The rank product method with two samples.
Breitling et al. (2004) introduced a statistical technique, the rank product method, for detecting differentially regulated genes in replicated microarray experiments. The technique has achieved widespread acceptance and is now used more broadly, in such diverse fields as RNAi analysis, proteomics, and machine learning. In this note, we extend the rank product method to the two sample setting, provide distribution theory attending the rank product method in this setting, and give numerical details for implementing the method.
Statistical methods for ranking data
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Finite sampling inequalities: an application to two-sample Kolmogorov-Smirnov statistics.
We review a finite-sampling exponential bound due to Serfling and discuss related exponential bounds for the hypergeometric distribution. We then discuss how such bounds motivate some new results for two-sample empirical processes. Our development complements recent results by Wei and Dudley (2012) concerning exponential bounds for two-sided Kolmogorov - Smirnov statistics by giving corresponding results for one-sided statistics with emphasis on "adjusted" inequalities of the type proved originally by Dvoretzky et al. (1956) and by Massart (1990) for one-sample versions of these statistics.
Statistical Optimality in Multipartite Ranking and Ordinal Regression.
Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.
Statistical inference of Minimum Rank Factor Analysis
For any given number of factors, Minimum Rank Factor Analysis yields optimal communalities for an observed covariance matrix in the sense that the unexplained common variance with that number of factors is minimized, subject to the constraint that both the diagonal matrix of unique variances and the
For any given number of factors, Minimum Rank Factor Analysis yields optimal communalities for an observed covariance matrix in the sense that the unexplained common variance with that number of factors is minimized, subject to the constraint that both the diagonal matrix of unique variances and the
Prototyping a Distributed Information Retrieval System That Uses Statistical Ranking.
Built using a distributed architecture, this prototype distributed information retrieval system uses statistical ranking techniques to provide better service to the end user. Distributed architecture was shown to be a feasible alternative to centralized or CD-ROM information retrieval, and user testing of the ranking methodology showed both…
Rank correlation among different statistical models in ranking of winter wheat genotypes’
Several statistical methods have been developed for analyzing genotype×environment(GE)interactions in crop breeding programs to identify genotypes with high yield and stability performances.Four statistical methods,including joint regression analysis(JRA),additive mean effects and multiplicative interaction(AMMI)analysis,genotype plus GE interaction(GGE)biplot analysis,and yield–stability(YSi)statistic were used to evaluate GE interaction in20 winter wheat genotypes grown in 24 environments in Iran.The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield,stability and yield–stability.Three kinds of genotypic ranks(yield ranks,stability ranks,and yield–stability ranks)were determined with each method.The results indicated the presence of GE interaction,suggesting the need for stability analysis.With respect to yield,the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated(P<0.01).For stability ranking,the rank correlations ranged from 0.53(GGE–YSi;P<0.05)to0.97(JRA–YSi;P<0.01).AMMI distance(AMMID)was highly correlated(P<0.01)with variance of regression deviation(S2di)in JRA(r=0.83)and Shukla stability variance(σ2)in YSi(r=0.86),indicating that these stability indices can be used interchangeably.No correlation was found between yield ranks and stability ranks(AMMID,S2di,σ2,and GGE stability index),indicating that they measure static stability and accordingly could be used if selection is based primarily on stability.For yield–stability,rank correlation coefficients among the statistical methods varied from 0.64(JRA–YSi;P<0.01)to 0.89(AMMI–YSi;P<0.01),indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance.Based on the results,it can be concluded that YSi was closely correlated with(i)JRA in ranking genotypes for stability and(ii)AMMI for integrating yield and stability.
Rank correlation among different statistical models in ranking of winter wheat genotypes
Full Text Available Several statistical methods have been developed for analyzing genotype × environment (GE interactions in crop breeding programs to identify genotypes with high yield and stability performances. Four statistical methods, including joint regression analysis (JRA, additive mean effects and multiplicative interaction (AMMI analysis, genotype plus GE interaction (GGE biplot analysis, and yield–stability (YSi statistic were used to evaluate GE interaction in 20 winter wheat genotypes grown in 24 environments in Iran. The main objective was to evaluate the rank correlations among the four statistical methods in genotype rankings for yield, stability and yield–stability. Three kinds of genotypic ranks (yield ranks, stability ranks, and yield–stability ranks were determined with each method. The results indicated the presence of GE interaction, suggesting the need for stability analysis. With respect to yield, the genotype rankings by the GGE biplot and AMMI analysis were significantly correlated (P < 0.01. For stability ranking, the rank correlations ranged from 0.53 (GGE–YSi; P < 0.05 to 0.97 (JRA–YSi; P < 0.01. AMMI distance (AMMID was highly correlated (P < 0.01 with variance of regression deviation (S2di in JRA (r = 0.83 and Shukla stability variance (σ2 in YSi (r = 0.86, indicating that these stability indices can be used interchangeably. No correlation was found between yield ranks and stability ranks (AMMID, S2di, σ2, and GGE stability index, indicating that they measure static stability and accordingly could be used if selection is based primarily on stability. For yield–stability, rank correlation coefficients among the statistical methods varied from 0.64 (JRA–YSi; P < 0.01 to 0.89 (AMMI–YSi; P < 0.01, indicating that AMMI and YSi were closely associated in the genotype ranking for integrating yield with stability performance. Based on the results, it can be concluded that YSi was closely correlated with (i JRA in ranking
Efficient nonrigid registration using ranked order statistics
Tennakoon, Ruwan B.; Bab-Hadiashar, Alireza; de Bruijne, Marleen
2013-01-01
Non-rigid image registration techniques are widely used in medical imaging applications. Due to high computational complexities of these techniques, finding appropriate registration method to both reduce the computation burden and increase the registration accuracy has become an intense area...... of research. In this paper we propose a fast and accurate non-rigid registration method for intra-modality volumetric images. Our approach exploits the information provided by an order statistics based segmentation method, to find the important regions for registration and use an appropriate sampling scheme...... to target those areas and reduce the registration computation time. A unique advantage of the proposed method is its ability to identify the point of diminishing returns and stop the registration process. Our experiments on registration of real lung CT images, with expert annotated landmarks, show...
Objective: The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. Background: However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. Material and Methods: A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. Results: With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital’s clinical governance was required to create a database. Conclusion: Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems. PMID:27147804
The purpose of this study was investigating situation and presenting a conceptual model for clinical governance information system by using UML in two sample hospitals. However, use of information is one of the fundamental components of clinical governance; but unfortunately, it does not pay much attention to information management. A cross sectional study was conducted in October 2012- May 2013. Data were gathered through questionnaires and interviews in two sample hospitals. Face and content validity of the questionnaire has been confirmed by experts. Data were collected from a pilot hospital and reforms were carried out and Final questionnaire was prepared. Data were analyzed by descriptive statistics and SPSS 16 software. With the scenario derived from questionnaires, UML diagrams are presented by using Rational Rose 7 software. The results showed that 32.14 percent Indicators of the hospitals were calculated. Database was not designed and 100 percent of the hospital's clinical governance was required to create a database. Clinical governance unit of hospitals to perform its mission, do not have access to all the needed indicators. Defining of Processes and drawing of models and creating of database are essential for designing of information systems.
A major problem in using SVD (singular-value decomposition) as a tool in determining the effective rank of a perturbed matrix is that of distinguishing between significantly small and significantly large singular values to the end, conference regions are derived for the perturbed singular values of matrices with noisy observation data. The analysis is based on the theories of perturbations of singular values and statistical significance test. Threshold bounds for perturbation due to finite-precision and i.i.d. random models are evaluated. In random models, the threshold bounds depend on the dimension of the matrix, the noisy variance, and predefined statistical level of significance. Results applied to the problem of determining the effective order of a linear autoregressive system from the approximate rank of a sample autocorrelation matrix are considered. Various numerical examples illustrating the usefulness of these bounds and comparisons to other previously known approaches are given.
Useful experimental designs and rank order statistics in educational research
Full Text Available Experimental educational research is of great impact because it illuminates cause-and-effect relationships by accumulating empirical evidence. The present article does not propose new methods but brings three useful experimental designs as well as appropriate statistical procedures (rank order statistics to the attention of the reader to conduct educational experiments, even with small samples. By means of their systematic use combined with the process-product paradigm of experimental educational research, the influence of essential variables (teacher, context, and process variables in schools, universities, and other educational institutions can be investigated. The statistical procedures described in this article guarantee that small samples (e.g. a school class can be successfully used, and that product variables (e.g. knowledge, comprehension, transfer are only required to meet the criteria of an ordinal scale. The experimental designs and statistical procedures are exemplified by hypothetical data and detailed calculations.
Inverted rank distributions: Macroscopic statistics, universality classes, and critical exponents
An inverted rank distribution is an infinite sequence of positive sizes ordered in a monotone increasing fashion. Interlacing together Lorenzian and oligarchic asymptotic analyses, we establish a macroscopic classification of inverted rank distributions into five “socioeconomic” universality classes: communism, socialism, criticality, feudalism, and absolute monarchy. We further establish that: (i) communism and socialism are analogous to a “disordered phase”, feudalism and absolute monarchy are analogous to an “ordered phase”, and criticality is the “phase transition” between order and disorder; (ii) the universality classes are characterized by two critical exponents, one governing the ordered phase, and the other governing the disordered phase; (iii) communism, criticality, and absolute monarchy are characterized by sharp exponent values, and are inherently deterministic; (iv) socialism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by continuous power-law statistics; (v) feudalism is characterized by a continuous exponent range, is inherently stochastic, and is universally governed by discrete exponential statistics. The results presented in this paper yield a universal macroscopic socioeconophysical perspective of inverted rank distributions.
Homogeneity and change-point detection tests for multivariate data using rank statistics
Detecting and locating changes in highly multivariate data is a major concern in several current statistical applications. In this context, the first contribution of the paper is a novel non-parametric two-sample homogeneity test for multivariate data based on the well-known Wilcoxon rank statistic. The proposed two-sample homogeneity test statistic can be extended to deal with ordinal or censored data as well as to test for the homogeneity of more than two samples. The second contribution of the paper concerns the use of the proposed test statistic to perform retrospective change-point analysis. It is first shown that the approach is computationally feasible even when looking for a large number of change-points thanks to the use of dynamic programming. Computable asymptotic $p$-values for the test are then provided in the case where a single potential change-point is to be detected. Compared to available alternatives, the proposed approach appears to be very reliable and robust. This is particularly true in ...
Statistical regularities in the rank-citation profile of scientists
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile ci(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each ci(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different ci(r) profiles, our results demonstrate the utility of the βi scaling parameter in conjunction with hi for quantifying individual publication impact. We show that the total number of citations Ci tallied from a scientist's Ni papers scales as . Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd.
Statistical regularities in the rank-citation profile of scientists.
Recent science of science research shows that scientific impact measures for journals and individual articles have quantifiable regularities across both time and discipline. However, little is known about the scientific impact distribution at the scale of an individual scientist. We analyze the aggregate production and impact using the rank-citation profile c(i)(r) of 200 distinguished professors and 100 assistant professors. For the entire range of paper rank r, we fit each c(i)(r) to a common distribution function. Since two scientists with equivalent Hirsch h-index can have significantly different c(i)(r) profiles, our results demonstrate the utility of the β(i) scaling parameter in conjunction with h(i) for quantifying individual publication impact. We show that the total number of citations C(i) tallied from a scientist's N(i) papers scales as [Formula: see text]. Such statistical regularities in the input-output patterns of scientists can be used as benchmarks for theoretical models of career progress.
Astronomical Site Ranking Based on Tropospheric Wind Statistics
We present comprehensive and reliable statistics of high altitude wind speeds and the tropospheric flows at the location of five important astronomical observatories. Statistical analysis exclusively of high altitude winds point to La Palma as the most suitable site for adaptive optics, with a mean value of 22.13 m/s at the 200 mbar pressure level. La Silla is at the bottom of the ranking, with the largest average value 200 mbar wind speed(33.35 m/s). We have found a clear annual periodicity of high altitude winds for the five sites in study. We have also explored the connection of high to low altitude atmospheric winds as a first approach of the linear relationship between the average velocity of the turbulence and high altitude winds (Sarazin & Tokovinin 2001). We may conclude that high and low altitude winds show good linear relationships at the five selected sites. The highest correlation coefficients correspond to Paranal and San Pedro Martir, while La Palma and La Silla show similar high to low alti...
We introduce the novel concept of statistical energy as a statistical tool. We define statistical energy of statistical distributions in a similar way as for electric charge distributions. Charges of opposite sign are in a state of minimum energy if they are equally distributed. This property is used to check whether two samples belong to the same parent distribution, to define goodness-of-fit tests and to unfold distributions distorted by measurement. The approach is binning-free and especially powerful in multidimensional applications.
Few studies comprehensively evaluate which types of life stress are most strongly associated with depressive episode onsets, over and above other forms of stress, and comparisons between acute and chronic stress are particularly lacking. Past research implicates major (moderate to severe) stressful life events (SLEs), and to a lesser extent, interpersonal forms of stress; research conflicts on whether dependent or independent SLEs are more potent, but theory favors dependent SLEs. The present study used 5 years of annual diagnostic and life stress interviews of chronic stress and SLEs from 2 separate samples (Sample 1 N = 432; Sample 2 N = 146) transitioning into emerging adulthood; 1 sample also collected early adversity interviews. Multivariate analyses simultaneously examined multiple forms of life stress to test hypotheses that all major SLEs, then particularly interpersonal forms of stress, and then dependent SLEs would contribute unique variance to major depressive episode (MDE) onsets. Person-month survival analysis consistently implicated chronic interpersonal stress and major interpersonal SLEs as statistically unique predictors of risk for MDE onset. In addition, follow-up analyses demonstrated temporal precedence for chronic stress; tested differences by gender; showed that recent chronic stress mediates the relationship between adolescent adversity and later MDE onsets; and revealed interactions of several forms of stress with socioeconomic status (SES). Specifically, as SES declined, there was an increasing role for noninterpersonal chronic stress and noninterpersonal major SLEs, coupled with a decreasing role for interpersonal chronic stress. Implications for future etiological research were discussed.
Rankings & Estimates: Rankings of the States 2015 and Estimates of School Statistics 2016
The data presented in this combined report--"Rankings & Estimates"--provide facts about the extent to which local, state, and national governments commit resources to public education. As one might expect in a nation as diverse as the United States--with respect to economics, geography, and politics--the level of commitment to…
Statistical regularities in the rank-citation profile of scientists
Recent "science of science" research shows common regularities in the publication patterns of scientific papers across time and discipline. Here we analyze the complete publication careers of 300 scientists and find remarkable regularity in the functional form of the rank-citation profile c_{i}(r) for each scientist i =1...300. We find that the rank-ordered citation distribution c_{i}(r) can be approximated by a discrete generalized beta distribution (DGBD) over the entire range of ranks r, which allows for the characterization and comparison of c_{i}(r) using a common framework. The functional form of the DGBD has two scaling exponents, beta_i and gamma_i, which determine the scaling behavior of c_{i}(r) for both small and large rank r. The crossover between two scaling regimes suggests a complex reinforcement or positive-feedback relation between the impact of a scientist's most famous papers and the impact of his/her other papers. Moreover, since two scientists with equivalent Hirsch h-index values may hav...
Reliable detection of directional couplings using rank statistics.
To detect directional couplings from time series various measures based on distances in reconstructed state spaces were introduced. These measures can, however, be biased by asymmetries in the dynamics' structure, noise color, or noise level, which are ubiquitous in experimental signals. Using theoretical reasoning and results from model systems we identify the various sources of bias and show that most of them can be eliminated by an appropriate normalization. We furthermore diminish the remaining biases by introducing a measure based on ranks of distances. This rank-based measure outperforms existing distance-based measures concerning both sensitivity and specificity for directional couplings. Therefore, our findings are relevant for a reliable detection of directional couplings from experimental signals.
Statistics for Ranking Program Committees and Editorial Boards
Ranking groups of researchers is important in several contexts and can serve many purposes such as the fair distribution of grants based on the scientist's publication output, concession of research projects, classification of journal editorial boards and many other applications in a social context. In this paper, we propose a method for measuring the performance of groups of researchers. The proposed method is called alpha-index and it is based on two parameters: (i) the homogeneity of the h-indexes of the researchers in the group; and (ii) the h-group, which is an extension of the h-index for groups. Our method integrates the concepts of homogeneity and absolute value of the h-index into a single measure which is appropriate for the evaluation of groups. We report on experiments that assess computer science conferences based on the h-indexes of their program committee members. Our results are similar to a manual classification scheme adopted by a research agency.
Nonrigid registration of volumetric images using ranked order statistics
Non-rigid image registration techniques using intensity based similarity measures are widely used in medical imaging applications. Due to high computational complexities of these techniques, particularly for volumetric images, finding appropriate registration methods to both reduce the computation...... burden and increase the registration accuracy has become an intensive area of research. In this paper we propose a fast and accurate non-rigid registration method for intra-modality volumetric images. Our approach exploits the information provided by an order statistics based segmentation method, to find...... the important regions for registration and use an appropriate sampling scheme to target those areas and reduce the registration computation time. A unique advantage of the proposed method is its ability to identify the point of diminishing returns and stop the registration process. Our experiments...
Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks
We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.
Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks
We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.
Comparison of three summary statistics for ranking genes in genome-wide association studies.
Problems associated with insufficient power have haunted the analysis of genome-wide association studies and are likely to be the main challenge for the analysis of next-generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios. Copyright © 2013 John Wiley & Sons, Ltd.
Statistical reliability and path diversity based PageRank algorithm improvements
Hong, Dohy
In this paper we present new improvement ideas of the original PageRank algorithm. The first idea is to introduce an evaluation of the statistical reliability of the ranking score of each node based on the local graph property and the second one is to introduce the notion of the path diversity. The path diversity can be exploited to dynamically modify the increment value of each node in the random surfer model or to dynamically adapt the damping factor. We illustrate the impact of such modifications through examples and simple simulations.
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics
Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
Statistical regularities in the rank-citation profile of individual scientists
Citation counts and paper tallies are ubiquitous in the achievement ratings of individual scientists. As a result, there have been many recent studies which propose measures for scientific impact (e.g. the h -index) and the distribution of impact measures among scientists. However, being just a single number, the h -index cannot account for the full impact information contained in an author's set of publications. Alternative ``single-number'' indices are also frequently proposed, but they too suffer from the shortfalls of not being comprehensive. In this talk I will discuss an alternative approach, which is to analyze the fundamental properties of the entire rank-citation profile (from which all single-value indices are derived). Using the complete publication careers of 200 highly-cited physicists and 100 Assistant professors, I will demonstrate remarkable statistical regularity in the functional form of the rank-citation profile ci (r) for each physicist i = 1 . . . 300 . We find that ci (r) can be approximated by a discrete generalized beta distribution over the entire range of ranks r , which allows for the characterization and comparison of ci (r) using a common framework. Since two scientists can have equivalent hi values while having different ci (r) , our results demonstrate the utility of a scaling parameter, βi , in conjunction with hi , to quantify a scientist's publication impact.
In this paper we discuss and question the use of statistical significance tests in relation to university rankings as recently suggested. We outline the assumptions behind and interpretations of statistical significance tests and relate this to examples from the recent SCImago Institutions Ranking....... By use of statistical power analyses and demonstration of effect sizes, we emphasize that importance of empirical findings lies in “differences that make a difference” and not statistical significance tests per se. Finally we discuss the crucial assumption of randomness and question the presumption...... that randomness is present in the university ranking data. We conclude that the application of statistical significance tests in relation to university rankings, as recently advocated, is problematic and can be misleading....
Reproducibility-optimized test statistic for ranking genes in microarray studies.
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
In recent years, there has been a renewed interest in applying statistical ranking criteria to identify sites on a road network, which potentially present high traffic crash risks or are over-represented in certain type of crashes, for further engineering evaluation and safety improvement. This requires that good estimates of ranks of crash risks be obtained at individual intersections or road segments, or some analysis zones. The nature of this site ranking problem in roadway safety is related to two well-established statistical problems known as the small area (or domain) estimation problem and the disease mapping problem. The former arises in the context of providing estimates using sample survey data for a small geographical area or a small socio-demographic group in a large area, while the latter stems from estimating rare disease incidences for typically small geographical areas. The statistical problem is such that direct estimates of certain parameters associated with a site (or a group of sites) with adequate precision cannot be produced, due to a small available sample size, the rareness of the event of interest, and/or a small exposed population or sub-population in question. Model based approaches have offered several advantages to these estimation problems, including increased precision by "borrowing strengths" across the various sites based on available auxiliary variables, including their relative locations in space. Within the model based approach, generalized linear mixed models (GLMM) have played key roles in addressing these problems for many years. The objective of the study, on which this paper is based, was to explore some of the issues raised in recent roadway safety studies regarding ranking methodologies in light of the recent statistical development in space-time GLMM. First, general ranking approaches are reviewed, which include naïve or raw crash-risk ranking, scan based ranking, and model based ranking. Through simulations, the
In the random censorship model, the log-rank test is often used for comparing a control group with different dose groups. If the number of tumors is small, so-called exact methods are often applied for computing critical values from a permutational distribution. Two of these exact methods are discussed and shown to be incorrect. The correct permutational distribution is derived and studied with respect to its behavior under unequal censoring in the light of recent results proving that the permutational version and the unconditional version of the log-rank test are asymptotically equivalent even under unequal censoring. The log-rank test is studied by simulations of a realistic scenario from a bioassay with small numbers of tumors.
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF...... is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain.Conclusions: The result is a flexible gene finder whose overall performance matches or exceeds...
Improving Statistical Machine Translation Through N-best List Re-ranking and Optimization
2014-03-27
Language Processing, 1352–1362. Association for Computational Linguistics, Edinburgh, Scotland , UK., July 2011. URL http://www.aclweb.org/anthology/D11-1125...Josef. “Minimum Error Rate Training in Statistical Machine Translation”. Erhard Hinrichs and Dan Roth (editors), Proceedings of the 41st Annual Meeting of
EasyGene – a prokaryotic gene finder that ranks ORFs by statistical significance
Full Text Available Abstract Background Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs in prokaryotic genomes. Therefore, many genomes have too many short ORFs annotated as genes. Results In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM that is automatically estimated for a new genome. Using extensions of similarities in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain. Conclusions The result is a flexible gene finder whose overall performance matches or exceeds other methods. The entire pipeline of computer processing from the raw input of a genome or set of contigs to a list of putative genes with significance is automated, making it easy to apply EasyGene to newly sequenced organisms. EasyGene with pre-trained models can be accessed at http://www.cbs.dtu.dk/services/EasyGene.
High-throughput sequencing techniques are increasingly affordable and produce massive amounts of data. Together with other high-throughput technologies, such as microarrays, there are an enormous amount of resources in databases. The collection of these valuable data has been routine for more than a decade. Despite different technologies, many experiments share the same goal. For instance, the aims of RNA-seq studies often coincide with those of differential gene expression experiments based on microarrays. As such, it would be logical to utilize all available data. However, there is a lack of biostatistical tools for the integration of results obtained from different technologies. Although diverse technological platforms produce different raw data, one commonality for experiments with the same goal is that all the outcomes can be transformed into a platform-independent data format - rankings - for the same set of items. Here we present the R package TopKLists, which allows for statistical inference on the lengths of informative (top-k) partial lists, for stochastic aggregation of full or partial lists, and for graphical exploration of the input and consolidated output. A graphical user interface has also been implemented for providing access to the underlying algorithms. To illustrate the applicability and usefulness of the package, we integrated microRNA data of non-small cell lung cancer across different measurement techniques and draw conclusions. The package can be obtained from CRAN under a LGPL-3 license.
GOODNESS-OF-FIT TEST ON TWO SAMPLES
In this paper, a new statistics for testing two samples coming from the same population is derived from a simple linear model with an artificial parameter. Its limit distribution is a chi-squared distribution with 2 degrees of freedom under null hypothesis and the limit distribution is a noncentral chi-squared distribution with 2 degrees of freedom under certain sequence of alternative hypothesis. Finally, we make power comparison with other tests on two samples, especially, with Smirnov statistics.
Two sampling techniques for game meat.
A study was conducted to compare the excision sampling technique used by the export market and the sampling technique preferred by European countries, namely the biotrace cattle and swine test. The measuring unit for the excision sampling was grams (g) and square centimetres (cm2) for the swabbing technique. The two techniques were compared after a pilot test was conducted on spiked approved beef carcasses (n = 12) that statistically proved the two measuring units correlated. The two sampling techniques were conducted on the same game carcasses (n = 13) and analyses performed for aerobic plate count (APC), Escherichia coli and Staphylococcus aureus, for both techniques. A more representative result was obtained by swabbing and no damage was caused to the carcass. Conversely, the excision technique yielded fewer organisms and caused minor damage to the carcass. The recovery ratio from the sampling technique improved 5.4 times for APC, 108.0 times for E. coli and 3.4 times for S. aureus over the results obtained from the excision technique. It was concluded that the sampling methods of excision and swabbing can be used to obtain bacterial profiles from both export and local carcasses and could be used to indicate whether game carcasses intended for the local market are possibly on par with game carcasses intended for the export market and therefore safe for human consumption.
Full Text Available In Environmental Epidemiology, long lists of relative risk estimates from exposed populations are compared to a reference to scrutinize the dataset for extremes. Here, inference on disease profiles for given areas, or for fixed disease population signatures, are of interest and summaries can be obtained averaging over areas or diseases. We have developed a multivariate hierarchical Bayesian approach to estimate posterior rank distributions and we show how to produce league tables of ranks with credibility intervals useful to address the above mentioned inferential problems. Applying the procedure to a real dataset from the report “Environment and Health in Sardinia (Italy” we selected 18 areas characterized by high environmental pressure for industrial, mining or military activities investigated for 29 causes of deaths among male residents. Ranking diseases highlighted the increased burdens of neoplastic (cancerous, and non-neoplastic respiratory diseases in the heavily polluted area of Portoscuso. The averaged ranks by disease over areas showed lung cancer among the three highest positions.
Quantum communication is concerned with the complexity of entanglement of a state and statistical data analysis is concerned with the complexity of a model. A common key word for both is "rank". In this paper we will show that both community is tracing the same target and that the methods used are slightly different. Two different methods, the range criterion method from quantum communication and the determinant polynomial method, are shown as an examples.
Full Text Available Abstract Background In many areas of medical research, a bivariate analysis is desirable because it simultaneously tests two response variables that are of equal interest and importance in two populations. Several parametric and nonparametric bivariate procedures are available for the location problem but each of them requires a series of stringent assumptions such as specific distribution, affine-invariance or elliptical symmetry. The aim of this study is to propose a powerful test statistic that requires none of the aforementioned assumptions. We have reduced the bivariate problem to the univariate problem of sum or subtraction of measurements. A simple bivariate test for the difference in location between two populations is proposed. Method In this study the proposed test is compared with Hotelling's T2 test, two sample Rank test, Cramer test for multivariate two sample problem and Mathur's test using Monte Carlo simulation techniques. The power study shows that the proposed test performs better than any of its competitors for most of the populations considered and is equivalent to the Rank test in specific distributions. Conclusions Using simulation studies, we show that the proposed test will perform much better under different conditions of underlying population distribution such as normality or non-normality, skewed or symmetric, medium tailed or heavy tailed. The test is therefore recommended for practical applications because it is more powerful than any of the alternatives compared in this paper for almost all the shifts in location and in any direction.
In een academische wereld die steeds competitiever wordt, willen we graag weten wat ‘de beste’ universiteit is. Verschillende rankings bedienen ons op onze wenken, waaronder Times Higher Education, Sjanghai, QS en Leiden. De kritiek op die lijsten is echter niet mals, ook omdat universiteiten graag
The embryonic stem cell test (EST) is applied as a model system for detection of embryotoxicants. The application of transcriptomics allows a more detailed effect assessment compared to the morphological endpoint. Genes involved in cell differentiation, modulated by chemical exposures, may be useful as biomarkers of developmental toxicity. We describe a statistical approach to obtain a predictive gene set for toxicity potency ranking of compounds within one class. This resulted in a gene set based on differential gene expression across concentration-response series of phthalatic monoesters. We determined the concentration at which gene expression was changed at least 1.5-fold. Genes responding with the same potency ranking in vitro and in vivo embryotoxicity were selected. A leave-one-out cross-validation showed that the relative potency of each phthalate was always predicted correctly. The classical morphological 50% effect level (ID50) in EST was similar to the predicted concentration using gene set expression responses. A general down-regulation of development-related genes and up-regulation of cell-cycle related genes was observed, reminiscent of the differentiation inhibition in EST. This study illustrates the feasibility of applying dedicated gene set selections as biomarkers for developmental toxicity potency ranking on the basis of in vitro testing in the EST.
Declining recognition of top university lists prompts China to look for new ways to evaluate its higher learning institutions Zhejiang University for the first time has overtaken Peking University and Tsinghua University to rank No.1 on the latest list of Chinese college rankings.The rankings are an important part of the book Picking Your University and
Optimal tests for the two-sample spherical location problem
We tackle the classical two-sample spherical location problem for directional data by having recourse to the Le Cam methodology, habitually used in classical "linear" multivariate analysis. More precisely we construct locally and asymptotically optimal (in the maximin sense) parametric tests, which we then turn into semi-parametric ones in two distinct ways. First, by using a studentization argument; this leads to so-called pseudo-FvML tests. Second, by resorting to the invariance principle; this leads to efficient rank-based tests. Within each construction, the semi-parametric tests inherit optimality under a given distribution (the FvML in the first case, any rotationally symmetric one in the second) from their parametric counterparts and also improve on the latter by being valid under the whole class of rotationally symmetric distributions. Asymptotic relative efficiencies are calculated and the finite-sample behavior of the proposed tests is investigated by means of a Monte Carlo simulation.
Currently the ranking of scientists is based on the $h$-index, which is widely perceived as an imprecise and simplistic though still useful metric. We find that the $h$-index actually favours modestly performing researchers and propose a simple criterion for proper ranking.
University rankings: The web ranking
Full Text Available The publication in 2003 of the Ranking of Universities by Jiao Tong University of Shanghai has revolutionized not only academic studies on Higher Education, but has also had an important impact on the national policies and the individual strategies of the sector. The work gathers the main characteristics of this and other global university rankings, paying special attention to their potential benefits and limitations. The Web Ranking is analyzed in depth, presenting the model on which its compound indicator is based and analyzing its different variables. ------- Rankings de universidades: El ranking web Resumen La publicación en 2003 del Ranking de Universidades de la Universidad Jiao Tong de Shanghai ha revolucionado no sólo los estudios académicos sobre la Educación Superior, sino que también ha tenido un importante impacto sobre las políticas nacionales y las estrategias individuales del sector. El trabajo recoge las principales características de este y otros rankings mundiales de universidades, prestando especial atención a sus potencialidades y limitaciones. Se analiza en profundidad el Ranking Web, presentando el modelo en el que se basa su indicador compuesto y analizando sus diferentes variables y principales resultados. DOI: 10.18870/hlrc.v2i1.56 PDF document contains both the original in Spanish and an English translation.
This note tries to attempt a sketch of the history of spectral ranking, a general umbrella name for techniques that apply the theory of linear maps (in particular, eigenvalues and eigenvectors) to matrices that do not represent geometric transformations, but rather some kind of relationship between entities. Albeit recently made famous by the ample press coverage of Google's PageRank algorithm, spectral ranking was devised more than fifty years ago, almost exactly in the same terms, and has been studied in psychology and social sciences. I will try to describe it in precise and modern mathematical terms, highlighting along the way the contributions given by previous scholars.
Full Text Available Objective: Choosing the most efficient statistical test is one of the essential problems of statistics. Asymptotic relative efficiency is a notion which enables to implement in large samples the quantitative comparison of two different tests used for testing of the same statistical hypothesis. The notion of the asymptotic efficiency of tests is more complicated than that of asymptotic efficiency of estimates. This paper discusses the effect of sample size on expected values and variances of non-parametric tests for independent two samples and determines the most effective test for different sample sizes using Fraser efficiency value. Material and Methods: Since calculating the power value in comparison of the tests is not practical most of the time, using the asymptotic relative efficiency value is favorable. Asymptotic relative efficiency is an indispensable technique for comparing and ordering statistical test in large samples. It is especially useful in nonparametric statistics where there exist numerous heuristic tests such as the linear rank tests. In this study, the sample size is determined as 2 ≤ n ≤ 50. Results: In both balanced and unbalanced cases, it is found that, as the sample size increases expected values and variances of all the tests discussed in this paper increase as well. Additionally, considering the Fraser efficiency, Mann-Whitney U test is found as the most efficient test among the non-parametric tests that are used in comparison of independent two samples regardless of their sizes. Conclusion: According to Fraser efficiency, Mann-Whitney U test is found as the most efficient test.
The two-sample problem with induced dependent censorship.
Induced dependent censorship is a general phenomenon in health service evaluation studies in which a measure such as quality-adjusted survival time or lifetime medical cost is of interest. We investigate the two-sample problem and propose two classes of nonparametric tests. Based on consistent estimation of the survival function for each sample, the two classes of test statistics examine the cumulative weighted difference in hazard functions and in survival functions. We derive a unified asymptotic null distribution theory and inference procedure. The tests are applied to trial V of the International Breast Cancer Study Group and show that long duration chemotherapy significantly improves time without symptoms of disease and toxicity of treatment as compared with the short duration treatment. Simulation studies demonstrate that the proposed tests, with a wide range of weight choices, perform well under moderate sample sizes.
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Testing Homogeneity in a Semiparametric Two-Sample Problem
Full Text Available We study a two-sample homogeneity testing problem, in which one sample comes from a population with density f(x and the other is from a mixture population with mixture density (1−λf(x+λg(x. This problem arises naturally from many statistical applications such as test for partial differential gene expression in microarray study or genetic studies for gene mutation. Under the semiparametric assumption g(x=f(xeα+βx, a penalized empirical likelihood ratio test could be constructed, but its implementation is hindered by the fact that there is neither feasible algorithm for computing the test statistic nor available research results on its theoretical properties. To circumvent these difficulties, we propose an EM test based on the penalized empirical likelihood. We prove that the EM test has a simple chi-square limiting distribution, and we also demonstrate its competitive testing performances by simulations. A real-data example is used to illustrate the proposed methodology.
Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests
Multivariate generalizations of the Wald--Wolfowitz runs statistic and the Smirnov maximum deviation statistic for the two-sample problem are presented. They are based on the minimal spanning tree of the pooled sample points. Some null distribution results are derived and a simulation study of power is reported. 5 figures, 2 tables.
A MODIFIED LIKELIHOOD RATIO TEST FOR HOMOGENEITY IN BIVARIATE NORMAL MIXTURES OF TWO SAMPLES
This paper investigates the asymptotic properties of a modified likelihood ratio statistic for testing homogeneity in bivariate normal mixture models of two samples. The asymptotic null distribution of the modified likelihood ratio statistic is found to be X~2_2, where X~2_2 is a chi-squared distribution with 2 degrees of freedom.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Wikipedia ranking of world universities
We use the directed networks between articles of 24 Wikipedia language editions for producing the wikipedia ranking of world Universities (WRWU) using PageRank, 2DRank and CheiRank algorithms. This approach allows to incorporate various cultural views on world universities using the mathematical statistical analysis independent of cultural preferences. The Wikipedia ranking of top 100 universities provides about 60% overlap with the Shanghai university ranking demonstrating the reliable features of this approach. At the same time WRWU incorporates all knowledge accumulated at 24 Wikipedia editions giving stronger highlights for historically important universities leading to a different estimation of efficiency of world countries in university education. The historical development of university ranking is analyzed during ten centuries of their history.
The Asymptotics of Ranking Algorithms
We consider the predictive problem of supervised ranking, where the task is to rank sets of candidate items returned in response to queries. Although there exist statistical procedures that come with guarantees of consistency in this setting, these procedures require that individuals provide a complete ranking of all items, which is rarely feasible in practice. Instead, individuals routinely provide partial preference information, such as pairwise comparisons of items, and more practical approaches to ranking have aimed at modeling this partial preference data directly. As we show, however, such an approach has serious theoretical shortcomings. Indeed, we demonstrate that many commonly used surrogate losses for pairwise comparison data do not yield consistency; surprisingly, we show inconsistency even in low-noise settings. With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences and develop $U$-statistic-based empirical risk minimi...
Time evolution of Wikipedia network ranking
We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003 - 2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007 - 2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80 percents of top universities of Shanghai ranking during the considered time period.
Time evolution of Wikipedia network ranking
We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003-2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007-2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period.
A C++ Program for the Cramér-Von Mises Two-Sample Test
Full Text Available As larger sets of high-throughput data in genomics and proteomics become more readily available, there is a growing need for fast algorithms designed to compute exact p values of distribution-free statistical tests. We present a program for computing the exact distribution of the two-sample Cramér-von Mises test statistic under the null hypothesis that the two samples are drawn from the same continuous distribution. The program makes it possible to handle substantially larger sample sizes than earlier proposed computational tools. The C++ source code for the program is published with this paper, and an R package is under development.
In recent years, immunological science has evolved, and cancer vaccines are now approved and available for treating existing cancers. Because cancer vaccines require time to elicit an immune response, a delayed treatment effect is expected and is actually observed in drug approval studies. Accordingly, we propose the evaluation of survival endpoints by weighted log-rank tests with the Fleming-Harrington class of weights. We consider group sequential monitoring, which allows early efficacy stopping, and determine a semiparametric information fraction for the Fleming-Harrington family of weights, which is necessary for the error spending function. Moreover, we give a flexible survival model in cancer vaccine studies that considers not only the delayed treatment effect but also the long-term survivors. In a Monte Carlo simulation study, we illustrate that when the primary analysis is a weighted log-rank test emphasizing the late differences, the proposed information fraction can be a useful alternative to the surrogate information fraction, which is proportional to the number of events. Copyright © 2016 John Wiley & Sons, Ltd.
Ranking Theory and Conditional Reasoning.
Ranking theory is a formal epistemology that has been developed in over 600 pages in Spohn's recent book The Laws of Belief, which aims to provide a normative account of the dynamics of beliefs that presents an alternative to current probabilistic approaches. It has long been received in the AI community, but it has not yet found application in experimental psychology. The purpose of this paper is to derive clear, quantitative predictions by exploiting a parallel between ranking theory and a statistical model called logistic regression. This approach is illustrated by the development of a model for the conditional inference task using Spohn's (2013) ranking theoretic approach to conditionals.
2005-01-01
The present study examines the relation between psychopathy and the Big Five dimensions of personality in two samples of adolescents. Specifically, the study tests the hypothesis that the aspect of psychopathy representing selfishness, callousness, and interpersonal manipulation (Factor 1) is most strongly associated with low Agreeableness,…
The reduced rank regression model is a multivariate regression model with a coefficient matrix with reduced rank. The reduced rank regression algorithm is an estimation procedure, which estimates the reduced rank regression model. It is related to canonical correlations and involves calculating e...
On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests
Full Text Available Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. Inthisshortsurvey,wefocusonteststatisticsthatinvolvetheWassersteindistance. Usingan entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ plots and receiver operating characteristic or ordinal dominance (ROC/ODC curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
A tilting approach to ranking influence
We suggest a new approach, which is applicable for general statistics computed from random samples of univariate or vector-valued or functional data, to assessing the influence that individual data have on the value of a statistic, and to ranking the data in terms of that influence. Our method is based on, first, perturbing the value of the statistic by ‘tilting’, or reweighting, each data value, where the total amount of tilt is constrained to be the least possible, subject to achieving a given small perturbation of the statistic, and, then, taking the ranking of the influence of data values to be that which corresponds to ranking the changes in data weights. It is shown, both theoretically and numerically, that this ranking does not depend on the size of the perturbation, provided that the perturbation is sufficiently small. That simple result leads directly to an elegant geometric interpretation of the ranks; they are the ranks of the lengths of projections of the weights onto a ‘line’ determined by the first empirical principal component function in a generalized measure of covariance. To illustrate the generality of the method we introduce and explore it in the case of functional data, where (for example) it leads to generalized boxplots. The method has the advantage of providing an interpretable ranking that depends on the statistic under consideration. For example, the ranking of data, in terms of their influence on the value of a statistic, is different for a measure of location and for a measure of scale. This is as it should be; a ranking of data in terms of their influence should depend on the manner in which the data are used. Additionally, the ranking recognizes, rather than ignores, sign, and in particular can identify left- and right-hand ‘tails’ of the distribution of a random function or vector.
Nonparametric multivariate rank tests and their unbiasedness
Although unbiasedness is a basic property of a good test, many tests on vector parameters or scalar parameters against two-sided alternatives are not finite-sample unbiased. This was already noticed by Sugiura [Ann. Inst. Statist. Math. 17 (1965) 261--263]; he found an alternative against which the Wilcoxon test is not unbiased. The problem is even more serious in multivariate models. When testing the hypothesis against an alternative which fits well with the experiment, it should be verified whether the power of the test under this alternative cannot be smaller than the significance level. Surprisingly, this serious problem is not frequently considered in the literature. The present paper considers the two-sample multivariate testing problem. We construct several rank tests which are finite-sample unbiased against a broad class of location/scale alternatives and are finite-sample distribution-free under the hypothesis and alternatives. Each of them is locally most powerful against a specific alternative of t...
Comments on the rank product method for analyzing replicated experiments.
Breitling et al. introduced a statistical technique, the rank product method, for detecting differentially regulated genes in replicated microarray experiments. The technique has achieved widespread acceptance and is now used more broadly, in such diverse fields as RNAi analysis, proteomics, and machine learning. In this note, we relate the rank product method to linear rank statistics and provide an alternative derivation of distribution theory attending the rank product method.
Ranking Operations Management Conferences
Several publications have appeared in the field of Operations Management which rank Operations Management related journals. Several ranking systems exist for journals based on , for example, perceived relevance and quality, citation, and author affiliation. Many academics also publish at conferences
Sparse structure regularized ranking
Learning ranking scores is critical for the multimedia database retrieval problem. In this paper, we propose a novel ranking score learning algorithm by exploring the sparse structure and using it to regularize ranking scores. To explore the sparse structure, we assume that each multimedia object could be represented as a sparse linear combination of all other objects, and combination coefficients are regarded as a similarity measure between objects and used to regularize their ranking scores. Moreover, we propose to learn the sparse combination coefficients and the ranking scores simultaneously. A unified objective function is constructed with regard to both the combination coefficients and the ranking scores, and is optimized by an iterative algorithm. Experiments on two multimedia database retrieval data sets demonstrate the significant improvements of the propose algorithm over state-of-the-art ranking score learning algorithms.
Nose biopsy: a comparison between two sampling techniques.
Pre operative biopsy is important in obtaining preliminary information that may help in tailoring the optimal treatment. The aim of this study was to compare two sampling techniques of obtaining nasal biopsy-nasal forceps and nasal scissors in terms of pathological results. Biopsies of nasal lesions were taken from patients undergoing nasal surgery by two techniques- with nasal forceps and with nasal scissors. Each sample was examined by a senior pathologist that was blinded to the sampling method. A grading system was used to rate the crush artifact in every sample (none, mild, moderate, severe). A comparison was made between the severity of the crush artifact and the pathological results of the two techniques. One hundred and forty-four samples were taken from 46 patients. Thirty-one were males and the mean age was 49.6 years. Samples taken by forceps had significantly higher grades of crush artifacts compared to those taken by scissors. The degree of crush artifacts had a significant influence on the accuracy of the pre operative biopsy. Forceps cause significant amount of crush artifacts compared to scissors. The degree of crush artifact in the tissue sample influences the accuracy of the biopsy.
Missing Data Problems for Two Samples on a Dichotomous Variable.
The problem of comparing proportions when some data are missing is investigated, and determination is made of what statistical techniques are appropriate under each of several probability models describing the observations likely to be missing. Monte Carlo methods were used to investigate the properties of standard estimators under each of the…
RankExplorer: Visualization of Ranking Changes in Large Time Series Data.
For many applications involving time series data, people are often interested in the changes of item values over time as well as their ranking changes. For example, people search many words via search engines like Google and Bing every day. Analysts are interested in both the absolute searching number for each word as well as their relative rankings. Both sets of statistics may change over time. For very large time series data with thousands of items, how to visually present ranking changes is an interesting challenge. In this paper, we propose RankExplorer, a novel visualization method based on ThemeRiver to reveal the ranking changes. Our method consists of four major components: 1) a segmentation method which partitions a large set of time series curves into a manageable number of ranking categories; 2) an extended ThemeRiver view with embedded color bars and changing glyphs to show the evolution of aggregation values related to each ranking category over time as well as the content changes in each ranking category; 3) a trend curve to show the degree of ranking changes over time; 4) rich user interactions to support interactive exploration of ranking changes. We have applied our method to some real time series data and the case studies demonstrate that our method can reveal the underlying patterns related to ranking changes which might otherwise be obscured in traditional visualizations.
Paired comparisons analysis: an axiomatic approach to ranking methods
In this paper we present an axiomatic analysis of several ranking methods for general tournaments. We find that the ranking method obtained by applying maximum likelihood to the (Zermelo-)Bradley-Terry model, the most common method in statistics and psychology, is one of the ranking methods that per
There are now many methods available to assess the relative citation performance of peer-reviewed journals. Regardless of their individual faults and advantages, citation-based metrics are used by researchers to maximize the citation potential of their articles, and by employers to rank academic track records. The absolute value of any particular index is arguably meaningless unless compared to other journals, and different metrics result in divergent rankings. To provide a simple yet more objective way to rank journals within and among disciplines, we developed a κ-resampled composite journal rank incorporating five popular citation indices: Impact Factor, Immediacy Index, Source-Normalized Impact Per Paper, SCImago Journal Rank and Google 5-year h-index; this approach provides an index of relative rank uncertainty. We applied the approach to six sample sets of scientific journals from Ecology (n = 100 journals), Medicine (n = 100), Multidisciplinary (n = 50); Ecology + Multidisciplinary (n = 25), Obstetrics & Gynaecology (n = 25) and Marine Biology & Fisheries (n = 25). We then cross-compared the κ-resampled ranking for the Ecology + Multidisciplinary journal set to the results of a survey of 188 publishing ecologists who were asked to rank the same journals, and found a 0.68-0.84 Spearman's ρ correlation between the two rankings datasets. Our composite index approach therefore approximates relative journal reputation, at least for that discipline. Agglomerative and divisive clustering and multi-dimensional scaling techniques applied to the Ecology + Multidisciplinary journal set identified specific clusters of similarly ranked journals, with only Nature & Science separating out from the others. When comparing a selection of journals within or among disciplines, we recommend collecting multiple citation-based metrics for a sample of relevant and realistic journals to calculate the composite rankings and their relative uncertainty windows.
A two-sample Bayesian t-test for microarray data
Full Text Available Abstract Background Determining whether a gene is differentially expressed in two different samples remains an important statistical problem. Prior work in this area has featured the use of t-tests with pooled estimates of the sample variance based on similarly expressed genes. These methods do not display consistent behavior across the entire range of pooling and can be biased when the prior hyperparameters are specified heuristically. Results A two-sample Bayesian t-test is proposed for use in determining whether a gene is differentially expressed in two different samples. The test method is an extension of earlier work that made use of point estimates for the variance. The method proposed here explicitly calculates in analytic form the marginal distribution for the difference in the mean expression of two samples, obviating the need for point estimates of the variance without recourse to posterior simulation. The prior distribution involves a single hyperparameter that can be calculated in a statistically rigorous manner, making clear the connection between the prior degrees of freedom and prior variance. Conclusion The test is easy to understand and implement and application to both real and simulated data shows that the method has equal or greater power compared to the previous method and demonstrates consistent Type I error rates. The test is generally applicable outside the microarray field to any situation where prior information about the variance is available and is not limited to cases where estimates of the variance are based on many similar observations.
Bornmann, Stefaner, de Moya Anegon, and Mutz (in press) have introduced a web application (www.excellencemapping.net) which is linked to both academic ranking lists published hitherto (e.g. the Academic Ranking of World Universities) as well as spatial visualization approaches. The web application visualizes institutional performance within specific subject areas as ranking lists and on custom tile-based maps. The new, substantially enhanced version of the web application and the multilevel logistic regression on which it is based are described in this paper. Scopus data were used which have been collected for the SCImago Institutions Ranking. Only those universities and research-focused institutions are considered that have published at least 500 articles, reviews and conference papers in the period 2006 to 2010 in a certain Scopus subject area. In the enhanced version, the effect of single covariates (such as the per capita GDP of a country in which an institution is located) on two performance metrics (bes...
A study of serial ranks via random graphs
Serial ranks have long been used as the basis for nonparametric tests of independence in time series analysis. We shall study the underlying graph structure of serial ranks. This will lead us to a basic martingale which will allow us to construct a weighted approximation to a serial rank process. To show the applicability of this approximation, we will use it to prove two very general central limit theorems for Wald-Wolfowitz-type serial rank statistics.
Ranking of States and Commodities by Cash Receipts, 1992
This publication presents two types of ranking information derived from the U.S. Department of Agriculture's cash receipts statistics for the marketing of agricultural commodities within States. One type is the 25 leading commodities for each State and the Nation, ranked according to the estimated value of receipts. The second is the ranking of States by receipts from each of the 25 leading U.S. commodities and by several major commodity groups. The ranking of commodities produced in a State ...
This paper ranks Dutch economists using information about publications and citations. Rankings involve the aggregation of several performance dimensions. Instead of using a cardinal approach, where each dimension is weighted based on impact factors of journals for example, we use an ordinal approach
In this paper the concept of page rank for the world wide web is discussed. The possibility of describing the distribution of page rank by an exponential law is considered. It is shown that the concept is essentially equal to that of status score, a centrality measure discussed already in 1953 by Ka
This note explains how Emil Artin's proof that row rank equals column rank for a matrix with entries in a field leads naturally to the formula for the nullity of a matrix and also to an algorithm for solving any system of linear equations in any number of variables. This material could be used in any course on matrix theory or linear algebra.
2008-09-01
Assessing the potential impact on environmental and human health from the production and use of chemicals or from polluted sites involves a multi-criteria evaluation scheme. A priori several parameters are to address, e.g., production tonnage, specific release scenarios, geographical and site-specific factors in addition to various substance dependent parameters. Further socio-economic factors may be taken into consideration. The number of parameters to be included may well appear to be prohibitive for developing a sensible model. The study introduces hierarchical partial order ranking (HPOR) that remedies this problem. By HPOR the original parameters are initially grouped based on their mutual connection and a set of meta-descriptors is derived representing the ranking corresponding to the single groups of descriptors, respectively. A second partial order ranking is carried out based on the meta-descriptors, the final ranking being disclosed though average ranks. An illustrative example on the prioritization of polluted sites is given.
Correlation of Expert and Search Engine Rankings
In previous research it has been shown that link-based web page metrics can be used to predict experts' assessment of quality. We are interested in a related question: do expert rankings of real-world entities correlate with search engine rankings of corresponding web resources? For example, each year US News & World Report publishes a list of (among others) top 50 graduate business schools. Does their expert ranking correlate with the search engine ranking of the URLs of those business schools? To answer this question we conducted 9 experiments using 8 expert rankings on a range of academic, athletic, financial and popular culture topics. We compared the expert rankings with the rankings in Google, Live Search (formerly MSN) and Yahoo (with list lengths of 10, 25, and 50). In 57 search engine vs. expert comparisons, only 1 strong and 4 moderate correlations were statistically significant. In 42 inter-search engine comparisons, only 2 strong and 4 moderate correlations were statistically significant. The ...
Ranking Economic History Journals
Ranking economic history journals
This study ranks-for the first time-12 international academic journals that have economic history as their main topic. The ranking is based on data collected for the year 2007. Journals are ranked using standard citation analysis where we adjust for age, size and self-citation of journals. We also...... compare the leading economic history journals with the leading journals in economics in order to measure the influence on economics of economic history, and vice versa. With a few exceptions, our results confirm the general idea about what economic history journals are the most influential for economic...
This paper presents key new developments in the THES - QS World University Rankings in 2007, related to enhancements to the "Peer Review", "Data Collection" and "Statistical Aggregation" utilised in this ranking as well as discussing the decision to utilise Full-Time Equivalent (FTE) figures for personnel statistics. Indicator correlation is also…
Algebraic and computational aspects of real tensor ranks
This book provides comprehensive summaries of theoretical (algebraic) and computational aspects of tensor ranks, maximal ranks, and typical ranks, over the real number field. Although tensor ranks have been often argued in the complex number field, it should be emphasized that this book treats real tensor ranks, which have direct applications in statistics. The book provides several interesting ideas, including determinant polynomials, determinantal ideals, absolutely nonsingular tensors, absolutely full column rank tensors, and their connection to bilinear maps and Hurwitz-Radon numbers. In addition to reviews of methods to determine real tensor ranks in details, global theories such as the Jacobian method are also reviewed in details. The book includes as well an accessible and comprehensive introduction of mathematical backgrounds, with basics of positive polynomials and calculations by using the Groebner basis. Furthermore, this book provides insights into numerical methods of finding tensor ranks through...
Co-integration Rank Testing under Conditional Heteroskedasticity
null distributions of the rank statistics coincide with those derived by previous authors who assume either i.i.d. or (strict and covariance) stationary martingale difference innovations. We then propose wild bootstrap implementations of the co-integrating rank tests and demonstrate that the associated...... bootstrap rank statistics replicate the first-order asymptotic null distributions of the rank statistics. We show the same is also true of the corresponding rank tests based on the i.i.d. bootstrap of Swensen (2006). The wild bootstrap, however, has the important property that, unlike the i.i.d. bootstrap......, it preserves in the re-sampled data the pattern of heteroskedasticity present in the original shocks. Consistent with this, numerical evidence sug- gests that, relative to tests based on the asymptotic critical values or the i.i.d. bootstrap, the wild bootstrap rank tests perform very well in small samples un...
Ranking Economic History Journals
This study ranks - for the first time - 12 international academic journals that have economic history as their main topic. The ranking is based on data collected for the year 2007. Journals are ranked using standard citation analysis where we adjust for age, size and self-citation of journals. We...... also compare the leading economic history journals with the leading journals in economics in order to measure the influence on economics of economic history, and vice versa. With a few exceptions, our results confirm the general idea about what economic history journals are the most influential...... for economic history, and that, although economic history is quite independent from economics as a whole, knowledge exchange between the two fields is indeed going on....
Adaptive distributional extensions to DFR ranking
Petersen, Casper; Simonsen, Jakob Grue; Järvelin, Kalervo
Divergence From Randomness (DFR) ranking models assume that informative terms are distributed in a corpus differently than non-informative terms. Different statistical models (e.g. Poisson, geometric) are used to model the distribution of non-informative terms, producing different DFR models. An ...
2002-01-01
New Zealand business students and graduates made similar rankings of the five most important workplace competencies: computer literacy, customer service orientation, teamwork and cooperation, self-confidence, and willingness to learn. Graduates placed greater importance on most of the 24 competencies, resulting in a statistically significant…
Ranking Workplace Competencies: Student and Graduate Perceptions.
New Zealand business students and graduates made similar rankings of the five most important workplace competencies: computer literacy, customer service orientation, teamwork and cooperation, self-confidence, and willingness to learn. Graduates placed greater importance on most of the 24 competencies, resulting in a statistically significant…
Non-parametric three-way mixed ANOVA with aligned rank tests.
Research problems that require a non-parametric analysis of multifactor designs with repeated measures arise in the behavioural sciences. There is, however, a lack of available procedures in commonly used statistical packages. In the present study, a generalization of the aligned rank test for the two-way interaction is proposed for the analysis of the typical sources of variation in a three-way analysis of variance (ANOVA) with repeated measures. It can be implemented in the usual statistical packages. Its statistical properties are tested by using simulation methods with two sample sizes (n = 30 and n = 10) and three distributions (normal, exponential and double exponential). Results indicate substantial increases in power for non-normal distributions in comparison with the usual parametric tests. Similar levels of Type I error for both parametric and aligned rank ANOVA were obtained with non-normal distributions and large sample sizes. Degrees-of-freedom adjustments for Type I error control in small samples are proposed. The procedure is applied to a case study with 30 participants per group where it detects gender differences in linguistic abilities in blind children not shown previously by other methods.
Diversifying customer review rankings.
E-commerce Web sites owe much of their popularity to consumer reviews accompanying product descriptions. On-line customers spend hours and hours going through heaps of textual reviews to decide which products to buy. At the same time, each popular product has thousands of user-generated reviews, making it impossible for a buyer to read everything. Current approaches to display reviews to users or recommend an individual review for a product are based on the recency or helpfulness of each review. In this paper, we present a framework to rank product reviews by optimizing the coverage of the ranking with respect to sentiment or aspects, or by summarizing all reviews with the top-K reviews in the ranking. To accomplish this, we make use of the assigned star rating for a product as an indicator for a review's sentiment polarity and compare bag-of-words (language model) with topic models (latent Dirichlet allocation) as a mean to represent aspects. Our evaluation on manually annotated review data from a commercial review Web site demonstrates the effectiveness of our approach, outperforming plain recency ranking by 30% and obtaining best results by combining language and topic model representations.
Outlier detection is an important data mining task for consistency checks, fraud detection, etc. Binary decision making on whether or not an object is an outlier is not appropriate in many applications and moreover hard to parametrize. Thus, recently, methods for outlier ranking have been proposed...
Outlier detection is an important data mining task for consistency checks, fraud detection, etc. Binary decision making on whether or not an object is an outlier is not appropriate in many applications and moreover hard to parametrize. Thus, recently, methods for outlier ranking have been proposed...
@@ The rankings of China's leading retailers for fastmoving consumer goods,a collaboration between Beijingbased CTR Market Research and CIB, comes into its third term. According to our findings,with the competitive advantages such as better shopping environments and cheaper prices,the large-scale retailers, or hypermarkets, are continuing to increase their market shares.
We consider maintaining information about the rank of a matrix under changes of the entries. For n×n matrices, we show an upper bound of O(n1.575) arithmetic operations and a lower bound of Ω(n) arithmetic operations per element change. The upper bound is valid when changing up to O(n0.575) entri...... closed fields. The upper bound for element updates uses fast rectangular matrix multiplication, and the lower bound involves further development of an earlier technique for proving lower bounds for dynamic computation of rational functions.......We consider maintaining information about the rank of a matrix under changes of the entries. For n×n matrices, we show an upper bound of O(n1.575) arithmetic operations and a lower bound of Ω(n) arithmetic operations per element change. The upper bound is valid when changing up to O(n0.575) entries...... in a single column of the matrix. We also give an algorithm that maintains the rank using O(n2) arithmetic operations per rank one update. These bounds appear to be the first nontrivial bounds for the problem. The upper bounds are valid for arbitrary fields, whereas the lower bound is valid for algebraically...
Improving Ranking Using Quantum Probability
The paper shows that ranking information units by quantum probability differs from ranking them by classical probability provided the same data used for parameter estimation. As probability of detection (also known as recall or power) and probability of false alarm (also known as fallout or size) measure the quality of ranking, we point out and show that ranking by quantum probability yields higher probability of detection than ranking by classical probability provided a given probability of ...
Interference Alignment as a Rank Constrained Rank Minimization
We show that the maximization of the sum degrees-of-freedom for the static flat-fading multiple-input multiple-output (MIMO) interference channel is equivalent to a rank constrained rank minimization problem, when the signal spaces span all available dimensions. The rank minimization corresponds to maximizing interference alignment (IA) such that interference spans the lowest dimensional subspace possible. The rank constraints account for the useful signal spaces spanning all available spatial dimensions. That way, we reformulate all IA requirements to requirements involving ranks. Then, we present a convex relaxation of the RCRM problem inspired by recent results in compressed sensing and low-rank matrix completion theory that rely on approximating rank with the nuclear norm. We show that the convex envelope of the sum of ranks of the interference matrices is the sum of their corresponding nuclear norms and introduce tractable constraints that are asymptotically equivalent to the rank constraints for the ini...
Ranking of States and Commodities by Cash Receipts, 1991
This publication identifies the 25 leading agricultural commodities produced in each State and the United States, ranked by the value of cash receipts. The major producing States, ranked by cash receipts, for each of the 25 leading commodities in the United States and for several major commodity groups are also identified. The information is derived from U.S. Department of Agriculture's cash receipts statistics for the marketing of agricultural commodities within States. The ranking of commod...
Low rank Multivariate regression
We consider in this paper the multivariate regression problem, when the target regression matrix $A$ is close to a low rank matrix. Our primary interest in on the practical case where the variance of the noise is unknown. Our main contribution is to propose in this setting a criterion to select among a family of low rank estimators and prove a non-asymptotic oracle inequality for the resulting estimator. We also investigate the easier case where the variance of the noise is known and outline that the penalties appearing in our criterions are minimal (in some sense). These penalties involve the expected value of the Ky-Fan quasi-norm of some random matrices. These quantities can be evaluated easily in practice and upper-bounds can be derived from recent results in random matrix theory.
Om dengang en gymnastikpædagog og højskoleforstander fik en hel generation af bondekarle til at ranke ryggen, løfte blikket og se de andre klasser i øjnene. Og om hvordan samme forstander fattede stærk sympati for nazismen og sågar fik lejlighed til at veksle ord med Hitler.......Om dengang en gymnastikpædagog og højskoleforstander fik en hel generation af bondekarle til at ranke ryggen, løfte blikket og se de andre klasser i øjnene. Og om hvordan samme forstander fattede stærk sympati for nazismen og sågar fik lejlighed til at veksle ord med Hitler....
Netflix [15], the Cite-Seer network of citations [29], or for ranking of college football teams [31]. Another setting which can be reduced to the...centrality measures have been proposed for the analysis of temporal network data in neuroscience , for studying the functional activity in the human brain using...given the score sheet of all soccer games in the England Football Premier League, recording the goal difference for each game, but without disclosing who
A note on the ranking of earthquake forecasts
The ranking problem of earthquake forecasts is considered. We formulate simple statistical requirements to forecasting quality measure R and analyze some R-ranking methods on this basis, in particular, the pari-mutuel gambling method by Zechar&Zhuang (2014).
Paired Comparisons Analysis : An Axiomatic Approach to Rankings in Tournaments
In this paper we present an axiomatic analysis of several ranking methods for tournaments. We find that two of them exhibit a very good behaviour with respect to the set of properties under consideration. One of them is the maximum likelihood ranking, the most common method in statistics and psychol
Many complex systems can be described as multiplex networks in which the same nodes can interact with one another in different layers, thus forming a set of interacting and co-evolving networks. Examples of such multiplex systems are social networks where people are involved in different types of relationships and interact through various forms of communication media. The ranking of nodes in multiplex networks is one of the most pressing and challenging tasks that research on complex networks is currently facing. When pairs of nodes can be connected through multiple links and in multiple layers, the ranking of nodes should necessarily reflect the importance of nodes in one layer as well as their importance in other interdependent layers. In this paper, we draw on the idea of biased random walks to define the Multiplex PageRank centrality measure in which the effects of the interplay between networks on the centrality of nodes are directly taken into account. In particular, depending on the intensity of the in...
Empirical likelihood for balanced ranked-set sampled data
Ranked-set sampling(RSS) often provides more efficient inference than simple random sampling(SRS).In this article,we propose a systematic nonparametric technique,RSS-EL,for hypoth-esis testing and interval estimation with balanced RSS data using empirical likelihood(EL).We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations.In all three cases,RSS is shown to provide more efficient inference than SRS of the same size.Moreover,the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data,such as perfect ranking,identical ranking scheme in two groups,and location shift between two population distributions.The merit of the RSS-EL method is also demonstrated through simulation studies.
Rankings from Fuzzy Pairwise Comparisons
We propose a new method for deriving rankings from fuzzy pairwise comparisons. It is based on the observation that quantification of the uncertainty of the pairwise comparisons should be used to obtain a better crisp ranking, instead of a fuzzified version of the ranking obtained from crisp pairwise
University Rankings and Social Science
University rankings widely affect the behaviours of prospective students and their families, university executive leaders, academic faculty, governments and investors in higher education. Yet the social science foundations of global rankings receive little scrutiny. Rankings that simply recycle reputation without any necessary connection to real…
University Rankings and Social Science
University rankings widely affect the behaviours of prospective students and their families, university executive leaders, academic faculty, governments and investors in higher education. Yet the social science foundations of global rankings receive little scrutiny. Rankings that simply recycle reputation without any necessary connection to real…
Fractional cointegration rank estimation
We consider cointegration rank estimation for a p-dimensional Fractional Vector Error Correction Model. We propose a new two-step procedure which allows testing for further long-run equilibrium relations with possibly different persistence levels. The fi…rst step consists in estimating......-likelihood ratio test of no-cointegration on the estimated p - r common trends that are not cointegrated under the null. The cointegration degree is re-estimated in the second step to allow for new cointegration relationships with different memory. We augment the error correction model in the second step...
Random Walker Ranking for NCAA Division I-A Football
We develop a one-parameter family of ranking systems for NCAA Division I-A football teams based on a collection of voters, each with a single vote, executing independent random walks on a network defined by the teams (vertices) and the games played (edges). The virtue of this class of ranking systems lies in the simplicity of its explanation. We discuss the statistical properties of the randomly walking voters and relate them to the community structure of the underlying network. We compare the results of these rankings for recent seasons with Bowl Championship Series standings and component rankings. To better understand this ranking system, we also examine the asymptotic behaviors of the aggregate of walkers. Finally, we consider possible generalizations to this ranking algorithm.
Rankings, creatividad y urbanismo
2008-08-01
Full Text Available La competencia entre ciudades constituye uno de los factores impulsores de procesos de renovación urbana y los rankings han devenido instrumentos de medida de la calidad de las ciudades. Nos detendremos en el caso de un antiguo barrio industrial hoy en vías de transformación en distrito "creativo" por medio de una intervención urbanística de gran escala. Su análisis nos descubre tres claves críticas. En primer lugar, nos obliga a plantearnos la definición de innovación urbana y cómo se integran el pasado, la identidad y la memoria en la construcción del futuro. Nos lleva a comprender que la innovación y el conocimiento no se "dan" casualmente, sino que son el fruto de una larga y compleja red en la que participan saberes, espacios, actores e instituciones diversas en naturaleza, escala y magnitud. Por último nos obliga a reflexionar sobre el valor que se le otorga a lo local en los procesos de renovación urbana.Competition among cities constitutes one ofthe main factors o furban renewal, and rankings have become instruments to indícate cities quality. Studying the transformation of an old industrial quarter into a "creative district" by the means ofa large scale urban project we highlight three main conclusions. First, itasks us to reconsider the notion ofurban innovation and hoto past, identity and memory should intégrate the future development. Second, it shows that innovation and knowledge doesn't yield per chance, but are the result ofa large and complex grid of diverse knowledges, spaces, agents and institutions. Finally itforces us to reflect about the valué attributed to the "local" in urban renewalprocesses.
2016-04-01
In order to accurately predict a digital camera response to spectral stimuli, the spectral sensitivity functions of its sensor need to be known. These functions can be determined by direct measurement in the lab-a difficult and lengthy procedure-or through simple statistical inference. Statistical inference methods are based on the observation that when a camera responds linearly to spectral stimuli, the device spectral sensitivities are linearly related to the camera rgb response values, and so can be found through regression. However, for rendered images, such as the JPEG images taken by a mobile phone, this assumption of linearity is violated. Even small departures from linearity can negatively impact the accuracy of the recovered spectral sensitivities, when a regression method is used. In our work, we develop a novel camera spectral sensitivity estimation technique that can recover the linear device spectral sensitivities from linear images and the effective linear sensitivities from rendered images. According to our method, the rank order of a pair of responses imposes a constraint on the shape of the underlying spectral sensitivity curve (of the sensor). Technically, each rank-pair splits the space where the underlying sensor might lie in two parts (a feasible region and an infeasible region). By intersecting the feasible regions from all the ranked-pairs, we can find a feasible region of sensor space. Experiments demonstrate that using rank orders delivers equal estimation to the prior art. However, the Rank-based method delivers a step-change in estimation performance when the data is not linear and, for the first time, allows for the estimation of the effective sensitivities of devices that may not even have "raw mode." Experiments validate our method.
Modified likelihood ratio test for homogeneity in normal mixtures with two samples
This paper investigates the modified likelihood ratio test(LRT) for homogeneity in normal mixtures of two samples with mixing proportions unknown. It is proved that the limit distribution of the modified likelihood ratio test is X2(1).
Rank diversity of languages: generic behavior in computational linguistics.
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.
Rank Diversity of Languages: Generic Behavior in Computational Linguistics
Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied. PMID:25849150
This work is a compilation of reports on ongoing research at the University of North Dakota. Topics include: Control Technology and Coal Preparation Research (SO{sub x}/NO{sub x} control, waste management), Advanced Research and Technology Development (turbine combustion phenomena, combustion inorganic transformation, coal/char reactivity, liquefaction reactivity of low-rank coals, gasification ash and slag characterization, fine particulate emissions), Combustion Research (fluidized bed combustion, beneficiation of low-rank coals, combustion characterization of low-rank coal fuels, diesel utilization of low-rank coals), Liquefaction Research (low-rank coal direct liquefaction), and Gasification Research (hydrogen production from low-rank coals, advanced wastewater treatment, mild gasification, color and residual COD removal from Synfuel wastewaters, Great Plains Gasification Plant, gasifier optimization).
A Unified Approach to Constructing Nonparametric Rank Tests.
International Conference on Robust Rank-Based and Nonparametric Methods
The contributors to this volume include many of the distinguished researchers in this area. Many of these scholars have collaborated with Joseph McKean to develop underlying theory for these methods, obtain small sample corrections, and develop efficient algorithms for their computation. The papers cover the scope of the area, including robust nonparametric rank-based procedures through Bayesian and big data rank-based analyses. Areas of application include biostatistics and spatial areas. Over the last 30 years, robust rank-based and nonparametric methods have developed considerably. These procedures generalize traditional Wilcoxon-type methods for one- and two-sample location problems. Research into these procedures has culminated in complete analyses for many of the models used in practice including linear, generalized linear, mixed, and nonlinear models. Settings are both multivariate and univariate. With the development of R packages in these areas, computation of these procedures is easily shared with r...
Beyond Zipf's Law: The Lavalette Rank Function and its Properties
Fontanelli, Oscar; Yang, Yaning; Cocho, Germinal; Li, Wentian
Although Zipf's law is widespread in natural and social data, one often encounters situations where one or both ends of the ranked data deviate from the power-law function. Previously we proposed the Beta rank function to improve the fitting of data which does not follow a perfect Zipf's law. Here we show that when the two parameters in the Beta rank function have the same value, the Lavalette rank function, the probability density function can be derived analytically. We also show both computationally and analytically that Lavalette distribution is approximately equal, though not identical, to the lognormal distribution. We illustrate the utility of Lavalette rank function in several datasets. We also address three analysis issues on the statistical testing of Lavalette fitting function, comparison between Zipf's law and lognormal distribution through Lavalette function, and comparison between lognormal distribution and Lavalette distribution.
University Rankings in Critical Perspective
Pusser, Brian; Marginson, Simon
Ranking Models in Conjoint Analysis
textabstractIn this paper we consider the estimation of probabilistic ranking models in the context of conjoint experiments. By using approximate rather than exact ranking probabilities, we do not need to compute high-dimensional integrals. We extend the approximation technique proposed by
2011-01-01
This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d log n$ adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorith...
2006-01-01
We show that the empirical ranking of volatility models can be inconsistent for the true ranking if the evaluation is based on a proxy for the population measure of volatility. For example, the substitution of a squared return for the conditional variance in the evaluation of ARCH-type models can...
2012-01-01
In this article we explore the dual role of global university rankings in the creation of a new, knowledge-identified, transnational capitalist class and in facilitating new forms of social exclusion. We examine how and why the practice of ranking universities has become widely defined by national and international organisations as an important…
2009-01-01
We introduce a novel type of orientation–selective rank features that are sensitive to contrast modulations (second–order stimuli). Variance Ranklets are designed in close analogy with the standard Ranklets, but use the Siegel–Tukey statistics for dispersion instead of the Wilcoxon statistics. Their
2008-01-01
In the International Statistical Classification of Diseases, Tenth Revision(ICD-10) and Diagnostic and Statistical Manual of Mental Disorder, Third and Fourth Edition(DSM-III-IV), the presence of one of Schneider "first-rank symptoms" (FRS) is symptomatically sufficient for the schizophrenia...
Improving Ranking Using Quantum Probability
The paper shows that ranking information units by quantum probability differs from ranking them by classical probability provided the same data used for parameter estimation. As probability of detection (also known as recall or power) and probability of false alarm (also known as fallout or size) measure the quality of ranking, we point out and show that ranking by quantum probability yields higher probability of detection than ranking by classical probability provided a given probability of false alarm and the same parameter estimation data. As quantum probability provided more effective detectors than classical probability within other domains that data management, we conjecture that, the system that can implement subspace-based detectors shall be more effective than a system which implements a set-based detectors, the effectiveness being calculated as expected recall estimated over the probability of detection and expected fallout estimated over the probability of false alarm.
There are currently two philosophies for building grammars and parsers -- Statistically induced grammars and Wide-coverage grammars. One way to combine the strengths of both approaches is to have a wide-coverage grammar with a heuristic component which is domain independent but whose contribution is tuned to particular domains. In this paper, we discuss a three-stage approach to disambiguation in the context of a lexicalized grammar, using a variety of domain independent heuristic techniques. We present a training algorithm which uses hand-bracketed treebank parses to set the weights of these heuristics. We compare the performance of our grammar against the performance of the IBM statistical grammar, using both untrained and trained weights for the heuristics.
Effect of Doximity Residency Rankings on Residency Applicants’ Program Choices
Full Text Available Introduction: Choosing a residency program is a stressful and important decision. Doximity released residency program rankings by specialty in September 2014. This study sought to investigate the impact of those rankings on residency application choices made by fourth year medical students. Methods: A 12-item survey was administered in October 2014 to fourth year medical students at three schools. Students indicated their specialty, awareness of and perceived accuracy of the rankings, and the rankings’ impact on the programs to which they chose to apply. Descriptive statistics were reported for all students and those applying to Emergency Medicine (EM. Results: A total of 461 (75.8% students responded, with 425 applying in one of the 20 Doximity ranked specialties. Of the 425, 247 (58% were aware of the rankings and 177 looked at them. On a 1-100 scale (100=very accurate, students reported a mean ranking accuracy rating of 56.7 (SD 20.3. Forty-five percent of students who looked at the rankings modified the number of programs to which they applied. The majority added programs. Of the 47 students applying to EM, 18 looked at the rankings and 33% changed their application list with most adding programs. Conclusion: The Doximity rankings had real effects on students applying to residencies as almost half of students who looked at the rankings modified their program list. Additionally, students found the rankings to be moderately accurate. Graduating students might benefit from emphasis on more objective characterization of programs to assess in light of their own interests and personal/career goals
To rank or to be ranked: the impact of global rankings in higher education
Global university rankings have cemented the notion of a world university market arranged in a single "league table" for comparative purposes and have given a powerful impetus to intranational and international competitive pressures in the sector. Both the research rankings by Shanghai Jiao Tong Uni
Parga, Joyce A
In this study, dominance rank instability among male Lemur catta during mating was investigated. Also, data on agonism and sexual behavior across five consecutive mating seasons in a population of L. catta on St. Catherines Island, USA, were collected. Instances of male rank instability were categorized into three types. Type 1 consisted of a temporary switch in the dominance ranks of two males, which lasted for a period of minutes or hours. Type 2 dyadic male agonistic interactions showed highly variable outcomes for a period of time during which wins and losses were neither predictable nor consistent. Type 3 interactions consisted of a single agonistic win by a lower-ranked male over a more dominant male. More Type 2 interactions (indicating greater dominance instability) occurred when males had not spent the previous mating season in the same group, but this trend was not statistically significant. The majority of periods of male rank instability were preceded by female proceptivity or receptivity directed to a lower-ranked male. As such, exhibition of female mate choice for a lower-ranking male appeared to incite male-male competition. Following receipt of female proceptivity or receptivity, males who were lower-ranking took significantly longer to achieve their first agonistic win over a more dominant male than did males who were higher-ranked. Ejaculation frequently preceded loss of dominance. In conclusion, temporary rank reversals and overall dominance rank instability commonly occur among male L. catta in mating contexts, and these temporary increases in dominance status appear to positively affect male mating success. (c) 2008 Wiley-Liss, Inc.
Universal scaling in sports ranking
Ranking is a ubiquitous phenomenon in the human society. By clicking the web pages of Forbes, you may find all kinds of rankings, such as world's most powerful people, world's richest people, top-paid tennis stars, and so on and so forth. Herewith, we study a specific kind, sports ranking systems in which players' scores and prize money are calculated based on their performances in attending various tournaments. A typical example is tennis. It is found that the distributions of both scores and prize money follow universal power laws, with exponents nearly identical for most sports fields. In order to understand the origin of this universal scaling we focus on the tennis ranking systems. By checking the data we find that, for any pair of players, the probability that the higher-ranked player will top the lower-ranked opponent is proportional to the rank difference between the pair. Such a dependence can be well fitted to a sigmoidal function. By using this feature, we propose a simple toy model which can simul...
A Generalized Approach to the Two Sample Problem: The Quantile Approach.
This paper considers identification and estimation of a general nonlinear Errors-in-Variables (EIV) model using two samples. Both samples consist of a dependent variable, some error-free covariates, and an error-prone covariate, for which the measurement error has unknown distribution and could be arbitrarily correlated with the latent true values; and neither sample contains an accurate measurement of the corresponding true variable. We assume that the regression model of interest - the conditional distribution of the dependent variable given the latent true covariate and the error-free covariates - is the same in both samples, but the distributions of the latent true covariates vary with observed error-free discrete covariates. We first show that the general latent nonlinear model is nonparametrically identified using the two samples when both could have nonclassical errors, without either instrumental variables or independence between the two samples. When the two samples are independent and the nonlinear regression model is parameterized, we propose sieve Quasi Maximum Likelihood Estimation (Q-MLE) for the parameter of interest, and establish its root-n consistency and asymptotic normality under possible misspecification, and its semiparametric efficiency under correct specification, with easily estimated standard errors. A Monte Carlo simulation and a data application are presented to show the power of the approach.
Two-Sample, Bivariate Hypothesis Testing Methods Based on Tukey's Depth.
Conducted simulations to explore methods for comparing bivariate distributions corresponding to two independent groups, all of which are based on Tukey's "depth," a generalization of the notion of ranks to multivariate data. Discusses steps needed to control Type I error. (SLD)
FUNSTAT and statistical image representations
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Full Text Available In this paper we continue the study of projective planes which admit collineation groups of low rank (Kallaher [1] and Bachmann [2,3]. A rank 5 collineation group of a projective plane ℙ of order n≠3 is proved to be flag-transitive. As in the rank 3 and rank 4 case this implies that is ℙ not desarguesian and that n is (a prime power of the form m4 if m is odd and n=m2 with m≡0mod4 if n is even. Our proof relies on the classification of all doubly transitive groups of finite degree (which follows from the classification of all finite simple groups.
Frahm, K M; Shepelyansky, D L
We build up a directed network tracing links from a given integer to its divisors and analyze the properties of the Google matrix of this network. The PageRank vector of this matrix is computed numerically and it is shown that its probability is inversely proportional to the PageRank index thus being similar to the Zipf law and the dependence established for the World Wide Web. The spectrum of the Google matrix of integers is characterized by a large gap and a relatively small number of nonzero eigenvalues. A simple semi-analytical expression for the PageRank of integers is derived that allows to find this vector for matrices of billion size. This network provides a new PageRank order of integers.
Rank of Stably Dissipative Graphs
For the class of stably dissipative Lotka-Volterra systems we prove that the rank of its defining matrix, which is the dimension of the associated invariant foliation, is completely determined by the system's graph.
Ranking Queries on Uncertain Data
Uncertain data is inherent in many important applications, such as environmental surveillance, market analysis, and quantitative economics research. Due to the importance of those applications and rapidly increasing amounts of uncertain data collected and accumulated, analyzing large collections of uncertain data has become an important task. Ranking queries (also known as top-k queries) are often natural and useful in analyzing uncertain data. Ranking Queries on Uncertain Data discusses the motivations/applications, challenging problems, the fundamental principles, and the evaluation algorith
2017-04-01
We propose a function-based temporal pooling method that captures the latent structure of the video sequence data - e.g., how frame-level features evolve over time in a video. We show how the parameters of a function that has been fit to the video data can serve as a robust new video representation. As a specific example, we learn a pooling function via ranking machines. By learning to rank the frame-level features of a video in chronological order, we obtain a new representation that captures the video-wide temporal dynamics of a video, suitable for action recognition. Other than ranking functions, we explore different parametric models that could also explain the temporal changes in videos. The proposed functional pooling methods, and rank pooling in particular, is easy to interpret and implement, fast to compute and effective in recognizing a wide variety of actions. We evaluate our method on various benchmarks for generic action, fine-grained action and gesture recognition. Results show that rank pooling brings an absolute improvement of 7-10 average pooling baseline. At the same time, rank pooling is compatible with and complementary to several appearance and local motion based methods and features, such as improved trajectories and deep learning features.
2017-05-01
Complex networks have emerged as a simple yet powerful framework to represent and analyze a wide range of complex systems. The problem of ranking the nodes and the edges in complex networks is critical for a broad range of real-world problems because it affects how we access online information and products, how success and talent are evaluated in human activities, and how scarce resources are allocated by companies and policymakers, among others. This calls for a deep understanding of how existing ranking algorithms perform, and which are their possible biases that may impair their effectiveness. Many popular ranking algorithms (such as Google's PageRank) are static in nature and, as a consequence, they exhibit important shortcomings when applied to real networks that rapidly evolve in time. At the same time, recent advances in the understanding and modeling of evolving networks have enabled the development of a wide and diverse range of ranking algorithms that take the temporal dimension into account. The aim of this review is to survey the existing ranking algorithms, both static and time-aware, and their applications to evolving networks. We emphasize both the impact of network evolution on well-established static algorithms and the benefits from including the temporal dimension for tasks such as prediction of network traffic, prediction of future links, and identification of significant nodes.
2015-09-01
Our results demonstrate RANKL expression was observed in the tumor element in 68% of human OS using IHC. However, the staining intensity was relatively low and only 37% (29/79 of samples exhibited≥10% RANKL positive tumor cells. RANK expression was not observed in OS tumor cells. In contrast, RANK expression was clearly observed in other cells within OS samples, including the myeloid osteoclast precursor compartment, osteoclasts and in giant osteoclast cells. The intensity and frequency of RANKL and RANK staining in OS samples were substantially less than that observed in GCTB samples. The observation that RANKL is expressed in OS cells themselves suggests that these tumors may mediate an osteoclastic response, and anti-RANKL therapy may potentially be protective against bone pathologies in OS. However, the absence of RANK expression in primary human OS cells suggests that any autocrine RANKL/RANK signaling in human OS tumor cells is not operative, and anti-RANKL therapy would not directly affect the tumor.
Ranking structures and rank-rank correlations of countries: The FIFA and UEFA cases
Ranking of agents competing with each other in complex systems may lead to paradoxes according to the pre-chosen different measures. A discussion is presented on such rank-rank, similar or not, correlations based on the case of European countries ranked by UEFA and FIFA from different soccer competitions. The first question to be answered is whether an empirical and simple law is obtained for such (self-) organizations of complex sociological systems with such different measuring schemes. It is found that the power law form is not the best description contrary to many modern expectations. The stretched exponential is much more adequate. Moreover, it is found that the measuring rules lead to some inner structures in both cases.
Ranking structures and Rank-Rank Correlations of Countries. The FIFA and UEFA cases
Ranking of agents competing with each other in complex systems may lead to paradoxes according to the pre-chosen different measures. A discussion is presented on such rank-rank, similar or not, correlations based on the case of European countries ranked by UEFA and FIFA from different soccer competitions. The first question to be answered is whether an empirical and simple law is obtained for such (self-) organizations of complex sociological systems with such different measuring schemes. It is found that the power law form is not the best description contrary to many modern expectations. The stretched exponential is much more adequate. Moreover, it is found that the measuring rules lead to some inner structures, in both cases.
2009-11-01
Recently, the abundance of digital data is enabling the implementation of graph-based ranking algorithms that provide system level analysis for ranking publications and authors. Here, we take advantage of the entire Physical Review publication archive (1893-2006) to construct authors' networks where weighted edges, as measured from opportunely normalized citation counts, define a proxy for the mechanism of scientific credit transfer. On this network, we define a ranking method based on a diffusion algorithm that mimics the spreading of scientific credits on the network. We compare the results obtained with our algorithm with those obtained by local measures such as the citation count and provide a statistical analysis of the assignment of major career awards in the area of physics. A website where the algorithm is made available to perform customized rank analysis can be found at the address http://www.physauthorsrank.org.
2015-01-01
rank genes according to their difference in gene expression levels. This article constructs measures of the agreement of two or more ordered lists. We use the standard deviation of the ranks to define a measure of agreement that both provides an intuitive interpretation and can be applied to any number...... of lists even if some or all are incomplete or censored. The approach can identify change-points in the agreement of the lists and the sequential changes of agreement as a function of the depth of the lists can be compared graphically to a permutation based reference set. The usefulness of these tools...
2009-12-01
By allowing the regression coefficients to change with certain covariates, the class of varying coefficient models offers a flexible approach to modeling nonlinearity and interactions between covariates. This paper proposes a novel estimation procedure for the varying coefficient models based on local ranks. The new procedure provides a highly efficient and robust alternative to the local linear least squares method, and can be conveniently implemented using existing R software package. Theoretical analysis and numerical simulations both reveal that the gain of the local rank estimator over the local linear least squares estimator, measured by the asymptotic mean squared error or the asymptotic mean integrated squared error, can be substantial. In the normal error case, the asymptotic relative efficiency for estimating both the coefficient functions and the derivative of the coefficient functions is above 96%; even in the worst case scenarios, the asymptotic relative efficiency has a lower bound 88.96% for estimating the coefficient functions, and a lower bound 89.91% for estimating their derivatives. The new estimator may achieve the nonparametric convergence rate even when the local linear least squares method fails due to infinite random error variance. We establish the large sample theory of the proposed procedure by utilizing results from generalized U-statistics, whose kernel function may depend on the sample size. We also extend a resampling approach, which perturbs the objective function repeatedly, to the generalized U-statistics setting; and demonstrate that it can accurately estimate the asymptotic covariance matrix.
2011-01-01
Full Text Available Marketing and statistical literature available to practitioners provides a wide range of sampling methods that can be implemented in the context of marketing research. Ranking sampling method is based on taking apart the general population into several strata, namely into several subdivisions which are relatively homogenous regarding a certain characteristic. In fact, the sample will be composed by selecting, from each stratum, a certain number of components (which can be proportional or non-proportional to the size of the stratum until the pre-established volume of the sample is reached. Using ranking sampling within marketing research requires the determination of some relevant statistical indicators - average, dispersion, sampling error etc. To that end, the paper contains a case study which illustrates the actual approach used in order to apply the ranking sample method within a marketing research made by a company which provides Internet connection services, on a particular category of customers – small and medium enterprises.
2009-01-01
In this article we study a semiparametric mixture model for the two-sample problem with right censored data. The model implies that the densities for the continuous outcomes are related by a parametric tilt but otherwise unspecified. It provides a useful alternative to the Cox (1972) proportional hazards model for the comparison of treatments based on right censored survival data. We propose an iterative algorithm for the semiparametric maximum likelihood estimates of the parametric and nonparametric components of the model. The performance of the proposed method is studied using simulation. We illustrate our method in an application to melanoma.
2012-11-19
Background: Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods.Results: To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods.Conclusion: The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. 2012 Wang et al; licensee BioMed Central Ltd.
... 14 Aeronautics and Space 5 2010-01-01 2010-01-01 false Final ranking. 1214.1105 Section 1214.1105... Recruitment and Selection Program § 1214.1105 Final ranking. Final rankings will be based on a combination of... preference will be included in this final ranking in accordance with applicable regulations....
The Globalization of College and University Rankings
In the era of globalization, accountability, and benchmarking, university rankings have achieved a kind of iconic status. The major ones--the Academic Ranking of World Universities (ARWU, or the "Shanghai rankings"), the QS (Quacquarelli Symonds Limited) World University Rankings, and the "Times Higher Education" World…
2000-01-01
This is the final version of ANT-0142 ("An embedding approach to Dwork's conjecture"). It reduces the higher rank case of the conjecture over a general base variety to the rank one case over the affine space. The general rank one case is completed in ANT-0235 "Rank one case of Dwork's conjecture". Both papers will appear in JAMS.
2003-11-01
The APOA1-C3-A4-A5 gene complex encodes genes whose products are implicated in the metabolism of HDL and/or triglycerides. Although the relationship between polymorphisms in this gene cluster and dyslipidemias was first reported more than 15 years ago, association and linkage results have remained inconclusive. This is due, in part, to the oligogenic and multivariate nature of dyslipidemic phenotypes. Therefore, we investigate evidence of linkage of APOC3 and HDL using two samples of dyslipidemic pedigrees: familial combined hyperlipidemia (FCHL) and isolated low-HDL (ILHDL). We used a strategy that deals with several difficulties inherent in the study of complex traits: by using a Bayesian Markov Chain Monte Carlo (MCMC) approach we allow for oligogenic trait models, as well as simultaneous incorporation of covariates, in the context of multipoint analysis. By using this approach on extended pedigrees we provide evidence of linkage of APOC3 and HDL level variation in two samples with different ascertainment. In addition to APOC3, we estimate that two to three genes, each with a substantial effect on total variance, are responsible for HDL variation in both data sets. We also provide evidence, using the FCHL data set, for a pleiotropic effect between HDL, HDL3 and triglycerides at the APOC3 locus.
Exact Rational Expectations, Cointegration, and Reduced Rank Regression
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Exact rational expectations, cointegration, and reduced rank regression
We interpret the linear relations from exact rational expectations models as restrictions on the parameters of the statistical model called the cointegrated vector autoregressive model for non-stationary variables. We then show how reduced rank regression, Anderson (1951), plays an important role...
Tibet natural conservation areas rank first in China
According to statistics conducted by the Environment Bureau of T.A.R，Tibet has established 7 different conservation areas at national level, 8 at provincial level, and 23 at sub-provincial level. The total conservation area reaches to 407,000 sq km, This number accounts for one third oft A. R, and ranks number one in China.
Let Us Rank Journalism Programs
Unlike law, business, and medical schools, as well as universities in general, journalism schools and journalism programs have rarely been ranked. Publishers such as "U.S. News & World Report," "Forbes," "Bloomberg Businessweek," and "Washington Monthly" do not pay them much mind. What is the best…
Measuring and Ranking Value Drivers
textabstractAnalysis of the strength of value drivers is crucial to understand their influence in the process of free cash flow generation. The paper addresses the issue of value driver measurement and ranking. The research reveals that, value drivers have similar pattern across industries.
Charter School Laws: Ranking Scorecard.
This is the fifth report prepared by the Center for Education Reform (CER) evaluating the capacity and flexibility of state laws promoting charter schools. Three primary factors were evaluated in preparing charter-school quality rankings by state. The center finds that the establishment of multiple sponsoring authorities, in addition to local…
Measuring and Ranking Value Drivers
textabstractAnalysis of the strength of value drivers is crucial to understand their influence in the process of free cash flow generation. The paper addresses the issue of value driver measurement and ranking. The research reveals that, value drivers have similar pattern across industries. Furtherm
A Review of Ranking Models in Data Envelopment Analysis
2013-01-01
Full Text Available In the course of improving various abilities of data envelopment analysis (DEA models, many investigations have been carried out for ranking decision-making units (DMUs. This is an important issue both in theory and practice. There exist a variety of papers which apply different ranking methods to a real data set. Here the ranking methods are divided into seven groups. As each of the existing methods can be viewed from different aspects, it is possible that somewhat these groups have an overlapping with the others. The first group conducts the evaluation by a cross-efficiency matrix where the units are self- and peer-evaluated. In the second one, the ranking units are based on the optimal weights obtained from multiplier model of DEA technique. In the third group, super-efficiency methods are dealt with which are based on the idea of excluding the unit under evaluation and analyzing the changes of frontier. The fourth group involves methods based on benchmarking, which adopts the idea of being a useful target for the inefficient units. The fourth group uses the multivariate statistical techniques, usually applied after conducting the DEA classification. The fifth research area ranks inefficient units through proportional measures of inefficiency. The sixth approach involves multiple-criteria decision methodologies with the DEA technique. In the last group, some different methods of ranking units are mentioned.
Cross ranking of cities and regions: population versus income
This paper explores the relationship between the inner economical structure of communities and their population distribution through a rank-rank analysis of official data, along statistical physics ideas within two techniques. The data is taken on Italian cities. The analysis is performed both at a global (national) and at a more local (regional) level in order to distinguish ‘macro’ and ‘micro’ aspects. First, the rank-size rule is found not to be a standard power law, as in many other studies, but a doubly decreasing power law. Next, the Kendall τ and the Spearman ρ rank correlation coefficients which measure pair concordance and the correlation between fluctuations in two rankings, respectively,—as a correlation function does in thermodynamics, are calculated for finding rank correlation (if any) between demography and wealth. Results show non only global disparities for the whole (country) set, but also (regional) disparities, when comparing the number of cities in regions, the number of inhabitants in cities and that in regions, as well as when comparing the aggregated tax income of the cities and that of regions. Different outliers are pointed out and justified. Interestingly, two classes of cities in the country and two classes of regions in the country are found. ‘Common sense’ social, political, and economic considerations sustain the findings. More importantly, the methods show that they allow to distinguish communities, very clearly, when specific criteria are numerically sound. A specific modeling for the findings is presented, i.e. for the doubly decreasing power law and the two phase system, based on statistics theory, e.g. urn filling. The model ideas can be expected to hold when similar rank relationship features are observed in fields. It is emphasized that the analysis makes more sense than one through a Pearson Π value-value correlation analysis
Comparison of the efficiency between two sampling plans for aflatoxins analysis in maize.
Variance and performance of two sampling plans for aflatoxins quantification in maize were evaluated. Eight lots of maize were sampled using two plans: manual, using sampling spear for kernels; and automatic, using a continuous flow to collect milled maize. Total variance and sampling, preparation, and analysis variance were determined and compared between plans through multifactor analysis of variance. Four theoretical distribution models were used to compare aflatoxins quantification distributions in eight maize lots. The acceptance and rejection probabilities for a lot under certain aflatoxin concentration were determined using variance and the information on the selected distribution model to build the operational characteristic curves (OC). Sampling and total variance were lower at the automatic plan. The OC curve from the automatic plan reduced both consumer and producer risks in comparison to the manual plan. The automatic plan is more efficient than the manual one because it expresses more accurately the real aflatoxin contamination in maize.
2014-01-01
Full Text Available Variance and performance of two sampling plans for aflatoxins quantification in maize were evaluated. Eight lots of maize were sampled using two plans: manual, using sampling spear for kernels; and automatic, using a continuous flow to collect milled maize. Total variance and sampling, preparation, and analysis variance were determined and compared between plans through multifactor analysis of variance. Four theoretical distribution models were used to compare aflatoxins quantification distributions in eight maize lots. The acceptance and rejection probabilities for a lot under certain aflatoxin concentration were determined using variance and the information on the selected distribution model to build the operational characteristic curves (OC. Sampling and total variance were lower at the automatic plan. The OC curve from the automatic plan reduced both consumer and producer risks in comparison to the manual plan. The automatic plan is more efficient than the manual one because it expresses more accurately the real aflatoxin contamination in maize.
In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data. To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers. We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy
In men's professional tennis, players aspire to hold the top ranking position. On the way to the top spot, reaching the top 100 can be seen as a significant career milestone. National Federations undertake extensive efforts to assist their players to reach the top 100. However, objective data considering reasonable ranking yardsticks for top 100 success in men's professional tennis are lacking. Therefore, it is difficult for National Federations and those involved in player development to give empirical programming advice to young players. By taking a closer look at the ranking history of professional male tennis players, this article tries to provide those involved in player development a more objective basis for decision-making. The 100 names, countries, birthdates and ranking histories of the top 100 players listed in the Association of Tennis Professionals (ATP) at 31 December 2009 were recorded from websites in the public domain. Descriptive statistics were reported for the ranking milestones of interest. Results confirmed the merits of the International Tennis Federation's junior tour with 91% of the top 100 professionals earning a junior ranking, the mean peak of which was 94.1, s=148.9. On average, top 100 professionals achieved their best junior rankings and earned their first ATP point at similar ages, suggesting that players compete on both the junior and professional tours during their transition. Once professionally ranked, players took an average 4.5, s=2.1 years to reach the ATP top 100 at the mean age of 21.5, s=2.6 years, which contrasts with the mean current age of the top 100 of 26.8, s=3.2. The best professional rankings of players born in 1982 or earlier were positively related to the ages at which players earned their first ATP point and then entered the top 100, suggesting that the ages associated with these ranking milestones may have some forecasting potential. Future work should focus on the change in top 100 demographics over time as well
Full Text Available The paper presents a methodology for calculating the aggregate global university ranking (Aggregated Global University Ranking, or AGUR, which consists of an automated presentation of the comparable lists of names for different universities from particular global university rankings (using Machine Learning and Mining Data algorithms and a simple procedure of aggregating particular global university rankings (summing up the university ranking positions from different particular rankings and their subsequent ranking. The second procedure makes it possible to bring lists of universities from particular rankings, which are nonidentical by length, to one size. The paper includes a sample AGUR for six particular global university rankings as of 2013, as well as cross-correlation matrices and intersection matrices for AGUR for 2011-2013, all created by means of using the Python-based software.
In plain, uncomplicated language, and using detailed examples to explain the key concepts, models, and algorithms in vertical search ranking, Relevance Ranking for Vertical Search Engines teaches readers how to manipulate ranking algorithms to achieve better results in real-world applications. This reference book for professionals covers concepts and theories from the fundamental to the advanced, such as relevance, query intention, location-based relevance ranking, and cross-property ranking. It covers the most recent developments in vertical search ranking applications, such as freshness-based relevance theory for new search applications, location-based relevance theory for local search applications, and cross-property ranking theory for applications involving multiple verticals. It introduces ranking algorithms and teaches readers how to manipulate ranking algorithms for the best results. It covers concepts and theories from the fundamental to the advanced. It discusses the state of the art: development of ...
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Full Text Available Many definitions exist for sample quantiles and are included in statistical software. The need to adopt a standard definition of sample quantiles has been recognized and different definitions have been compared in terms of satisfying some desirable properties, but no consensus has been found. We outline here that comparisons of the sample quantile definitions are irrelevant because the probabilities associated with order-ranked sample values are known exactly. Accordingly, the standard definition for sample quantiles should be based on the true rank probabilities. We show that this allows more accurate inference of the tails of the distribution, and thus improves estimation of the probability of extreme events.
A Practical Guide to Implementing Nonparametric and Rank-Based ProceduresNonparametric Statistical Methods Using R covers traditional nonparametric methods and rank-based analyses, including estimation and inference for models ranging from simple location models to general linear and nonlinear models for uncorrelated and correlated responses. The authors emphasize applications and statistical computation. They illustrate the methods with many real and simulated data examples using R, including the packages Rfit and npsm.The book first gives an overview of the R language and basic statistical c
Rank modulation has been recently proposed as a scheme for storing information in flash memories. While rank modulation has advantages in improving write speed and endurance, the current encoding approach is based on the "push to the top" operation that is not efficient in the general case. We propose a new encoding procedure where a cell level is raised to be higher than the minimal necessary subset - instead of all - of the other cell levels. This new procedure leads to a significantly more compressed (lower charge levels) encoding. We derive an upper bound for a family of codes that utilize the proposed encoding procedure, and consider code constructions that achieve that bound for several special cases.
result in an inferior model being chosen as "best" with a probability that converges to one as the sample size increases. We document the practical relevance of this problem in an empirical application and by simulation experiments. Our results provide an additional argument for using the realized...... variance in out-of-sample evaluations rather than the squared return. We derive the theoretical results in a general framework that is not specific to the comparison of volatility models. Similar problems can arise in comparisons of forecasting models whenever the predicted variable is a latent variable.......We show that the empirical ranking of volatility models can be inconsistent for the true ranking if the evaluation is based on a proxy for the population measure of volatility. For example, the substitution of a squared return for the conditional variance in the evaluation of ARCH-type models can...
Recently it has been recognized that many complex social, technological and biological networks have a multilayer nature and can be described by multiplex networks. Multiplex networks are formed by a set of nodes connected by links having different connotations forming the different layers of the multiplex. Characterizing the centrality of the nodes in a multiplex network is a challenging task since the centrality of the node naturally depends on the importance associated to links of a certain type. Here we propose to assign to each node of a multiplex network a centrality called Functional Multiplex PageRank that is a function of the weights given to every different pattern of connections (multilinks) existent in the multiplex network between any two nodes. Since multilinks distinguish all the possible ways in which the links in different layers can overlap, the Functional Multiplex PageRank can describe important non-linear effects when large relevance or small relevance is assigned to multilinks with overl...
Full Text Available The final ranking of a championship is determined by quality attributes combined with other factors which should be filtered out of any decision on relegation or draft for upper level tournaments. Factors like referees' mistakes and difficulty of certain matches due to its accidental importance to the opponents should have their influence reduced. This work tests approaches to combine classification rules considering the imprecision of the number of points as a measure of quality and of the variables that provide reliable explanation for it. Two home-advantage variables are tested and shown to be apt to enter as explanatory variables. Independence between the criteria is checked against the hypothesis of maximal correlation. The importance of factors and of composition rules is evaluated on the basis of correlation between rank vectors, number of classes and number of clubs in tail classes. Data from five years of the Brazilian Soccer Championship are analyzed.
"Times Higher Education" 100 under 50 ranking is a new twist to the university ranking. It focuses on universities that have a history of 50 years or less with the purpose of offsetting the advantage of prestige of the older ones. This article re-analysed the data publicly available and looked into relevant conceptual and statistical issues. The…
Full Text Available We propose and justify a new approach to constructing optimal nonlinear transforms of random vectors. We show that the proposed transform improves such characteristics of {rank-reduced} transforms as compression ratio, accuracy of decompression and reduces required computational work. The proposed transform ${mathcal T}_p$ is presented in the form of a sum with $p$ terms where each term is interpreted as a particular rank-reduced transform. Moreover, terms in ${mathcal T}_p$ are represented as a combination of three operations ${mathcal F}_k$, ${mathcal Q}_k$ and ${oldsymbol{varphi}}_k$ with $k=1,ldots,p$. The prime idea is to determine ${mathcal F}_k$ separately, for each $k=1,ldots,p$, from an associated rank-constrained minimization problem similar to that used in the Karhunen--Lo`{e}ve transform. The operations ${mathcal Q}_k$ and ${oldsymbol{varphi}}_k$ are auxiliary for f/inding ${mathcal F}_k$. The contribution of each term in ${mathcal T}_p$ improves the entire transform performance. A corresponding unconstrained nonlinear optimal transform is also considered. Such a transform is important in its own right because it is treated as an optimal filter without signal compression. A rigorous analysis of errors associated with the proposed transforms is given.
Recently it has been recognized that many complex social, technological and biological networks have a multilayer nature and can be described by multiplex networks. Multiplex networks are formed by a set of nodes connected by links having different connotations forming the different layers of the multiplex. Characterizing the centrality of the nodes in a multiplex network is a challenging task since the centrality of the node naturally depends on the importance associated to links of a certain type. Here we propose to assign to each node of a multiplex network a centrality called Functional Multiplex PageRank that is a function of the weights given to every different pattern of connections (multilinks) existent in the multiplex network between any two nodes. Since multilinks distinguish all the possible ways in which the links in different layers can overlap, the Functional Multiplex PageRank can describe important non-linear effects when large relevance or small relevance is assigned to multilinks with overlap. Here we apply the Functional Page Rank to the multiplex airport networks, to the neuronal network of the nematode C. elegans, and to social collaboration and citation networks between scientists. This analysis reveals important differences existing between the most central nodes of these networks, and the correlations between their so-called pattern to success.
This paper describes our solution for WSDM Cup 2016. Ranking the query independent importance of scholarly articles is a critical and challenging task, due to the heterogeneity and dynamism of entities involved. Our approach is called Ensemble enabled Weighted PageRank (EWPR). To do this, we first propose Time-Weighted PageRank that extends PageRank by introducing a time decaying factor. We then develop an ensemble method to assemble the authorities of the heterogeneous entities involved in scholarly articles. We finally propose to use external data sources to further improve the ranking accuracy. Our experimental study shows that our EWPR is a good choice for ranking scholarly articles.
Full Text Available In this paper Rank set sampling (RSS is introduced with a view of increasing the efficiency of estimates of Simple regression model. Regression model is considered with respect to samples taken from sampling techniques like Simple random sampling (SRS, Systematic sampling (SYS and Rank set sampling (RSS. It is found that R2 and Adj R2 obtained from regression model based on Rank set sample is higher than rest of two sampling schemes. Similarly Root mean square error, p-values, coefficient of variation are much lower in Rank set based regression model, also under validation technique (Jackknifing there is consistency in the measure of R2, Adj R2 and RMSE in case of RSS as compared to SRS and SYS. Results are supported with an empirical study involving a real data set generated of Pinus Wallichiana taken from block Langate of district Kupwara.
Collaborative ranking is an emerging field of recommender systems that utilizes users' preference data rather than rating values. Unfortunately, neighbor-based collaborative ranking has gained little attention despite its more flexibility and justifiability. This paper proposes a novel framework, called SibRank that seeks to improve the state of the art neighbor-based collaborative ranking methods. SibRank represents users' preferences as a signed bipartite network, and finds similar users, through a novel personalized ranking algorithm in signed networks.
Full Text Available The purpose of the paper is to identify advantages and disadvantages of various methods of constructing rankings. The subject of our study is important due to the international debate on development and welfare measurement methods and ways of comparing results obtained for different countries. Because GDP per capita does not allow sufficient assessment, countries are compared on the basis of many criteria and results are usually presented in form of rankings. We discuss different outranking methods originating from multidimensional statistical analysis and multicriteria optimization and compare them taking into consideration the effect of each method and each set of criteria on the final result. Our remarks are illustrated by rankings of development and economic performance built for European Union countries. Our observations and results can be regarded also as an opinion in the discussion on the report of the International Commission onMeasurement of Economic Performance and Social Progress chaired by J.E. Stiglitz and A. Sen.
Many complex phenomena, from the selection of traits in biological systems to hierarchy formation in social and economic entities, show signs of competition and heterogeneous performance in the temporal evolution of their components, which may eventually lead to stratified structures such as the wealth distribution worldwide. However, it is still unclear whether the road to hierarchical complexity is determined by the particularities of each phenomena, or if there are universal mechanisms of stratification common to many systems. Human sports and games, with their (varied but simplified) rules of competition and measures of performance, serve as an ideal test bed to look for universal features of hierarchy formation. With this goal in mind, we analyse here the behaviour of players and team rankings over time for several sports and games. Even though, for a given time, the distribution of performance ranks varies across activities, we find statistical regularities in the dynamics of ranks. Specifically the ran...
Competence in many domains rests on children developing conceptual and procedural knowledge, as well as procedural flexibility. However, research on the developmental relations between these different types of knowledge has yielded unclear results, in part because little attention has been paid to the validity of the measures or to the effects of prior knowledge on the relations. To overcome these problems, we modeled the three constructs in the domain of equation solving as latent factors and tested (a) whether the predictive relations between conceptual and procedural knowledge were bidirectional, (b) whether these interrelations were moderated by prior knowledge, and (c) how both constructs contributed to procedural flexibility. We analyzed data from 2 measurement points each from two samples (Ns = 228 and 304) of middle school students who differed in prior knowledge. Conceptual and procedural knowledge had stable bidirectional relations that were not moderated by prior knowledge. Both kinds of knowledge contributed independently to procedural flexibility. The results demonstrate how changes in complex knowledge structures contribute to competence development.
We present a simple general method for combining two one-sample confidence procedures to obtain inferences in the two-sample problem. Some applications give striking connections to established methods; for example, combining exact binomial confidence procedures gives new confidence intervals on the difference or ratio of proportions that match inferences using Fisher's exact test, and numeric studies show the associated confidence intervals bound the type I error rate. Combining exact one-sample Poisson confidence procedures recreates standard confidence intervals on the ratio, and introduces new ones for the difference. Combining confidence procedures associated with one-sample t-tests recreates the Behrens-Fisher intervals. Other applications provide new confidence intervals with fewer assumptions than previously needed. For example, the method creates new confidence intervals on the difference in medians that do not require shift and continuity assumptions. We create a new confidence interval for the difference between two survival distributions at a fixed time point when there is independent censoring by combining the recently developed beta product confidence procedure for each single sample. The resulting interval is designed to guarantee coverage regardless of sample size or censoring distribution, and produces equivalent inferences to Fisher's exact test when there is no censoring. We show theoretically that when combining intervals asymptotically equivalent to normal intervals, our method has asymptotically accurate coverage. Importantly, all situations studied suggest guaranteed nominal coverage for our new interval whenever the original confidence procedures themselves guarantee coverage.
This study demonstrates the reliability and validity of the Clergy Occupational Distress Index (CODI). The five-item index allows researchers to measure the frequency that clergy, who traditionally have not been the subject of occupational health studies, experience occupational distress. We assess the reliability and validity of the index using two samples of clergy: a nationally representative sample of clergy and a sample of clergy from nine Protestant denominations. Exploratory factor analysis and Cronbach's scores are generated. Construct validity is measured by examining the association between CODI scores and depressive symptoms while controlling for demographic, ministerial, and health variables. In both samples, the five items of the CODI load onto a single factor and the Cronbach's alpha scores are robust. The regression model indicates that a high score on the CODI (i.e., more frequent occupational distress) is positively associated with having depressive symptoms within the last 4 weeks. The CODI can be used to identify clergy who frequently experience occupational distress and to understand how occupational distress affects clergy's health, ministerial career, and the functioning of their congregation.
Full Text Available University rankings are extremely important not only for future student, but also for universities themselves. They have a large impact on the institutions of higher education. A lot of universities believe, that rankings help them to maintain and create a reputation. Ranking systems function as some kind of fashion arena, where universities make comparisons between themselves. Universities want to improve their position in published classifications, so very often they try to change their policy and strategy. They also try to influence the ranking indicators, for example by hiring Nobel Prize winners. Therefore, there is an increasing need for reliable and transparent information about schools. However universities need not only statistical data, but also the tools, which will be useful in their comparisons and evaluations. The article presents the possibility of using one of the methods of graphic presentation of multidimensional empirical data structure, so called RGM, proposed by M. Rybaczuk. Thanks to this method universities could easily compare one another. They also could identify the fields of their activities, in which they are able to be better. The proposed way of graphical presentation of the universities could be a useful addition to traditional rankings, which just show us a lists of schools from the best to the worst.
The statistical analysis of measurement data has become a key component of many quantum engineering experiments. As standard full state tomography becomes unfeasible for large dimensional quantum systems, one needs to exploit prior information and the ‘sparsity’ properties of the experimental state in order to reduce the dimensionality of the estimation problem. In this paper we propose model selection as a general principle for finding the simplest, or most parsimonious explanation of the data, by fitting different models and choosing the estimator with the best trade-off between likelihood fit and model complexity. We apply two well established model selection methods—the Akaike information criterion (AIC) and the Bayesian information criterion (BIC)—two models consisting of states of fixed rank and datasets such as are currently produced in multiple ions experiments. We test the performance of AIC and BIC on randomly chosen low rank states of four ions, and study the dependence of the selected rank with the number of measurement repetitions for one ion states. We then apply the methods to real data from a four ions experiment aimed at creating a Smolin state of rank 4. By applying the two methods together with the Pearson χ2 test we conclude that the data can be suitably described with a model whose rank is between 7 and 9. Additionally we find that the mean square error of the maximum likelihood estimator for pure states is close to that of the optimal over all possible measurements.
We construct a counterexample to the Rank versus Genus Conjecture, i.e. a closed orientable hyperbolic 3-manifold with rank of its fundamental group smaller than its Heegaard genus. Moreover, we show that the discrepancy between rank and Heegaard genus can be arbitrarily large for hyperbolic 3-manifolds. We also construct toroidal such examples containing hyperbolic JSJ pieces.
The purpose of this study is to offer a comprehensive assessment of journal standings in Marketing from two perspectives. The discipline perspective of rankings is obtained from a collection of published journal ranking studies during the past 15 years. The studies in the published ranking stream are assessed for reliability by examining internal…
The purpose of this study is to offer a comprehensive assessment of journal standings in Marketing from two perspectives. The discipline perspective of rankings is obtained from a collection of published journal ranking studies during the past 15 years. The studies in the published ranking stream are assessed for reliability by examining internal…
The Probability Ranking Principle states that the document set with the highest values of probability of relevance optimizes information retrieval effectiveness given the probabilities are estimated as accurately as possible. The key point of the principle is the separation of the document set into two subsets with a given level of fallout and with the highest recall. The paper introduces the separation between two vector subspaces and shows that the separation yields a more effective performance than the optimal separation into subsets with the same available evidence, the performance being measured with recall and fallout. The result is proved mathematically and exemplified experimentally.
Full Text Available A high-quality manufacturing system should be capable to meet the company goals. Moreover, it is essential for any organization that its manufacturing system should be aligned with company’s strategy. There is always a potential for improvement in components of manufacturing systems but it is also essential to identify theparticular areas of the components that need improvement. In this paper, we have discussed the most appropriate criterion for good manufacturing systems with the help of a survey that indentified the importance of seven different criteria according to the experts experience and we ranked them accordingly.
Two headwaters located in southwest France were monitored for 3 and 2 years (Auvézère and Aixette watershed, respectively) with two sampling strategies: grab and passive sampling with polar organic chemical integrative sampler (POCIS). These watersheds are rural and characterized by agricultural areas with similar breeding practices, except that the Auvézère watershed contains apple production for agricultural diversification and the downstream portion of the Aixette watershed is in a peri-urban area. The agricultural activities of both are extensive, i.e., with limited supply of fertilizer and pesticides. The sampling strategies used here give specific information: grab samples for higher pesticide content and POCIS for contamination background noise and number of compounds found. Agricultural catchments in small headwater streams are characterized by a background noise of pesticide contamination in the range of 20-70 ng/L, but there may also be transient and high-peak pesticide contamination (2000-3000 ng/L) caused by rain events, poor use of pesticides, and/or the small size of the water body. This study demonstrates that between two specific runoff events, contamination was low; hence the importance of passive sampler use. While the peak pesticide concentrations seen here are a toxicity risk for aquatic life, the pesticide background noise of single compounds do not pose obvious acute nor chronic risks; however, this study did not consider the risk from synergistic "cocktail" effects. Proper tools and sampling strategies may link watershed activities (agricultural, non-agricultural) to pesticides detected in the water, and data from both grab and passive samples can contribute to discussions on environmental effects in headwaters, an area of great importance for biodiversity.
S. Panchapakesan has made significant contributions to ranking and selection and has published in many other areas of statistics, including order statistics, reliability theory, stochastic inequalities, and inference. Written in his honor, the twenty invited articles in this volume reflect recent advances in these areas and form a tribute to Panchapakesan's influence and impact on these areas. Thematically organized, the chapters cover a broad range of topics from: Inference; Ranking and Selection; Multiple Comparisons and Tests; Agreement Assessment; Reliability; and Biostatistics. Featuring
Full Text Available Whenever ranking data are collected, such as in elections, surveys, and database searches, it is frequently the case that partial rankings are available instead of, or sometimes in addition to, full rankings. Statistical methods for partial rankings have been discussed in the literature. However, there has been relatively little published on their Fourier analysis, perhaps because the abstract nature of the transforms involved impede insight. This paper provides as its novel contributions an analysis of the Fourier transform for partial rankings, with particular attention to the first three ranks, while emphasizing on basic signal processing properties of transform magnitude and phase. It shows that the transform and its magnitude satisfy a projection invariance and analyzes the reconstruction of data from either magnitude or phase alone. The analysis is motivated by appealing to corresponding properties of the familiar DFT and by application to two real-world data sets.
Abstract Background In many areas of medical research, a bivariate analysis is desirable because it simultaneously tests two response variables that are of equal interest and importance in two populations. Several parametric and nonparametric bivariate procedures are available for the location problem but each of them requires a series of stringent assumptions such as specific distribution, affine-invariance or elliptical symmetry. The aim of this study is to propose a powerful test statistic...
Thousands of safety issues have been collected on-line at the Idaho National Engineering and Environmental Laboratory (INEEL) as part of the Issue Management Plan. However, there has been no established approach to prioritize collected and future issues. The authors developed a methodology, based on hazards assessment, to identify and risk rank over 5000 safety issues collected at INEEL. This approach required that it was easily applied and understandable for site adaptation and commensurate with the Integrated Safety Plan. High-risk issues were investigated and mitigative/preventive measures were suggested and ranked based on a cost-benefit scheme to provide risk-informed safety measures. This methodology was consistent with other integrated safety management goals and tasks providing a site-wide risk informed decision tool to reduce hazardous conditions and focus resources on high-risk safety issues. As part of the issue management plan, this methodology was incorporated at the issue collection level and training was provided to management to better familiarize decision-makers with concepts of safety and risk. This prioritization methodology and issue dissemination procedure will be discussed. Results of issue prioritization and training efforts will be summarized. Difficulties and advantages of the process will be reported. Development and incorporation of this process into INEELs lessons learned reporting and the site-wide integrated safety management program will be shown with an emphasis on establishing self reliance and ownership of safety issues.
Thousands of safety issues have been collected on-line at the Idaho National Engineering and Environmental Laboratory (INEEL) as part of the Issue Management Plan. However, there has been no established approach to prioritize collected and future issues. The authors developed a methodology, based on hazards assessment, to identify and risk rank over 5000 safety issues collected at INEEL. This approach required that it was easily applied and understandable for site adaptation and commensurate with the Integrated Safety Plan. High-risk issues were investigated and mitigative/preventive measures were suggested and ranked based on a cost-benefit scheme to provide risk-informed safety measures. This methodology was consistent with other integrated safety management goals and tasks providing a site-wide risk-informed decision tool to reduce hazardous conditions and focus resources on high-risk safety issues. As part of the issue management plan, this methodology was incorporated at the issue collection level and training was provided to management to better familiarize decision-makers with concepts of safety and risk. This prioritization methodology and issue dissemination procedure will be discussed. Results of issue prioritization and training efforts will be summarized. Difficulties and advantages of the process will be reported. Development and incorporation of this process into INEEL's lessons learned reporting and the site-wide integrated safety management program will be shown with an emphasis on establishing self reliance and ownership of safety issues.
Rank modulation is a way of encoding information to correct errors in flash memory devices as well as impulse noise in transmission lines. Modeling rank modulation involves construction of packings of the space of permutations equipped with the Kendall tau distance. We present several general constructions of codes in permutations that cover a broad range of code parameters. In particular, we show a number of ways in which conventional error-correcting codes can be modified to correct errors in the Kendall space. Codes that we construct afford simple encoding and decoding algorithms of essentially the same complexity as required to correct errors in the Hamming metric. For instance, from binary BCH codes we obtain codes correcting $t$ Kendall errors in $n$ memory cells that support the order of $n!/(\\log_2n!)^t$ messages, for any constant $t= 1,2,...$ We also construct families of codes that correct a number of errors that grows with $n$ at varying rates, from $\\Theta(n)$ to $\\Theta(n^{2})$. One of our constr...
We propose an adiabatic quantum algorithm to evaluate the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this quantum algorithm outputs any component of the PageRank vector-and thus the ranking of the corresponding webpage-in a time which scales polylogarithmically in the number of webpages. This would constitute an exponential speed-up with respect to all known classical algorithms designed to evaluate the PageRank.
A graph is called an integral graph if it has an integral spectrum i.e., all eigen-values are integers. A graph is called circulant graph if it is Cayley graph on the circulant group, i.e., its adjacency matrix is circulant. The rank of a graph is defined to be the rank of its adjacency matrix. This importance of the rank, due to applications in physics, chemistry and combinatorics. In this paper, using Ramanujan sums, we study the rank of integral circulant graphs and gave some simple computational formulas for the rank and provide an example which shows the formula is sharp.
... 24 Housing and Urban Development 3 2010-04-01 2010-04-01 false Ranking of applications. 599.401... Communities § 599.401 Ranking of applications. (a) Ranking order. Rural and urban applications will be ranked... applications ranked first. (b) Separate ranking categories. After initial ranking, both rural and...
The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists ab aeterno. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. While PageRank highlights very well known nodes with many ingoing links, CheiRank highlights very communicative nodes with many outgoing links. In this way the ranking becomes two-dimensional. Using CheiRank and PageRank we analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories.
1999-01-01
The work described in this report has been performed as a part of the RESTRAT Project FI4P-CT95-0021a (PL 950128) co-funded by the Nuclear Fission Safety Programme of the European Commission. The RESTRAT project has the overall objective of developinggeneric methodologies for ranking restoration...... techniques as a function of contamination and site characteristics. The project includes analyses of existing remediation methodologies and contaminated sites, and is structured in the following steps:-characterisation of relevant contaminated sites -identication and characterisation of relevant restoration...... techniques -assessment of the radiological impact -development and application of a selection methodology for restoration options -formulation ofgeneric conclusions and development of a manual The project is intended to apply to situations in which sites with nuclear installations have been contaminated...
Cosmic Statistics of Statistics
The errors on statistics measured in finite galaxy catalogs are exhaustively investigated. The theory of errors on factorial moments by Szapudi & Colombi (1996) is applied to cumulants via a series expansion method. All results are subsequently extended to the weakly non-linear regime. Together with previous investigations this yields an analytic theory of the errors for moments and connected moments of counts in cells from highly nonlinear to weakly nonlinear scales. The final analytic formu...
Comparing classical and quantum PageRanks
Loke, T; Rodriguez, J; Small, M; Wang, J B
2015-01-01
Following recent developments in quantum PageRanking, we present a comparative analysis of discrete-time and continuous-time quantum-walk-based PageRank algorithms. For the discrete-time case, we introduce an alternative PageRank measure based on the maximum probabilities achieved by the walker on the nodes. We demonstrate that the required time of evolution does not scale significantly with increasing network size. We affirm that all three quantum PageRank measures considered here distinguish clearly between outerplanar hierarchical, scale-free, and Erd\\"os-R\\'enyi network types. Relative to classical PageRank and to different extents, the quantum measures better highlight secondary hubs and resolve ranking degeneracy among peripheral nodes for the networks we studied in this paper.
The world is addicted to ranking: everything, from the reputation of scientists, journals, and universities to purchasing decisions is driven by measured or perceived differences between them. Here, we analyze empirical data capturing real time ranking in a number of systems, helping to identify the universal characteristics of ranking dynamics. We develop a continuum theory that not only predicts the stability of the ranking process, but shows that a noise-induced phase transition is at the heart of the observed differences in ranking regimes. The key parameters of the continuum theory can be explicitly measured from data, allowing us to predict and experimentally document the existence of three phases that govern ranking stability.
Ranking is always an important task in machine learning and information retrieval, e.g., collaborative filtering, recommender systems, drug discovery, etc. A kernel-based stochastic gradient descent algorithm with the least squares loss is proposed for ranking in this paper. The implementation of this algorithm is simple, and an expression of the solution is derived via a sampling operator and an integral operator. An explicit convergence rate for leaning a ranking function is given in terms of the suitable choices of the step size and the regularization parameter. The analysis technique used here is capacity independent and is novel in error analysis of ranking learning. Experimental results on real-world data have shown the effectiveness of the proposed algorithm in ranking tasks, which verifies the theoretical analysis in ranking error.
We consider rank modulation codes for flash memories that allow for handling arbitrary charge drop errors. Unlike classical rank modulation codes used for correcting errors that manifest themselves as swaps of two adjacently ranked elements, the proposed \\emph{translocation rank codes} account for more general forms of errors that arise in storage systems. Translocations represent a natural extension of the notion of adjacent transpositions and as such may be analyzed using related concepts in combinatorics and rank modulation coding. Our results include tight bounds on the capacity of translocation rank codes, construction techniques for asymptotically good codes, as well as simple decoding methods for one class of structured codes. As part of our exposition, we also highlight the close connections between the new code family and permutations with short common subsequences, deletion and insertion error-correcting codes for permutations and permutation arrays.
Distributions over rankings are used to model data in various settings such as preference analysis and political elections. The factorial size of the space of rankings, however, typically forces one to make structural assumptions, such as smoothness, sparsity, or probabilistic independence about these underlying distributions. We approach the modeling problem from the computational principle that one should make structural assumptions which allow for efficient calculation of typical probabilistic queries. For ranking models, "typical" queries predominantly take the form of partial ranking queries (e.g., given a user's top-k favorite movies, what are his preferences over remaining movies?). In this paper, we argue that riffled independence factorizations proposed in recent literature [7, 8] are a natural structural assumption for ranking distributions, allowing for particularly efficient processing of partial ranking queries.
Conditionally bounding analytic ranks of elliptic curves
We describe a method for bounding the rank of an elliptic curve under the assumptions of the Birch and Swinnerton-Dyer conjecture and the generalized Riemann hypothesis. As an example, we compute, under these conjectures, exact upper bounds for curves which are known to have rank at least as large as 20, 21, 22, 23, and 24. For the known curve of rank at least 28, we get a bound of 30.
It is demonstrated that every (0,1)-matrix of size n×m having Boolean rank n contains a column with at least √n/2−1 zero entries. This bound is shown to be asymptotically optimal. As a corollary, it is established that the size of a full-rank Boolean matrix is bounded from above by a function of its tropical and determinantal ranks. Bibliography: 16 titles.
An abstract polytope of rank n is said to be chiral if its automorphism group has precisely two orbits on the flags, such that adjacent flags belong to distinct orbits. The present paper describes a general method for deriving new finite chiral polytopes from old finite chiral polytopes of the same rank. In particular, the technique is used to construct many new examples in ranks 3, 4 and 5.
We prove that (i) rank(K2(E)) 1 for all elliptic curves E defined over Q with a rational torsion point of exact order N 4; (ii) rank(K2(E)) 1 for all but at most one R-isomorphism class of elliptic curves E defined over Q with a rational torsion point of exact order 3. We give some sufficient conditions for rank(K2(EZ)) 1.
A mere hyperbolic law, like the Zipf's law power function, is often inadequate to describe rank-size relationships. An alternative theoretical distribution is proposed based on theoretical physics arguments starting from the Yule-Simon distribution. A modeling is proposed leading to a universal form. A theoretical suggestion for the "best (or optimal) distribution", is provided through an entropy argument. The ranking of areas through the number of cities in various countries and some sport competition ranking serves for the present illustrations.
We prove that (i) rank(K2(E))≥1 for all elliptic curves E defined over Q with a rational torsion point of exact order N≥ 4;(ii) rank(K2(E))≥1 for all but at most one R-isomorphism class of elliptic curves E defined over Q with a rational torsion point of exact order 3.We give some sufficient conditions for rank(K2(Ez))≥1.
For an exponential on a nonarchimedean ordered field, we introduce the notion of the exponential rank, in analogy to the rank of the field. This gives information about the growth rate of the exponential, and about the convex valuations on the field which are compatible with the exponential. We give several characterizations of these valuations, using maps induced by the exponential on the value group of the natural valuation and on the rank of the field. Finally, we construct exponential fie...
I have traced the theories of Otto Rank as they appeared in his major technical writings. Against this background, I have discussed references to Rank in past and contemporary psychoanalytic literature. This paper describes three important contributions of Rank--his birth trauma theory, leading to his theory of the birth of the self; his emphasis on present experience (forerunner of the current "here-and-now" theory); and his writings about the creative potential of the termination process.
2010-01-01
Invenio is the web-based integrated digital library system developed at CERN. Within this framework, we present four types of ranking models based on the citation graph that complement the simple approach based on citation counts: time-dependent citation counts, a relevancy ranking which extends the PageRank model, a time-dependent ranking which combines the freshness of citations with PageRank and a ranking that takes into consideration the external citations. We present our analysis and results obtained on two main data sets: Inspire and CERN Document Server. Our main contributions are: (i) a study of the currently available ranking methods based on the citation graph; (ii) the development of new ranking methods that correct some of the identified limitations of the current methods such as treating all citations of equal importance, not taking time into account or considering the citation graph complete; (iii) a detailed study of the key parameters for these ranking methods. (The original publication is ava...
Methods for matrix decomposition have found numerous applications in image processing, in particular for the problem of template decomposition. Since existing matrix decomposition techniques are mainly concerned with the linear domain, we consider it timely to investigate matrix decomposition techniques in the nonlinear domain with applications in image processing. The mathematical basis for these investigations is the new theory of rank within minimax algebra. Thus far, only minimax decompositions of rank 1 and rank 2 matrices into outer product expansions are known to the image processing community. We derive a heuristic algorithm for the decomposition of matrices having arbitrary rank.
The Department of Homeland Security (DHS) characterized and prioritized the physical cross-border threats and hazards to the nation stemming from terrorism, market-driven illicit flows of people and goods (illegal immigration, narcotics, funds, counterfeits, and weaponry), and other nonmarket concerns (movement of diseases, pests, and invasive species). These threats and hazards pose a wide diversity of consequences with very different combinations of magnitudes and likelihoods, making it very challenging to prioritize them. This article presents the approach that was used at DHS to arrive at a consensus regarding the threats and hazards that stand out from the rest based on the overall risk they pose. Due to time constraints for the decision analysis, it was not feasible to apply multiattribute methodologies like multiattribute utility theory or the analytic hierarchy process. Using a holistic approach was considered, such as the deliberative method for ranking risks first published in this journal. However, an ordinal ranking alone does not indicate relative or absolute magnitude differences among the risks. Therefore, the use of the deliberative method for ranking risks is not sufficient for deciding whether there is a material difference between the top-ranked and bottom-ranked risks, let alone deciding what the stand-out risks are. To address this limitation of ordinal rankings, the deliberative method for ranking risks was augmented by adding an additional step to transform the ordinal ranking into a ratio scale ranking. This additional step enabled the selection of stand-out risks to help prioritize further analysis.
The work described in this report has been performed as a part of the RESTRAT Project FI4P-CT95-0021a (PL 950128) co-funded by the Nuclear Fission Safety Programme of the European Commission. The RESTRAT project has the overall objective of developing generic methodologies for ranking restoration techniques as a function of contamination and site characteristics. The project includes analyses of existing remediation methodologies and contaminated sites, and is structured in the following steps: characterisation of relevant contaminated sites; identification and characterisation of relevant restoration techniques; assessment of the radiological impact; development and application of a selection methodology for restoration options; formulation of generic conclusions and development of a manual. The project is intended to apply to situations in which sites with nuclear installations have been contaminated with radioactive materials as a result of the operation of these installations. The areas considered for remedial measures include contaminated land areas, rivers and sediments in rivers, lakes, and sea areas. Five contaminated European sites have been studied. Various remedial measures have been envisaged with respect to the optimisation of the protection of the populations being exposed to the radionuclides at the sites. Cost-benefit analysis and multi-attribute utility analysis have been applied for optimisation. Health, economic and social attributes have been included and weighting factors for the different attributes have been determined by the use of scaling constants. (au)
The long controversy over the term ‘Quaternary' as a chronostratigraphic unit may be reaching an apotheosis, judging from recent papers (Pillans and Naish, 2004; Gibbard et al., 2005; and referencest herein). The debate is no longer centered on whether there should be a place in the geological time scale for a unit termed ‘Quaternary'-despite its dubious past, it cannot be denied that a large body of earth-historical research is strongly identified with this term. The challenge now concerns an appropriate rank and definition of Quaternary with regard to other chronostratigraphic units. Several options have been proposed (Pillans and Naish, 2004), and Gibbard et al. (2005) encourage a debate on these before decision is reached. In this brief note, we describe an arrangement not previously considered that seems advantageous. It is instructive, however, to first review the Pleistocene Series and Neogene System, the two units that are directly affected by introduction of the Quaternary into the chronostratigraphic hierarchy.
U.S. Environmental Protection Agency — The RANKED_OAS are all the Conservation Opportunity Areas identified by MoRAP that have subsequently been ranked by patch size, landform representation, and the...
A new algorithm to generate all Dyck words is presented, which is used in ranking and unranking Dyck words. We emphasize the importance of using Dyck words in encoding objects related to Catalan numbers. As a consequence of formulas used in the ranking algorithm we can obtain a recursive formula for the nth Catalan number.
The amount of online information has grown exponentially over the past few decades, and users become more and more dependent on ranking and recommendation systems to address their information seeking needs. The advance in information technologies has enabled users to provide feedback on the utilities of the underlying ranking and recommendation…
We classify rank 2 cluster varieties (those whose corresponding skew-form has rank 2) according to the deformation type of a generic fiber U of their X-spaces, as defined by Fock and Goncharov. Our approach is based on the work of Gross, Hacking, and Keel for cluster varieties and log Calabi...
Regression problems with a number of related response variables are typically analyzed by separate multiple regressions. This paper shows how these regressions can be visualized jointly in a biplot based on reduced-rank regression. Reduced-rank regression combines multiple regression and principal c
This article presents a comparative judgment approach for holistically scored constructed response tasks. In this approach, the grader rank orders (rather than rate) the quality of a small set of responses. A prior automated evaluation of responses guides both set formation and scaling of rankings. Sets are formed to have similar prior scores and…
Past research has observed that certain subgroups (e.g., individuals who are overweight/obese) have inaccurate estimates of survival rates for particular cancers (e.g., colon cancer). However, no study has examined whether the lay public can accurately rank cancer survival rates in comparison with one another (i.e., rank cancers from most deadly…
We classify rank 2 cluster varieties (those whose corresponding skew-form has rank 2) according to the deformation type of a generic fiber U of their X-spaces, as defined by Fock and Goncharov. Our approach is based on the work of Gross, Hacking, and Keel for cluster varieties and log Calabi-Yau ...
We describe our participation in the WebCLEF 2007 task, targeted at snippet retrieval from web data. Our system ranks snippets based on a simple similarity-based centrality, inspired by the web page ranking algorithms. We experimented with retrieval units (sentences and paragraphs) and with the
Full Text Available The purpose of this paper is to explore international university ranking systems. As a compilation study this paper provides specific criteria that each ranking system uses and main critiques regarding these ranking systems. Since there are many ranking systems in this area of research, this study focused on only most cited and referred ranking systems. As there is no consensus in terms of the criteria that these systems use, this paper has no intention of identifying the best ranking system based on a comparative analysis. Rather, this paper may inform relevant interest groups in higher education about the ranking systems and their decisive factors as universities attempt to place their names on the most known ranking systems. This study may provide brief but extensive information to Turkish Higher Education researchers and universities as they participate in Erasmus and Mevlana to open their doors to international higher education world. To become one of the top universities, this paper suggests that universities should know and understand the realities of international competitiveness in higher education and based on such understanding they can mold their physical, structural and academic futures toward achieving their visions and foundational missions.
Entity ranking using Wikipedia as a pivot
Rankings and the Global Reputation Race
This chapter delves into the growing influence and impact of rankings on higher education, as a lens through which to view how the race for reputation and status is changing the higher education landscape, both globally and nationally. The author considers the extent to which rankings are driving policy choices and institutional decisions and the…
We study higher rank Donaldson-Thomas invariants of a Calabi-Yau 3-fold using Joyce-Song's wall-crossing formula. We construct quivers whose counting invariants coincide with the Donaldson-Thomas invariants. As a corollary, we prove the integrality and a certain symmetry for the higher rank invariants.
We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix. The consistency results are valid not only in the classic asymptotic regime, when the number of responses $n$ and predictors $p$ stays bounded, and the number of observations $m$ grows, but also when either, or both, $n$ and $p$ grow, possibly much faster than $m$. Our finite sample prediction and estimation performance bounds show that the RSC estimator achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly ...
We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study Gray codes for the local rank-modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding window size, and overlap between adjacent windows. We show our constructed codes have asymptotically-optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.
The PageRank algorithm enables to rank the nodes of a network through a specific eigenvector of the Google matrix, using a damping parameter $\\alpha \\in ]0,1[$. Using extensive numerical simulations of large web networks, we determine numerically and analytically the universal features of PageRank vector at its emergence when $\\alpha \\rightarrow 1$. The whole network can be divided into a core part and a group of invariant subspaces. For $ \\alpha \\rightarrow 1$ the PageRank converges to a universal power law distribution on the invariant subspaces whose size distribution also follows a universal power law. The convergence of PageRank at $ \\alpha \\rightarrow 1$ is controlled by eigenvalues of the core part of the Google matrix which are exponentially close to unity leading to large relaxation times as for example in spin glasses.
The PageRank algorithm enables us to rank the nodes of a network through a specific eigenvector of the Google matrix, using a damping parameter {alpha} Element-Of ]0, 1[. Using extensive numerical simulations of large web networks, with a special accent on British University networks, we determine numerically and analytically the universal features of the PageRank vector at its emergence when {alpha} {yields} 1. The whole network can be divided into a core part and a group of invariant subspaces. For {alpha} {yields} 1, PageRank converges to a universal power-law distribution on the invariant subspaces whose size distribution also follows a universal power law. The convergence of PageRank at {alpha} {yields} 1 is controlled by eigenvalues of the core part of the Google matrix, which are extremely close to unity, leading to large relaxation times as, for example, in spin glasses. (paper)
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email retrieval ranking approach to tackle this problem. It ranks the retrieved emails based on a scoring function that depends on crucial email fields, namely subject, content, and sender. The paper also proposes an architecture to allow every user in a network/group of users to be able, if permissible, to know the most important network senders who are interested in his submitted query words. The experimental evaluation on Enron corpus prove that our approach outperforms known email retrieval ranking approaches.
Email Retrieval task has recently taken much attention to help the user retrieve the email(s) related to the submitted query. Up to our knowledge, existing email retrieval ranking approaches sort the retrieved emails based on some heuristic rules, which are either search clues or some predefined user criteria rooted in email fields. Unfortunately, the user usually does not know the effective rule that acquires best ranking related to his query. This paper presents a new email retrieval ranking approach to tackle this problem. It ranks the retrieved emails based on a scoring function that depends on crucial email fields, namely subject, content, and sender. The paper also proposes an architecture to allow every user in a network/group of users to be able, if permissible, to know the most important network senders who are interested in his submitted query words. The experimental evaluation on Enron corpus prove that our approach outperforms known email retrieval ranking approaches
Full Text Available Abstract Background Journal impact factors and their ranks are used widely by journals, researchers, and research assessment exercises. Methods Based on citations to journals in research and experimental medicine in 2005, Bayesian Markov chain Monte Carlo methods were used to estimate the uncertainty associated with these journal performance indicators. Results Intervals representing plausible ranges of values for journal impact factor ranks indicated that most journals cannot be ranked with great precision. Only the top and bottom few journals could place any confidence in their rank position. Intervals were wider and overlapping for most journals. Conclusion Decisions placed on journal impact factors are potentially misleading where the uncertainty associated with the measure is ignored. This article proposes that caution should be exercised in the interpretation of journal impact factors and their ranks, and specifically that a measure of uncertainty should be routinely presented alongside the point estimate.
Full Text Available In recent years, the Internet has become embedded into the purchasing decision of consumers. The purpose of this paper is to study whether the Internet behavior of users correlates with their actual behavior in computer games market. Rather than proposing the most accurate model for computer game sales, we aim to investigate to what extent web search query data can be exploited to nowcast (contraction of “now” and “forecasting” referring to techniques used to make short-term forecasts (predict the present status of the ranking of mobile games in the world. Google search query data is used for this purpose, since this data can provide a real-time view on the topics of interest. Various statistical techniques are used to show the effectiveness of using web search query data to nowcast mobile games ranking.
Measures of galaxy environment -- II. Rank-ordered mark correlations
We analyze environmental correlations using mark clustering statistics with the mock galaxy catalogue constructed by Muldrew et al. (Paper I). We find that mark correlation functions are able to detect even a small dependence of galaxy properties on the environment, quantified by the overdensity $1+\\delta$, while such a small dependence would be difficult to detect by traditional methods. We then show that rank ordering the marks and using the rank as a weight is a simple way of comparing the correlation signals for different marks. With this we quantify to what extent fixed-aperture overdensities are sensitive to large-scale halo environments, nearest-neighbor overdensities are sensitive to small-scale environments within haloes, and colour is a better tracer of overdensity than is luminosity.
Full Text Available The Kruskal-Wallis test is a non-parametric test for the equality of K population medians. The test statistic involved is a measure of the overall closeness of the K average ranks in the individual samples to the average rank in the combined sample. The resulting acceptance region of the test however may not be the smallest region with the required acceptance probability under the null hypothesis. Presently an alternative acceptance region is constructed such that it has the smallest size, apart from having the required acceptance probability. Compared to the Kruskal-Wallis test, the alternative test is found to have larger average power computed from the powers along the evenly chosen directions of deviation of the medians.
The purpose of this paper is to highlight the importance of a population model in guiding the design and interpretation of simulation studies used to investigate the Spearman rank correlation. The Spearman rank correlation has been known for over a hundred years to applied researchers and methodologists alike and is one of the most widely used non-parametric statistics. Still, certain misconceptions can be found, either explicitly or implicitly, in the published literature because a population definition for this statistic is rarely discussed within the social and behavioural sciences. By relying on copula distribution theory, a population model is presented for the Spearman rank correlation, and its properties are explored both theoretically and in a simulation study. Through the use of the Iman-Conover algorithm (which allows the user to specify the rank correlation as a population parameter), simulation studies from previously published articles are explored, and it is found that many of the conclusions purported in them regarding the nature of the Spearman correlation would change if the data-generation mechanism better matched the simulation design. More specifically, issues such as small sample bias and lack of power of the t-test and r-to-z Fisher transformation disappear when the rank correlation is calculated from data sampled where the rank correlation is the population parameter. A proof for the consistency of the sample estimate of the rank correlation is shown as well as the flexibility of the copula model to encompass results previously published in the mathematical literature.
Tumor diagnosis by analyzing gene expression profiles becomes an interesting topic in bioinformatics and the main problem is to identify the genes related to a tumor.This paper proposes a rank sum method to identify the related genes based on the rank sum test theory in statistics.The tumor diagnosis system is constructed by the support vector machine (SVM) trained on the set of the related gene expression profiles. The experiments demonstrate that the constructed tumor diagnosis system with the rank sum method and SVM can reach an accuracy level of 96.2% on the colon data and 100 % on the leukemia data.
Full Text Available The Human Development Index (HDI is a composite statistic of life expectancy, education, and income indices, which is implemented to rank countries into different items of human development including life expectancy, education, living standards. This paper uses fuzzy analytical hierarchy process (AHP to rank five influencing factors including income, culture, healthcare, knowledge and civil rights in Iran. Using a questionnaire in linguistic form, the study asks some experts to make judgment about the relative importance of each pair of five items and it ranks them based on fuzzy AHP technique. The results indicate that income is number priority followed by knowledge, culture, civil rights and healthcare affairs.
Full Text Available The academic rankings are a controversial subject in higher education. However, despite all the criticism, academic rankings are here to stay and more and more different stakeholders use rankings to obtain information about the institutions' performance. The two most well-known rankings, The Times and the Shanghai Jiao Tong University rankings have different methodologies. The Times ranking is based on peer review, whereas the Shanghai ranking has only quantitative indicators and is mainly based on research outputs. In Germany, the CHE ranking uses a different methodology from the traditional rankings, allowing the users to choose criteria and weights. The Portuguese higher education institutions are performing below their European peers, and the Government believes that an academic ranking could improve both performance and competitiveness between institutions. The purpose of this paper is to analyse the advantages and problems of academic rankings and provide guidance to a new Portuguese ranking.Los rankings académicos son un tema muy contradictorio en la enseñanza superior. Todavía, además de todas las críticas los rankings están para quedarse entre nosotros. Y cada vez más, diferentes stakeholders utilizan los rankings para obtener información sobre el desempeño de las instituciones. Dos de los rankings más conocidos, el The Times y el ranking de la universidad de Shangai Jiao Tong tienen métodos distintos. El The Times se basa en la opinión de expertos mientras el ranking de la universidad de Shangai presenta solamente indicadores cuantitativos y mayoritariamente basados en los resultados de actividades de investigación. En Alemania el ranking CHE usa un método distinto permitiendo al utilizador elegir los criterios y su importancia. Las instituciones de enseñanza superior portuguesas tienen un desempeño abajo de las europeas y el gobierno cree que un ranking académico podría contribuir para mejorar su desempeño y
We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in "q-sampling" protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.
We propose an adiabatic quantum algorithm for generating a quantum pure state encoding of the PageRank vector, the most widely used tool in ranking the relative importance of internet pages. We present extensive numerical simulations which provide evidence that this algorithm can prepare the quantum PageRank state in a time which, on average, scales polylogarithmically in the number of web pages. We argue that the main topological feature of the underlying web graph allowing for such a scaling is the out-degree distribution. The top-ranked log(n) entries of the quantum PageRank state can then be estimated with a polynomial quantum speed-up. Moreover, the quantum PageRank state can be used in “q-sampling” protocols for testing properties of distributions, which require exponentially fewer measurements than all classical schemes designed for the same task. This can be used to decide whether to run a classical update of the PageRank.
Background: There is no publicly available resource that provides the relative severity of adverse drug reactions (ADRs). Such a resource would be useful for several applications, including assessment of the risks and benefits of drugs and improvement of patient-centered care. It could also be used to triage predictions of drug adverse events. Objective: The intent of the study was to rank ADRs according to severity. Methods: We used Internet-based crowdsourcing to rank ADRs according to severity. We assigned 126,512 pairwise comparisons of ADRs to 2589 Amazon Mechanical Turk workers and used these comparisons to rank order 2929 ADRs. Results: There is good correlation (rho=.53) between the mortality rates associated with ADRs and their rank. Our ranking highlights severe drug-ADR predictions, such as cardiovascular ADRs for raloxifene and celecoxib. It also triages genes associated with severe ADRs such as epidermal growth-factor receptor (EGFR), associated with glioblastoma multiforme, and SCN1A, associated with epilepsy. Conclusions: ADR ranking lays a first stepping stone in personalized drug risk assessment. Ranking of ADRs using crowdsourcing may have useful clinical and financial implications, and should be further investigated in the context of health care decision making.
This paper is concerned with the matchmaker for ranking web services by using semantics. So far several methods of semantic matchmaker have been proposed. Most of them, however, focus on classifying the services into predefined categories rather than providing a ranking result. In this paper, a new method of semantic matchmaker is proposed for ranking web services. It is proposed to use the semantic distance for estimating the matching degree between a service and a user request. Four types of semantic distances are defined and four algorithms are implemented respectively to calculate them. Experimental results show that the proposed semantic matchmaker significantly outperforms the keywordbased baseline method.
show that the problem of deciding whether a non trivial community exists is NP complete. Nevertheless, experiments show that a very simple greedy approach can identify members of a community in the Danish part of the web graph with time complexity only dependent on the size of the found community...... and its immediate surroundings. The members are ranked with a “local” variant of the PageRank algorithm. Results are reported from successful experiments on identifying and ranking Danish Computer Science sites and Danish Chess pages using only a few representatives....
Otto Rank, one of Sigmund Freud's original followers, posited the existence of an "urge to immortality" as man's deepest drive. In his Psychology and the Soul, Rank traced the desire for immortality through four historical eras, with particular emphasis on the creativity of the hero and the artist. By the end of his life, Rank had not only repudiated orthodox psychoanalysis and developed then abandoned a psychology of the will, he had moved "beyond psychology" to a religious view of history and the nature of man.
This book has four chapters. In chapter one we just recall the notion of RD codes, MRD codes, circulant rank codes and constant rank codes and describe their properties. In chapter two we introduce few new classes of codes and study some of their properties. In this chapter we introduce the notion of fuzzy RD codes and fuzzy RD bicodes. Rank distance m-codes are introduced in chapter three and the property of m-covering radius is analysed. Chapter four indicates some applications of these new classes of codes.
Algebraic statistics brings together ideas from algebraic geometry, commutative algebra, and combinatorics to address problems in statistics and its applications. Computer algebra provides powerful tools for the study of algorithms and software. However, these tools are rarely prepared to address statistical challenges and therefore new algebraic results need often be developed. This way of interplay between algebra and statistics fertilizes both disciplines. Algebraic statistics is a relativ...
Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
Statistical power is important in a meta-analysis study, although few studies have examined the performance of simulated power in meta-analysis. The purpose of this study is to inform researchers about statistical power estimation on two sample mean difference test under different situations: (1) the discrepancy between the analytical power and…
Currently, trauma center quality benchmarking is based on risk adjusted observed-expected (O/E) mortality ratios. However, failure to account for number of patients has been recently shown to produce unreliable mortality estimates, especially for low-volume centers. This study explores the effect of reliability adjustment (RA), a statistical technique developed to eliminate bias introduced by low volume on risk-adjusted trauma center benchmarking. Analysis of the National Trauma Data Bank 2010 was performed. Patients 16 years or older with blunt or penetrating trauma and an Injury Severity Score (ISS) of 9 or greater were included. Based on the statistically accepted standards of the Trauma Quality Improvement Program methodology, risk-adjusted mortality rates were generated for each center and used to rank them accordingly. Hierarchical logistic regression modeling was then performed to adjust these rates for reliability using an empiric Bayes approach. The impact of RA was examined by (1) recalculating interfacility variations in adjusted mortality rates and (2) comparing adjusted hospital mortality quintile rankings before and after RA. A total of 557 facilities (with 278,558 patients) were included. RA significantly reduced the variation in risk-adjusted mortality rates between centers from 14-fold (0.7-9.8%) to only 2-fold (4.4-9.6%) after RA. This reduction in variation was most profound for smaller centers. A total of 68 "best" hospitals and 18 "worst" hospitals based on current risk adjustment methods were reclassified after performing RA. "Reliability adjustment" dramatically reduces variations in risk-adjusted mortality arising from statistical noise, especially for lower volume centers. Moreover, the absence of RA had a profound impact on hospital performance assessment, suggesting that nearly one of every six hospitals in National Trauma Data Bank would have been inappropriately placed among the very best or very worst quintile of rankings. RA should be
Complex networks are formal frameworks capturing the interdependencies between the elements of large systems and databases. This formalism allows to use network navigation methods to rank the importance that each constituent has on the global organization of the system. A key example is Pagerank navigation which is at the core of the most used search engine of the World Wide Web. Inspired in this classical algorithm, we define a quantum navigation method providing a unique ranking of the elements of a network. We analyze the convergence of quantum navigation to the stationary rank of networks and show that quantumness decreases the number of navigation steps before convergence. In addition, we show that quantum navigation allows to solve degeneracies found in classical ranks. By implementing the quantum algorithm in real networks, we confirm these improvements and show that quantum coherence unveils new hierarchical features about the global organization of complex systems.
J. Appl. Sci. Environ. ... Knowledge Workers' characteristics, in this paper, we seek to identify factors influencing the Productivity of ... cost, time and performance (Afrazeh et al., 2003). ... ranking tools and the theoretical framework used to.
The process of rank aggregation is intimately intertwined with the structure of skew-symmetric matrices. We apply recent advances in the theory and algorithms of matrix completion to skew-symmetric matrices. This combination of ideas produces a new method for ranking a set of items. The essence of our idea is that a rank aggregation describes a partially filled skew-symmetric matrix. We extend an algorithm for matrix completion to handle skew-symmetric data and use that to extract ranks for each item. Our algorithm applies to both pairwise comparison and rating data. Because it is based on matrix completion, it is robust to both noise and incomplete data. We show a formal recovery result for the noiseless case and present a detailed study of the algorithm on synthetic data and Netflix ratings.
A survey of 701 Texas high school students revealed that they ranked the prestige of six careers in the following order: (1) minister, (2) television reporter, (3) accountant, (4) policeman, (5) high school teacher, (6) newspaper reporter. (GT)
Full Text Available Low-rank matrix recovery (LRMR has been becoming an increasingly popular technique for analyzing data with missing entries, gross corruptions, and outliers. As a significant component of LRMR, the model of low-rank representation (LRR seeks the lowest-rank representation among all samples and it is robust for recovering subspace structures. This paper attempts to solve the problem of LRR with partially observed entries. Firstly, we construct a nonconvex minimization by taking the low rankness, robustness, and incompletion into consideration. Then we employ the technique of augmented Lagrange multipliers to solve the proposed program. Finally, experimental results on synthetic and real-world datasets validate the feasibility and effectiveness of the proposed method.
For the evaluation of information flow in bivariate time series, information measures have been employed, such as the transfer entropy (TE), the symbolic transfer entropy (STE), defined similarly to TE but on the ranks of the components of the reconstructed vectors, and the transfer entropy on rank vectors (TERV), similar to STE but forming the ranks for the future samples of the response system with regard to the current reconstructed vector. Here we extend TERV for multivariate time series, and account for the presence of confounding variables, called partial transfer entropy on ranks (PTERV). We investigate the asymptotic properties of PTERV, and also partial STE (PSTE), construct parametric significance tests under approximations with Gaussian and gamma null distributions, and show that the parametric tests cannot achieve the power of the randomization test using time-shifted surrogates. Using simulations on known coupled dynamical systems and applying parametric and randomization significance tests, we s...
Let E and F be Banach spaces, and B( E, F) all of bounded linear operators on E into F. Let T0 ∈ B( E, F) with an outer inverse T0# ∈ B( F, E). Then a characteristic condition of S= (I + T0# ( T- T0))-1 T0# with T∈ B( E, F) and || T0# ( T- T0) || < 1, being a generalized inverse of T, is presented, and hence, a rank theorem of operators on E into F is established (which generalizes the rank theorem of matrices to Banach spaces). Consequently, an improved finite rank theorem and a new rank theorem are deduced. These results will be very useful to nonlinear functional analysis.
Let E and F be Banach spaces, and B(E,F) all of bounded linear operators on E into F. Let T0∈B(E,F) with an outer inverse T#0∈B(F,E). Then a characteristic condition of S=(I+T#0(T-T0))-1T#0 with T∈B(E,F) and ‖T#0(T-T0)‖<1, being a generalized inverse of T, is presented, and hence, a rank theorem of operators on E into F is established (which generalizes the rank theorem of matrices to Banach spaces). Consequently, an improved finite rank theorem and a new rank theorem are deduced. These results will be very useful to nonlinear functional analysis.
Is Hitler bigger than Napoleon? Washington bigger than Lincoln? Picasso bigger than Einstein? Quantitative analysts are rapidly finding homes in social and cultural domains, from finance to politics. What about history? In this fascinating book, Steve Skiena and Charles Ward bring quantitative analysis to bear on ranking and comparing historical reputations. They evaluate each person by aggregating the traces of millions of opinions, just as Google ranks webpages. The book includes a technical discussion for readers interested in the details of the methods, but no mathematical or computational background is necessary to understand the rankings or conclusions. Along the way, the authors present the rankings of more than one thousand of history's most significant people in science, politics, entertainment, and all areas of human endeavor. Anyone interested in history or biography can see where their favorite figures place in the grand scheme of things.
Kaartele toetuv katus, (harjutus)saalid. Peaprojekteerija Kolde Projekt AS, allprojekteerija EKK AS, arhitektid Peep Jänes, Toomas Rank. Sisekujundus: SAB Lember & Padar OÜ. Konsultant Karl Õiger. 5 illustratsiooni
Tensor methods are among the most prominent tools for the numerical solution of high-dimensional problems where functions of multiple variables have to be approximated. Such high-dimensional approximation problems naturally arise in stochastic analysis and uncertainty quantification. In many practical situations, the approximation of high-dimensional functions is made computationally tractable by using rank-structured approximations. In this talk, we present algorithms for the approximation in hierarchical tensor format using statistical methods. Sparse representations in a given tensor format are obtained with adaptive or convex relaxation methods, with a selection of parameters using crossvalidation methods.
Google's PageRank has created a new synergy to information retrieval for a better ranking of Web pages. It ranks documents depending on the topology of the graphs and the weights of the nodes. PageRank has significantly advanced the field of information retrieval and keeps Google ahead of competitors in the search engine market. It has been deployed in bibliometrics to evaluate research impact, yet few of these studies focus on the important impact of the damping factor (d) for ranking purposes. This paper studies how varied damping factors in the PageRank algorithm can provide additional insight into the ranking of authors in an author co-citation network. Furthermore, we propose weighted PageRank algorithms. We select 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculate the ranks of these 108 authors based on PageRank with damping factor ranging from 0.05 to 0.95. In order to test the relationship between these diffe...
Full Text Available The aim of this paper is to ascertain the correlation between selected cognitive abilities, age and performance of judokas according to ranking. The study group consisted of judokas in the age group 18 ± 2.4 years. The Stroop Color-Word Test - Victoria Version (VST was the instrument used to determine the level of cognitive abilities. The data obtained were measured by the Pearson Correlation (r correlation test. The results of the study show an associative relationship of indirect correlation (p < 0.01 between age and all of the three categories of the Stroop test. This is an indirect correlation, so the higher the age, the lower the time (better performance of the probands in the Stroop test. There was no statistically significant correlation between performance in the categories of the Stroop test and rankings. The outcomes show that the level of selected cognitive abilities depends on age, but the level of the selected cognitive abilities does not affect the ranking of the judokas.
This paper presents a panoramic macroscopic outlook of rank distributions. We establish a general framework for the analysis of rank distributions, which classifies them into five macroscopic "socioeconomic" states: monarchy, oligarchy-feudalism, criticality, socialism-capitalism, and communism. Oligarchy-feudalism is shown to be characterized by discrete macroscopic rank distributions, and socialism-capitalism is shown to be characterized by continuous macroscopic size distributions. Criticality is a transition state between oligarchy-feudalism and socialism-capitalism, which can manifest allometric scaling with multifractal spectra. Monarchy and communism are extreme forms of oligarchy-feudalism and socialism-capitalism, respectively, in which the intrinsic randomness vanishes. The general framework is applied to three different models of rank distributions—top-down, bottom-up, and global—and unveils each model's macroscopic universality and versatility. The global model yields a macroscopic classification of the generalized Zipf law, an omnipresent form of rank distributions observed across the sciences. An amalgamation of the three models establishes a universal rank-distribution explanation for the macroscopic emergence of a prevalent class of continuous size distributions, ones governed by unimodal densities with both Pareto and inverse-Pareto power-law tails.
© 2012 Springer Science+Business Media, LLC. All rights reserved. Article Outline: Glossary Definition of the Subject and Introduction The Bayesian Statistical Paradigm Three Examples Comparison with the Frequentist Statistical Paradigm Future Directions Bibliography
We developed the Biomedical Informatics Researchers ranking website (rank.informatics-review.com) to overcome many of the limitations of previous scientific productivity ranking strategies. The website is composed of four key components that work together to create an automatically updating ranking website: (1) list of biomedical informatics researchers, (2) Google Scholar scraper, (3) display page, and (4) updater. The site has been useful to other groups in evaluating researchers, such as tenure and promotions committees in interpreting the various citation statistics reported by candidates. Creation of the Biomedical Informatics Researchers ranking website highlights the vast differences in scholarly productivity among members of the biomedical informatics research community.
We take a system point of view toward constructing any power or ranking hierarchy onto a society of human or animal players. The most common hierarchy is the linear ranking, which is habitually used in nearly all real-world problems. A stronger version of linear ranking via increasing and unvarying winning potentials, known as Bradley-Terry model, is particularly popular. Only recently non-linear ranking hierarchy is discussed and developed through recognition of dominance information contents beyond direct dyadic win-and-loss. We take this development further by rigorously arguing for the necessity of accommodating system's global pattern information contents, and then introducing a systemic testing on Bradley-Terry model. Our test statistic with an ensemble based empirical distribution favorably compares with the Deviance test equipped with a Chi-squared asymptotic approximation. Several simulated and real data sets are analyzed throughout our development.
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical $T^2$ test does not work for this "large $p$, small $n$" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.
The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their 'public relations' for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford's law, and 1/f noise.
Full Text Available Ecological ranking is a prerequisite to many kinds of environmental decisions. It requires a set of 'objects' (e.g., competing sites for species reintroduction, or competing alternatives of environmental management to be evaluated on the basis of multiple weighted criteria, and then ranked from the best to the worst, or vice versa. The resulting ranking is then used to choose the course of an action (e.g., the optimal sites where a species can be reintroduced, or the optimal management scenario for a protected area. In this work, a new tool called FuzRnk is proposed as a modification of classic fuzzy algorithm. FuzRnk, which is freely available upon request from the author, allows for a fuzzy ranking of GIS objects (e.g., landscape patches or zones within protected areas. With respect to classic fuzzy algorithm, FuzRnk introduces two modifications: a criteria can be weighted on the basis of their importance, b not only the best performances, but also the worst ones are considered in the ranking procedure.
Ranking players or teams in sports is of practical interests. From the viewpoint of networks, a ranking system is equivalent a centrality measure for sports networks, whereby a directed link represents the result of a single game. Previously proposed network-based ranking systems are derived from static networks, i.e., aggregation of the results of games over time. However, the score (i.e., strength) of a player, for example, depends on time. Defeating a renowned player in the peak performance is intuitively more rewarding than defeating the same player in other periods. To account for this factor, we propose a dynamic variant of such a network-based ranking system and apply it to professional men's tennis data. Our ranking system, also interpreted as a centrality measure for directed temporal networks, has two parameters. One parameter represents the exponential decay rate of the past score, and the other parameter controls the effect of indirect wins on the score. We derive a set of linear online update equ...
Welding of pyroclastic deposits involves flattening of glassy pyroclasts under a compactional load at temperatures above the glass transition temperature. Progressive welding is recorded by changes in the petrographic (e.g., fabric) and physical (e.g., density) properties of the deposits. Mapping the intensity of welding can be integral to studies of pyroclastic deposits, but making systematic comparisons between deposits can be problematical. Here we develop a scheme for ranking welding intensity in pyroclastic deposits on the basis of petrographic textural observations (e.g., oblateness of pumice lapilli and micro-fabric orientation) and measurements of physical properties, including density, porosity, point load strength and uniaxial compressive strength. Our dataset comprises measurements on 100 samples collected from a single cooling unit of the Bandelier Tuff and parallel measurements on 8 samples of more densely welded deposits. The proposed classification comprises six ranks of welding intensity ranging from unconsolidated (Rank I) to obsidian-like vitrophyre (Rank VI) and should allow for reproducible mapping of subtle variations in welding intensity between different deposits. The application of the ranking scheme is demonstrated by using published physical property data on welded pyroclastic deposits to map the total accumulated strain and to reconstruct their pre-welding thicknesses.
Recent evidence suggests that perceptions of social class rank influence a variety of social cognitive tendencies, from patterns of causal attribution to moral judgment. In the present studies we tested the hypotheses that upper-class rank individuals would be more likely to endorse essentialist lay theories of social class categories (i.e., that social class is founded in genetically based, biological differences) than would lower-class rank individuals and that these beliefs would decrease support for restorative justice--which seeks to rehabilitate offenders, rather than punish unlawful action. Across studies, higher social class rank was associated with increased essentialism of social class categories (Studies 1, 2, and 4) and decreased support for restorative justice (Study 4). Moreover, manipulated essentialist beliefs decreased preferences for restorative justice (Study 3), and the association between social class rank and class-based essentialist theories was explained by the tendency to endorse beliefs in a just world (Study 2). Implications for how class-based essentialist beliefs potentially constrain social opportunity and mobility are discussed.
Full Text Available The t-test commonly used for testing two samples is based on the assumption that the sample are random and belong to the same normal population. These assumptions may or may not be valid for different types of experimental data. In cases where these assumptions do not hold good, it would be preferable to use tests which are independent of the nature of the distribution of the parent population. A number of such tests, some developed in the Defence Science Laboratory, is given in this paper. The test depend on a sequence of A's and B's obtained by pooling together the two samples {Xm}and {Yn} and arranging them in ascending or descending order and treating the observations belonging to {xm} and {yn} as A's and B's respectively. For this sequence the number of AB's or AB's and BA's are noted for the following cases: (1 Between any two observations of the sequence separated by (k-1 observations or less; (2 Between any two observations in blocks of (k+1 consecutive observations moving from one end to the other end. It has been found that the standardized deviates of these statics serve as more reliable tests than any of other existing tests. Further work is in progress to confirm these findings.
Ranking spreaders by decomposing complex networks
Ranking the nodes' ability of spreading in networks is crucial for designing efficient strategies to hinder spreading in the case of diseases or accelerate spreading in the case of information dissemination. In the well-known k-shell method, nodes are ranked only according to the links between the remaining nodes (residual links) while the links connecting to the removed nodes (exhausted links) are entirely ignored. In this Letter, we propose a mixed degree decomposition (MDD) procedure in which both the residual degree and the exhausted degree are considered. By simulating the epidemic spreading process on real networks, we show that the MDD method can outperform the k-shell and degree methods in ranking spreaders.
Matrix low-rank approximation is intimately related to data modelling; a problem that arises frequently in many different fields. Low Rank Approximation: Algorithms, Implementation, Applications is a comprehensive exposition of the theory, algorithms, and applications of structured low-rank approximation. Local optimization methods and effective suboptimal convex relaxations for Toeplitz, Hankel, and Sylvester structured problems are presented. A major part of the text is devoted to application of the theory. Applications described include: system and control theory: approximate realization, model reduction, output error, and errors-in-variables identification; signal processing: harmonic retrieval, sum-of-damped exponentials, finite impulse response modeling, and array processing; machine learning: multidimensional scaling and recommender system; computer vision: algebraic curve fitting and fundamental matrix estimation; bioinformatics for microarray data analysis; chemometrics for multivariate calibration; ...
Full Text Available Probabilities and odds, derived from vectors of ranks, are here compared as measures of efficiency of decision-making units (DMUs. These measures are computed with the goal of providing preliminary information before starting a Data Envelopment Analysis (DEA or the application of any other evaluation or composition of preferences methodology. Preferences, quality and productivity evaluations are usually measured with errors or subject to influence of other random disturbances. Reducing evaluations to ranks and treating the ranks as estimates of location parameters of random variables, we are able to compute the probability of each DMU being classified as the best according to the consumption of each input and the production of each output. Employing the probabilities of being the best as efficiency measures, we stretch distances between the most efficient units. We combine these partial probabilities in a global efficiency score determined in terms of proximity to the efficiency frontier.
Full Text Available We present a method to find the most influential rock guitarist by applying Google PageRank algorithm to information extracted from Wikipedia articles. The influence of a guitarist was estimated by the number of guitarists citing him/her as an influence and the influence of the latter. We extracted this who-influenced-whom data from the Wikipedia biographies and converted them to a directed graph where a node represented a guitarist and an edge between two nodes indicated the influence of one guitarist over the other. Next we used Google PageRank algorithm to rank the guitarists. The results are most interesting and provide a quantitative foundation to the idea that most of the contemporary rock guitarists are influenced by early blues guitarists. Although no direct comparison exist, the list was still validated against a number of other best-of lists available online and found to be mostly compatible.
One of the challenges of protein structure prediction is to identify long-range interactions between amino acids. To reliably predict such interactions, we enumerate, score and rank all beta-topologies (partitions of beta-strands into sheets, orderings of strands within sheets and orientations...... of paired strands) of a given protein. We show that the beta-topology corresponding to the native structure is, with high probability, among the top-ranked. Since full enumeration is very time-consuming, we also suggest a method to deal with proteins with many beta-strands. The results reported...... in this paper are highly relevant for ab initio protein structure prediction methods based on decoy generation. The top-ranked beta-topologies can be used to find initial conformations from which conformational searches can be started. They can also be used to filter decoys by removing those with poorly...
Full Text Available For years, there have been growing interests on cost reduction for products and services. Privatization is considered as one of the most important techniques to increase relative efficiencies of publically held firms. In this paper, we present an empirical investigation to rank important barriers on privatization of television (TV media industry in Iran. The proposed study of this paper designs and distributes a questionnaire using a sample of 234 out of 600 graduate students who were enrolled in media communication studies. The survey considers social, cultural, economic as well as rules and regulations factors influencing privatization of TV media industry. The survey uses the ranking method presented by Cook and Kress (1990 [Cook, W. D., & Kress, M. (1990. A data envelopment model for aggregating preference rankings. Management Science, 36(11, 1302-1310.]. The results of the investigation indicate rules and regulations are the most important barriers on privatization of Iranian TV followed by cultural, social and economic factors.
Thermal properties of different rank coals
Thermal properties of various coal samples which have different rank and petrography were investigated under both inert and oxidizing conditions up to 900{sup o}C in a thermal analysis system. Peat, anthracite, and bituminous coal samples from different countries and various lignites from Turkey such as Askale, Soma, and Elbistan were used. DTA (Differential Thermal Analysis) and TGA (Thermogravimetric Analysis) techniques were applied. DTG (Derivative Thermogravimetric) curves were derived and interpreted considering the physical and chemical properties, and the rank of coals. Pyrolytic chars obtained from the inert atmosphere experiments were examined applying SEM (Scanning Electron Microscopy) and XRD (X-ray Diffractometry) techniques. It was found that the thermal reactivity and the apparent thermal properties of different rank coals differ considerably under both conditions. 6 refs., 4 figs., 3 tabs.
Abstract: This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model with i.i.d. elliptical innovations. The tests are asymptotically distribution-free, and their validity does not depend on the actual distribution of the innovations. This result holds despite the
Colleges and universities are "ranksteering"--driving under the influence of popular college rankings systems like "U.S. News and World Report's" Best Colleges. This article examines the criticisms of college rankings and describes how a group of education leaders is honing a plan to end the tyranny of the ratings game and better help students and…
Abstract: This paper introduces rank-based tests for the cointegrating rank in an Error Correction Model with i.i.d. elliptical innovations. The tests are asymptotically distribution-free, and their validity does not depend on the actual distribution of the innovations. This result holds despite the
Sparse coding, which represents a data point as a sparse reconstruction code with regard to a dictionary, has been a popular data representation method. Meanwhile, in database retrieval problems, learning the ranking scores from data points plays an important role. Up to now, these two problems have always been considered separately, assuming that data coding and ranking are two independent and irrelevant problems. However, is there any internal relationship between sparse coding and ranking score learning? If yes, how to explore and make use of this internal relationship? In this paper, we try to answer these questions by developing the first joint sparse coding and ranking score learning algorithm. To explore the local distribution in the sparse code space, and also to bridge coding and ranking problems, we assume that in the neighborhood of each data point, the ranking scores can be approximated from the corresponding sparse codes by a local linear function. By considering the local approximation error of ranking scores, the reconstruction error and sparsity of sparse coding, and the query information provided by the user, we construct a unified objective function for learning of sparse codes, the dictionary and ranking scores. We further develop an iterative algorithm to solve this optimization problem.
This volume provides a compact presentation of modern statistical physics at an advanced level. Beginning with questions on the foundations of statistical mechanics all important aspects of statistical physics are included, such as applications to ideal gases, the theory of quantum liquids and superconductivity and the modern theory of critical phenomena. Beyond that attention is given to new approaches, such as quantum field theory methods and non-equilibrium problems.
Statistical Methods provides a discussion of the principles of the organization and technique of research, with emphasis on its application to the problems in social statistics. This book discusses branch statistics, which aims to develop practical ways of collecting and processing numerical data and to adapt general statistical methods to the objectives in a given field.Organized into five parts encompassing 22 chapters, this book begins with an overview of how to organize the collection of such information on individual units, primarily as accomplished by government agencies. This text then
This book discusses statistical methods that are useful for treating problems in modern optics, and the application of these methods to solving a variety of such problems This book covers a variety of statistical problems in optics, including both theory and applications. The text covers the necessary background in statistics, statistical properties of light waves of various types, the theory of partial coherence and its applications, imaging with partially coherent light, atmospheric degradations of images, and noise limitations in the detection of light. New topics have been introduced i
2010-01-01
A new edition of the trusted guide on commonly used statistical distributions Fully updated to reflect the latest developments on the topic, Statistical Distributions, Fourth Edition continues to serve as an authoritative guide on the application of statistical methods to research across various disciplines. The book provides a concise presentation of popular statistical distributions along with the necessary knowledge for their successful use in data modeling and analysis. Following a basic introduction, forty popular distributions are outlined in individual chapters that are complete with re
The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their ‘public relations’ for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford’s law, and 1/f noise. - Highlights: • Harmonic statistics are described and reviewed in detail. • Connections to various statistical laws are established. • Connections to perturbation, renormalization and dynamics are established.
The new Excellence Indicator in the World Report of the SCImago Institutions Rankings 2011
The new excellence indicator in the World Report of the SCImago Institutions Rankings (SIR) makes it possible to test differences in the ranking in terms of statistical significance. For example, at the 17th position of these rankings, UCLA has an output of 37,994 papers with an excellence indicator of 28.9. Stanford University follows at the 19th position with 37,885 papers and 29.1 excellence, and z = - 0.607. The difference between these two institution thus is not statistically significant. We provide a calculator at http://www.leydesdorff.net/scimago11/scimago11.xls in which one can fill out this test for any two institutions and also for each institutions on whether its score is significantly above or below expectation (assuming that 10% of the papers are for stochastic reasons in the top-10% set).
Learning to rank for information retrieval
Liu, Tie-Yan
2011-01-01
Due to the fast growth of the Web and the difficulties in finding desired information, efficient and effective information retrieval systems have become more important than ever, and the search engine has become an essential tool for many people. The ranker, a central component in every search engine, is responsible for the matching between processed queries and indexed documents. Because of its central role, great attention has been paid to the research and development of ranking technologies. In addition, ranking is also pivotal for many other information retrieval applications, such as coll
Improved Relevance Ranking in WebGather
LEI Ming; WANG Jianyong; CHEN Baojue; LI Xiaoming
The amount of information on the web is growing rapidly, and search engines that rely on keyword matching usually return too many low quality matches. To improve search results, a challenging task for search engines is how to effectively calculate a relevance ranking for each web page. This paper discusses in what order a search engine should return the URLs it has produced in response to a user's query, so as to show more relevant pages first.Emphasis is given on the ranking functions adopted by WebGather that take link structure and user popularity factors into account. Experimental results are also presented to evaluate the proposed strategy.
Reduced Multiplicative Tolerance Ranking and Applications
Sebastian Sitarz
Full Text Available In this paper a reduced multiplicative tolerance - a measure of sensitivity analysis in multi-objective linear programming (MOLP is presented. By using this new measure a method for ranking the set of efficient extreme solutions is proposed. The idea is to rank these solutions by values of the reduced tolerance. This approach can be applied to many MOLP problems, where sensitivity analysis is important for a decision maker. In the paper, applications of the presented methodology are shown in the market model and the transportation problem.
Two-dimensional ranking of Wikipedia articles
Zhirov, A O; Shepelyansky, D L
The Library of Babel, described by Jorge Luis Borges, stores an enormous amount of information. The Library exists {\\it ab aeterno}. Wikipedia, a free online encyclopaedia, becomes a modern analogue of such a Library. Information retrieval and ranking of Wikipedia articles become the challenge of modern society. We analyze the properties of two-dimensional ranking of all Wikipedia English articles and show that it gives their reliable classification with rich and nontrivial features. Detailed studies are done for countries, universities, personalities, physicists, chess players, Dow-Jones companies and other categories.
Ranking mutual funds using Sortino method
Khosro Faghani Makrani
Full Text Available One of the primary concerns on most business activities is to determine an efficient method for ranking mutual funds. This paper performs an empirical investigation to rank 42 mutual funds listed on Tehran Stock Exchange using Sortino method over the period 2011-2012. The results of survey have been compared with market return and the results have confirmed that there were some positive and meaningful relationships between Sortino return and market return. In addition, there were some positive and meaningful relationship between two Sortino methods.
Estimation of Low-Rank Covariance Function
Koltchinskii, Vladimir; Lounici, Karim; Tsybakov, Alexander B.
We consider the problem of estimating a low rank covariance function $K(t,u)$ of a Gaussian process $S(t), t\\in [0,1]$ based on $n$ i.i.d. copies of $S$ observed in a white noise. We suggest a new estimation procedure adapting simultaneously to the low rank structure and the smoothness of the covariance function. The new procedure is based on nuclear norm penalization and exhibits superior performances as compared to the sample covariance function by a polynomial factor in the sample size $n$...
PageRank for low frequency earthquake detection
Aguiar, A. C.; Beroza, G. C.
We have analyzed Hi-Net seismic waveform data during the April 2006 tremor episode in the Nankai Trough in SW Japan using the autocorrelation approach of Brown et al. (2008), which detects low frequency earthquakes (LFEs) based on pair-wise waveform matching. We have generalized this to exploit the fact that waveforms may repeat multiple times, on more than just a pair-wise basis. We are working towards developing a sound statistical basis for event detection, but that is complicated by two factors. First, the statistical behavior of the autocorrelations varies between stations. Analyzing one station at a time assures that the detection threshold will only depend on the station being analyzed. Second, the positive detections do not satisfy "closure." That is, if window A correlates with window B, and window B correlates with window C, then window A and window C do not necessarily correlate with one another. We want to evaluate whether or not a linked set of windows are correlated due to chance. To do this, we map our problem on to one that has previously been solved for web search, and apply Google's PageRank algorithm. PageRank is the probability of a 'random surfer' to visit a particular web page; it assigns a ranking for a webpage based on the amount of links associated with that page. For windows of seismic data instead of webpages, the windows with high probabilities suggest likely LFE signals. Once identified, we stack the matched windows to improve the snr and use these stacks as template signals to find other LFEs within continuous data. We compare the results among stations and declare a detection if they are found in a statistically significant number of stations, based on multinomial statistics. We compare our detections using the single-station method to detections found by Shelly et al. (2007) for the April 2006 tremor sequence in Shikoku, Japan. We find strong similarity between the results, as well as many new detections that were not found using
Suitable for graduate students and researchers in applied probability and statistics, as well as for scientists in biology, computer science, pharmaceutical science and medicine, this title brings together a collection of chapters illustrating the depth and diversity of theory, methods and applications in the area of scan statistics.
Petocz, Peter; Sowey, Eric
In this article, the authors focus on hypothesis testing--that peculiarly statistical way of deciding things. Statistical methods for testing hypotheses were developed in the 1920s and 1930s by some of the most famous statisticians, in particular Ronald Fisher, Jerzy Neyman and Egon Pearson, who laid the foundations of almost all modern methods of…
Accelerators and detectors are expensive, both in terms of money and human effort. It is thus important to invest effort in performing a good statistical anal- ysis of the data, in order to extract the best information from it. This series of five lectures deals with practical aspects of statistical issues that arise in typical High Energy Physics analyses.
The ranks of Maiorana-McFarland bent functions
In this paper, the ranks of a special family of Maiorana-McFarland bent functions are discussed. The upper and lower bounds of the ranks are given and those bent functions whose ranks achieve these bounds are determined. As a consequence, the inequivalence of some bent functions are derived. Furthermore, the ranks of the functions of this family are calculated when t 6.
Nominal versus Attained Weights in Universitas 21 Ranking
Soh, Kaycheng
Universitas 21 Ranking of National Higher Education Systems (U21 Ranking) is one of the three new ranking systems appearing in 2012. In contrast with the other systems, U21 Ranking uses countries as the unit of analysis. It has several features which lend it with greater trustworthiness, but it also shared some methodological issues with the other…
Some upper and lower bounds on PSD-rank
T. J. Lee (Troy); Z. Wei (Zhaohui); R. M. de Wolf (Ronald)
2014-01-01
textabstractPositive semidefinite rank (PSD-rank) is a relatively new quantity with applications to combinatorial optimization and communication complexity. We first study several basic properties of PSD-rank, and then develop new techniques for showing lower bounds on the PSD-rank. All of these
Some upper and lower bounds on PSD-rank
Lee, T.; Wei, Z.; de Wolf, R.
Positive semidefinite rank (PSD-rank) is a relatively new complexity measure on matrices, with applications to combinatorial optimization and communication complexity. We first study several basic properties of PSD-rank, and then develop new techniques for showing lower bounds on the PSD-rank. All
The effect of new links on Google PageRank
Avrachenkov, Konstatin; Litvak, Nelly
PageRank is one of the principle criteria according to which Google ranks Web pages. PageRank can be interpreted as a frequency of visiting a Web page by a random surfer and thus it reflects the popularity of a Web page. We study the effect of newly created links on Google PageRank. We discuss to wh
5 CFR 451.302 - Ranks for senior career employees.
... 5 Administrative Personnel 1 2010-01-01 2010-01-01 false Ranks for senior career employees. 451... AWARDS Presidential Rank Awards § 451.302 Ranks for senior career employees. (a) The circumstances under... Professional to a senior career employee are set forth in 5 U.S.C. 4507a. (b) To be eligible for a rank...
The Seven Deadly Sins of World University Ranking: A Summary from Several Papers
Soh, Kaycheng
2017-01-01
World university rankings use the weight-and-sum approach to process data. Although this seems to pass the common sense test, it has statistical problems. In recent years, seven such problems have been uncovered: spurious precision, weight discrepancies, assumed mutual compensation, indictor redundancy, inter-system discrepancy, negligence of…
How different from random are docking predictions when ranked by scoring functions?
Feliu, Elisenda; Oliva, Baldomero
Docking algorithms predict the structure of protein-protein interactions. They sample the orientation of two unbound proteins to produce various predictions about their interactions, followed by a scoring step to rank the predictions. We present a statistical assessment of scoring functions used...
On bundles of rank 3 computing Clifford indices
Lange, H
Let $C$ be a smooth irreducible projective algebraic curve defined over the complex numbers. The notion of the Clifford index of $C$ was extended a few years ago to semistable bundles of any rank. Recent work has been focussed mainly on the rank-2 Clifford index, although interesting results have also been obtained for the case of rank 3. In this paper we extend this work, obtaining improved lower bounds for the rank-3 Clifford index. This allows the first computations of the rank-3 index in non-trivial cases and examples for which the rank-3 index is greater than the rank-2 index.
City Life: Rankings (Livability) versus Perceptions (Satisfaction)
Okulicz-Kozaryn, Adam
I investigate the relationship between the popular Mercer city ranking (livability) and survey data (satisfactions). Livability aims to capture "objective" quality of life such as infrastructure. Survey items capture "subjective" quality of life such as satisfaction with city. The relationship between objective measures of quality of life and…
Ranking Very Many Typed Entities on Wikipedia
Zaragoza, Hugo; Rode, Henning; Mika, Peter; Atserias, Jordi; Ciaramita, Massimiliano; Attardi, Guiseppe
We discuss the problem of ranking very many entities of different types. In particular we deal with a heterogeneous set of types, some being very generic and some very specific. We discuss two approaches for this problem: i) exploiting the entity containment graph and ii) using a Web search engine t
Subject Gateway Sites and Search Engine Ranking.
Thelwall, Mike
Discusses subject gateway sites and commercial search engines for the Web and presents an explanation of Google's PageRank algorithm. The principle question addressed is the conditions under which a gateway site will increase the likelihood that a target page is found in search engines. (LRW)
Primate Innovation: Sex, Age and Social Rank
Reader, S.M.; Laland, K.N.
Analysis of an exhaustive survey of primate behavior collated from the published literature revealed significant variation in rates of innovation among individuals of different sex, age and social rank. We searched approximately 1,000 articles in four primatology journals, together with other releva
Kinesiology Faculty Citations across Academic Rank
Knudson, Duane
Citations to research reports are used as a measure for the influence of a scholar's research line when seeking promotion, grants, and awards. The current study documented the distributions of citations to kinesiology scholars of various academic ranks. Google Scholar Citations was searched for user profiles using five research interest areas…
Ranking related entities: components and analyses
Bron, M.; Balog, K.; de Rijke, M.
Related entity finding is the task of returning a ranked list of homepages of relevant entities of a specified type that need to engage in a given relationship with a given source entity. We propose a framework for addressing this task and perform a detailed analysis of four core components; co-occu
An algorithm for ranking assignments using reoptimization
Pedersen, Christian Roed; Nielsen, Lars Relund; Andersen, Kim Allan
We consider the problem of ranking assignments according to cost in the classical linear assignment problem. An algorithm partitioning the set of possible assignments, as suggested by Murty, is presented where, for each partition, the optimal assignment is calculated using a new reoptimization...... technique. Computational results for the new algorithm are presented...
Texture Repairing by Unified Low Rank Optimization
Xiao Liang; Xiang Ren; Zhengdong Zhang; Yi Ma
In this paper, we show how to harness both low-rank and sparse structures in regular or near-regular textures for image completion. Our method is based on a unified formulation for both random and contiguous corruption. In addition to the low rank property of texture, the algorithm also uses the sparse assumption of the natural image: because the natural image is piecewise smooth, it is sparse in certain transformed domain (such as Fourier or wavelet transform). We combine low-rank and sparsity properties of the texture image together in the proposed algorithm. Our algorithm based on convex optimization can automatically and correctly repair the global structure of a corrupted texture, even without precise information about the regions to be completed. This algorithm integrates texture rectification and repairing into one optimization problem. Through extensive simulations, we show our method can complete and repair textures corrupted by errors with both random and contiguous supports better than existing low-rank matrix recovery methods. Our method demonstrates significant advantage over local patch based texture synthesis techniques in dealing with large corruption, non-uniform texture, and large perspective deformation.
Block Models and Personalized PageRank
Kloumann, Isabel; Kleinberg, Jon
Methods for ranking the importance of nodes in a network have a rich history in machine learning and across domains that analyze structured data. Recent work has evaluated these methods though the seed set expansion problem: given a subset $S$ of nodes from a community of interest in an underlying graph, can we reliably identify the rest of the community? We start from the observation that the most widely used techniques for this problem, personalized PageRank and heat kernel methods, operate in the space of landing probabilities of a random walk rooted at the seed set, ranking nodes according to weighted sums of landing probabilities of different length walks. Both schemes, however, lack an a priori relationship to the seed set objective. In this work we develop a principled framework for evaluating ranking methods by studying seed set expansion applied to the stochastic block model. We derive the optimal gradient for separating the landing probabilities of two classes in a stochastic block model, and find, ...
BPR: Bayesian Personalized Ranking from Implicit Feedback
Rendle, Steffen; Gantner, Zeno; Schmidt-Thieme, Lars
Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive knearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of p...
Cointegration rank testing under conditional heteroskedasticity
Cavaliere, Giuseppe; Rahbek, Anders Christian; Taylor, Robert M.
We analyze the properties of the conventional Gaussian-based cointegrating rank tests of Johansen (1996, Likelihood-Based Inference in Cointegrated Vector Autoregressive Models) in the case where the vector of series under test is driven by globally stationary, conditionally heteroskedastic...
An Application of Sylvester's Rank Inequality
Kung, Sidney H.
Using two well known criteria for the diagonalizability of a square matrix plus an extended form of Sylvester's Rank Inequality, the author presents a new condition for the diagonalization of a real matrix from which one can obtain the eigenvectors by simply multiplying some associated matrices without solving a linear system of simultaneous…
Global Rank Axioms for Poset Matroids
Shu Chao LI; Yan Qin FENG
An excellent introduction to the topic of poset matroids is due to Barnabei, Nicoletti and Pezzoli. In this paper, we investigate the rank axioms for poset matroids; thereby we can characterize poset matroids in a "global" version and a "pseudo-global" version. Some corresponding properties of combinatorial schemes are also obtained.
Primate Innovation: Sex, Age and Social Rank
Reader, S.M.; Laland, K.N.
Analysis of an exhaustive survey of primate behavior collated from the published literature revealed significant variation in rates of innovation among individuals of different sex, age and social rank. We searched approximately 1,000 articles in four primatology journals, together with other releva
A note on ranking assignments using reoptimization
Pedersen, Christian Roed; Nielsen, L.R.; Andersen, K.A.
We consider the problem of ranking assignments according to cost in the classical linear assignment problem. An algorithm partitioning the set of possible assignments, as suggested by Murty, is presented where, for each partition, the optimal assignment is calculated using a new reoptimization...
Alternative Class Ranks Using Z-Scores
Brown, Philip H.; Van Niel, Nicholas
Grades at US colleges and universities have increased precipitously over the last 50 years, suggesting that their signalling power has become attenuated. Moreover, average grades have risen disproportionately in some departments, implying that weak students in departments with high grades may obtain better class ranks than strong students in…
Ranking health between countries in international comparisons
Brønnum-Hansen, Henrik
Cross-national comparisons and ranking of summary measures of population health sometimes give rise to inconsistent and diverging conclusions. In order to minimise confusion, international comparative studies ought to be based on well-harmonised data with common standards of definitions...
Suppression pheromone and cockroach rank formation
Kou, Rong; Chang, Huan-Wen; Chen, Shu-Chun; Ho, Hsiao-Yung
Although agonistic behaviors in the male lobster cockroach ( Nauphoeta cinerea) are well known, the formation of an unstable hierarchy has long been a puzzle. In this study, we investigate how the unstable dominance hierarchy in N. cinerea is maintained via a pheromone signaling system. In agonistic interactions, aggressive posture (AP) is an important behavioral index of aggression. This study showed that, during the formation of a governing hierarchy, thousands of nanograms of 3-hydroxy-2-butanone (3H-2B) were released by the AP-adopting dominant in the first encounter fight, then during the early domination period and that this release of 3H-2B was related to rank maintenance, but not to rank establishment. For rank maintenance, 3H-2B functioned as a suppression pheromone, which suppressed the fighting capability of rivals and kept them in a submissive state. During the period of rank maintenance, as the dominant male gradually decreased his 3H-2B release, the fighting ability of the subordinate gradually developed, as shown by the increasing odds of a subordinate adopting an AP (OSAP). The OSAP was negatively correlated with the amount of 3H-2B released by the dominant and positively correlated with the number of domination days. The same OSAP could be achieved earlier by reducing the amount of 3H-2B released by the dominant indicates that whether the subordinate adopts an offensive strategy depends on what the dominant is doing.
George Wilbur: Otto Rank and Hanns Sachs.
Roazen, Paul
George Wilbur, a pioneering Cape Cod psychoanalytic psychiatrist, was a long-standing editor of the journal "American Imago," and an excellent source of information about the Viennese analysts Otto Rank and Hanns Sachs. Wilbur was also knowledgeable about the early reception of psychoanalysis in the Boston community.
Rank-frequency relation for Chinese characters
Deng, W B; Li, B; Wang, Q A
The Zipf's law states that the ordered frequencies $f_1>f_2> ...$ of different words in a text hold $f_r\\propto r^{-\\gamma}$ with $\\gamma\\approx 1$ and rank $r$. The law applies to many languages with alphabetical writing systems, but was so far found to be absent for the rank-frequency relation of the Chinese characters, the main (and oldest) example of the logographic writing system. Here we show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. We focus on short texts, since for the sake of the rank-frequency analysis, long texts are just mixtures of shorter, thematically homogenous pieces. For long texts (or for mixtures of short texts), the Zipf's law holds for a relatively small range of ranks, but it is still important, since for all Chinese texts (we studied) it carries out $simeq 40%$ of the overall frequency. The previous results on th...
The cactus rank of cubic forms
Bernardi, Alessandra
We prove that the smallest degree of an apolar 0-dimensional scheme to a general cubic form in $n+1$ variables is at most $2n+2$, when $n\\geq 8$, and therefore smaller than the rank of the form. When n=8 we show that the bound is sharp, i.e. the smallest degree of an apolar subscheme is 18.
Rank-one LMIs and Lyapunov's inequality
Henrion, D.; Meinsma, G.
We describe a new proof of the well-known Lyapunov's matrix inequality about the location of the eigenvalues of a matrix in some region of the complex plane. The proof makes use of standard facts from quadratic and semi-definite programming. Links are established between the Lyapunov matrix, rank-on
Kinesiology Faculty Citations across Academic Rank
Knudson, Duane
Citations to research reports are used as a measure for the influence of a scholar's research line when seeking promotion, grants, and awards. The current study documented the distributions of citations to kinesiology scholars of various academic ranks. Google Scholar Citations was searched for user profiles using five research interest areas…
Nath, Rajender; Kumar, Naresh
Search Engine gives an ordered list of web search results in response to a user query, wherein the important pages are usually displayed at the top with less important ones afterwards. It may be possible that the user may have to look for many screen results to get the required documents. In literatures, many page ranking algorithms has been given to find the page rank of a page. For example PageRank is considered in this work. This algorithm treats all the links equally when distributing rank scores. That's why this algorithm some time gives equal importance to all the pages. But in real this can not be happen because, if two pages have same rank then how we can judge which page is more important then other. So this paper proposes another idea to organize the search results and describe which page is more important when confliction of same rank is produced by the PageRank. So that the user can get more relevant and important results easily and in a short span of time.
VaRank: a simple and powerful tool for ranking genetic variants
Véronique Geoffroy
Full Text Available Background. Most genetic disorders are caused by single nucleotide variations (SNVs or small insertion/deletions (indels. High throughput sequencing has broadened the catalogue of human variation, including common polymorphisms, rare variations or disease causing mutations. However, identifying one variation among hundreds or thousands of others is still a complex task for biologists, geneticists and clinicians. Results. We have developed VaRank, a command-line tool for the ranking of genetic variants detected by high-throughput sequencing. VaRank scores and prioritizes variants annotated either by Alamut Batch or SnpEff. A barcode allows users to quickly view the presence/absence of variants (with homozygote/heterozygote status in analyzed samples. VaRank supports the commonly used VCF input format for variants analysis thus allowing it to be easily integrated into NGS bioinformatics analysis pipelines. VaRank has been successfully applied to disease-gene identification as well as to molecular diagnostics setup for several hundred patients. Conclusions. VaRank is implemented in Tcl/Tk, a scripting language which is platform-independent but has been tested only on Unix environment. The source code is available under the GNU GPL, and together with sample data and detailed documentation can be downloaded from http://www.lbgi.fr/VaRank/.
VaRank: a simple and powerful tool for ranking genetic variants.
Geoffroy, Véronique; Pizot, Cécile; Redin, Claire; Piton, Amélie; Vasli, Nasim; Stoetzel, Corinne; Blavier, André; Laporte, Jocelyn; Muller, Jean
Background. Most genetic disorders are caused by single nucleotide variations (SNVs) or small insertion/deletions (indels). High throughput sequencing has broadened the catalogue of human variation, including common polymorphisms, rare variations or disease causing mutations. However, identifying one variation among hundreds or thousands of others is still a complex task for biologists, geneticists and clinicians. Results. We have developed VaRank, a command-line tool for the ranking of genetic variants detected by high-throughput sequencing. VaRank scores and prioritizes variants annotated either by Alamut Batch or SnpEff. A barcode allows users to quickly view the presence/absence of variants (with homozygote/heterozygote status) in analyzed samples. VaRank supports the commonly used VCF input format for variants analysis thus allowing it to be easily integrated into NGS bioinformatics analysis pipelines. VaRank has been successfully applied to disease-gene identification as well as to molecular diagnostics setup for several hundred patients. Conclusions. VaRank is implemented in Tcl/Tk, a scripting language which is platform-independent but has been tested only on Unix environment. The source code is available under the GNU GPL, and together with sample data and detailed documentation can be downloaded from http://www.lbgi.fr/VaRank/.
Dwelling Price Ranking versus Socioeconomic Clustering: Possibility of Imputation
Fleishman Larisa
Full Text Available In order to characterize the socioeconomic profile of various geographic units, it is common practice to use aggregated indices. However, the process of calculating such indices requires a wide variety of variables from various data sources available concurrently. Using a number of administrative databases for 2001 and 2003, this study examines the question of whether dwelling prices in a given locality can serve as a proxy for its socioeconomic level. Based on statistical and geographic criteria, we developed a Dwelling Price Ranking (DPR methodology. Our findings show that the DPR can serve as a good approximation for the socioeconomic cluster (SEC calculated by the Israel Central Bureau of Statistics for years when the required data was available. As opposed to the SEC, the suggested DPR indicator can easily be calculated, thus ensuring a continuum of socioeconomic index series. Both parametric and nonparametric statistical analyses have been carried out in order to examine the additional social, demographic, location, crime and security effects that are exogenous to SEC. Complementary analysis on recently published SEC series for 2006 and 2008 show that our conclusions remain valid. The proposed methodology and the obtained findings may be applicable for different statistical purposes in other countries which possess dwelling transactions data.
Information content of partially rank-ordered set samples
Hatefi, Armin; Jozani, Mohammad Jafari
Partially rank-ordered set (PROS) sampling is a generalization of ranked set sampling in which rankers are not required to fully rank the sampling units in each set, hence having more flexibility to perform the necessary judgemental ranking process. The PROS sampling has a wide range of applications in different fields ranging from environmental and ecological studies to medical research and it has been shown to be superior over ranked set sampling and simple random sampling for estimating th...
Del Carratore, Francesco; Jankevics, Andris; Eisinga, Rob; Heskes, Tom; Hong, Fangxin; Breitling, Rainer
The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable. We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results. RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ). rainer.breitling@manchester.ac.uk. Supplementary data are available at Bioinformatics online.
Ross, Sheldon M
In this revised text, master expositor Sheldon Ross has produced a unique work in introductory statistics. The text's main merits are the clarity of presentation, contemporary examples and applications from diverse areas, and an explanation of intuition and ideas behind the statistical methods. To quote from the preface, ""It is only when a student develops a feel or intuition for statistics that she or he is really on the path toward making sense of data."" Ross achieves this goal through a coherent mix of mathematical analysis, intuitive discussions and examples.* Ross's clear writin
Ross, Sheldon M
In this 3rd edition revised text, master expositor Sheldon Ross has produced a unique work in introductory statistics. The text's main merits are the clarity of presentation, contemporary examples and applications from diverse areas, and an explanation of intuition and ideas behind the statistical methods. Concepts are motivated, illustrated and explained in a way that attempts to increase one's intuition. To quote from the preface, ""It is only when a student develops a feel or intuition for statistics that she or he is really on the path toward making sense of data."" Ross achieves this
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Wannier, Gregory H
Until recently, the field of statistical physics was traditionally taught as three separate subjects: thermodynamics, statistical mechanics, and kinetic theory. This text, a forerunner in its field and now a classic, was the first to recognize the outdated reasons for their separation and to combine the essentials of the three subjects into one unified presentation of thermal physics. It has been widely adopted in graduate and advanced undergraduate courses, and is recommended throughout the field as an indispensable aid to the independent study and research of statistical physics.Designed for
Blakemore, J S
Semiconductor Statistics presents statistics aimed at complementing existing books on the relationships between carrier densities and transport effects. The book is divided into two parts. Part I provides introductory material on the electron theory of solids, and then discusses carrier statistics for semiconductors in thermal equilibrium. Of course a solid cannot be in true thermodynamic equilibrium if any electrical current is passed; but when currents are reasonably small the distribution function is but little perturbed, and the carrier distribution for such a """"quasi-equilibrium"""" co
Ranking of delay factors in construction projects after Egyptian revolution
Remon Fayek Aziz
Full Text Available Time is one of the major considerations throughout project management life cycle and can be regarded as one of the most important parameters of a project and the driving force of project success. Time delay is a very frequent phenomenon and is almost associated with nearly all constructing projects. However, little effort has been made to curtail the phenomenon, this research work attempts to identify, investigate, and rank factors perceived to affect delays in the Egyptian construction projects with respect to their relative importance so as to proffer possible ways of coping with this phenomenon. To achieve this objective, researcher invited practitioners and experts, comprising a statistically representative sample to participate in a structured questionnaire survey. Brain storming was taken into consideration, through which a number of delay factors were identified in construction projects. Totally, ninety-nine (99 factors were short-listed to be made part of the questionnaire survey and were identified and categorized into nine (9 major categories. The survey was conducted with experts and representatives from private, public, and local general construction firms. The data were analyzed using Relative Importance Index (RII, ranking and simple percentages. Ranking of factors and categories was demonstrated according to their importance level on delay, especially after 25/1/2011 (Egyptian revolution. According to the case study results, the most contributing factors and categories (those need attention to delays were discussed, and some recommendations were made in order to minimize and control delays in construction projects. Also, this paper can serve as a guide for all construction parties with effective management in construction projects to achieve a competitive level of quality and a time effective project.
Tryggestad, Kjell
The study aims is to describe how the inclusion and exclusion of materials and calculative devices construct the boundaries and distinctions between statistical facts and artifacts in economics. My methodological approach is inspired by John Graunt's (1667) Political arithmetic and more recent work...... within constructivism and the field of Science and Technology Studies (STS). The result of this approach is here termed reversible statistics, reconstructing the findings of a statistical study within economics in three different ways. It is argued that all three accounts are quite normal, albeit...... in different ways. The presence and absence of diverse materials, both natural and political, is what distinguishes them from each other. Arguments are presented for a more symmetric relation between the scientific statistical text and the reader. I will argue that a more symmetric relation can be achieved...
2017-08-08
In large datasets, it is time consuming or even impossible to pick out interesting images. Our proposed solution is to find statistics to quantify the information in each image and use those to identify and pick out images of interest.
Serdobolskii, Vadim Ivanovich
This monograph presents mathematical theory of statistical models described by the essentially large number of unknown parameters, comparable with sample size but can also be much larger. In this meaning, the proposed theory can be called "essentially multiparametric". It is developed on the basis of the Kolmogorov asymptotic approach in which sample size increases along with the number of unknown parameters.This theory opens a way for solution of central problems of multivariate statistics, which up until now have not been solved. Traditional statistical methods based on the idea of an infinite sampling often break down in the solution of real problems, and, dependent on data, can be inefficient, unstable and even not applicable. In this situation, practical statisticians are forced to use various heuristic methods in the hope the will find a satisfactory solution.Mathematical theory developed in this book presents a regular technique for implementing new, more efficient versions of statistical procedures. ...
Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles
Lai Po-Ting
Full Text Available Abstract Background Experimentally verified protein-protein interactions (PPIs cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT and then to return a list of interaction pairs for each article (interaction pair task or IPT. These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R score because the order of identifiers in the output list is important for ease of curation. Results Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. Conclusions In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches.
A multivariate rank test for comparing mass size distributions
Lombard, F.
Particle size analyses of a raw material are commonplace in the mineral processing industry. Knowledge of particle size distributions is crucial in planning milling operations to enable an optimum degree of liberation of valuable mineral phases, to minimize plant losses due to an excess of oversize or undersize material or to attain a size distribution that fits a contractual specification. The problem addressed in the present paper is how to test the equality of two or more underlying size distributions. A distinguishing feature of these size distributions is that they are not based on counts of individual particles. Rather, they are mass size distributions giving the fractions of the total mass of a sampled material lying in each of a number of size intervals. As such, the data are compositional in nature, using the terminology of Aitchison [1] that is, multivariate vectors the components of which add to 100%. In the literature, various versions of Hotelling\\'s T 2 have been used to compare matched pairs of such compositional data. In this paper, we propose a robust test procedure based on ranks as a competitor to Hotelling\\'s T 2. In contrast to the latter statistic, the power of the rank test is not unduly affected by the presence of outliers or of zeros among the data. © 2012 Copyright Taylor and Francis Group, LLC.
Weighted Page Content Rank for Ordering Web Search Result
POOJA SHARMA,
Full Text Available With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for user’s to utilize automated tools in order to find, extract, filter and evaluate the desired information and resources. Web structure mining and content mining plays an effective role in this approach. There are two Ranking algorithms PageRank and Weighted PageRank. PageRank is a commonly used algorithm in Web Structure Mining. Weighted Page Rank also takes the importance of the inlinks and outlinks of the pages but the rank score to all links is not equally distributed. i.e. unequal distribution is performed. In this paper we proposed a new algorithm, Weighted Page Content Rank (WPCRbased on web content mining and structure mining that shows the relevancy of the pages to a given query is better determined, as compared to the existing PageRank and Weighted PageRank algorithms.
A Review of Outcomes of Seven World University Ranking Systems
Mahmood Khosrowjerdi
Full Text Available There are many national and international ranking systems rank the universities and higher education institutions of the world, nationally or internationally, based on the same or different criteria. The question is whether we need all these ranking systems? Are the outcomes of these ranking systems as different as they claim? This study collected data from the results of seven major ranking systems including Shanghai, QS, 4International, Webometrics, HEEACT, and Leiden University ranking and analyzed them. Results showed a significant correlation among the outcomes of these international ranking systems in ranking and rating the world's top 50 universities. The highest correlation was between Shanghai - THE (Spearman's Rho = 0.85; Shanghai - Webometrics (Spearman's Rho = 0.81 and Shanghai - Leiden (Spearman's Rho = 0.80. Finally, some suggestions for improving current ranking systems have been investigated.
A logical framework for ranking landslide inventory maps
Santangelo, Michele; Fiorucci, Federica; Bucci, Francesco; Cardinali, Mauro; Ardizzone, Francesca; Marchesini, Ivan; Cesare Mondini, Alessandro; Reichenbach, Paola; Rossi, Mauro; Guzzetti, Fausto
Landslides inventory maps are essential for quantitative landslide hazard and risk assessments, and for geomorphological and ecological studies. Landslide maps, including geomorphological, event based, multi-temporal, and seasonal inventory maps, are most commonly prepared through the visual interpretation of (i) monoscopic and stereoscopic aerial photographs, (ii) satellite images, (iii) LiDAR derived images, aided by more or less extensive field surveys. Landslide inventory maps are the basic information for a number of different scientific, technical and civil protection purposes, such as: (i) quantitative geomorphic analyses, (ii) erosion studies, (iii) deriving landslide statistics, (iv) urban development planning (v) landslide susceptibility, hazard and risk evaluation, and (vi) landslide monitoring systems. Despite several decades of activity in landslide inventory making, still no worldwide-accepted standards, best practices and protocols exist for the ranking and the production of landslide inventory maps. Standards for the preparation (and/or ranking) of landslide inventories should indicate the minimum amount of information for a landslide inventory map, given the scale, the type of images, the instrumentation available, and the available ancillary data. We recently attempted at a systematic description and evaluation of a total of 22 geomorphological inventories, 6 multi-temporal inventories, 10 event inventories, and 3 seasonal inventories, in the scale range between 1:10,000 and 1:500,000, prepared for areas in different geological and geomorphological settings. All of the analysed inventories were carried out by using image interpretation techniques, or field surveys. Firstly, a detailed characterisation was performed for each landslide inventory, mainly collecting metadata related (i) to the amount of information used for preparing the landslide inventory (i.e. images used, instrumentation, ancillary data, digitalisation method, legend, validation
Low-rank quadratic semidefinite programming
Yuan, Ganzhao
Low rank matrix approximation is an attractive model in large scale machine learning problems, because it can not only reduce the memory and runtime complexity, but also provide a natural way to regularize parameters while preserving learning accuracy. In this paper, we address a special class of nonconvex quadratic matrix optimization problems, which require a low rank positive semidefinite solution. Despite their non-convexity, we exploit the structure of these problems to derive an efficient solver that converges to their local optima. Furthermore, we show that the proposed solution is capable of dramatically enhancing the efficiency and scalability of a variety of concrete problems, which are of significant interest to the machine learning community. These problems include the Top-k Eigenvalue problem, Distance learning and Kernel learning. Extensive experiments on UCI benchmarks have shown the effectiveness and efficiency of our proposed method. © 2012.
Correlated Topic Model for Web Services Ranking
Mustapha AZNAG
Full Text Available With the increasing number of published Web services providing similar functionalities, it’s very tedious for a service consumer to make decision to select the appropriate one according to her/his needs. In this paper, we explore several probabilistic topic models: Probabilistic Latent Semantic Analysis (PLSA, Latent Dirichlet Allocation (LDA and Correlated Topic Model (CTM to extract latent factors from web service descriptions. In our approach, topic models are used as efficient dimension reduction techniques, which are able to capture semantic relationships between word-topic and topic-service interpreted in terms of probability distributions. To address the limitation of keywords-based queries, we represent web service description as a vector space and we introduce a new approach for discovering and ranking web services using latent factors. In our experiment, we evaluated our Service Discovery and Ranking approach by calculating the precision (P@n and normalized discounted cumulative gain (NDCGn.
Compressed Sensing with Rank Deficient Dictionaries
Hansen, Thomas Lundgaard; Johansen, Daniel Højrup; Jørgensen, Peter Bjørn
In compressed sensing it is generally assumed that the dictionary matrix constitutes a (possibly overcomplete) basis of the signal space. In this paper we consider dictionaries that do not span the signal space, i.e. rank deficient dictionaries. We show that in this case the signal-to-noise ratio...... (SNR) in the compressed samples can be increased by selecting the rows of the measurement matrix from the column space of the dictionary. As an example application of compressed sensing with a rank deficient dictionary, we present a case study of compressed sensing applied to the Coarse Acquisition (C....../A) step in a GPS receiver. Simulations show that for this application the proposed choice of measurement matrix yields an increase in SNR performance of up to 5 − 10 dB, compared to the conventional choice of a fully random measurement matrix. Furthermore, the compressed sensing based C/A step is compared...
A linear functional strategy for regularized ranking.
Kriukova, Galyna; Panasiuk, Oleksandra; Pereverzyev, Sergei V; Tkachenko, Pavlo
Regularization schemes are frequently used for performing ranking tasks. This topic has been intensively studied in recent years. However, to be effective a regularization scheme should be equipped with a suitable strategy for choosing a regularization parameter. In the present study we discuss an approach, which is based on the idea of a linear combination of regularized rankers corresponding to different values of the regularization parameter. The coefficients of the linear combination are estimated by means of the so-called linear functional strategy. We provide a theoretical justification of the proposed approach and illustrate them by numerical experiments. Some of them are related with ranking the risk of nocturnal hypoglycemia of diabetes patients.
Deep Impact: Unintended consequences of journal rank
Brembs, Björn
Much has been said about the increasing bureaucracy in science, stifling innovation, hampering the creativity of researchers and incentivizing misconduct, even outright fraud. Many anecdotes have been recounted, observations described and conclusions drawn about the negative impact of impact assessment on scientists and science. However, few of these accounts have drawn their conclusions from data, and those that have typically relied on a few studies. In this review, we present the most recent and pertinent data on the consequences that our current scholarly communication system has had on various measures of scientific quality (such as utility/citations, methodological soundness, expert ratings and retractions). These data confirm previous suspicions: using journal rank as an assessment tool is bad scientific practice. Moreover, the data lead us to argue that any journal rank (not only the currently-favored Impact Factor) would have this negative impact. Therefore, we suggest that abandoning journals altoge...
Detectability of ranking hierarchies in directed networks
Letizia, Elisa; Lillo, Fabrizio
Identifying hierarchies and rankings of nodes in directed graphs is fundamental in many applications such as social network analysis, biology, economics, and finance. A recently proposed method identifies the hierarchy by finding the ordered partition of nodes which minimizes a score function, termed agony. This function penalizes the links violating the hierarchy in a way depending on the strength of the violation. To investigate the detectability of ranking hierarchies we introduce an ensemble of random graphs, the Hierarchical Stochastic Block Model. We find that agony may fail to identify hierarchies when the structure is not strong enough and the size of the classes is small with respect to the whole network. We analytically characterize the detectability threshold and we show that an iterated version of agony can partly overcome this resolution limit.
Image Segmentation by Discounted Cumulative Ranking on Maximal Cliques
Carreira, Joao; Sminchisescu, Cristian
We propose a mid-level image segmentation framework that combines multiple figure-ground hypothesis (FG) constrained at different locations and scales, into interpretations that tile the entire image. The problem is cast as optimization over sets of maximal cliques sampled from the graph connecting non-overlapping, putative figure-ground segment hypotheses. Potential functions over cliques combine unary Gestalt-based figure quality scores and pairwise compatibilities among spatially neighboring segments, constrained by T-junctions and the boundary interface statistics resulting from projections of real 3d scenes. Learning the model parameters is formulated as rank optimization, alternating between sampling image tilings and optimizing their potential function parameters. State of the art results are reported on both the Berkeley and the VOC2009 segmentation dataset, where a 28% improvement was achieved.
BRICS countries in international innovation rankings
RODIONOVA I.; MASSAROVA A.; EPIFANTSEVA A.
The BRICS countries (China, Russia, Brazil, India and South Africa) are the largest emerging markets which are undergoing the processes of economic modernization and restructuring and taking the leading positions on many indicators on the global arena, extending beyond the regional scale. In the article, the positions of the BRICS countries in the international rankings of innovation capabilities will be considered in comparison with the leaders of the global economy. The recommendations for ...
Fuzzy Logic Based Power System Contingency Ranking
A. Y. Abdelaziz
Full Text Available Voltage stability is a major concern in planning and operations of power systems. It is well known that voltage instability and collapse have led to major system failures. Modern transmission networks are more heavily loaded than ever before to meet the growing demand. One of the major consequences resulted from such a stressed system is voltage collapse or instability. This paper presents maximum loadability identification of a load bus in a power transmission network. In this study, Fast Voltage Stability Index (FVSI is utilized as the indicator of the maximum loadability termed as Qmax. In this technique, reactive power loading will be increased gradually at particular load bus until the FVSI reaches close to unity. Therefore, a critical value of FVSI was set as the maximum loadability point. This value ensures the system from entering voltage-collapse region. The main purpose in the maximum loadability assessment is to plan for the maximum allowable load value to avoid voltage collapse; which is important in power system planning risk assessment.The most important task in security analysis is the problem of identifying the critical contingencies from a large list of credible contingencies and ranks them according to their severity. The condition of voltage stability in a power system can be characterized by the use of voltage stability indices. This paper presents fuzzy approach for ranking the contingencies using composite-index based on parallel operated fuzzy inference engine. The Line Flow index (L.F and bus Voltage Magnitude (VM of the load buses are expressed in fuzzy set notation. Further, they are evaluated using Fuzzy rules to obtain overall Criticality Index. Contingencies are ranked based on decreasing order of Criticality Index and then provides the comparison of ranking obtained with FVSI method.
Ranking hubs and authorities using matrix functions
2012-01-01
Wage sensitivity rankings and temporal convergence
Jones, Ronald W.; Neary, J. Peter
1988-01-01
Rank one case of Dwork's conjecture
Wan, D
This paper proves the general rank one case of Dwork's conjecture over the affine space. It generalizes and improves the method of ANT-0141 "Dwork's conjecture on unit root zeta functions" (Ann. Math., 150(1999), 867-929). In addition, explicit information about the zeros and poles (along the Gouv\\^ea-Mazur conjecture direction) for the unit root zeta function is obtained. The paper is to appear in JAMS.
Jana, Madhusudan
Statistical mechanics is self sufficient, written in a lucid manner, keeping in mind the exam system of the universities. Need of study this subject and its relation to Thermodynamics is discussed in detail. Starting from Liouville theorem gradually, the Statistical Mechanics is developed thoroughly. All three types of Statistical distribution functions are derived separately with their periphery of applications and limitations. Non-interacting ideal Bose gas and Fermi gas are discussed thoroughly. Properties of Liquid He-II and the corresponding models have been depicted. White dwarfs and condensed matter physics, transport phenomenon - thermal and electrical conductivity, Hall effect, Magneto resistance, viscosity, diffusion, etc. are discussed. Basic understanding of Ising model is given to explain the phase transition. The book ends with a detailed coverage to the method of ensembles (namely Microcanonical, canonical and grand canonical) and their applications. Various numerical and conceptual problems ar...
Schwabl, Franz
The completely revised new edition of the classical book on Statistical Mechanics covers the basic concepts of equilibrium and non-equilibrium statistical physics. In addition to a deductive approach to equilibrium statistics and thermodynamics based on a single hypothesis - the form of the microcanonical density matrix - this book treats the most important elements of non-equilibrium phenomena. Intermediate calculations are presented in complete detail. Problems at the end of each chapter help students to consolidate their understanding of the material. Beyond the fundamentals, this text demonstrates the breadth of the field and its great variety of applications. Modern areas such as renormalization group theory, percolation, stochastic equations of motion and their applications to critical dynamics, kinetic theories, as well as fundamental considerations of irreversibility, are discussed. The text will be useful for advanced students of physics and other natural sciences; a basic knowledge of quantum mechan...
Ashburn, J R; Ashburn, James R.; Colvert, Paul M.
We introduce a Bayesian mean-value approach for ranking all college football teams using only win-loss data. This approach is unique in that the prior distribution necessary to handle undefeated and winless teams is calculated self-consistently. Furthermore, we will show statistics supporting the validity of the prior distribution. Finally, a brief comparison with other football rankings will be presented.
Biodepolymerization studies of low rank Indian coals
Selvi, V.A.; Banerjee, R.; Ram, L.C.; Singh, G. [FRI, Dhanbad (India). Environmental Management Division
Biodepolymerization of some of the lower rank Indian coals by Pleurotus djamor, Pleurotus citrinopileatus and Aspergillus species were studied in a batch system. The main disadvantage in burning low rank coals is the low calorific values. To get the maximum benefit from the low rank coals, the non fuel uses of coals needs to be explored. The liquefaction of coals is the preliminary processes for such approaches. The present study is undertaken specifically to investigate the optimization of bio depolymerization of Neyveli lignite by P. djmor. The pH of the media reached a constant value of about 7.8 by microbial action. The effect of different carbon and nitrogen sources and influence of chelators and metal ions on depolymerization of lignite were also studied. Lignite was solubilized by P. djamor only to a limited extent without the addition of carbon and nitrogen sources. Sucrose was the best suitable carbon source for coal depolymerization by P. djamor and sodium nitrate followed by urea was the best nitrogen source. The Chelators like salicylic acid, TEA and metal ions Mg{sup 2+}, Fe{sup 3+}, Ca{sup 2+}, Cu{sup 2+}, Mn{sup 2+} has enhanced the lignite solubilization process. The finding of the study showed that, compared to sub-bituminous and bituminous coal, the lignite has higher rate of solubilization activity.
Reduced-Rank Hidden Markov Models
Siddiqi, Sajid M; Gordon, Geoffrey J
We introduce the Reduced-Rank Hidden Markov Model (RR-HMM), a generalization of HMMs that can model smooth state evolution as in Linear Dynamical Systems (LDSs) as well as non-log-concave predictive distributions as in continuous-observation HMMs. RR-HMMs assume an m-dimensional latent state and n discrete observations, with a transition matrix of rank k <= m. This implies the dynamics evolve in a k-dimensional subspace, while the shape of the set of predictive distributions is determined by m. Latent state belief is represented with a k-dimensional state vector and inference is carried out entirely in R^k, making RR-HMMs as computationally efficient as k-state HMMs yet more expressive. To learn RR-HMMs, we relax the assumptions of a recently proposed spectral learning algorithm for HMMs (Hsu, Kakade and Zhang 2009) and apply it to learn k-dimensional observable representations of rank-k RR-HMMs. The algorithm is consistent and free of local optima, and we extend its performance guarantees to cover the RR-...
Ranking environmental liabilities at a petroleum refinery
Lupo, M. [K. W. Brown Environmental Services, College Station, TX (United States)
A new computer model is available to allow the management of a petroleum refinery to prioritize environmental action and construct a holistic approach to remediation. A large refinery may have numerous solid waste management units regulated by the Resource Conservation and Recovery Act (RCRA), as well as process units that emit hazardous chemicals into the environment. These sources can impact several environmental media, potentially including the air, the soil, the groundwater, the unsaturated zone water, and surface water. The number of chemicals of concern may be large. The new model is able to rank the sources by considering the impact of each chemical in each medium from each source in terms of concentration, release rate, and a weighted index based on toxicity. In addition to environmental impact, the sources can be ranked in three other ways: (1) by cost to remediate, (2) by environmental risk reduction caused by the remediation in terms of the decreases in release rate, concentration, and weighted index, and (3) by cost-benefit, which is the environmental risk reduction for each source divided by the cost of the remedy. Ranking each unit in the refinery allows management to use its limited environmental resources in a pro-active strategic manner that produces long-term results, rather than in reactive, narrowly focused, costly, regulatory-driven campaigns that produce only short-term results.
Ranking environmental liabilities in the petroleum industry
1996-12-31
An exploration, production, transportation, or refining company may have numerous discontinued operations, as well as active process units that may emit potentially hazardous chemicals into the environment. These sources can impact different environmental media including air, soil, groundwater, the unsaturated zone above the water table, and surface water. The number of chemicals of concern may be large. A procedure is put forth for ranking the sources by considering the impact of each chemical in each medium from each source in terms of concentration, release rate, and a weighted index based on toxicity. In addition to environmental impact, the sources can be ranked in three other ways: (1) by cost to remediate, (2) by environmental risk reduction, and (3) by cost benefit. Ranking each remediation project enables management to realize the maximum benefit from environmental remediation projects by strategically planning the investment of limited resources. An example is presented in which a subset of the remediation projects is chosen for funding. If the sources were chosen for remediation based on their risk, one would achieve more than 95% of the planned risk reduction at nearly 75% of the total cost of all of the projects. On the other hand, if the projects were selected based on the cost-benefit analysis, 85% of the planned risk reduction could be attained with only 15.5% of the planned budget.
Web Mining Using PageRank Algorithm
Vignesh. V
Full Text Available Data mining is extracting and automatic discovering the web based information has been used as web mining. It is one of the most universal and a dominant application on the Internet and it becomes increasing in size and search tools that combine the results of multiple search engines are becoming more valuable. But, almost none of these studies deals with genetic relation algorithm (GRA, where GRA is one of the evolutionary methods with graph structure. GRA was designed to both increase the effectiveness of search engine and improve their efficiency. GRA considers the correlation coefficient between stock brands as strength, which indicates the relation between nodes in each individual of GRA. The reduced number of hyperlinks provided by GRA in the final generation consists of only the most similar hyperlinks with respect to the query. But, the end user’s not satisfied fully. To improve the satisfaction of user by using Page rank algorithm to measure the importance of a page and to prioritize pages returned from a GRA. It will reduce the user’s searching time. PageRank algorithm works to allocate rank for filtered links based on number of keyword occurred in the content.
Ranking with uncertain labels and its applications
YAN Shuicheng; WANG Huan; LIU Jianzhuang; TANG Xiao'ou; Thomas S.Huang
The techniques for image analysis and classification generally consider the image sample labels fixed and without uncertainties.The rank regression problem studied in this paper is based on the training samples with uncertain labels,which often is the case for the manual estimated image labels.A core ranking model is designed first as the bilinear fusing of multiple candidate kernels.Then,the parameters for feature selection and kernel selection are learned simultaneously by maximum a posteriori for given samples and uncertain labels.The provable convergency Expectation Maximization(EM)method is used for inferring these parameters in an iterative manner.The effectiveness of the proposed algorithm is finally validated by the extensive experiments on age ranking task and human trackingtask.The popular FG-NET and the large scale Yamaha aging database are used for the age estimation experiments,and our algorithm outperforms those state-of-the-art algorithms ever reported by other interrelated literatures significantly.The experiment result of human tracking task also validates its advantage over conventional linear regression algorithm.
Rank Awareness in Joint Sparse Recovery
Davies, Mike E
In this paper we revisit the sparse multiple measurement vector (MMV) problem where the aim is to recover a set of jointly sparse multichannel vectors from incomplete measurements. This problem has received increasing interest as an extension of the single channel sparse recovery problem which lies at the heart of the emerging field of compressed sensing. However the sparse approximation problem has origins which include links to the field of array signal processing where we find the inspiration for a new family of MMV algorithms based on the MUSIC algorithm. We highlight the role of the rank of the coefficient matrix X in determining the difficulty of the recovery problem. We derive the necessary and sufficient conditions for the uniqueness of the sparse MMV solution, which indicates that the larger the rank of X the less sparse X needs to be to ensure uniqueness. We also show that the larger the rank of X the less the computational effort required to solve the MMV problem through a combinatorial search. In ...
Ranking agility factors affecting hospitals in Iran
M. Abdi Talarposht
Full Text Available Background: Agility is an effective response to the changing and unpredictable environment and using these changes as opportunities for organizational improvement. Objective: The aim of the present study was to rank the factors affecting agile supply chain of hospitals of Iran. Methods: This applied study was conducted by cross sectional-descriptive method at some point of 2015 for one year. The research population included managers, administrators, faculty members and experts were selected hospitals. A total of 260 people were selected as sample from the health centers. The construct validity of the questionnaire was approved by confirmatory factor analysis test and its reliability was approved by Cronbach's alpha (α=0.97. All data were analyzed by Kolmogorov-Smirnov, Chi-square and Friedman tests. Findings: The development of staff skills, the use of information technology, the integration of processes, appropriate planning, and customer satisfaction and product quality had a significant impact on the agility of public hospitals of Iran (P<0.001. New product introductions had earned the highest ranking and the development of staff skills earned the lowest ranking. Conclusion: The new product introduction, market responsiveness and sensitivity, reduce costs, and the integration of organizational processes, ratings better to have acquired agility hospitals in Iran. Therefore, planners and officials of hospitals have to, through the promotion quality and variety of services customer-oriented, providing a basis for investing in the hospital and etc to apply for agility supply chain public hospitals of Iran.
Subspace Expanders and Matrix Rank Minimization
Khajehnejad, Amin; Hassibi, Babak
Matrix rank minimization (RM) problems recently gained extensive attention due to numerous applications in machine learning, system identification and graphical models. In RM problem, one aims to find the matrix with the lowest rank that satisfies a set of linear constraints. The existing algorithms include nuclear norm minimization (NNM) and singular value thresholding. Thus far, most of the attention has been on i.i.d. Gaussian measurement operators. In this work, we introduce a new class of measurement operators, and a novel recovery algorithm, which is notably faster than NNM. The proposed operators are based on what we refer to as subspace expanders, which are inspired by the well known expander graphs based measurement matrices in compressed sensing. We show that given an $n\\times n$ PSD matrix of rank $r$, it can be uniquely recovered from a minimal sampling of $O(nr)$ measurements using the proposed structures, and the recovery algorithm can be cast as matrix inversion after a few initial processing s...
The matrix method to calculate page rank
H. Barboucha, M. Nasri
Full Text Available Choosing the right keywords is relatively easy, whereas getting a high PageRank is more complicated. The index Page Rank is what defines the position in the result pages of search engines (for Google of course, but the other engines are now using more or less the same kind of algorithm. It is therefore very important to understand how this type of algorithm functions to hope to appear on the first page of results (the only page read in 95 % of cases or at least be among the first. We propose in this paper to clarify the operation of this algorithm using a matrix method and a JavaScript program enabling to experience this type of analysis. It is of course a simplified version, but it can add value to the website and achieve a high ranking in the search results and reach a larger customer base. The interest is to disclose an algorithm to calculate the relevance of each page. This is in fact a mathematical algorithm based on a web graph. This graph is formed of all the web pages that are modeled by nodes, and hyperlinks that are modeled by arcs.
Rank-dependant factorization of entanglement evolution
Siomau, Michael, E-mail: siomau@nld.ds.mpg.de [Physics Department, Jazan University, P.O. Box 114, 45142 Jazan (Saudi Arabia); Network Dynamics, Max Planck Institute for Dynamics and Self-Organization (MPIDS), 37077 Göttingen (Germany)
Highlights: • In some cases the complex entanglement evolution can be factorized on simple terms. • We suggest factorization equations for multiqubit entanglement evolution. • The factorization is solely defined by the rank of the final state density matrices. • The factorization is independent on the local noisy channels and initial pure states. - Abstract: The description of the entanglement evolution of a complex quantum system can be significantly simplified due to the symmetries of the initial state and the quantum channels, which simultaneously affect parts of the system. Using concurrence as the entanglement measure, we study the entanglement evolution of few qubit systems, when each of the qubits is affected by a local unital channel independently on the others. We found that for low-rank density matrices of the final quantum state, such complex entanglement dynamics can be completely described by a combination of independent factors representing the evolution of entanglement of the initial state, when just one of the qubits is affected by a local channel. We suggest necessary conditions for the rank of the density matrices to represent the entanglement evolution through the factors. Our finding is supported with analytical examples and numerical simulations.
Estimation of rank correlation for clustered data.
Rosner, Bernard; Glynn, Robert J
2017-06-30
Rohatgi, Vijay K
Unified treatment of probability and statistics examines and analyzes the relationship between the two fields, exploring inferential issues. Numerous problems, examples, and diagrams--some with solutions--plus clear-cut, highlighted summaries of results. Advanced undergraduate to graduate level. Contents: 1. Introduction. 2. Probability Model. 3. Probability Distributions. 4. Introduction to Statistical Inference. 5. More on Mathematical Expectation. 6. Some Discrete Models. 7. Some Continuous Models. 8. Functions of Random Variables and Random Vectors. 9. Large-Sample Theory. 10. General Meth
Mandl, Franz
The Manchester Physics Series General Editors: D. J. Sandiford; F. Mandl; A. C. Phillips Department of Physics and Astronomy, University of Manchester Properties of Matter B. H. Flowers and E. Mendoza Optics Second Edition F. G. Smith and J. H. Thomson Statistical Physics Second Edition E. Mandl Electromagnetism Second Edition I. S. Grant and W. R. Phillips Statistics R. J. Barlow Solid State Physics Second Edition J. R. Hook and H. E. Hall Quantum Mechanics F. Mandl Particle Physics Second Edition B. R. Martin and G. Shaw The Physics of Stars Second Edition A. C. Phillips Computing for Scient
Levine-Wissing, Robin
All Access for the AP® Statistics Exam Book + Web + Mobile Everything you need to prepare for the Advanced Placement® exam, in a study system built around you! There are many different ways to prepare for an Advanced Placement® exam. What's best for you depends on how much time you have to study and how comfortable you are with the subject matter. To score your highest, you need a system that can be customized to fit you: your schedule, your learning style, and your current level of knowledge. This book, and the online tools that come with it, will help you personalize your AP® Statistics prep
Freund, Rudolf J; Wilson, William J
Statistical Methods, 3e provides students with a working introduction to statistical methods offering a wide range of applications that emphasize the quantitative skills useful across many academic disciplines. This text takes a classic approach emphasizing concepts and techniques for working out problems and intepreting results. The book includes research projects, real-world case studies, numerous examples and data exercises organized by level of difficulty. This text requires that a student be familiar with algebra. New to this edition: NEW expansion of exercises a
Davidson, Norman
Clear and readable, this fine text assists students in achieving a grasp of the techniques and limitations of statistical mechanics. The treatment follows a logical progression from elementary to advanced theories, with careful attention to detail and mathematical development, and is sufficiently rigorous for introductory or intermediate graduate courses.Beginning with a study of the statistical mechanics of ideal gases and other systems of non-interacting particles, the text develops the theory in detail and applies it to the study of chemical equilibrium and the calculation of the thermody
LIU TianQing; LIN Nan; ZHANG BaoXue
Ranked-set sampling (RSS) often provides more efficient inference than simple random sampling (SRS). In this article, we propose a systematic nonparametric technique, RSS-EL, for hypothesis testing and interval estimation with balanced RSS data using empirical likelihood (EL). We detail the approach for interval estimation and hypothesis testing in one-sample and two-sample problems and general estimating equations. In all three cases, RSS is shown to provide more efficient inference than SRS of the same size. Moreover, the RSS-EL method does not require any easily violated assumptions needed by existing rank-based nonparametric methods for RSS data, such as perfect ranking, identical ranking scheme in two groups, and location shift between two population distributions. The merit of the RSS-EL method is also demonstrated through simulation studies.
Ranking welding intensity in pyroclastic deposits
Quane, S. L.; Russell, J. K.
Pyroclastic deposits emplaced at high temperatures and having sufficient thickness become welded. The welding process involves sintering, compaction and flattening of hot glassy pyroclastic material and is attended by systematic changes in physical properties. Historically, the terms nonwelded, incipiently welded, partially welded with pumice, partially welded with fiamme, moderately welded and densely welded have been used as field descriptors for welding intensity (e.g., Smith &Bailey, 1966; Smith, 1979; Ross &Smith, 1980; Streck &Grunder, 1995). While using these descriptive words is often effective for delineating variations of welding intensity within a single deposit, their qualitative character does not provide for consistency between field areas or workers, and inhibits accurate comparison between deposits. Hence, there is a need for a universal classification of welding intensity in pyroclastic deposits. Here we develop an objective ranking system. The system recognizes 8 ranks (I to VIII) based on measurements of physical properties and petrographic characteristics. The physical property measurements include both lab and field observations: density, porosity, uniaxial compressive strength, point load strength, fiamme elongation, and foliation/fabric. The values are normalized in order to make the system universal. The rank divisions are adaptations of a rock mass-rating scheme based on rock strength (Hoek &Brown, 1980) and previous divisions of welding degree based on physical properties (e.g., density: Ragan &Sheridan, 1972, Streck &Grunder, 1995; fiamme elongation: Peterson, 1979). Each rank comprises a range of normalized values for each of the physical properties and a corresponding set of petrographic characteristics. Our new ranking system provides a consistent, objective means by which each sample or section of welded tuff can be evaluated, thus providing a much needed uniformity in nomenclature for degree of welding. References: Hoek, E. &Brown, E
Discovering motifs in ranked lists of DNA sequences.
Eran Eden
Full Text Available Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP-chip (chromatin immuno-precipitation on a microarray measurements. Several major challenges in sequence motif discovery still require consideration: (i the need for a principled approach to partitioning the data into target and background sets; (ii the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii the need for an appropriate framework for accounting for motif multiplicity; (iv the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs, which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP-chip and CpG methylation data and obtained the following results. (i Identification of 50 novel putative transcription factor (TF binding sites in yeast ChIP-chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked
Gallavotti, Giovanni
C. Cercignani: A sketch of the theory of the Boltzmann equation.- O.E. Lanford: Qualitative and statistical theory of dissipative systems.- E.H. Lieb: many particle Coulomb systems.- B. Tirozzi: Report on renormalization group.- A. Wehrl: Basic properties of entropy in quantum mechanics.
Asian top universities in six world university ranking systems
Mahmood Khosrowjerdi
2013-12-01
Full Text Available There are a variety of ranking systems for universities throughout the different continents of the world. The majority of the world ranking systems have paid special attention toward evaluation of universities and higher education institutions at the national and international level. This paper tries to study the similarities and status of top Asian universities in the list of top 200 universities by these world ranking systems. Findings show that there are some parallelisms among these international rankings. For example it was found some correlations between QS-Webometrics rankings (R= 0.78; QS-THE rankings (R= 0.53; and Shanghai-HEEACT rankings (R= 0.58. The highest correlation rate belongs to QS-Webometrics (R=0.78. The findings show no evidence to prove that the origin country of ranking system has any bias toward the rank of universities of its own country among other countries. For instance QS ranking of the United States classifies many universities of China and Japan as top Asian universities. HEEACT Ranking System of Taiwan includes just one university of Taiwan in the high ranking category (as other rankings do. Shanghai Ranking of China assigns a lower grade to universities of China and Hong Kong in comparison with QS ranking of the USA. Finally, some suggestions are made to improve the benefits of the ranking systems in order to promote the situation of higher education in the world, and recommendations for combining the indicators of these ranking systems to have a more comprehensive one for the world.
Fan, Xinghua; Kubwabo, Cariton; Rasmussen, Pat E; Wu, Fang
An analytical method for the simultaneous determination of 13 organophosphate esters (OPEs) in house dust was developed. The method is based on solvent extraction by sonication, sample cleanup by solid phase extraction (SPE), and analysis by gas chromatography-positive chemical ionization-tandem mass spectrometry (GC/PCI-MS/MS). Method detection limits (MDLs) ranged from 0.03 to 0.43 μg/g and recoveries from 60% to 118%. The inter- and intra-day variations ranged from 3% to 23%. The method was applied to dust samples collected using two vacuum sampling techniques from 134 urban Canadian homes: a sample of fresh or "active" dust (FD) collected by technicians and a composite sample taken from the household vacuum cleaner (HD). Results show that the two sampling methods (i.e., FD vs HD) provided comparable results. Tributoxyethyl phosphate (TBEP), triphenyl phosphate (TPhP), tris(chloropropyl) phosphate (TCPP), tri(2-chloroethyl) phosphate (TCEP), tris(dichloro-isopropyl) phosphate (TDCPP), tricresyl phosphate (TCrP), and tri-n-butyl phosphate (TnBP) were detected in the majority of samples. The most predominant OPE was TBEP, with median concentrations of 31.9 μg/g and 22.8 μg/g in FD and HD samples, respectively, 1 to 2 orders of magnitude higher than other OPEs. The method was also applied to the analysis of OPEs in the National Institute of Standards and Technology (NIST) standard reference material (NIST SRM 2585, organic contaminants in house dust). The results from SRM 2585 may contribute to the certification of OPE concentration values in this SRM. Crown Copyright © 2013. Published by Elsevier B.V. All rights reserved.
Donald W. Zimmerman
2004-01-01
Limitations of log-rank tests for analysing longevity data in biogerontology.
Le Bourg, Eric
2014-08-01
Normalized entropy of rank distribution: a novel measure of heterogeneity of complex networks
Wu Jun; Tan Yue-Jin; Deng Hong-Zhong; Zhu Da-Zhi
2007-01-01
A scientometrics law about co-authors and their ranking. The co-author core
Ausloos, Marcel
2012-01-01
Theoretical analysis on convergence behavior of rank filters
YE; Wanzhou
2004-01-01
Carlos-Roberto Peña-Barrera
Full Text Available Los principales objetivos de esta investigación son los siguientes: (1 que la comunidad científica nacional e internacional y la sociedad en general co-nozcan los resultados del Ranking U-Sapiens Colombia 2010_2, el cual clasifica a cada institución de educación superior colombiana según puntaje, posición y cuartil; (2 destacar los movimientos más importantes al comparar los resultados del ranking 2010_1 con los del 2010_2; (3 publicar las respuestas de algunos actores de la academia nacional con respecto a la dinámica de la investigación en el país; (4 reconocer algunas instituciones, medios de comunicación e investigadores que se han interesado a modo de reflexión, referenciación o citación por esta investigación; y (5 dar a conocer el «Sello Ranking U-Sapiens Colombia» para las IES clasificadas. El alcance de este estudio en cuanto a actores abordó todas y cada una de las IES nacionales (aunque solo algunas lograran entrar al ranking y en cuanto a tiempo, un periodo referido al primer semestre de 2010 con respecto a: (1 los resultados 2010-1 de revistas indexadas en Publindex, (2 los programas de maestrías y doctorados activos durante 2010-1 según el Ministerio de Educación Nacional, y (3 los resultados de grupos de investigación clasificados para 2010 según Colciencias. El método empleado para esta investigación es el mismo que para el ranking 2010_1, salvo por una especificación aún más detallada en uno de los pasos del modelo (las variables α, β, γ; es completamente cuantitativo y los datos de las variables que fundamentan sus resultados provienen de Colciencias y el Ministerio de Educación Nacional; y en esta ocasión se darán a conocer los resultados por variable para 2010_1 y 2010_2. Los resultados más relevantes son estos: (1 entraron 8 IES al ranking y salieron 3; (2 las 3 primeras IES son públicas; (3 en total hay 6 instituciones universitarias en el ranking; (4 7 de las 10 primeras IES son
West, Caroline; Khalikova, Maria A; Lesellier, Eric; Héberger, Károly
2015-08-28
The identification of a suitable stationary phase in supercritical fluid chromatography (SFC) is a major source of difficulty for those with little experience in this technique. Several protocols have been suggested for column classification in high-performance liquid chromatography (HPLC), gas chromatography (GC), and SFC. However, none of the proposed classification schemes received general acceptance. A fair way to compare columns was proposed with the sum of ranking differences (SRD). In this project, we used the retention data obtained for 86 test compounds with varied polarity and structure, analyzed on 71 different stationary phases encompassing the full range in polarity of commercial packed columns currently available to the SFC chromatographer, with a single set of mobile phase and operating conditions (carbon dioxide-methanol mobile phase, 25°C, 150bar outlet pressure, 3ml/min). First, a reference column was selected and the 70 remaining columns were ranked based on this reference column and the retention data obtained on the 86 analytes. As these analytes previously served for the calculation of linear solvation energy relationships (LSER) on the 71 columns, SRD ranks were compared to LSER methodology. Finally, an external comparison based on the analysis of 10 other analytes (UV filters) related the observed selectivity to SRD ranking. Comparison of elution orders of the UV filters to the SRD rankings is highly supportive of the adequacy of SRD methodology to select similar and dissimilar columns.
Irreducible Killing Tensors from Third Rank Killing-Yano Tensors
Popa, Florian Catalin; Tintareanu-Mircea, Ovidiu
We investigate higher rank Killing-Yano tensors showing that third rank Killing-Yano tensors are not always trivial objects being possible to construct irreducible Killing tensors from them. We give as an example the Kimura IIC metric were from two rank Killing-Yano tensors we obtain a reducible Killing tensor and from third rank Killing-Yano tensors we obtain three Killing tensors, one reducible and two irreducible.
The ranks of Maiorana-McFarland bent functions
WENG GuoBiao; FENG RongQuan; QIU WeiSheng; ZHENG ZhiMing
In this paper,the ranks of a special family of Maiorana-McFarland bent functions are discussed.The upper and lower bounds of the ranks are given and those bent functions whose ranks achieve these bounds are determined.As a consequence,the inequivalence of some bent functions are derived.Furthermore,the ranks of the functions of this family are calculated when t≤6.
Natrella, Mary Gibbons
Formulated to assist scientists and engineers engaged in army ordnance research and development programs, this well-known and highly regarded handbook is a ready reference for advanced undergraduate and graduate students as well as for professionals seeking engineering information and quantitative data for designing, developing, constructing, and testing equipment. Topics include characterizing and comparing the measured performance of a material, product, or process; general considerations in planning experiments; statistical techniques for analyzing extreme-value data; use of transformations
Rasch analysis for the evaluation of rank of student response time in multiple choice examinations.
Thompson, James J; Yang, Tong; Chauvin, Sheila W
The availability of computerized testing has broadened the scope of person assessment beyond the usual accuracy-ability domain to include response time analyses. Because there are contexts in which speed is important, e.g. medical practice, it is important to develop tools by which individuals can be evaluated for speed. In this paper, the ability of Rasch measurement to convert ordinal nonparametric rankings of speed to measures is examined and compared to similar measures derived from parametric analysis of response times (pace) and semi-parametric logarithmic time-scaling procedures. Assuming that similar spans of the measures were used, non-parametric methods of raw ranking or percentile-ranking of persons by questions gave statistically acceptable person estimates of speed virtually identical to the parametric or semi-parametric methods. Because no assumptions were made about the underlying time distributions with ranking, generality of conclusions was enhanced. The main drawbacks of the non-parametric ranking procedures were the lack of information on question duration and the overall assignment by the model of variance to the person by question interaction.
Estimation of (near) low-rank matrices with noise and high-dimensional scaling
Negahban, Sahand
High-dimensional inference refers to problems of statistical estimation in which the ambient dimension of the data may be comparable to or possibly even larger than the sample size. We study an instance of high-dimensional inference in which the goal is to estimate a matrix $\\Theta^* \\in \\real^{k \\times p}$ on the basis of $N$ noisy observations, and the unknown matrix $\\Theta^*$ is assumed to be either exactly low rank, or ``near'' low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider an $M$-estimator based on regularization by the trace or nuclear norm over matrices, and analyze its performance under high-dimensional scaling. We provide non-asymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and then illustrate their consequences for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low-rank matrices fro...
Akbudak, Kadir
2017-05-11
Annihilating Filter-Based Low-Rank Hankel Matrix Approach for Image Inpainting.
Jin, Kyong Hwan; Ye, Jong Chul
In this paper, we propose a patch-based image inpainting method using a low-rank Hankel structured matrix completion approach. The proposed method exploits the annihilation property between a shift-invariant filter and image data observed in many existing inpainting algorithms. In particular, by exploiting the commutative property of the convolution, the annihilation property results in a low-rank block Hankel structure data matrix, and the image inpainting problem becomes a low-rank structured matrix completion problem. The block Hankel structured matrices are obtained patch-by-patch to adapt to the local changes in the image statistics. To solve the structured low-rank matrix completion problem, we employ an alternating direction method of multipliers with factorization matrix initialization using the low-rank matrix fitting algorithm. As a side product of the matrix factorization, locally adaptive dictionaries can be also easily constructed. Despite the simplicity of the algorithm, the experimental results using irregularly subsampled images as well as various images with globally missing patterns showed that the proposed method outperforms existing state-of-the-art image inpainting methods.
Meneghetti, M; Dahle, H; Limousin, M
The existence of an arc statistics problem was at the center of a strong debate in the last fifteen years. With the aim to clarify if the optical depth for giant gravitational arcs by galaxy clusters in the so called concordance model is compatible with observations, several studies were carried out which helped to significantly improve our knowledge of strong lensing clusters, unveiling their extremely complex internal structure. In particular, the abundance and the frequency of strong lensing events like gravitational arcs turned out to be a potentially very powerful tool to trace the structure formation. However, given the limited size of observational and theoretical data-sets, the power of arc statistics as a cosmological tool has been only minimally exploited so far. On the other hand, the last years were characterized by significant advancements in the field, and several cluster surveys that are ongoing or planned for the near future seem to have the potential to make arc statistics a competitive cosmo...
Tutorial: Calculating Percentile Rank and Percentile Norms Using SPSS
Baumgartner, Ted A.
Practitioners can benefit from using norms, but they often have to develop their own percentile rank and percentile norms. This article is a tutorial on how to quickly and easily calculate percentile rank and percentile norms using SPSS, and this information is presented for a data set. Some issues in calculating percentile rank and percentile…
Synthesis of Partial Rankings of Points of Interest Using Crowdsourcing
Keles, Ilkcan; Saltenis, Simonas; Jensen, Christian Søndergaard
to the query keywords and the query location. A key challenge in being able to make progress on the design of ranking functions is to be able to assess the quality of the results returned by ranking functions. We propose a model that synthesizes a ranking of points of interest from answers to crowdsourced...
Variation in rank abundance replicate samples and impact of clustering
Neuteboom, J.H.; Struik, P.C.
Calculating a single-sample rank abundance curve by using the negative-binomial distribution provides a way to investigate the variability within rank abundance replicate samples and yields a measure of the degree of heterogeneity of the sampled community. The calculation of the single-sample rank a
Tutorial: Calculating Percentile Rank and Percentile Norms Using SPSS
Baumgartner, Ted A.
Practitioners can benefit from using norms, but they often have to develop their own percentile rank and percentile norms. This article is a tutorial on how to quickly and easily calculate percentile rank and percentile norms using SPSS, and this information is presented for a data set. Some issues in calculating percentile rank and percentile…
Amalgams of Rank 2 and Characteristic 3 Involving M11
黄建华; 李慧陵
In this paper we investigate the amalgams (M11,X) of rank 2 and characteristic 3,where X is a group of Lie type of rank 1 or permutation group of low rank,and give a characterization of the 3 local subgroups of the finite sporadic groups Co3 of Ly and Suz.
10 CFR 455.131 - State ranking of grant applications.
2010-01-01
25 CFR 1001.3 - Priority ranking for negotiations.
... 25 Indians 2 2010-04-01 2010-04-01 false Priority ranking for negotiations. 1001.3 Section 1001.3... PROGRAM § 1001.3 Priority ranking for negotiations. In addition to the eligibility criteria identified above, a tribe or consortium of tribes seeking priority ranking for negotiations must submit...
Academic Ranking--From Its Genesis to Its International Expansion
Vieira, Rosilene C.; Lima, Manolita C.
Given the visibility and popularity of rankings that encompass the measurement of quality of post-graduate courses, for instance, the MBA (Master of Business Administration) or graduate studies program (MSc and PhD) as do global academic rankings--Academic Ranking of World Universities-ARWU, Times Higher/Thomson Reuters World University Ranking…
Ranking Quality in Higher Education: Guiding or Misleading?
Bergseth, Brita; Petocz, Peter; Abrandt Dahlgren, Madeleine
The study examines two different models of measuring, assessing and ranking quality in higher education. Do different systems of quality assessment lead to equivalent conclusions about the quality of education? This comparative study is based on the rankings of 24 Swedish higher education institutions. Two ranking actors have independently…
THE RANK AND COEXPONENT OF A FINITE P-GROUP
Yujie MA
In this paper, we present a sharp bound for the rank of a finite p-group in terms of its coexponent. As to finite p-groups with p odd, we also give a sufficient condition for which the normal rank is equal to its rank.
Control by Numbers: New Managerialism and Ranking in Higher Education
Lynch, Kathleen
This paper analyses the role of rankings as an instrument of new managerialism. It shows how rankings are reconstituting the purpose of universities, the role of academics and the definition of what it is to be a student. The paper opens by examining the forces that have facilitated the emergence of the ranking industry and the ideologies…
Ranking Scholarly Publishers in Political Science: An Alternative Approach
Garand, James C.; Giles, Micheal W.
Previous research has documented how political scientists evaluate and rank scholarly journals, but the evaluation and ranking of scholarly book publishers has drawn less attention. In this article, we use data from a survey of 603 American political scientists to generate a ranking of scholarly publishers in political science. We used open-ended…
An Improved Technique for Ranking Semantic Associations
S Narayana
Full Text Available The primary focus of the search techniques in the first generation of the Web is accessing relevant documents from the Web. Though it satisfies user requirements, but it is insufficient as the user sometimes wishes to access actionable information involving complex relationships between two given entities. Finding such complex relationships (also known as semantic associations is especially useful in applications such as National Security, Pharmacy, Business Intelligence etc. Therefore the next frontier is discovering relevant semantic associations between two entities present in large semantic metadata repositories. Given two entities, there exist a huge number of semantic associations between two entities. Hence ranking of these associations is required in order to find more relevant associations. For this Aleman Meza et al. proposed a method involving six metrics viz. context, subsumption, rarity, popularity, association length and trust. To compute the overall rank of the associations this method computes context, subsumption, rarity and popularity values for each component of the association and for all the associations. However it is obvious that, many components appears repeatedly in many associations therefore it is not necessary to compute context, subsumption, rarity, popularity, and trust values of the components every time for each association rather the previously computed values may be used while computing the overall rank of the associations. This paper proposes a method to reuse the previously computed values using a hash data structure thus reduce the execution time. To demonstrate the effectiveness of the proposed method, experiments were conducted on SWETO ontology. Results show that the proposed method is more efficient than the other existing methods.
Robust Generalized Low Rank Approximations of Matrices.
Jiarong Shi
Robust Generalized Low Rank Approximations of Matrices.
Shi, Jiarong; Yang, Wei; Zheng, Xiuyun
In recent years, the intrinsic low rank structure of some datasets has been extensively exploited to reduce dimensionality, remove noise and complete the missing entries. As a well-known technique for dimensionality reduction and data compression, Generalized Low Rank Approximations of Matrices (GLRAM) claims its superiority on computation time and compression ratio over the SVD. However, GLRAM is very sensitive to sparse large noise or outliers and its robust version does not have been explored or solved yet. To address this problem, this paper proposes a robust method for GLRAM, named Robust GLRAM (RGLRAM). We first formulate RGLRAM as an l1-norm optimization problem which minimizes the l1-norm of the approximation errors. Secondly, we apply the technique of Augmented Lagrange Multipliers (ALM) to solve this l1-norm minimization problem and derive a corresponding iterative scheme. Then the weak convergence of the proposed algorithm is discussed under mild conditions. Next, we investigate a special case of RGLRAM and extend RGLRAM to a general tensor case. Finally, the extensive experiments on synthetic data show that it is possible for RGLRAM to exactly recover both the low rank and the sparse components while it may be difficult for previous state-of-the-art algorithms. We also discuss three issues on RGLRAM: the sensitivity to initialization, the generalization ability and the relationship between the running time and the size/number of matrices. Moreover, the experimental results on images of faces with large corruptions illustrate that RGLRAM obtains the best denoising and compression performance than other methods.
Prioritization and Ranking Problems Exporting Iranian Saffron
Reza Askandari
Full Text Available The main objective of this study is to prioritize and rank problems exporting Iranian saffron. Based on a comprehensive review of the literature on issues related to data and information collected from questionnaires, with 816% reliability is used. In this study, the sample of managers and sales and marketing company is exporting Iranian saffron is the sample size of 231 subjects. The results of this study are positive and significant relationship between the dependent variable and the independent variables of export performance of export barriers exist.
Tecer sobe no ranking da Capes
José Aparecido
Full Text Available Surpresa ainda maior foi verificar que prosseguimos no rumo da consolidação, crescendo no ranking – chegando a B3 em alguns campos, como pode ser visto no portal de buscas do Qualis Capes http://qualis.capes.gov.br/webqualis/principal.seamhttp://qualis.capes.gov, que apresenta nossa classificação abaixo: B3 ADMINISTRAÇÃO, CIÊNCIAS CONTÁBEIS E TURISMO B4 CIÊNCIAS SOCIAIS APLICADAS I B4 EDUCAÇÃO B4 INTERDISCIPLINAR B5 DIREITO B5 HISTÓRIA C CIÊNCIA DA COMPUTAÇÃO
Simple approach for ranking structure determining residues.
Luna-Martínez, Oscar D; Vidal-Limón, Abraham; Villalba-Velázquez, Miryam I; Sánchez-Alcalá, Rosalba; Garduño-Juárez, Ramón; Uversky, Vladimir N; Becerril, Baltazar
Mutating residues has been a common task in order to study structural properties of the protein of interest. Here, we propose and validate a simple method that allows the identification of structural determinants; i.e., residues essential for preservation of the stability of global structure, regardless of the protein topology. This method evaluates all of the residues in a 3D structure of a given globular protein by ranking them according to their connectivity and movement restrictions without topology constraints. Our results matched up with sequence-based predictors that look up for intrinsically disordered segments, suggesting that protein disorder can also be described with the proposed methodology.
Dissipative homoclinic loops and rank one chaos
Wang, Qiudong; Ott, William
We prove that when subjected to periodic forcing of the form $p_{\\mu, \\rh, \\om} (t) = \\mu (\\rh h(x,y) + \\sin (\\om t))$, certain second order systems of differential equations with dissipative homoclinic loops admit strange attractors with SRB measures for a set of forcing parameters $(\\mu, \\rh, \\om)$ of positive measure. Our proof applies the recent theory of rank one maps, developed by Wang and Young based on the analysis of strongly dissipative H\\'enon maps by Benedicks and Carleson.
Compressive Sensing via Nonlocal Smoothed Rank Function.
Fan, Ya-Ru; Huang, Ting-Zhu; Liu, Jun; Zhao, Xi-Le
Compressive sensing (CS) theory asserts that we can reconstruct signals and images with only a small number of samples or measurements. Recent works exploiting the nonlocal similarity have led to better results in various CS studies. To better exploit the nonlocal similarity, in this paper, we propose a non-convex smoothed rank function based model for CS image reconstruction. We also propose an efficient alternating minimization method to solve the proposed model, which reduces a difficult and coupled problem to two tractable subproblems. Experimental results have shown that the proposed method performs better than several existing state-of-the-art CS methods for image reconstruction.
Tecer sobe no ranking da Capes
José Aparecido
Surpresa ainda maior foi verificar que prosseguimos no rumo da consolidação, crescendo no ranking – chegando a B3 em alguns campos, como pode ser visto no portal de buscas do Qualis Capes http://qualis.capes.gov.br/webqualis/principal.seamhttp://qualis.capes.gov, que apresenta nossa classificação abaixo: B3 ADMINISTRAÇÃO, CIÊNCIAS CONTÁBEIS E TURISMO B4 CIÊNCIAS SOCIAIS APLICADAS I B4 EDUCAÇÃO B4 INTERDISC...
Simple approach for ranking structure determining residues
Oscar D. Luna-Martínez
Full Text Available Mutating residues has been a common task in order to study structural properties of the protein of interest. Here, we propose and validate a simple method that allows the identification of structural determinants; i.e., residues essential for preservation of the stability of global structure, regardless of the protein topology. This method evaluates all of the residues in a 3D structure of a given globular protein by ranking them according to their connectivity and movement restrictions without topology constraints. Our results matched up with sequence-based predictors that look up for intrinsically disordered segments, suggesting that protein disorder can also be described with the proposed methodology.
Spiders for rank 2 Lie algebras
Kuperberg, G
1996-01-01
Do PageRank-based author rankings outperform simple citation counts?
Fiala, Dalibor; Žitnik, Slavko; Bajec, Marko
2015-01-01
Torres-Salinas, Daniel
2015-12-01
Qingzhai FAN; Xiaochun FANG
2009-01-01
The authors prove that the crossed product of an infinite dimensional simple separable unital C*-algebra with stable rank one by an action of a finite group with the tracial Rokhlin property has again stable rank one.It iS also proved that the crossedproduct of an infinite dimensional simple separable unital C*-algebra with real rank zero by an action of a finite group with the tracial Rokhlin property has again real rank zero.
In 1975 John Tukey proposed a multivariate median which is the 'deepest' point in a given data cloud in R^d. Later, in measuring the depth of an arbitrary point z with respect to the data, David Donoho and Miriam Gasko considered hyperplanes through z and determined its 'depth' by the smallest portion of data that are separated by such a hyperplane. Since then, these ideas has proved extremely fruitful. A rich statistical methodology has developed that is based on data depth and, more general...
Sheffield, Scott
2009-01-01
Rank hypocrisies the insult of the REF
Sayer, Derek
2015-01-01
Rational and real positive semidefinite rank can be different
Gouveia, João; Fawzi, Hamza; Robinson, Richard Z.
Given a nonnegative matrix M with rational entries, we consider two quantities: the usual positive semidefinite (psd) rank, where the matrix is factored through the cone of real symmetric psd matrices, and the rational-restricted psd rank, where the matrix factors are required to be rational symmetric psd matrices. It is clear that the rational-restricted psd rank is always an upper bound to the usual psd rank. We show that this inequality may be strict by exhibiting a matrix with psd rank fo...
Generalization Performance of Regularized Ranking With Multiscale Kernels.
Zhou, Yicong; Chen, Hong; Lan, Rushi; Pan, Zhibin
The regularized kernel method for the ranking problem has attracted increasing attentions in machine learning. The previous regularized ranking algorithms are usually based on reproducing kernel Hilbert spaces with a single kernel. In this paper, we go beyond this framework by investigating the generalization performance of the regularized ranking with multiscale kernels. A novel ranking algorithm with multiscale kernels is proposed and its representer theorem is proved. We establish the upper bound of the generalization error in terms of the complexity of hypothesis spaces. It shows that the multiscale ranking algorithm can achieve satisfactory learning rates under mild conditions. Experiments demonstrate the effectiveness of the proposed method for drug discovery and recommendation tasks.
Mallik, Saurav; Maulik, Ujjwal
2015-10-01
Robins, Vanessa; Turner, Katharine
2016-11-01
OPG/RANKL/RANK cytokine system in renal osteodystrophy
Ivica Avberšek-Lužnik
Full Text Available Background: Renal osteodystrophy is one of the most common complications affecting patients with endstage renal disease treated with hemodialysis (HD. The action of calciotropic hormones in renal osteodystrophy is regulated by the OPG/RANKL/RANK system. Its function is modulated by interleukines, calcitriol and parathyroid hormone (PTH.The aim of our study was to confirm that this system is involved in the pathogenesis of renal osteodystrophy and supports the mechanism of PTH action on bone.Methods: 106 HD patients (mean age 60 years and 50 healthy volunteers (mean age 64 years were enrolled in the study. In serum samples of patients and controls we determined concentrations of OPG, RANKL, tartarat resistant acid phosphatase 5b (TRAP 5b, serum Cterminal telopeptide cross-links of type I collagen (CTx, bone specific alkaline phosphatase (BALP, osteocalcin (OC and parathyroid hormone (PTH. We compared serum measurements of HD patients and controls and assessed the correlation of OPG and RANKL with bone markers. The most frequent OPG promotor gene polymorphisms were also determined. SPSS 12.1 for Windows was used for statistical analysis.Results: Median OPG concentrations were approximately three times higher in HD patients (0.804 µg/l than in healthy volunteers (0.272 µg/l. Mean serum RANKL concentrations were 1.66- fold higher in HD patients (1.36 pmol/l than in controls (0.82 pmol/l. Serum RANKL levels significantly differed between patients with and without calcitriol therapy (p = 0.001. After dividing HD patients into tertiles according to PTH, we observed significantly higher OPG values in the lower and RANKL in the upper tertile (p < 0.001. OPG did not correlate with bone resorption markers. Only weak correlation of bone formation markers with osteocalcin was noted. In contrast to OPG, RANKL correlated well with PTH, OC and CTX. OPG promoter gene polymorphisms (149 T → C, 163 A → G, 950 T → C do not influence OPG expression and
Reynaud-Bouret, Patricia; Laurent, Béatrice
Considering two independent Poisson processes, we address the question of testing equality of their respective intensities. We construct multiple testing procedures from the aggregation of single tests whose testing statistics come from model selection, thresholding and/or kernel estimation methods. The corresponding critical values are computed through a non-asymptotic wild bootstrap approach. The obtained tests are proved to be exactly of level $\\alpha$, and to satisfy non-asymptotic oracle type inequalities. From these oracle type inequalities, we deduce that our tests are adaptive in the minimax sense over a large variety of classes of alternatives based on classical and weak Besov bodies in the univariate case, but also Sobolev and anisotropic Nikol'skii-Besov balls in the multivariate case. A simulation study furthermore shows that they strongly perform in practice.
Fitting Ranked Linguistic Data with Two-Parameter Functions
Wentian Li
2010-07-01
Ranking de universidades chilenas: un análisis multivariado
Firinguetti Limone, Luis
2015-06-01
EU Country Rankings' Sensitivity to the Choice of Welfare Indicators
Hussain, M. Azhar
2016-01-01
Tensor Rank and Stochastic Entanglement Catalysis for Multipartite Pure States
Chen, Lin; Duan, Runyao; Ji, Zhengfeng; Winter, Andreas
2010-01-01
Noma, Hisashi; Matsui, Shigeyuki
2013-05-20
Low-Rank Positive Semidefinite Matrix Recovery From Corrupted Rank-One Measurements
Li, Yuanxin; Sun, Yue; Chi, Yuejie
2017-01-01
Paine, Gregory Harold
1982-03-01
Re-ranking via User Feedback: Georgetown University at TREC 2015 DD Track
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
Steinfeld Israel
Full Text Available Abstract Background Since the inception of the GO annotation project, a variety of tools have been developed that support exploring and searching the GO database. In particular, a variety of tools that perform GO enrichment analysis are currently available. Most of these tools require as input a target set of genes and a background set and seek enrichment in the target set compared to the background set. A few tools also exist that support analyzing ranked lists. The latter typically rely on simulations or on union-bound correction for assigning statistical significance to the results. Results GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. This is particularly useful in many typical cases where genomic data may be naturally represented as a ranked list of genes (e.g. by level of expression or of differential expression. GOrilla employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, called mHG, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. This enables rigorous statistical analysis of thousand of genes and thousands of GO terms in order of seconds. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms. Conclusion GOrilla is an efficient GO analysis tool with unique features that make a useful addition to the existing repertoire of GO enrichment tools. GOrilla's unique features and advantages over other threshold free enrichment tools include rigorous statistics, fast running time and an effective graphical representation. GOrilla is publicly available at: http://cbl-gorilla.cs.technion.ac.il
Classifying and ranking DMUs in interval DEA
GUO Jun-peng; WU Yu-hua; LI Wen-hua
During efficiency evaluating by DEA, the inputs and outputs of DMUs may be intervals because of insufficient information or measure error. For this reason, interval DEA is proposed. To make the efficiency scores more discriminative, this paper builds an Interval Modified DEA (IMDEA) model based on MDEA.Furthermore, models of obtaining upper and lower bounds of the efficiency scores for each DMU are set up.Based on this, the DMUs are classified into three types. Next, a new order relation between intervals which can express the DM' s preference to the three types is proposed. As a result, a full and more eonvietive ranking is made on all the DMUs. Finally an example is given.
PSPACE Bounds for Rank-1 Modal Logics
Schröder, Lutz
For lack of general algorithmic methods that apply to wide classes of logics, establishing a complexity bound for a given modal logic is often a laborious task. The present work is a step towards a general theory of the complexity of modal logics. Our main result is that all rank-1 logics enjoy a shallow model property and thus are, under mild assumptions on the format of their axiomatisation, in PSPACE. This leads to a unified derivation of tight PSPACE-bounds for a number of logics including K, KD, coalition logic, graded modal logic, majority logic, and probabilistic modal logic. Our generic algorithm moreover finds tableau proofs that witness pleasant proof-theoretic properties including a weak subformula property. This generality is made possible by a coalgebraic semantics, which conveniently abstracts from the details of a given model class and thus allows covering a broad range of logics in a uniform way.
Ultrasonic ranking of toughness of tungsten carbide
Vary, A.; Hull, D. R.
The feasibility of using ultrasonic attenuation measurements to rank tungsten carbide alloys according to their fracture toughness was demonstrated. Six samples of cobalt-cemented tungsten carbide (WC-Co) were examined. These varied in cobalt content from approximately 2 to 16 weight percent. The toughness generally increased with increasing cobalt content. Toughness was first determined by the Palmqvist and short rod fracture toughness tests. Subsequently, ultrasonic attenuation measurements were correlated with both these mechanical test methods. It is shown that there is a strong increase in ultrasonic attenuation corresponding to increased toughness of the WC-Co alloys. A correlation between attenuation and toughness exists for a wide range of ultrasonic frequencies. However, the best correlation for the WC-Co alloys occurs when the attenuation coefficient measured in the vicinity of 100 megahertz is compared with toughness as determined by the Palmqvist technique.
Vexler, Albert; Yu, Jihnhee
In clinical trials examining the incidence of pneumonia it is a common practice to measure infection via both invasive and non-invasive procedures. In the context of a recently completed randomized trial comparing two treatments the invasive procedure was only utilized in certain scenarios due to the added risk involved, and given that the level of the non-invasive procedure surpassed a given threshold. Hence, what was observed was bivariate data with a pattern of missingness in the invasive variable dependent upon the value of the observed non-invasive observation within a given pair. In order to compare two treatments with bivariate observed data exhibiting this pattern of missingness we developed a semi-parametric methodology utilizing the density-based empirical likelihood approach in order to provide a non-parametric approximation to Neyman-Pearson-type test statistics. This novel empirical likelihood approach has both a parametric and non-parametric components. The non-parametric component utilizes the observations for the non-missing cases, while the parametric component is utilized to tackle the case where observations are missing with respect to the invasive variable. The method is illustrated through its application to the actual data obtained in the pneumonia study and is shown to be an efficient and practical method. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Testing for Statistical Discrimination based on Gender
Lesner, Rune Vammen
Query Specific Rank Fusion for Image Retrieval.
Zhang, Shaoting; Yang, Ming; Cour, Timothee; Yu, Kai; Metaxas, Dimitris N
Recently two lines of image retrieval algorithms demonstrate excellent scalability: 1) local features indexed by a vocabulary tree, and 2) holistic features indexed by compact hashing codes. Although both of them are able to search visually similar images effectively, their retrieval precision may vary dramatically among queries. Therefore, combining these two types of methods is expected to further enhance the retrieval precision. However, the feature characteristics and the algorithmic procedures of these methods are dramatically different, which is very challenging for the feature-level fusion. This motivates us to investigate how to fuse the ordered retrieval sets, i.e., the ranks of images, given by multiple retrieval methods, to boost the retrieval precision without sacrificing their scalability. In this paper, we model retrieval ranks as graphs of candidate images and propose a graph-based query specific fusion approach, where multiple graphs are merged and reranked by conducting a link analysis on a fused graph. The retrieval quality of an individual method is measured on-the-fly by assessing the consistency of the top candidates' nearest neighborhoods. Hence, it is capable of adaptively integrating the strengths of the retrieval methods using local or holistic features for different query images. This proposed method does not need any supervision, has few parameters, and is easy to implement. Extensive and thorough experiments have been conducted on four public datasets, i.e., the UKbench, Corel-5K, Holidays and the large-scale San Francisco Landmarks datasets. Our proposed method has achieved very competitive performance, including state-of-the-art results on several data sets, e.g., the N-S score 3.83 for UKbench.
Engineering Students Designing a Statistical Procedure for Quantifying Variability
Hjalmarson, Margret A.
2007-01-01
Lesieur, Thibault; Krzakala, Florent; Zdeborová, Lenka
This article is an extended version of previous work of Lesieur et al (2015 IEEE Int. Symp. on Information Theory Proc. pp 1635-9 and 2015 53rd Annual Allerton Conf. on Communication, Control and Computing (IEEE) pp 680-7) on low-rank matrix estimation in the presence of constraints on the factors into which the matrix is factorized. Low-rank matrix factorization is one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a parallel with the study of vector-spin glass models—presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail a general form of the low-rank approximate message passing (Low-RAMP) algorithm, that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods.
Equivalent statistics and data interpretation.
2016-10-14
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Applying weighted PageRank to author citation networks
Ding, Ying
This paper aims to identify whether different weighted PageRank algorithms can be applied to author citation networks to measure the popularity and prestige of a scholar from a citation perspective. Information Retrieval (IR) was selected as a test field and data from 1956-2008 were collected from Web of Science (WOS). Weighted PageRank with citation and publication as weighted vectors were calculated on author citation networks. The results indicate that both popularity rank and prestige rank were highly correlated with the weighted PageRank. Principal Component Analysis (PCA) was conducted to detect relationships among these different measures. For capturing prize winners within the IR field, prestige rank outperformed all the other measures.
Social Rank, Stress, Fitness, and Life Expectancy in Wild Rabbits
von Holst, Dietrich; Hutzelmeyer, Hans; Kaetzke, Paul; Khaschei, Martin; Schönheiter, Ronald
Bayesian Thurstonian models for ranking data using JAGS.
Johnson, Timothy R; Kuhn, Kristine M
2013-09-01
A Thurstonian model for ranking data assumes that observed rankings are consistent with those of a set of underlying continuous variables. This model is appealing since it renders ranking data amenable to familiar models for continuous response variables-namely, linear regression models. To date, however, the use of Thurstonian models for ranking data has been very rare in practice. One reason for this may be that inferences based on these models require specialized technical methods. These methods have been developed to address computational challenges involved in these models but are not easy to implement without considerable technical expertise and are not widely available in software packages. To address this limitation, we show that Bayesian Thurstonian models for ranking data can be very easily implemented with the JAGS software package. We provide JAGS model files for Thurstonian ranking models for general use, discuss their implementation, and illustrate their use in analyses.