George C. McBane
2006-05-01
Full Text Available A set of FORTRAN subprograms is presented to compute density and cumulative distribution functions and critical values for the range ratio statistics of Dixon (1951, The Annals of Mathematical Statistics These statistics are useful for detection of outliers in small samples.
Outlier detection using autoencoders
Lyudchik, Olga
2016-01-01
Outlier detection is a crucial part of any data analysis applications. The goal of outlier detection is to separate a core of regular observations from some polluting ones, called “outliers”. We propose an outlier detection method using deep autoencoder. In our research the invented method was applied to detect outlier points in the MNIST dataset of handwriting digits. The experimental results show that the proposed method has a potential to be used for anomaly detection.
Local Outlier Detection with Interpretation
Dang, Xuan-Hong; Micenková, Barbora; Assent, Ira
2013-01-01
Outlier detection aims at searching for a small set of objects that are inconsistent or considerably deviating from other objects in a dataset. Existing research focuses on outlier identification while omitting the equally important problem of outlier interpretation. This paper presents a novel...... that this learning task can be solved via the matrix eigen-decomposition and its solution contains essential information to reveal features that are most important to interpret the exceptional properties of outliers. We demonstrate the appealing performance of LODI via a number of synthetic and real world datasets...
Local Outlier Detection with Interpretation
Dang, Xuan-Hong; Micenková, Barbora; Assent, Ira;
2013-01-01
Outlier detection aims at searching for a small set of objects that are inconsistent or considerably deviating from other objects in a dataset. Existing research focuses on outlier identification while omitting the equally important problem of outlier interpretation. This paper presents a novel...... that this learning task can be solved via the matrix eigen-decomposition and its solution contains essential information to reveal features that are most important to interpret the exceptional properties of outliers. We demonstrate the appealing performance of LODI via a number of synthetic and real world datasets...
Multivariate Voronoi Outlier Detection for Time Series.
Zwilling, Chris E; Wang, Michelle Yongmei
2014-10-01
Outlier detection is a primary step in many data mining and analysis applications, including healthcare and medical research. This paper presents a general method to identify outliers in multivariate time series based on a Voronoi diagram, which we call Multivariate Voronoi Outlier Detection (MVOD). The approach copes with outliers in a multivariate framework, via designing and extracting effective attributes or features from the data that can take parametric or nonparametric forms. Voronoi diagrams allow for automatic configuration of the neighborhood relationship of the data points, which facilitates the differentiation of outliers and non-outliers. Experimental evaluation demonstrates that our MVOD is an accurate, sensitive, and robust method for detecting outliers in multivariate time series data.
An enhanced Monte Carlo outlier detection method.
Zhang, Liangxiao; Li, Peiwu; Mao, Jin; Ma, Fei; Ding, Xiaoxia; Zhang, Qi
2015-09-30
Outlier detection is crucial in building a highly predictive model. In this study, we proposed an enhanced Monte Carlo outlier detection method by establishing cross-prediction models based on determinate normal samples and analyzing the distribution of prediction errors individually for dubious samples. One simulated and three real datasets were used to illustrate and validate the performance of our method, and the results indicated that this method outperformed Monte Carlo outlier detection in outlier diagnosis. After these outliers were removed, the value of validation by Kovats retention indices and the root mean square error of prediction decreased from 3.195 to 1.655, and the average cross-validation prediction error decreased from 2.0341 to 1.2780. This method helps establish a good model by eliminating outliers. © 2015 Wiley Periodicals, Inc.
Outlier detection in surveying networks
Preweda, Edward
2014-01-01
The paper refers to the robust estimation methods, which allows to eliminate outliers in surveying networks. Network adjustment is performed by the method of least squares. A key problem is the correct selection of weights, resulting from the different standard deviations of observations. In the case of gross errors their impact on the results of the alignment can be minimized by reducing the weight of outstanding observations. The second solution is the elimination of such observations as th...
A Modified Approach for Detection of Outliers
Iftikhar Hussain Adil
2015-04-01
Full Text Available Tukey’s boxplot is very popular tool for detection of outliers. It reveals the location, spread and skewness of the data. It works nicely for detection of outliers when the data are symmetric. When the data are skewed it covers boundary away from the whisker on the compressed side while declares erroneous outliers on the extended side of the distribution. Hubert and Vandervieren (2008 made adjustment in Tukey’s technique to overcome this problem. However another problem arises that is the adjusted boxplot constructs the interval of critical values which even exceeds from the extremes of the data. In this situation adjusted boxplot is unable to detect outliers. This paper gives solution of this problem and proposed approach detects outliers properly. The validity of the technique has been checked by constructing fences around the true 95% values of different distributions. Simulation technique has been applied by drawing different sample size from chi square, beta and lognormal distributions. Fences constructed by the modified technique are close to the true 95% than adjusted boxplot which proves its superiority on the existing technique.
[Stellar spectral outliers detection based on Isomap].
Bu, Yu-De; Pan, Jing-Chang; Chen, Fu-Qiang
2014-01-01
How to find the spectra misclassified by traditional methods is the key problem that has been widely studied by the experts of astronomical data processing. We found that Isomap algorithm performs well for this problem. By comparing the performance of Isomap with that of principal component analysis (PCA), we found that (1) Isomap can project the spectra with similar features together and project the spectra with different features far away, while PCA may project the spectra with different features into nearby regions; (2) the outliers given by Isomap can be easily determined, and most of the outliers are binary stars with high scientific values; while the outliers given by PCA are difficult to determine and most of outliers are not binary stars. Thus, Isomap is more efficient than PCA in finding the outliers. Since the spectral data used in experiment are the spectra from the ninth data release of Sloan Digital Sky Survey (SDSS DR9), Isomap can find the spectra misclassified by SDSS pipeline efficiently and improve the classification accuracy obviously. Furthermore, since most of the spectra misclassified by SDSS pipeline are binary stars, Isomap can improve the efficiency of finding the binary stars with high scientific values. Though the experiment results show that Isomap is more sensitive to the noise than PCA, this disadvantage will not affect the application of Isomap in spectral classification since most of the spectra with low signal-to-noise ratios are the spectra whose spectral type can't be determined manually.
Outlier detection from ETL Execution trace
Goswami, Saptarsi; Chakrabarti, Amlan
2012-01-01
Extract, Transform, Load (ETL) is an integral part of Data Warehousing (DW) implementation. The commercial tools that are used for this purpose captures lot of execution trace in form of various log files with plethora of information. However there has been hardly any initiative where any proactive analyses have been done on the ETL logs to improve their efficiency. In this paper we utilize outlier detection technique to find the processes varying most from the group in terms of execution trace. As our experiment was carried on actual production processes, any outlier we would consider as a signal rather than a noise. To identify the input parameters for the outlier detection algorithm we employ a survey among developer community with varied mix of experience and expertise. We use simple text parsing to extract these features from the logs, as shortlisted from the survey. Subsequently we applied outlier detection technique (Clustering based) on the logs. By this process we reduced our domain of detailed analy...
Outlier Detection Using Nonconvex Penalized Regression
She, Yiyuan
2010-01-01
This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the $n$ data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual $L_1$ penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The $L_1$ penalty corresponds to soft thresholding. We introduce a thresholding (denoted by $\\Theta$) based iterative procedure for outlier detection ($\\Theta$-IPOD). A version based on hard thresholding correctly identifies outliers on some hard test problems. We find that $\\Theta$-IPOD is much faster than iteratively reweighted least squares for large data because each iteration costs at most $O(np)$ (and sometimes much less) avoiding an $O(np^2)$ least squares estimate. We describe the connection between $\\Theta$-IPOD and $M$-estimators. Our proposed method has one tuning parameter with which to both identify outliers and estimate regression...
The weirdest SDSS galaxies: results from an outlier detection algorithm
Baron, Dalya
2016-01-01
How can we discover objects we did not know existed within the large datasets that now abound in astronomy? We present an outlier detection algorithm that we developed, based on an unsupervised Random Forest. We test the algorithm on more than two million galaxy spectra from the Sloan Digital Sky Survey and examine the 400 galaxies with the highest outlier score. We find objects which have extreme emission line ratios and abnormally strong absorption lines, objects with unusual continua, including extremely reddened galaxies. We find galaxy-galaxy gravitational lenses, double-peaked emission line galaxies, and close galaxy pairs. We find galaxies with high ionisation lines, galaxies which host supernovae, and galaxies with unusual gas kinematics. Only a fraction of the outliers we find were reported by previous studies that used specific and tailored algorithms to find a single class of unusual objects. Our algorithm is general and detects all of these classes, and many more, regardless of what makes them pec...
Outlier Detection in Structural Time Series Models
Marczak, Martyna; Proietti, Tommaso
investigate via Monte Carlo simulations how this approach performs for detecting additive outliers and level shifts in the analysis of nonstationary seasonal time series. The reference model is the basic structural model, featuring a local linear trend, possibly integrated of order two, stochastic seasonality......Structural change affects the estimation of economic signals, like the underlying growth rate or the seasonally adjusted series. An important issue, which has attracted a great deal of attention also in the seasonal adjustment literature, is its detection by an expert procedure. The general...... and a stationary component. Further, we apply both kinds of indicator saturation to detect additive outliers and level shifts in the industrial production series in five European countries....
Comparative Study of Various Techniques on Outlier Detection
K, Varma Mamta and Rajesh Nigam
2011-08-01
Full Text Available Diverse patterns from web data, commonly referred to as web outliers or exceptional cases or noise; exist in many real-world databases. Detection of such outliers is important for numerous applications, such as detecting criminal activities in E-commerce. Outliers are data objects with different characteristics compared to other data objects. Formal definition of outliers is given by D.Hawkins. as “An Outlier is an observation that deviates so much from other observations so that it arouses suspicion that it is generated by a different mechanism”. Detection of such outliers (outlier mining is important for numerous applications, such as detecting criminal activities in E-commerce, video surveillance, weather prediction, intrusion detection and pharmaceutical research. This paper has focus on comparative study of various techniques on Outlier Detection.
Outlier detection in the GARCH (1,1) model
Ph.H.B.F. Franses (Philip Hans); D.J.C. van Dijk (Dick)
1999-01-01
textabstractIn this paper the issue of detecting and handling outliers in the GARCH(1,1) model is addressed. Simulation evidence shows that neglecting even a single outlier has a dramatic on parameter estimates. To detect and correct for outliers, we propose an adaptation of the iterative in Chen an
Detection Procedure for a Single Additive Outlier and Innovational Outlier in a Bilinear Model
Azami Zaharim
2007-01-01
Full Text Available A single outlier detection procedure for data generated from BL(1,1,1,1 models is developed. It is carried out in three stages. Firstly, the measure of impact of an IO and AO denoted by IO ω , AO ω , respectively are derived based on least squares method. Secondly, test statistics and test criteria are defined for classifying an observation as an outlier of its respective type. Finally, a general single outlier detection procedure is presented to distinguish a particular type of outlier at a time point t.
Outlier Detection Techniques For Wireless Sensor Networks: A Survey
Zhang, Y.; Meratnia, Nirvana; Havinga, Paul J.M.
2008-01-01
In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are
Outlier Edge Detection Using Random Graph Generation Models and Applications
Zhang, Honglei; Gabbouj, Moncef
2016-01-01
Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Prefe...
Statistical Outlier Detection for Jury Based Grading Systems
Thompson, Mary Kathryn; Clemmensen, Line Katrine Harder; Rosas, Harvey
2013-01-01
This paper presents an algorithm that was developed to identify statistical outliers from the scores of grading jury members in a large project-based first year design course. The background and requirements for the outlier detection system are presented. The outlier detection algorithm...... and the follow-up procedures for score validation and appeals are described in detail. Finally, the impact of various elements of the outlier detection algorithm, their interactions, and the sensitivity of their numerical values are investigated. It is shown that the difference in the mean score produced...... by a grading jury before and after a suspected outlier is removed from the mean is the single most effective criterion for identifying potential outliers but that all of the criteria included in the algorithm have an effect on the outlier detection process....
Detection of Outliers methods in medical studies
Babaee Gh
2007-09-01
Full Text Available Background: An outlier is an observation that lies an abnormal distance from other values in a random sample from a population. Outliers sometimes deal with to abnormality in obtained results from collected data and information. known outlier data by researchers, physicians and other persons that work in medical fields and sciences is important and they must control data before getting result about outlier data, effect of them in information bias and how to remove & control to obtain minimum bias and exact data .in this paper we had trying by known technique and tests to control them and minimized the errors related to them.Methods: This paper has been done on 30 student's height in Tarbiat Modares University that measured by meter in smoothing area. We applied some methods such as; Z-test, Grub test and graphical methods to determine outliers. In this paper the advantage and disadvantage of methods were evaluated and finally compares with each other.Results: The above tests showed that the data values 153, 110 among collected data were outliers. All of the methods showed that the above data were outliers. Calculation quartiles and intermediate quartiles showed that the observations under 125 and upper 141 were mind outliers and if the observation under 119 and upper 147 is the sever outliers. According to upper situations the amounts of 110 and 153 is the sever outliers and resulted from all methods.Conclusion: The results showed that all methods were useful in determine outlier data and between them Quartiles were important to known severe and mild outliers. Also Grub test with p-Value is very useful to report outliers.
Query-Based Outlier Detection in Heterogeneous Information Networks
Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei
2015-01-01
Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397
Outlier Detection Method in Linear Regression Based on Sum of Arithmetic Progression
Adikaram, K. K. L. B.; Hussein, M. A.; Effenberger, M.; Becker, T.
2014-01-01
We introduce a new nonparametric outlier detection method for linear series, which requires no missing or removed data imputation. For an arithmetic progression (a series without outliers) with n elements, the ratio (R) of the sum of the minimum and the maximum elements and the sum of all elements is always 2/n : (0,1]. R ≠ 2/n always implies the existence of outliers. Usually, R outlier, and R > 2/n implies that the maximum is an outlier. Based upon this, we derived a new method for identifying significant and nonsignificant outliers, separately. Two different techniques were used to manage missing data and removed outliers: (1) recalculate the terms after (or before) the removed or missing element while maintaining the initial angle in relation to a certain point or (2) transform data into a constant value, which is not affected by missing or removed elements. With a reference element, which was not an outlier, the method detected all outliers from data sets with 6 to 1000 elements containing 50% outliers which deviated by a factor of ±1.0e − 2 to ±1.0e + 2 from the correct value. PMID:25121139
Stratification-Based Outlier Detection over the Deep Web
Xuefeng Xian; Pengpeng Zhao; Victor S. Sheng; Ligang Fang; Caidong Gu; Yuanfeng Yang; Zhiming Cui
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribu...
Stratification-Based Outlier Detection over the Deep Web
Xuefeng Xian; Pengpeng Zhao; Sheng, Victor S.; Ligang Fang; Caidong Gu; Yuanfeng Yang; Zhiming Cui
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribu...
Penalized Weighted Least Squares for Outlier Detection and Robust Regression
Gao, Xiaoli; Fang, Yixin
2016-01-01
To conduct regression analysis for data contaminated with outliers, many approaches have been proposed for simultaneous outlier detection and robust regression, so is the approach proposed in this manuscript. This new approach is called "penalized weighted least squares" (PWLS). By assigning each observation an individual weight and incorporating a lasso-type penalty on the log-transformation of the weight vector, the PWLS is able to perform outlier detection and robust regression simultaneou...
Using Person Fit Statistics to Detect Outliers in Survey Research.
Felt, John M; Castaneda, Ruben; Tiemensma, Jitske; Depaoli, Sarah
2017-01-01
Context: When working with health-related questionnaires, outlier detection is important. However, traditional methods of outlier detection (e.g., boxplots) can miss participants with "atypical" responses to the questions that otherwise have similar total (subscale) scores. In addition to detecting outliers, it can be of clinical importance to determine the reason for the outlier status or "atypical" response. Objective: The aim of the current study was to illustrate how to derive person fit statistics for outlier detection through a statistical method examining person fit with a health-based questionnaire. Design and Participants: Patients treated for Cushing's syndrome (n = 394) were recruited from the Cushing's Support and Research Foundation's (CSRF) listserv and Facebook page. Main Outcome Measure: Patients were directed to an online survey containing the CushingQoL (English version). A two-dimensional graded response model was estimated, and person fit statistics were generated using the Zh statistic. Results: Conventional outlier detections methods revealed no outliers reflecting extreme scores on the subscales of the CushingQoL. However, person fit statistics identified 18 patients with "atypical" response patterns, which would have been otherwise missed (Zh > |±2.00|). Conclusion: While the conventional methods of outlier detection indicated no outliers, person fit statistics identified several patients with "atypical" response patterns who otherwise appeared average. Person fit statistics allow researchers to delve further into the underlying problems experienced by these "atypical" patients treated for Cushing's syndrome. Annotated code is provided to aid other researchers in using this method.
Outlier Mining Based Abnormal Machine Detection in Intelligent Maintenance
ZHANG Lei; CAO Qi-xin; LEE Jay
2009-01-01
Assessing machine's performance through comparing the same or similar machines is important to implement intelligent maintenance for swarm machine. In this paper, an outlier mining based abnormal machine detection algorithm is proposed for this purpose. Firstly, the outlier mining based on clustering is introduced and the definition of cluster-based global outlier factor (CBGOF) is presented. Then the modified swarm intelligence clustering(MSIC) algorithm is suggested and the outlier mining algorithm based on MSIC is proposed. The algorithm can not only cluster machines according to their performance but also detect possible abnormal machines. Finally, a comparison of mobile soccer robots' performance proves the algorithm is feasible and effective.
The weirdest SDSS galaxies: results from an outlier detection algorithm
Baron, Dalya; Poznanski, Dovi
2017-03-01
How can we discover objects we did not know existed within the large data sets that now abound in astronomy? We present an outlier detection algorithm that we developed, based on an unsupervised Random Forest. We test the algorithm on more than two million galaxy spectra from the Sloan Digital Sky Survey and examine the 400 galaxies with the highest outlier score. We find objects which have extreme emission line ratios and abnormally strong absorption lines, objects with unusual continua, including extremely reddened galaxies. We find galaxy-galaxy gravitational lenses, double-peaked emission line galaxies and close galaxy pairs. We find galaxies with high ionization lines, galaxies that host supernovae and galaxies with unusual gas kinematics. Only a fraction of the outliers we find were reported by previous studies that used specific and tailored algorithms to find a single class of unusual objects. Our algorithm is general and detects all of these classes, and many more, regardless of what makes them peculiar. It can be executed on imaging, time series and other spectroscopic data, operates well with thousands of features, is not sensitive to missing values and is easily parallelizable.
Detection of Outliers in Spatial-Temporal Data
Rogers, James P.
2010-01-01
Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems. …
Stratification-Based Outlier Detection over the Deep Web
Xuefeng Xian
2016-01-01
Full Text Available For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Stratification-Based Outlier Detection over the Deep Web.
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.
Outliers detection in multivariate time series by independent component analysis.
Baragona, Roberto; Battaglia, Francesco
2007-07-01
In multivariate time series, outlying data may be often observed that do not fit the common pattern. Occurrences of outliers are unpredictable events that may severely distort the analysis of the multivariate time series. For instance, model building, seasonality assessment, and forecasting may be seriously affected by undetected outliers. The structure dependence of the multivariate time series gives rise to the well-known smearing and masking phenomena that prevent using most outliers' identification techniques. It may be noticed, however, that a convenient way for representing multiple outliers consists of superimposing a deterministic disturbance to a gaussian multivariate time series. Then outliers may be modeled as nongaussian time series components. Independent component analysis is a recently developed tool that is likely to be able to extract possible outlier patterns. In practice, independent component analysis may be used to analyze multivariate observable time series and separate regular and outlying unobservable components. In the factor models framework too, it is shown that independent component analysis is a useful tool for detection of outliers in multivariate time series. Some algorithms that perform independent component analysis are compared. It has been found that all algorithms are effective in detecting various types of outliers, such as patches, level shifts, and isolated outliers, even at the beginning or the end of the stretch of observations. Also, there is no appreciable difference in the ability of different algorithms to display the outlying observations pattern.
Spatial Outlier Detection of CO2 Monitoring Data Based on Spatial Local Outlier Factor
Liu Xin
2015-12-01
Full Text Available Spatial local outlier factor (SLOF algorithm was adopted in this study for spatial outlier detection because of the limitations of the traditional static threshold detection. Based on the spatial characteristics of CO2 monitoring data obtained in the carbon capture and storage (CCS project, the K-Nearest Neighbour (KNN graph was constructed using the latitude and longitude information of the monitoring points to identify the spatial neighbourhood of the monitoring points. Then SLOF was adopted to calculate the outlier degrees of the monitoring points and the 3σ rule was employed to identify the spatial outlier. Finally, the selection of K value was analysed and the optimal one was selected. The results show that, compared with the static threshold method, the proposed algorithm has a higher detection precision. It can overcome the shortcomings of the static threshold method and improve the accuracy and diversity of local outlier detection, which provides a reliable reference for the safety assessment and warning of CCS monitoring.
Detection of outliers in reference distributions: performance of Horn's algorithm.
Solberg, Helge Erik; Lahti, Ari
2005-12-01
Medical laboratory reference data may be contaminated with outliers that should be eliminated before estimation of the reference interval. A statistical test for outliers has been proposed by Paul S. Horn and coworkers (Clin Chem 2001;47:2137-45). The algorithm operates in 2 steps: (a) mathematically transform the original data to approximate a gaussian distribution; and (b) establish detection limits (Tukey fences) based on the central part of the transformed distribution. We studied the specificity of Horn's test algorithm (probability of false detection of outliers), using Monte Carlo computer simulations performed on 13 types of probability distributions covering a wide range of positive and negative skewness. Distributions with 3% of the original observations replaced by random outliers were used to also examine the sensitivity of the test (probability of detection of true outliers). Three data transformations were used: the Box and Cox function (used in the original Horn's test), the Manly exponential function, and the John and Draper modulus function. For many of the probability distributions, the specificity of Horn's algorithm was rather poor compared with the theoretical expectation. The cause for such poor performance was at least partially related to remaining nongaussian kurtosis (peakedness). The sensitivity showed great variation, dependent on both the type of underlying distribution and the location of the outliers (upper and/or lower tail). Although Horn's algorithm undoubtedly is an improvement compared with older methods for outlier detection, reliable statistical identification of outliers in reference data remains a challenge.
Outlier Detection for DNA Fragment Assembly
Boucher, Christina; Lokshantov, Daniel
2011-01-01
Given $n$ length-$\\ell$ strings $S =\\{s_1, ..., s_n\\}$ over a constant size alphabet $\\Sigma$ together with parameters $d$ and $k$, the objective in the {\\em Consensus String with Outliers} problem is to find a subset $S^*$ of $S$ of size $n-k$ and a string $s$ such that $\\sum_{s_i \\in S^*} d(s_i, s) \\leq d$. Here $d(x, y)$ denotes the Hamming distance between the two strings $x$ and $y$. We prove 1. a variant of {\\em Consensus String with Outliers} where the number of outliers $k$ is fixed and the objective is to minimize the total distance $\\sum_{s_i \\in S^*} d(s_i, s)$ admits a simple PTAS. (ii) Under the natural assumption that the number of outliers $k$ is small, the PTAS for the distance minimization version of {\\em Consensus String with Outliers} performs well. In particular, as long as $k\\leq cn$ for a fixed constant $c < 1$, the algorithm provides a $(1+\\epsilon)$-approximate solution in time $f(1/\\epsilon)(n\\ell)^{O(1)}$ and thus, is an EPTAS. 2. In order to improve the PTAS for {\\em Consensus St...
Outlier Detection and Explanation for Domain Experts
Micenková, Barbora
to poor overall performance. Furthermore, in many applications some labeled examples of outliers are available but not sufficient enough in number as training data for standard supervised learning methods. As such, this valuable information is typically ignored. We introduce a new paradigm for outlier...... to a supervised classifier. The resulting method is robust to parameters and as such it can be easily applied to data by non-experts in data mining. We also consider the case where computational resources at test time are limited and introduce a feature selection technique that respects a computational budget...
Algorithms for Speeding up Distance-Based Outlier Detection
National Aeronautics and Space Administration — The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address...
Detecting outliers in multivariate data while controlling false alarm rate
André Achim
2012-06-01
Full Text Available Outlier identification often implies inspecting each z-transformed variable and adding a Mahalanobis D^2. Multiple outliers may mask each other by increasing variance estimates. Caroni and Prescott (1992 proposed a multivariate extension of Rosners (1983 technique to circumvent masking, taking sample size into account to keep the false alarm risk below, say, alpha = .05. Simulations studies here compare the single multivariate approach to "multiple-univariate plus multivariate" tests, each at a Bonferroni corrected alpha level, in terms of power at detecting outliers. Results suggest the former is better only up to about 12 variables. Macros in an Excel spreadsheet implement these techniques.
Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?
Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas
2011-01-01
Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…
Detection of additive outliers in seasonal time series
Haldrup, Niels; Montañés, Antonio; Sansó, Andreu
The detection and location of additive outliers in integrated variables has attracted much attention recently because such outliers tend to affect unit root inference among other things. Most of these procedures have been developed for non-seasonal processes. However, the presence of seasonality...... in the form of seasonally varying means and variances affect the properties of outlier detection procedures, and hence appropriate adjustments of existing methods are needed for seasonal data. In this paper we suggest modifications of tests proposed by Shin et al. (1996) and Perron and Rodriguez (2003......) to deal with data sampled at a seasonal frequency and the size and power properties are discussed. We also show that the presence of periodic heteroscedasticity will inflate the size of the tests and hence will tend to identify an excessive number of outliers. A modified Perron-Rodriguez test which allows...
Outlier Detection with Space Transformation and Spectral Analysis
Dang, Xuan-Hong; Micenková, Barbora; Assent, Ira
2013-01-01
Detecting a small number of outliers from a set of data observations is always challenging. In this paper, we present an approach that exploits space transformation and uses spectral analysis in the newly transformed space for outlier detection. Unlike most existing techniques in the literature...... benefits the process of mapping data into a usually lower dimensional space. Outliers are then identified by spectral analysis of the eigenspace spanned by the set of leading eigenvectors derived from the mapping procedure. The proposed technique is purely data-driven and imposes no assumptions regarding...... the data distribution, making it particularly suitable for identification of outliers from irregular, non-convex shaped distributions and from data with diverse, varying densities....
Outlier Detection Method Use for the Network Flow Anomaly Detection
Rimas Ciplinskas
2016-06-01
Full Text Available New and existing methods of cyber-attack detection are constantly being developed and improved because there is a great number of attacks and the demand to protect from them. In prac-tice, current methods of attack detection operates like antivirus programs, i. e. known attacks signatures are created and attacks are detected by using them. These methods have a drawback – they cannot detect new attacks. As a solution, anomaly detection methods are used. They allow to detect deviations from normal network behaviour that may show a new type of attack. This article introduces a new method that allows to detect network flow anomalies by using local outlier factor algorithm. Accom-plished research allowed to identify groups of features which showed the best results of anomaly flow detection according the highest values of precision, recall and F-measure.
WeirdestGalaxies: Outlier Detection Algorithm on Galaxy Spectra
Baron, Dalya; Poznanski, Dovi
2017-05-01
WeirdestGalaxies finds the weirdest galaxies in the Sloan Digital Sky Survey (SDSS) by using a basic outlier detection algorithm. It uses an unsupervised Random Forest (RF) algorithm to assign a similarity measure (or distance) between every pair of galaxy spectra in the SDSS. It then uses the distance matrix to find the galaxies that have the largest distance, on average, from the rest of the galaxies in the sample, and defined them as outliers.
Distance Based Method for Outlier Detection of Body Sensor Networks
Haibin Zhang
2016-01-01
Full Text Available We propose a distance based method for the outlier detection of body sensor networks. Firstly, we use a Kernel Density Estimation (KDE to calculate the probability of the distance to k nearest neighbors for diagnosed data. If the probability is less than a threshold, and the distance of this data to its left and right neighbors is greater than a pre-defined value, the diagnosed data is decided as an outlier. Further, we formalize a sliding window based method to improve the outlier detection performance. Finally, to estimate the KDE by training sensor readings with errors, we introduce a Hidden Markov Model (HMM based method to estimate the most probable ground truth values which have the maximum probability to produce the training data. Simulation results show that the proposed method possesses a good detection accuracy with a low false alarm rate.
Multivariate Outlier Detection in Genetic Evaluation in Nordic Jersey Cattle
Gao, Hongding; Madsen, Per; Pösö, Jukka
A procedure was developed for detection of multivariate outliers based on an approximation for Mahanalobis Distance (MD) and was implemented in the Nordic Jersey population. Evaluations are carried out by Nordic Cattle Genetic Evaluation (NAV), who uses a 9 trait model for milk, protein and fat...... means and co-variance matrix for the actual PY, lactation and DIM. Accuracy of EBV’s is improved for animals having extreme outlier record(s) deleted compared to EBV’s based on data not filtered for MD....
Multivariate Outlier Detection in Genetic Evaluation in Nordic Jersey Cattle
Gao, Hongding; Madsen, Per; Pösö, Jukka
A procedure was developed for detection of multivariate outliers based on an approximation for Mahanalobis Distance (MD) and was implemented in the Nordic Jersey population. Evaluations are carried out by Nordic Cattle Genetic Evaluation (NAV), who uses a 9 trait model for milk, protein and fat...
Hyperellipsoidal SVM-Based Outlier Detection Technique for Geosensor Networks
Zhang, Yang; Meratnia, N.; Havinga, P.J.M.
2009-01-01
Recently, wireless sensor networks providing fine-grained spatio-temporal observations have become one of the major monitoring platforms for geo-applications. Along side data acquisition, outlier detection is essential in geosensor networks to ensure data quality, secure monitoring and re- liable de
Shape based kinetic outlier detection in real-time PCR
D'Atri Mario
2010-04-01
Full Text Available Abstract Background Real-time PCR has recently become the technique of choice for absolute and relative nucleic acid quantification. The gold standard quantification method in real-time PCR assumes that the compared samples have similar PCR efficiency. However, many factors present in biological samples affect PCR kinetic, confounding quantification analysis. In this work we propose a new strategy to detect outlier samples, called SOD. Results Richards function was fitted on fluorescence readings to parameterize the amplification curves. There was not a significant correlation between calculated amplification parameters (plateau, slope and y-coordinate of the inflection point and the Log of input DNA demonstrating that this approach can be used to achieve a "fingerprint" for each amplification curve. To identify the outlier runs, the calculated parameters of each unknown sample were compared to those of the standard samples. When a significant underestimation of starting DNA molecules was found, due to the presence of biological inhibitors such as tannic acid, IgG or quercitin, SOD efficiently marked these amplification profiles as outliers. SOD was subsequently compared with KOD, the current approach based on PCR efficiency estimation. The data obtained showed that SOD was more sensitive than KOD, whereas SOD and KOD were equally specific. Conclusion Our results demonstrated, for the first time, that outlier detection can be based on amplification shape instead of PCR efficiency. SOD represents an improvement in real-time PCR analysis because it decreases the variance of data thus increasing the reliability of quantification.
Constructing Three-Dimension Space Graph for Outlier Detection Algorithms in Data Mining
ZHANG Jing; SUN Zhi-hui
2004-01-01
Outlier detection has very important applied value in data mining literature.Different outlier detection algorithms based on distinct theories have different definitions and mining processes.The three-dimensional space graph for constructing applied algorithms and an improved GridOf algorithm were proposed in terms of analyzing the existing outlier detection algorithms from criterion and theory.
Detecting Outlier Microarray Arrays by Correlation and Percentage of Outliers Spots
Song Yang
2006-01-01
Full Text Available We developed a quality assurance (QA tool, namely microarray outlier filter (MOF, and have applied it to our microarray datasets for the identification of problematic arrays. Our approach is based on the comparison of the arrays using the correlation coefficient and the number of outlier spots generated on each array to reveal outlier arrays. For a human universal reference (HUR dataset, which is used as a technical control in our standard hybridization procedure, 3 outlier arrays were identified out of 35 experiments. For a human blood dataset, 12 outlier arrays were identified from 185 experiments. In general, arrays from human blood samples displayed greater variation in their gene expression profiles than arrays from HUR samples. As a result, MOF identified two distinct patterns in the occurrence of outlier arrays. These results demonstrate that this methodology is a valuable QA practice to identify questionable microarray data prior to downstream analysis.
Electricity Price Forecasting Based on AOSVR and Outlier Detection
Zhou Dianmin; Gao Lin; Gao Feng
2005-01-01
Electricity price is of the first consideration for all the participants in electric power market and its characteristics are related to both market mechanism and variation in the behaviors of market participants. It is necessary to build a real-time price forecasting model with adaptive capability; and because there are outliers in the price data, they should be detected and filtrated in training the forecasting model by regression method. In view of these points, this paper presents an electricity price forecasting method based on accurate on-line support vector regression (AOSVR) and outlier detection. Numerical testing results show that the method is effective in forecasting the electricity prices in electric power market.
The ANZROD model: better benchmarking of ICU outcomes and detection of outliers.
Paul, Eldho; Bailey, Michael; Kasza, Jessica; Pilcher, David
2016-03-01
To compare the impact of the 2013 Australian and New Zealand Risk of Death (ANZROD) model and the 2002 Acute Physiology and Chronic Health Evaluation (APACHE) III-j model as risk-adjustment tools for benchmarking performance and detecting outliers in Australian and New Zealand intensive care units. Data were extracted from the Australian and New Zealand Intensive Care Society Adult Patient Database for all ICUs that contributed data between 1 January 2010 and 31 December 2013. Annual standardised mortality ratios (SMRs) were calculated for ICUs using the ANZROD and APACHE III-j models. They were plotted on funnel plots separately for each hospital type, with ICUs above the upper 99.8% control limit considered as potential outliers with worse performance than their peer group. Overdispersion parameters were estimated for both models. Overall fit was assessed using the Akaike information criterion (AIC) and Bayesian information criterion (BIC). Outlier association with mortality was assessed using a logistic regression model. The ANZROD model identified more outliers than the APACHE III-j model during the study period. The numbers of outliers in rural, metropolitan, tertiary and private hospitals identified by the ANZROD model were 3, 2, 6 and 6, respectively; and those identified by the APACHE III-j model were 2, 0, 1 and 1, respectively. The degree of overdispersion was less for the ANZROD model compared with the APACHE III-j model in each year. The ANZROD model showed better overall fit to the data, with smaller AIC and BIC values than the APACHE III-j model. Outlier ICUs identified using the ANZROD model were more strongly associated with increased mortality. The ANZROD model reduces variability in SMRs due to casemix, as measured by overdispersion, and facilitates more consistent identification of true outlier ICUs, compared with the APACHE III-j model.
Outlier estimation and detection: Application to Skin Lesion Classification
Sigurdsson, Sigurdur; Larsen, Jan; Hansen, Lars Kai
2002-01-01
We extend MacKay's (1992) Bayesian approach to neural classifiers to include an outlier detector mechanism. We show that the outlier detector can locate misclassified samples......We extend MacKay's (1992) Bayesian approach to neural classifiers to include an outlier detector mechanism. We show that the outlier detector can locate misclassified samples...
Okada, Sachiko; Nagase, Keisuke; Ito, Ayako; Ando, Fumihiko; Nakagawa, Yoshiaki; Okamoto, Kazuya; Kume, Naoto; Takemura, Tadamasa; Kuroda, Tomohiro; Yoshihara, Hiroyuki
2014-01-01
Comparison of financial indices helps to illustrate differences in operations and efficiency among similar hospitals. Outlier data tend to influence statistical indices, and so detection of outliers is desirable. Development of a methodology for financial outlier detection using information systems will help to reduce the time and effort required, eliminate the subjective elements in detection of outlier data, and improve the efficiency and quality of analysis. The purpose of this research was to develop such a methodology. Financial outliers were defined based on a case model. An outlier-detection method using the distances between cases in multi-dimensional space is proposed. Experiments using three diagnosis groups indicated successful detection of cases for which the profitability and income structure differed from other cases. Therefore, the method proposed here can be used to detect outliers. Copyright © 2013 John Wiley & Sons, Ltd.
A. Bhushan
2015-07-01
Full Text Available In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.
Prediction of heterogeneous differential genes by detecting outliers to a Gaussian tight cluster.
Yang, Zihua; Yang, Zhengrong
2013-03-05
Heterogeneously and differentially expressed genes (hDEG) are a common phenomenon due to bio-logical diversity. A hDEG is often observed in gene expression experiments (with two experimental conditions) where it is highly expressed in a few experimental samples, or in drug trial experiments for cancer studies with drug resistance heterogeneity among the disease group. These highly expressed samples are called outliers. Accurate detection of outliers among hDEGs is then desirable for dis- ease diagnosis and effective drug design. The standard approach for detecting hDEGs is to choose the appropriate subset of outliers to represent the experimental group. However, existing methods typically overlook hDEGs with very few outliers. We present in this paper a simple algorithm for detecting hDEGs by sequentially testing for potential outliers with respect to a tight cluster of non- outliers, among an ordered subset of the experimental samples. This avoids making any restrictive assumptions about how the outliers are distributed. We use simulated and real data to illustrate that the proposed algorithm achieves a good separation between the tight cluster of low expressions and the outliers for hDEGs. The proposed algorithm assesses each potential outlier in relation to the cluster of potential outliers without making explicit assumptions about the outlier distribution. Simulated examples and and breast cancer data sets are used to illustrate the suitability of the proposed algorithm for identifying hDEGs with small numbers of outliers.
Strategies for Identification and Detection of Outliers in Multiple Regression.
Vannoy, Martha
Outliers are frequently found in data sets and can cause problems for researchers if not addressed. Failure to identify and deal with outliers in an appropriate manner may lead researchers to report erroneous results. Using a multiple regression context, this paper examines some of the reasons for the presence of outliers and simple methods for…
Outlier Detection Techniques for SQL and ETL Tuning
Goswami, Saptarsi; Chakrabarti, Amlan
2012-01-01
RDBMS is the heart for both OLTP and OLAP types of applications. For both types of applications thousands of queries expressed in terms of SQL are executed on daily basis. All the commercial DBMS engines capture various attributes in system tables about these executed queries. These queries need to conform to best practices and need to be tuned to ensure optimal performance. While we use checklists, often tools to enforce the same, a black box technique on the queries for profiling, outlier detection is not employed for a summary level understanding. This is the motivation of the paper, as this not only points out to inefficiencies built in the system, but also has the potential to point evolving best practices and inappropriate usage. Certainly this can reduce latency in information flow and optimal utilization of hardware and software capacity. In this paper we start with formulating the problem. We explore four outlier detection techniques. We apply these techniques over rich corpora of production queries ...
Time Series Outlier Detection Based on Sliding Window Prediction
Yufeng Yu
2014-01-01
Full Text Available In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction confidence interval (PCI, which can be calculated by the predicted value and confidence coefficient. The use of PCI as threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis.
Performance Evaluation of Density-Based Outlier Detection on High Dimensional Data
P. Murugavel
2013-02-01
Full Text Available Outlier detection is a task that finds objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data. Outlier detection has wide applications which include data analysis, financial fraud detection, network intrusion detection and clinical diagnosis of diseases. In data analysis applications, outliers are often considered as error or noise and are removed once detected. Approaches to detect and remove outliers have been studied by several researchers. Density based approaches have been proved to be effective in detecting outliers successfully, but usually requires huge amount of computations. In this paper, two approaches that enhance the traditional density based method for removing outliers are analyzed. The first method uses data partitioning method and usespeed up strategies to avoid large computations. The second method presents a unified clustering and outlier detection using Neighbourhood based Local Density Factor (NLDF. The aim of both the models is to improve the performance of outlier detection, clustering and to speed up the whole process. In this paper, the working of these two papers is studied and a performance evaluation based on clustering efficiency and outlier detection efficiency is presented.
On damage detection in wind turbine gearboxes using outlier analysis
Antoniadou, Ifigeneia; Manson, Graeme; Dervilis, Nikolaos; Staszewski, Wieslaw J.; Worden, Keith
2012-04-01
The proportion of worldwide installed wind power in power systems increases over the years as a result of the steadily growing interest in renewable energy sources. Still, the advantages offered by the use of wind power are overshadowed by the high operational and maintenance costs, resulting in the low competitiveness of wind power in the energy market. In order to reduce the costs of corrective maintenance, the application of condition monitoring to gearboxes becomes highly important, since gearboxes are among the wind turbine components with the most frequent failure observations. While condition monitoring of gearboxes in general is common practice, with various methods having been developed over the last few decades, wind turbine gearbox condition monitoring faces a major challenge: the detection of faults under the time-varying load conditions prevailing in wind turbine systems. Classical time and frequency domain methods fail to detect faults under variable load conditions, due to the temporary effect that these faults have on vibration signals. This paper uses the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. A simplified two-degree-of-freedom gearbox model considering nonlinear backlash, time-periodic mesh stiffness and static transmission error, simulates the vibration signals to be analysed. Local stiffness reduction is used for the simulation of tooth faults and statistical processes determine the existence of intermittencies. The lowest level of fault detection, the threshold value, is considered and the Mahalanobis squared-distance is calculated for the novelty detection problem.
Higham, J. E.; Brevis, W.; Keylock, C. J.
2016-12-01
The present work proposes a novel method of detection and estimation of outliers in particle image velocimetry measurements by the modification of the temporal coefficients associated with a proper orthogonal decomposition of an experimental time series. Using synthetic outliers applied to two sequences of vector fields, the method is benchmarked against state-of-the-art approaches recently proposed to remove the influence of outliers. Compared with these methods, the proposed approach offers an increase in accuracy and robustness for the detection of outliers and comparable accuracy for their estimation.
Tsunami arrival time detection system applicable to discontinuous time series data with outliers
Lee, Jun-Whan; Park, Sun-Cheon; Lee, Duk Kee; Lee, Jong Ho
2016-12-01
Timely detection of tsunamis with water level records is a critical but logistically challenging task because of outliers and gaps. Since tsunami detection algorithms require several hours of past data, outliers could cause false alarms, and gaps can stop the tsunami detection algorithm even after the recording is restarted. In order to avoid such false alarms and time delays, we propose the Tsunami Arrival time Detection System (TADS), which can be applied to discontinuous time series data with outliers. TADS consists of three algorithms, outlier removal, gap filling, and tsunami detection, which are designed to update whenever new data are acquired. After calibrating the thresholds and parameters for the Ulleung-do surge gauge located in the East Sea (Sea of Japan), Korea, the performance of TADS was discussed based on a 1-year dataset with historical tsunamis and synthetic tsunamis. The results show that the overall performance of TADS is effective in detecting a tsunami signal superimposed on both outliers and gaps.
On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued
Campos, G. O.; Zimek, A.; Sander, J.;
2016-01-01
The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate...
Adjusted functional boxplots for spatio-temporal data visualization and outlier detection
Sun, Ying
2011-10-24
This article proposes a simulation-based method to adjust functional boxplots for correlations when visualizing functional and spatio-temporal data, as well as detecting outliers. We start by investigating the relationship between the spatio-temporal dependence and the 1.5 times the 50% central region empirical outlier detection rule. Then, we propose to simulate observations without outliers on the basis of a robust estimator of the covariance function of the data. We select the constant factor in the functional boxplot to control the probability of correctly detecting no outliers. Finally, we apply the selected factor to the functional boxplot of the original data. As applications, the factor selection procedure and the adjusted functional boxplots are demonstrated on sea surface temperatures, spatio-temporal precipitation and general circulation model (GCM) data. The outlier detection performance is also compared before and after the factor adjustment. © 2011 John Wiley & Sons, Ltd.
Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor
2017-05-01
Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.
Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection
Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas
2011-01-01
Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…
An Unbiased Distance-based Outlier Detection Approach for High-dimensional Data
Nguyen, Hoang Vu; Gopalkrishnan, Vivekanand; Assent, Ira
2011-01-01
Traditional outlier detection techniques usually fail to work efficiently on high-dimensional data due to the curse of dimensionality. This work proposes a novel method for subspace outlier detection, that specifically deals with multidimensional spaces where feature relevance is a local rather...... than a global property. Different from existing approaches, it is not grid-based and dimensionality unbiased. Thus, its performance is impervious to grid resolution as well as the curse of dimensionality. In addition, our approach ranks the outliers, allowing users to select the number of desired...... outliers, thus mitigating the issue of high false alarm rate. Extensive empirical studies on real datasets show that our approach efficiently and effectively detects outliers, even in high-dimensional spaces....
On the Evaluation of Outlier Detection and One-Class Classification Methods
Swersky, Lorne; Marques, Henrique O.; Sander, Jörg
2016-01-01
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies...... in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. Our experiments led to conclusions...
Exploratory functional flood frequency analysis and outlier detection
Chebana, Fateh; Dabo-Niang, Sophie; Ouarda, Taha B. M. J.
2012-04-01
The prevention of flood risks and the effective planning and management of water resources require river flows to be continuously measured and analyzed at a number of stations. For a given station, a hydrograph can be obtained as a graphical representation of the temporal variation of flow over a period of time. The information provided by the hydrograph is essential to determine the severity of extreme events and their frequencies. A flood hydrograph is commonly characterized by its peak, volume, and duration. Traditional hydrological frequency analysis (FA) approaches focused separately on each of these features in a univariate context. Recent multivariate approaches considered these features jointly in order to take into account their dependence structure. However, all these approaches are based on the analysis of a number of characteristics and do not make use of the full information content of the hydrograph. The objective of the present work is to propose a new framework for FA using the hydrographs as curves: functional data. In this context, the whole hydrograph is considered as one infinite-dimensional observation. This context allows us to provide more effective and efficient estimates of the risk associated with extreme events. The proposed approach contributes to addressing the problem of lack of data commonly encountered in hydrology by fully employing all the information contained in the hydrographs. A number of functional data analysis tools are introduced and adapted to flood FA with a focus on exploratory analysis as a first stage toward a complete functional flood FA. These methods, including data visualization, location and scale measures, principal component analysis, and outlier detection, are illustrated in a real-world flood analysis case study from the province of Quebec, Canada.
Outlier detection for particle image velocimetry data using a locally estimated noise variance
Lee, Yong; Yang, Hua; Yin, ZhouPing
2017-03-01
This work describes an adaptive spatial variable threshold outlier detection algorithm for raw gridded particle image velocimetry data using a locally estimated noise variance. This method is an iterative procedure, and each iteration is composed of a reference vector field reconstruction step and an outlier detection step. We construct the reference vector field using a weighted adaptive smoothing method (Garcia 2010 Comput. Stat. Data Anal. 54 1167-78), and the weights are determined in the outlier detection step using a modified outlier detector (Ma et al 2014 IEEE Trans. Image Process. 23 1706-21). A hard decision on the final weights of the iteration can produce outlier labels of the field. The technical contribution is that the spatial variable threshold motivation is embedded in the modified outlier detector with a locally estimated noise variance in an iterative framework for the first time. It turns out that a spatial variable threshold is preferable to a single spatial constant threshold in complicated flows such as vortex flows or turbulent flows. Synthetic cellular vortical flows with simulated scattered or clustered outliers are adopted to evaluate the performance of our proposed method in comparison with popular validation approaches. This method also turns out to be beneficial in a real PIV measurement of turbulent flow. The experimental results demonstrated that the proposed method yields the competitive performance in terms of outlier under-detection count and over-detection count. In addition, the outlier detection method is computational efficient and adaptive, requires no user-defined parameters, and corresponding implementations are also provided in supplementary materials.
Morris, Drew; Nossin-Manor, Revital; Taylor, Margot J; Sled, John G
2011-07-01
In diffusion weighted MRI, subject motion and brain pulsation lead both to signal drop-outs and image misalignment. Unsedated neonates, with their higher heart rate and propensity for motion are particularly prone to degraded scan quality that impairs diffusion tensor estimation. Retrospective registration and robust estimators are two methods that have previously been demonstrated to address motion and intensity outliers, respectively, in diffusion data. However, when taken together, the resampling of images to correct for misalignment can have the effect of averaging outlier voxels with uncorrupted voxels, thereby making outliers more difficult to detect. This article presents a method to remove outliers prior to resampling while taking misalignment into account so that this averaging of outliers with good data can be avoided. The proposed method is compared to other processing pipelines using simulations and data from unsedated preterm neonates. These results demonstrate advantages to the proposed method, particularly in subjects with high motion. Copyright © 2010 Wiley-Liss, Inc.
Methods of Detecting Outliers in A Regression Analysis Model ...
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... Capacity), X2 (Design Pressure), X3 (Boiler Type), X4 (Drum Type) were used. The analysis of the ... 1.2 Identification Of Outliers. There is no such thing as a simple test. However, there are many ..... Psychological. Bulletin, 95 ...
Segmentation by Large Scale Hypothesis Testing - Segmentation as Outlier Detection
Darkner, Sune; Dahl, Anders Lindbjerg; Larsen, Rasmus
2010-01-01
locally. We propose a method based on large scale hypothesis testing with a consistent method for selecting an appropriate threshold for the given data. By estimating the background distribution we characterize the segment of interest as a set of outliers with a certain probability based on the estimated...
Semantic-based Detection of Segment Outliers and Unusual Events for Wireless Sensor Networks
Gao, Lianli; Bruenig, Michael; Hunter, Jane
2014-01-01
Environmental scientists have increasingly been deploying wireless sensor networks to capture valuable data that measures and records precise information about our environment. One of the major challenges associated with wireless sensor networks is the quality of the data and more specifically the detection of segment outliers and unusual events. Most previous research has focused on detecting outliers that are errors that are caused by unreliable sensors and sensor nodes. However, there is a...
Detection of outliers and establishment of targets in external quality assessment programs.
Zhou, Qi; Li, Shaonan; Li, Xiaopeng; Wang, Wei; Wang, Zhiguo
2006-10-01
The establishment of target values is important in external quality assessment (EQA) programs since most of the programs use the target as the assessment criterion. Results submitted to EQA programs usually are not Gaussian-distributed due to the contamination with outliers. The traditional or robust statistical method can be chosen for the truncation of outliers. We investigated the two methods when setting targets for glucose in an EQA program. The results of glucose were analyzed as an all-methods group and divided into four subgroups according to the analytical methods prior to testing of each data distribution. Based on the distribution, the traditional or robust statistical method was used to detect outliers. After removal of outliers, the data distributions were retested and if Gaussian distributions were obtained, the mean values were used as the target. Original data sets were not Gaussian-distributed for all tested groups. Therefore, the robust statistical method was employed to detect outliers. After truncation of outliers, the data sets showed Gaussian distributions and the means were used to set target values. The targets of glucose were determined for all-methods and individual methods from the mean values following removal of outliers using the robust statistical method. This led to comparable targets among the tested groups.
Ensemble Learning Method for Outlier Detection and its Application to Astronomical Light Curves
Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Chen, Wesley
2016-09-01
Outlier detection is necessary for automated data analysis, with specific applications spanning almost every domain from financial markets to epidemiology to fraud detection. We introduce a novel mixture of the experts outlier detection model, which uses a dynamically trained, weighted network of five distinct outlier detection methods. After dimensionality reduction, individual outlier detection methods score each data point for “outlierness” in this new feature space. Our model then uses dynamically trained parameters to weigh the scores of each method, allowing for a finalized outlier score. We find that the mixture of experts model performs, on average, better than any single expert model in identifying both artificially and manually picked outliers. This mixture model is applied to a data set of astronomical light curves, after dimensionality reduction via time series feature extraction. Our model was tested using three fields from the MACHO catalog and generated a list of anomalous candidates. We confirm that the outliers detected using this method belong to rare classes, like Novae, He-burning, and red giant stars; other outlier light curves identified have no available information associated with them. To elucidate their nature, we created a website containing the light-curve data and information about these objects. Users can attempt to classify the light curves, give conjectures about their identities, and sign up for follow up messages about the progress made on identifying these objects. This user submitted data can be used further train of our mixture of experts model. Our code is publicly available to all who are interested.
M. Martinez-Camara
2014-05-01
Full Text Available Emissions of harmful substances into the atmosphere are a serious environmental concern. In order to understand and predict their effects, it is necessary to estimate the exact quantity and timing of the emissions, from sensor measurements taken at different locations. There exists a number of methods for solving this problem. However, these existing methods assume Gaussian additive errors, making them extremely sensitive to outlier measurements. We first show that the errors in real-world measurement datasets come from a heavy-tailed distribution, i.e., include outliers. Hence, we propose to robustify the existing inverse methods by adding a blind outlier detection algorithm. The improved performance of our method is demonstrated on a real dataset and compared to previously proposed methods. For the blind outlier detection, we first use an existing algorithm, RANSAC, and then propose a modification called TRANSAC, which provides a further performance improvement.
WIRELESS SENSOR NETWORKS AND FUSION OF CONTEXTUAL INFORMATION FOR WEATHER OUTLIER DETECTION
A. Amidi
2013-09-01
Full Text Available Weather stations are often expensive hence it may be difficult to obtain data with a high spatial coverage. A low cost alternative is wireless sensor network (WSN, which can be deployed as weather stations and address the aforementioned shortcoming. Due to imperfect sensors in WSNs context, provided raw data may be drawn in from of a low quality and reliability level, expectedly that is an emergence of applying outlier detection methods. Outliers may include errors or potentially useful information called events. In this research, forecast values as contextual information are utilized for weather outlier detection. In this paper, outliers are identified by comparing the patterns of WSN and forecasts. With that approach, temporal outliers are detected with respect to slopes of the WSNs and forecasts in the presence of pre-defined tolerance. The experimental results from the real data-set validate the applicability of using contextual information in the context of WSNs for outlier detection in terms of accuracy and energy efficiency.
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories.
Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie
2016-12-01
Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques.
Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories
Xiaojian Chen
2016-12-01
Full Text Available Low-cost GPS (receiver has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a it is able to model trajectories and detect outliers adaptively; (b only one critical value for outlier scores needs to be set; (c it is able to robustly detect unapparent outliers; and (d it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques.
A random effects variance shift model for detecting and accommodating outliers in meta-analysis.
Gumedze, Freedom N; Jackson, Dan
2011-02-16
Meta-analysis typically involves combining the estimates from independent studies in order to estimate a parameter of interest across a population of studies. However, outliers often occur even under the random effects model. The presence of such outliers could substantially alter the conclusions in a meta-analysis. This paper proposes a methodology for identifying and, if desired, downweighting studies that do not appear representative of the population they are thought to represent under the random effects model. An outlier is taken as an observation (study result) with an inflated random effect variance. We used the likelihood ratio test statistic as an objective measure for determining whether observations have inflated variance and are therefore considered outliers. A parametric bootstrap procedure was used to obtain the sampling distribution of the likelihood ratio test statistics and to account for multiple testing. Our methods were applied to three illustrative and contrasting meta-analytic data sets. For the three meta-analytic data sets our methods gave robust inferences when the identified outliers were downweighted. The proposed methodology provides a means to identify and, if desired, downweight outliers in meta-analysis. It does not eliminate them from the analysis however and we consider the proposed approach preferable to simply removing any or all apparently outlying results. We do not however propose that our methods in any way replace or diminish the standard random effects methodology that has proved so useful, rather they are helpful when used in conjunction with the random effects model.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization.
Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel
2016-04-22
In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the 'outliers' in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.
A kernel-based approach for detecting outliers of high-dimensional biological data.
Oh, Jung Hun; Gao, Jean
2009-04-29
In many cases biomedical data sets contain outliers that make it difficult to achieve reliable knowledge discovery. Data analysis without removing outliers could lead to wrong results and provide misleading information. We propose a new outlier detection method based on Kullback-Leibler (KL) divergence. The original concept of KL divergence was designed as a measure of distance between two distributions. Stemming from that, we extend it to biological sample outlier detection by forming sample sets composed of nearest neighbors. KL divergence is defined between two sample sets with and without the test sample. To handle the non-linearity of sample distribution, original data is mapped into a higher feature space. We address the singularity problem due to small sample size during KL divergence calculation. Kernel functions are applied to avoid direct use of mapping functions. The performance of the proposed method is demonstrated on a synthetic data set, two public microarray data sets, and a mass spectrometry data set for liver cancer study. Comparative studies with Mahalanobis distance based method and one-class support vector machine (SVM) are reported showing that the proposed method performs better in finding outliers. Our idea was derived from Markov blanket algorithm that is a feature selection method based on KL divergence. That is, while Markov blanket algorithm removes redundant and irrelevant features, our proposed method detects outliers. Compared to other algorithms, our proposed method shows better or comparable performance for small sample and high-dimensional biological data. This indicates that the proposed method can be used to detect outliers in biological data sets.
Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection
Fatemeh Azmandian; Ayse Yilmazer; Jennifer G Dy; Javed A Aslam; David R Kaeli
2014-01-01
Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.
Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization
Salim Zair
2016-04-01
Full Text Available In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS data is hindered by Non-Line Of Sight (NLOS and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF or a Rao-Blackwellization (RB approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the ‘outliers’ in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.
A one-class kernel fisher criterion for outlier detection.
Dufrenois, Franck
2015-05-01
Recently, Dufrenois and Noyer proposed a one class Fisher's linear discriminant to isolate normal data from outliers. In this paper, a kernelized version of their criterion is presented. Originally on the basis of an iterative optimization process, alternating between subspace selection and clustering, I show here that their criterion has an upper bound making these two problems independent. In particular, the estimation of the label vector is formulated as an unconstrained binary linear problem (UBLP) which can be solved using an iterative perturbation method. Once the label vector is estimated, an optimal projection subspace is obtained by solving a generalized eigenvalue problem. Like many other kernel methods, the performance of the proposed approach depends on the choice of the kernel. Constructed with a Gaussian kernel, I show that the proposed contrast measure is an efficient indicator for selecting an optimal kernel width. This property simplifies the model selection problem which is typically solved by costly (generalized) cross-validation procedures. Initialization, convergence analysis, and computational complexity are also discussed. Lastly, the proposed algorithm is compared with recent novelty detectors on synthetic and real data sets.
基于相似孤立系数的孤立点检测算法%Outlier Detection Algorithm Based on Approximate Outlier Factor
谢岳山; 樊晓平; 廖志芳; 周国恩; 刘世杰
2013-01-01
基于聚类的孤立点检测算法得到的结果比较粗糙，不够准确。针对该问题，提出一种基于相似孤立系数的孤立点检测算法。定义相似距离以及相似孤立点系数，给出基于相似距离的剪枝策略，根据该策略缩小可疑孤立点候选集，并降低孤立点检测算法的计算复杂度。通过选用公共数据集 Iris、Labor和 Segment-test进行实验验证，结果表明，该算法在发现孤立点、缩小候选集等方面相比经典孤立点检测算法更有效。%Aiming at the problem that the result of outlier detection algorithm based on clustering is coarser and not very accurate, this paper proposes an outlier detection algorithm based on Approximate Outlier Factor(AOF). This algorithm presents the definition of the similarity distance and outlier similarity coefficient, and provides a pruning strategy based on similarity distance to reduce the suspect candidate sets to decrease the computational complexity. Experiments are carried out with public datasets Iris, Labor and Segment-test, and results show that the performance of detecting outlier and reducing candidate set of this algorithm is effective compared with the classical outlier detection algorithm.
Outlier detection in near-infrared spectroscopic analysis by using Monte Carlo cross-validation
LIU ZhiChao; CAI WenSheng; SHAO XueGuang
2008-01-01
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that, in random test (Monte Carlo) cross-validation, the probability of outliers pre-senting in good models with smaller prediction residual error sum of squares (PRESS) or in bad mod-els with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first, then the models are sorted by the PRESS, and at last the outliers are recognized according to the accumulative probability of each sam-ple in the sorted models. For validation of the proposed method, four data sets, including three pub-lished data sets and a large data set of tobacco lamina, were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out (LOO) cross validation method.
Outlier detection in near-infrared spectroscopic analysis by using Monte Carlo cross-validation
2008-01-01
An outlier detection method is proposed for near-infrared spectral analysis. The underlying philosophy of the method is that,in random test(Monte Carlo) cross-validation,the probability of outliers presenting in good models with smaller prediction residual error sum of squares(PRESS) or in bad models with larger PRESS should be obviously different from normal samples. The method builds a large number of PLS models by using random test cross-validation at first,then the models are sorted by the PRESS,and at last the outliers are recognized according to the accumulative probability of each sample in the sorted models. For validation of the proposed method,four data sets,including three published data sets and a large data set of tobacco lamina,were investigated. The proposed method was proved to be highly efficient and veracious compared with the conventional leave-one-out(LOO) cross validation method.
AnyOut : Anytime Outlier Detection Approach for High-dimensional Data
Assent, Ira; Kranen, Philipp; Baldauf, Corinna;
2012-01-01
the problem of determining within any period of time whether an object in a data stream is anomalous. The more time is available, the more reliable the decision should be. We introduce AnyOut, an algorithm capable of solving anytime outlier detection, and investigate different approaches to build up...
Z.R. Struzik; A.P.J.M. Siebes (Arno)
2002-01-01
textabstractWe present a method of detecting and localising outliers in financial time series and other stochastic processes. The method checks the internal consistency of the scaling behaviour of the process within the paradigm of the multifractal spectrum. Deviation from the expected spectrum is i
Finding Fraud in Health Insurance Data with Two-Layer Outlier Detection Approach
Konijn, R.M.; Kowalczyk, W.
2011-01-01
Conventional techniques for detecting outliers address the problem of finding isolated observations that significantly differ from other observations that are stored in a database. For example, in the context of health insurance, one might be interested in finding unusual claims concerning
Santos, Jose O. dos, E-mail: osmansantos@ig.com.br [Instituto Federal de Educacao, Ciencia e Tecnologia de Sergipe (IFS), Lagarto, SE (Brazil); Munita, Casimiro S., E-mail: camunita@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Soares, Emilio A.A., E-mail: easoares@ufan.edu.br [Universidade Federal do Amazonas (UFAM), Manaus, AM (Brazil). Dept. de Geociencias
2013-07-01
The detection of outlier in geochemical studies is one of the main difficulties in the interpretation of dataset because they can disturb the statistical method. The search for outliers in geochemical studies is usually based in the Mahalanobis distance (MD), since points in multivariate space that are a distance larger the some predetermined values from center of the data are considered outliers. However, the MD is very sensitive to the presence of discrepant samples. Many robust estimators for location and covariance have been introduced in the literature, such as Minimum Covariance Determinant (MCD) estimator. When MCD estimators are used to calculate the MD leads to the so-called Robust Mahalanobis Distance (RD). In this context, in this work RD was used to detect outliers in geological study of samples collected from confluence of Negro and Solimoes rivers. The purpose of this study was to study the contributions of the sediments deposited by the Solimoes and Negro rivers in the filling of the tectonic depressions at Parana do Ariau. For that 113 samples were analyzed by Instrumental Neutron Activation Analysis (INAA) in which were determined the concentration of As, Ba, Ce, Co, Cr, Cs, Eu, Fe, Hf, K, La, Lu, Na, Nd, Rb, Sb, Sc, Sm, U, Yb, Ta, Tb, Th and Zn. In the dataset was possible to construct the ellipse corresponding to robust Mahalanobis distance for each group of samples. The samples found outside of the tolerance ellipse were considered an outlier. The results showed that Robust Mahalanobis Distance was more appropriate for the identification of the outliers, once it is a more restrictive method. (author)
Alavi Majd, Hamid; Najafi Ghobadi, Khadijeh; Akbarzadeh Baghban, Alireza; Ahmadi, Nayebali; Sajjadi, Elham
2014-05-01
Meta-analysis is a statistical technique in which the results of two or more independent studies, with similar objectives, are mathematically combined in order to improve the reliability of the results. The outliers, which may exist even in random models, can affect the validity and strength of meta-analysis results. The current study uses "random effects variance shift model" to evaluate and correct the outliers in performing a meta-analysis study of the effect of albendazole in treating patients with Ascaris lumbricoides infection. The study used data from 14 clinical trials; each article was composed of two groups, a treatment group and a placebo group. These articles compared the effect of single dose intakes of 400 mg albendazole in treating two groups of patients with Ascaris lumbricoides infection. The articles were published in a number of internationally indexed journals between 1983 to 2013. For both groups in each article, the total number of participants, the number of those with Ascaris lumbricoides infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk (RR) and variance were computed for each article individually. Then, using meta-analysis, the RR was computed for all the articles together. In order to detect outliers the "random effects variance shift model" and "likelihood ratio test" (LRT) were used. Adopting the bootstrap method, the accuracy rates for sampling distribution of the tests, which were used for multiple testing, were obtained and the relevant graphs were depicted. For data analysis, STATA and R software were used. According to meta-analysis results, the estimate for RR was 2.91, with a 95% confidence interval of 2.6 to 3.25. According to the method used in this study, three articles (articles number 4, 7, and 12) were outliers and, as such, they were detected in the graphs. We can detect and accommodate outliers in meta-analysis by using random effects variance
Di, Nur Faraidah Muhammad; Satari, Siti Zanariah
2017-05-01
Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Automated Parameters for Troubled-Cell Indicators Using Outlier Detection
Vuik, M.J.; Ryan, J.K.
2016-01-01
In Vuik and Ryan [J. Comput. Phys., 270 (2014), pp. 138--160] we studied the use of troubled-cell indicators for discontinuity detection in nonlinear hyperbolic partial differential equations and introduced a new multiwavelet technique to detect troubled cells. We found that these methods perform we
Li, Weizhi; Mo, Weirong; Zhang, Xu; Squiers, John J.; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2015-12-01
Multispectral imaging (MSI) was implemented to develop a burn tissue classification device to assist burn surgeons in planning and performing debridement surgery. To build a classification model via machine learning, training data accurately representing the burn tissue was needed, but assigning raw MSI data to appropriate tissue classes is prone to error. We hypothesized that removing outliers from the training dataset would improve classification accuracy. A swine burn model was developed to build an MSI training database and study an algorithm's burn tissue classification abilities. After the ground-truth database was generated, we developed a multistage method based on Z-test and univariate analysis to detect and remove outliers from the training dataset. Using 10-fold cross validation, we compared the algorithm's accuracy when trained with and without the presence of outliers. The outlier detection and removal method reduced the variance of the training data. Test accuracy was improved from 63% to 76%, matching the accuracy of clinical judgment of expert burn surgeons, the current gold standard in burn injury assessment. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Li, Weizhi; Mo, Weirong; Zhang, Xu; Squiers, John J; Lu, Yang; Sellke, Eric W; Fan, Wensheng; DiMaio, J Michael; Thatcher, Jeffrey E
2015-12-01
Multispectral imaging (MSI) was implemented to develop a burn tissue classification device to assist burn surgeons in planning and performing debridement surgery. To build a classification model via machine learning, training data accurately representing the burn tissue was needed, but assigning raw MSI data to appropriate tissue classes is prone to error. We hypothesized that removing outliers from the training dataset would improve classification accuracy. A swine burn model was developed to build an MSI training database and study an algorithm’s burn tissue classification abilities. After the ground-truth database was generated, we developed a multistage method based on Z -test and univariate analysis to detect and remove outliers from the training dataset. Using 10-fold cross validation, we compared the algorithm’s accuracy when trained with and without the presence of outliers. The outlier detection and removal method reduced the variance of the training data. Test accuracy was improved from 63% to 76%, matching the accuracy of clinical judgment of expert burn surgeons, the current gold standard in burn injury assessment. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
Micenková, Barbora; McWilliams, Brian; Assent, Ira
2014-01-01
Years of research in unsupervised outlier detection have produced numerous algorithms to score data according to their exceptionality. wever, the nature of outliers heavily depends on the application context and different algorithms are sensitive to outliers of different nature. This makes it ver...
An Illustration of New Methods in Machine Condition Monitoring, Part II: Adaptive outlier detection
Antoniadou, I.; Worden, K.; Marchesiello, S.; Mba, C.; Garibaldi, L.
2017-05-01
There have been many recent developments in the application of data-based methods to machine condition monitoring. A powerful methodology based on machine learning has emerged, where diagnostics are based on a two-step procedure: extraction of damage-sensitive features, followed by unsupervised learning (novelty detection) or supervised learning (classification). The objective of the current pair of papers is simply to illustrate one state-of-the-art procedure for each step, using synthetic data representative of reality in terms of size and complexity. The second paper in the pair will deal with novelty detection. Although there has been considerable progress in the use of outlier analysis for novelty detection, most of the papers produced so far have suffered from the fact that simple algorithms break down if multiple outliers are present or if damage is already present in a training set. The objective of the current paper is to illustrate the use of phase-space thresholding; an algorithm which has the ability to detect multiple outliers inclusively in a data set.
Rapid eye movement sleep behavior disorder as an outlier detection problem
Kempfner, Jacob; Sørensen, Gertrud Laura; Nikolic, M.
2014-01-01
: Sixteen healthy control subjects, 16 subjects with idiopathic REM sleep behavior disorder, and 16 subjects with periodic limb movement disorder were enrolled. Different combinations of five surface electromyographic channels, including the EOG, were tested. A muscle activity score was automatically...... for quantitative methods to establish objective criteria. This study proposes a semiautomatic algorithm for the early detection of Parkinson's disease. This is achieved by distinguishing between normal REM sleep and REM sleep without atonia by considering muscle activity as an outlier detection problem. METHODS...... computed from manual scored REM sleep. This was accomplished by the use of subject-specific features combined with an outlier detector (one-class support vector machine classifier). RESULTS: It was possible to correctly separate idiopathic REM sleep behavior disorder subjects from healthy control subjects...
The Outlier Detection for Ordinal Data Using Scalling Technique of Regression Coefficients
Adnan, Arisman; Sugiarto, Sigit
2017-06-01
The aims of this study is to detect the outliers by using coefficients of Ordinal Logistic Regression (OLR) for the case of k category responses where the score from 1 (the best) to 8 (the worst). We detect them by using the sum of moduli of the ordinal regression coefficients calculated by jackknife technique. This technique is improved by scalling the regression coefficients to their means. R language has been used on a set of ordinal data from reference distribution. Furthermore, we compare this approach by using studentised residual plots of jackknife technique for ANOVA (Analysis of Variance) and OLR. This study shows that the jackknifing technique along with the proper scaling may lead us to reveal outliers in ordinal regression reasonably well.
Efficient estimation of dynamic density functions with an application to outlier detection
Qahtan, Abdulhakim Ali Ali
2012-01-01
In this paper, we propose a new method to estimate the dynamic density over data streams, named KDE-Track as it is based on a conventional and widely used Kernel Density Estimation (KDE) method. KDE-Track can efficiently estimate the density with linear complexity by using interpolation on a kernel model, which is incrementally updated upon the arrival of streaming data. Both theoretical analysis and experimental validation show that KDE-Track outperforms traditional KDE and a baseline method Cluster-Kernels on estimation accuracy of the complex density structures in data streams, computing time and memory usage. KDE-Track is also demonstrated on timely catching the dynamic density of synthetic and real-world data. In addition, KDE-Track is used to accurately detect outliers in sensor data and compared with two existing methods developed for detecting outliers and cleaning sensor data. © 2012 ACM.
REPPlab: An R package for detecting clusters and outliers using exploratory projection pursuit
Fischer, Daniel; Berro, Alain; Nordhausen, Klaus; Ruiz-Gazen, Anne
2016-01-01
The R-package REPPlab is designed to explore multivariate data sets using one-dimensional unsupervised projection pursuit. It is useful in practice as a preprocessing step to find clusters or as an outlier detection tool for multivariate numerical data. Except from the package tourr that implements smooth sequences of projection matrices and rggobi that provides an interface to a dynamic graphics package called GGobi, there is no implementation of exploratory projection pursuit tools availabl...
Accuracy Evaluation of A Diagnostic Test by Detecting Outliers and Influential Observations
Hsien-Chueh Peter YANG; Tsung-Hao CHEN; Cheng-Wu CHEN; Chen-Yuan CHEN; Chun-Te LIU
2008-01-01
Logit regression analysis is widely applied in scientific studies and laboratory experiments, where skewed observations on a data set are often encountered. A number of problems with this method, for example, outliers and influential observations, can cause overdispersion when a model is fitted. In this study a systematic statistical approach, including the plotting of several indices is used to diagnose the lack-of-fit of a logistic regression model. The outliers and influential observations on data from laboratory experiments are then detected. Specifically we take account of the interaction of an internal solitary wave (ISW) with an obstacle, i.e., an underwater ridge, and also analyze the effects of the ridge height, the lower layer water depth, and the potential energy on the amplitude-based transmission rate of the ISW. As concluded, the goodness-of-fit of the revised logit regression model is better than that of the model without this approach.
Open-Source Radiation Exposure Extraction Engine (RE3) with Patient-Specific Outlier Detection.
Weisenthal, Samuel J; Folio, Les; Kovacs, William; Seff, Ari; Derderian, Vana; Summers, Ronald M; Yao, Jianhua
2016-08-01
We present an open-source, picture archiving and communication system (PACS)-integrated radiation exposure extraction engine (RE3) that provides study-, series-, and slice-specific data for automated monitoring of computed tomography (CT) radiation exposure. RE3 was built using open-source components and seamlessly integrates with the PACS. RE3 calculations of dose length product (DLP) from the Digital imaging and communications in medicine (DICOM) headers showed high agreement (R (2) = 0.99) with the vendor dose pages. For study-specific outlier detection, RE3 constructs robust, automatically updating multivariable regression models to predict DLP in the context of patient gender and age, scan length, water-equivalent diameter (D w), and scanned body volume (SBV). As proof of concept, the model was trained on 811 CT chest, abdomen + pelvis (CAP) exams and 29 outliers were detected. The continuous variables used in the outlier detection model were scan length (R (2) = 0.45), D w (R (2) = 0.70), SBV (R (2) = 0.80), and age (R (2) = 0.01). The categorical variables were gender (male average 1182.7 ± 26.3 and female 1047.1 ± 26.9 mGy cm) and pediatric status (pediatric average 710.7 ± 73.6 mGy cm and adult 1134.5 ± 19.3 mGy cm).
Shipley, Martin J.; Welch, Catherine; Kivimaki, Mika; Singh-Manoux, Archana
2015-01-01
Background Participants’ non adherence to protocol affects data quality. In longitudinal studies, this leads to outliers that can be present at the level of the population or the individual. The purpose of the present study is to elaborate a method for detection of outliers in a study of cognitive ageing. Methods In the Whitehall II study, data on a cognitive test battery have been collected in 1997-99, 2002-04, 2007-09 and 2012-13. Outliers at the 2012-13 wave were identified using a 4-step procedure: (1) identify cognitive tests with potential non-adherence to protocol, (2) choose a prediction model between a simple model with socio-demographic covariates and one that also includes health behaviours and health measures, (3) define an outlier using a studentized residual, and (4) study the impact of exclusion of outliers by estimating the effect of age and diabetes on cognitive decline. Results 5516 participants provided cognitive data in 2012-13. Comparisons of rates of annual decline over the first three and all four waves of data suggested outliers in three of the 5 tests. Mean residuals for the 2012-13 wave were larger for the basic compared to the more complex prediction model (all poutliers. Residuals greater than two standard deviation of residuals identified approximately 7% of observations as being outliers. Removal of these observations from the analyses showed that both age and diabetes had associations with cognitive decline similar to that observed with the first three waves of data; these associations were weaker or absent in non-cleaned data. Conclusions Identification of outliers is important as they obscure the effects of known risk factor and introduce bias in the estimates of cognitive decline. We showed that an informed approach, using the range of data collected in a longitudinal study, may be able to identify outliers. PMID:26161552
Dugravot, Aline; Sabia, Severine; Shipley, Martin J; Welch, Catherine; Kivimaki, Mika; Singh-Manoux, Archana
2015-01-01
Participants' non adherence to protocol affects data quality. In longitudinal studies, this leads to outliers that can be present at the level of the population or the individual. The purpose of the present study is to elaborate a method for detection of outliers in a study of cognitive ageing. In the Whitehall II study, data on a cognitive test battery have been collected in 1997-99, 2002-04, 2007-09 and 2012-13. Outliers at the 2012-13 wave were identified using a 4-step procedure: (1) identify cognitive tests with potential non-adherence to protocol, (2) choose a prediction model between a simple model with socio-demographic covariates and one that also includes health behaviours and health measures, (3) define an outlier using a studentized residual, and (4) study the impact of exclusion of outliers by estimating the effect of age and diabetes on cognitive decline. 5516 participants provided cognitive data in 2012-13. Comparisons of rates of annual decline over the first three and all four waves of data suggested outliers in three of the 5 tests. Mean residuals for the 2012-13 wave were larger for the basic compared to the more complex prediction model (all poutliers. Residuals greater than two standard deviation of residuals identified approximately 7% of observations as being outliers. Removal of these observations from the analyses showed that both age and diabetes had associations with cognitive decline similar to that observed with the first three waves of data; these associations were weaker or absent in non-cleaned data. Identification of outliers is important as they obscure the effects of known risk factor and introduce bias in the estimates of cognitive decline. We showed that an informed approach, using the range of data collected in a longitudinal study, may be able to identify outliers.
An optimized outlier detection algorithm for jury-based grading of engineering design projects
Thompson, Mary Kathryn; Espensen, Christina; Clemmensen, Line Katrine Harder
2016-01-01
This work characterizes and optimizes an outlier detection algorithm to identify potentially invalid scores produced by jury members while grading engineering design projects. The paper describes the original algorithm and the associated adjudication process in detail. The impact of the various......, but no true optimum seems to exist. The performance of the best optimizations and the original algorithm are similar. Therefore, it should be possible to choose new coefficient values for jury populations in other cultures and contexts logically and empirically without a full optimization as long...
Risk pre-warning of tender evaluation for civil projects:an outlier detection model
无
2008-01-01
The marking scheme method removes the low scores of the contractor's attributes given by experts when the overall score is calculated, which may result in that a contractor with some latent risks will win the project. In order to remedy the above defect of the marking scheme method, an outlier detection model, which is one mission of knowledge discovery in data, is established on the basis of the sum of similar coefficients. Then, the model is applied to the historical score data of tender evaluation for ci...
Md. S. Rana
2008-01-01
Full Text Available Problem statement: The problem of heteroscedasticity occurs in regression analysis for many practical reasons. It is now evident that the heteroscedastic problem affects both the estimation and test procedure of regression analysis, so it is really important to be able to detect this problem for possible remedy. The existence of a few extreme or unusual observations that we often call outliers is a very common feature in data analysis. In this study we have shown how the existence of outliers makes the detection of heteroscedasticity cumbersome. Often outliers occurring in a homoscedastic model make the model heteroscedastic, on the other hand, outliers may distort the diagnostic tools in such a way that we cannot correctly diagnose the heteroscedastic problem in the presence of outliers. Neither of these situations is desirable. Approach: This article introduced a robust test procedure to detect the problem of heteroscedasticity which will be unaffected in the presence of outliers. We have modified one of the most popular and commonly used tests, the Goldfeld-Quandt, by replacing its nonrobust components by robust alternatives. Results: The performance of the newly proposed test is investigated extensively by real data sets and Monte Carlo simulations. The results suggest that the robust version of this test offers substantial improvements over the existing tests. Conclusion/Recommendations: The proposed robust Goldfeld-Quandt test should be employed instead of the existing tests in order to avoid misleading conclusion.
A Modified Outlier Detection Method in Dynamic Data Reconciliation%一种改进的动态数据校正离群值检测法
周凌柯; 苏宏业; 褚健
2005-01-01
Data reconciliation technology can decrease the level of corruption of process data due to measurement noise, but the presence of outliers caused by process peaks or unmeasured disturbances will smear the reconciled results. Based on the analysis of limitation of conventional outlier detection algorithms, a modified outlier detection method in dynamic data reconciliation (DDR) is proposed in this paper. In the modified method, the outliers of each variable are distinguished individually and the weight is modified accordingly. Therefore, the modified method can use more information of normal data, and can efficiently decrease the effect of outliers. Simulation of a continuous stirred tank reactor (CSTR) process verifies the effectiveness of the proposed algorithm.
Explaining outliers by subspace separability
Micenková, Barbora; Ng, Raymond T.; Dang, Xuan-Hong
2013-01-01
Outliers are extraordinary objects in a data collection. Depending on the domain, they may represent errors, fraudulent activities or rare events that are subject of our interest. Existing approaches focus on detection of outliers or degrees of outlierness (ranking), but do not provide a possible...... explanation of how these objects deviate from the rest of the data. Such explanations would help user to interpret or validate the detected outliers. The problem addressed in this paper is as follows: given an outlier detected by an existing algorithm, we propose a method that determines possible explanations...... for the outlier. These explanations are expressed in the form of subspaces in which the given outlier shows separability from the inliers. In this manner, our proposed method complements existing outlier detection algorithms by providing additional information about the outliers. Our method is designed to work...
K. K. L. B. Adikaram
2015-01-01
Full Text Available Grubbs test (extreme studentized deviate test, maximum normed residual test is used in various fields to identify outliers in a data set, which are ranked in the order of x1≤x2≤x3≤⋯≤xn (i=1,2,3,…,n. However, ranking of data eliminates the actual sequence of a data series, which is an important factor for determining outliers in some cases (e.g., time series. Thus in such a data set, Grubbs test will not identify outliers correctly. This paper introduces a technique for transforming data from sequence bound linear form to sequence unbound form (y=c. Applying Grubbs test to the new transformed data set detects outliers more accurately. In addition, the new technique improves the outlier detection capability of Grubbs test. Results show that, Grubbs test was capable of identifing outliers at significance level 0.01 after transformation, while it was unable to identify those prior to transforming at significance level 0.05.
Revisiting Baarda's concept of minimal detectable bias with regard to outlier identifiability
Prószyński, W.
2015-10-01
The concept of minimal detectable bias (MDB) as initiated by Baarda (Publ Geod New Ser 2(5), 1968) and later developed by Wang and Chen (Acta Geodaet et Cartograph Sin Engl Edn 42-51, 1994), Schaffrin (J Eng Surv 123:126-137, 1997), Teunissen (IEEE Aerosp Electron Syst Mag 5(7):35-41, 1990, J Geod 72:236-244 1998, Testing theory: an introduction. Delft University Press, Delft, 2000) and others, refers to the issue of outlier detectability. A supplementation of the concept is proposed for the case of correlated observations contaminated with a single gross error. The supplementation consists mainly of an outlier identifiability index assigned to each individual observation in a network and a mis-identifiability index being the maximum probability of identifying a wrong observation. To those indices there can also be added the MDB multiplying factor to increase the identifiability index to a satisfactory level. As auxiliary measures there are indices of partial identifiability concerning pairs of observations. The indices were derived assuming the generalized outlier identification procedure as in Knight et al. (J Geod. doi: 10.1007/s00190-010-0392-4, 2010), which with one outlier case being assumed is similar to Baarda's w-test (Baarda in Publ Geod New Ser 2(5), 1968). The following two options of identifiability indices and partial identifiability indices are distinguished: I. the indices related to identification of a contaminated observation within a set of observations suspected of containing a gross error (identifiability), II. the indices related to identification of a contaminated observation within a whole set of observations (pseudo-identifiability). To characterize the proposed approach in the context of the existing solutions of similar topic being the separability testing, the properties of both types of identifiability indices are discussed with reference to the concept of Minimal Separable Bias (Wang and Knight in J Glob Position Syst 11(1):46-57, 2012
A comparison of four different methods for outlier detection in bioequivalence studies.
Ramsay, Timothy; Elkum, Naser
2005-01-01
Bioequivalence studies, required by law whenever a new formulation of an existing drug product is introduced to the market, are designed to test whether the bioavailability, defined as the rate and extent to which a substance reaches systemic circulation, is equivalent for each of two or more formulations. Detection and treatment of outlying data in bioequivalence studies are practically important, because inclusion or deletion of potential outlying data may lead to a different conclusion concerning bioequivalence. A review of the literature reveals that four different methods have been proposed for detecting outliers in bioavailability/bioequivalence studies. We present the results of an extensive computer simulation testing the small sample performance of these four testing methods, the results of which indicate that one of these, the estimates distance test, is substantially more powerful than the alternatives.
McGinnis, Ralph E.; Deloukas, Panos; McLaren, William M.; Inouye, Michael
2010-01-01
We describe a novel approach for evaluating SNP genotypes of a genome-wide association scan to identify “ethnic outlier” subjects whose ethnicity is different or admixed compared to most other subjects in the genotyped sample set. Each ethnic outlier is detected by counting a genomic excess of “rare” heterozygotes and/or homozygotes whose frequencies are low (outliers are enhanced by evaluating only genomic regions of visualized admixture rather than diluting outlier ancestry by evaluating the entire genome considered in aggregate. We have validated our method in the Wellcome Trust Case Control Consortium (WTCCC) study of 17,000 subjects as well as in HapMap subjects and simulated outliers of known ethnicity and admixture. The method's ability to precisely delineate chromosomal segments of non-Caucasian ethnicity has enabled us to demonstrate previously unreported non-Caucasian admixture in two HapMap Caucasian parents and in a number of WTCCC subjects. Its sensitive detection of ethnic outliers and simple visual discrimination of discrete chromosomal segments of different ethnicity implies that this method of rare heterozygotes and homozygotes (RHH) is likely to have diverse and important applications in humans and other species. PMID:20211853
Martinez, Rafael; Rodriguez, Francisco de Borja; Camacho, David
2007-01-01
The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important problems are analyzed in the paper. On the one hand, how to detect "false positives" when the distance among the documents is very low and there is actual similarity. On the other hand, we propose a way to structure a document database which similarities distance estimation depends on the length of the selected text. Finally, the experimental evaluations that have been carried out to study previous problems are shown.
A simple method to choose the most representative stride and detect outliers.
Sangeux, Morgan; Polak, Julia
2015-02-01
Kinematic data for gait analysis consists of joint angle curves plotted against percentages of the gait cycle. A typical gait analysis entails repeated measurement of the kinematic data. We present an automatic and computationally inexpensive method to choose the most representative curve and detect outliers amongst repeated curves. The method is based on the notion of depth, where the deepest curve is the equivalent to the median for univariate data. The method applies to single kinematic variable or multi-kinematic variables such as the gait profile. It is sensitive to both shape and position of the curves. A comparison with an existing statistical method is presented as well as an example on one patient's data. Copyright © 2014 Elsevier B.V. All rights reserved.
EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers.
Chiner-Oms, Alvaro; González-Candelas, Fernando
2016-01-01
We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.
An optimized outlier detection algorithm for jury-based grading of engineering design projects
Thompson, Mary Kathryn; Espensen, Christina; Clemmensen, Line Katrine Harder
2016-01-01
This work characterizes and optimizes an outlier detection algorithm to identify potentially invalid scores produced by jury members while grading engineering design projects. The paper describes the original algorithm and the associated adjudication process in detail. The impact of the various...... conditions in the algorithm on the false positive and false negative rates is explored. Aresponse surface design is performed to optimize the algorithm using a data set from Fall 2010. Finally, the results are tested against a data set from Fall 2011. It is shown that all elements of the original algorithm......, but no true optimum seems to exist. The performance of the best optimizations and the original algorithm are similar. Therefore, it should be possible to choose new coefficient values for jury populations in other cultures and contexts logically and empirically without a full optimization as long...
Quasar lenses and galactic streams: outlier selection and Gaia multiplet detection
Agnello, Adriano
2017-10-01
I describe two novel techniques originally devised to select strongly lensed quasar candidates in wide-field surveys. The first relies on outlier selection in optical and mid-infrared magnitude space; the second combines mid-infrared colour selection with Gaia spatial resolution, to identify multiplets of objects with quasar-like colours. Both methods have already been applied successfully to the Sloan Digital Sky Survey, ATLAS and Dark Energy Survey footprints: besides recovering known lenses from previous searches, they have led to new discoveries, including quadruply lensed quasars, which are rare within the rare-object class of quasar lenses. As a serendipitous by-product, at least four candidate Galactic streams in the South have been identified among foreground contaminants. There is considerable scope for tailoring the WISE-Gaia multiplet search to stellar-like objects, instead of quasar-like, and to automatically detect Galactic streams.
A new algorithm for automatic Outlier Detection in GPS Time Series
Cannavo', Flavio; Mattia, Mario; Rossi, Massimo; Palano, Mimmo; Bruno, Valentina
2010-05-01
Nowadays continuous GPS time series are considered a crucial product of GPS permanent networks, useful in many geo-science fields, such as active tectonics, seismology, crustal deformation and volcano monitoring (Altamimi et al. 2002, Elósegui et al. 2006, Aloisi et al. 2009). Although the GPS data elaboration software has increased in reliability, the time series are still affected by different kind of noise, from the intrinsic noise (e.g. thropospheric delay) to the un-modeled noise (e.g. cycle slips, satellite faults, parameters changing). Typically GPS Time Series present characteristic noise that is a linear combination of white noise and correlated colored noise, and this characteristic is fractal in the sense that is evident for every considered time scale or sampling rate. The un-modeled noise sources result in spikes, outliers and steps. These kind of errors can appreciably influence the estimation of velocities of the monitored sites. The outlier detection in generic time series is a widely treated problem in literature (Wei, 2005), while is not fully developed for the specific kind of GPS series. We propose a robust automatic procedure for cleaning the GPS time series from the outliers and, especially for long daily series, steps due to strong seismic or volcanic events or merely instrumentation changing such as antenna and receiver upgrades. The procedure is basically divided in two steps: a first step for the colored noise reduction and a second step for outlier detection through adaptive series segmentation. Both algorithms present novel ideas and are nearly unsupervised. In particular, we propose an algorithm to estimate an autoregressive model for colored noise in GPS time series in order to subtract the effect of non Gaussian noise on the series. This step is useful for the subsequent step (i.e. adaptive segmentation) which requires the hypothesis of Gaussian noise. The proposed algorithms are tested in a benchmark case study and the results
A Finite Mixture Method for Outlier Detection and Robustness in Meta-Analysis
Beath, Ken J.
2014-01-01
When performing a meta-analysis unexplained variation above that predicted by within study variation is usually modeled by a random effect. However, in some cases, this is not sufficient to explain all the variation because of outlier or unusual studies. A previously described method is to define an outlier as a study requiring a higher random…
LOSITAN: A workbench to detect molecular adaptation based on a Fst-outlier method
Beja-Pereira Albano
2008-07-01
Full Text Available Abstract Background Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Results Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst, provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. Conclusion LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.
Krumpe, Tanja; Walter, Carina; Rosenstiel, Wolfgang; Spüler, Martin
2016-08-01
Objective. In this study, the feasibility of detecting a P300 via an asynchronous classification mode in a reactive EEG-based brain-computer interface (BCI) was evaluated. The P300 is one of the most popular BCI control signals and therefore used in many applications, mostly for active communication purposes (e.g. P300 speller). As the majority of all systems work with a stimulus-locked mode of classification (synchronous), the field of applications is limited. A new approach needs to be applied in a setting in which a stimulus-locked classification cannot be used due to the fact that the presented stimuli cannot be controlled or predicted by the system. Approach. A continuous observation task requiring the detection of outliers was implemented to test such an approach. The study was divided into an offline and an online part. Main results. Both parts of the study revealed that an asynchronous detection of the P300 can successfully be used to detect single events with high specificity. It also revealed that no significant difference in performance was found between the synchronous and the asynchronous approach. Significance. The results encourage the use of an asynchronous classification approach in suitable applications without a potential loss in performance.
Explaining outliers by subspace separability
Micenková, Barbora; Ng, Raymond T.; Dang, Xuan-Hong;
2013-01-01
Outliers are extraordinary objects in a data collection. Depending on the domain, they may represent errors, fraudulent activities or rare events that are subject of our interest. Existing approaches focus on detection of outliers or degrees of outlierness (ranking), but do not provide a possible...
Detection of Outliers and Imputing of Missing Values for Water Quality UV-VIS Absorbance Time Series
Leonardo Plazas-Nossa
2017-01-01
Full Text Available Context: The UV-Vis absorbance collection using online optical captors for water quality detection may yield outliers and/or missing values. Therefore, data pre-processing is a necessary pre-requisite to monitoring data processing. Thus, the aim of this study is to propose a method that detects and removes outliers as well as fills gaps in time series. Method: Outliers are detected using Winsorising procedure and the application of the Discrete Fourier Transform (DFT and the Inverse of Fast Fourier Transform (IFFT to complete the time series. Together, these tools were used to analyse a case study comprising three sites in Colombia ((i Bogotá D.C. Salitre-WWTP (Waste Water Treatment Plant, influent; (ii Bogotá D.C. Gibraltar Pumping Station (GPS; and, (iii Itagüí, San Fernando-WWTP, influent (Medellín metropolitan area analysed via UV-Vis (Ultraviolet and Visible spectra. Results: Outlier detection with the proposed method obtained promising results when window parameter values are small and self-similar, despite that the three time series exhibited different sizes and behaviours. The DFT allowed to process different length gaps having missing values. To assess the validity of the proposed method, continuous subsets (a section of the absorbance time series without outlier or missing values were removed from the original time series obtaining an average 12% error rate in the three testing time series. Conclusions: The application of the DFT and the IFFT, using the 10% most important harmonics of useful values, can be useful for its later use in different applications, specifically for time series of water quality and quantity in urban sewer systems. One potential application would be the analysis of dry weather interesting to rain events, a feat achieved by detecting values that correspond to unusual behaviour in a time series. Additionally, the result hints at the potential of the method in correcting other hydrologic time series.
A Bayesian outlier criterion to detect SNPs under selection in large data sets.
Mathieu Gautier
Full Text Available BACKGROUND: The recent advent of high-throughput SNP genotyping technologies has opened new avenues of research for population genetics. In particular, a growing interest in the identification of footprints of selection, based on genome scans for adaptive differentiation, has emerged. METHODOLOGY/PRINCIPAL FINDINGS: The purpose of this study is to develop an efficient model-based approach to perform bayesian exploratory analyses for adaptive differentiation in very large SNP data sets. The basic idea is to start with a very simple model for neutral loci that is easy to implement under a bayesian framework and to identify selected loci as outliers via Posterior Predictive P-values (PPP-values. Applications of this strategy are considered using two different statistical models. The first one was initially interpreted in the context of populations evolving respectively under pure genetic drift from a common ancestral population while the second one relies on populations under migration-drift equilibrium. Robustness and power of the two resulting bayesian model-based approaches to detect SNP under selection are further evaluated through extensive simulations. An application to a cattle data set is also provided. CONCLUSIONS/SIGNIFICANCE: The procedure described turns out to be much faster than former bayesian approaches and also reasonably efficient especially to detect loci under positive selection.
Díaz Muñiz, C; García Nieto, P J; Alonso Fernández, J R; Martínez Torres, J; Taboada, J
2012-11-15
Water quality controls involve large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using oxygen and turbidity as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. Our approach lies in considering water quality monitoring through time as curves instead of vectors, that is to say, the data set of the problem is considered as a time-dependent function and not as a set of discrete values in different time instants. The methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in San Esteban estuary. Results were discussed in terms of origin, causes, etc., and compared with those obtained using the conventional method based on vector comparison. Finally, the advantages of the functional method are exposed. Copyright © 2012 Elsevier B.V. All rights reserved.
Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.
2016-03-01
Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.
Mixture Based Outlier Filtration
P. Pecherková
2006-01-01
Full Text Available Success/failure of adaptive control algorithms – especially those designed using the Linear Quadratic Gaussian criterion – depends on the quality of the process data used for model identification. One of the most harmful types of process data corruptions are outliers, i.e. ‘wrong data’ lying far away from the range of real data. The presence of outliers in the data negatively affects an estimation of the dynamics of the system. This effect is magnified when the outliers are grouped into blocks. In this paper, we propose an algorithm for outlier detection and removal. It is based on modelling the corrupted data by a two-component probabilistic mixture. The first component of the mixture models uncorrupted process data, while the second models outliers. When the outlier component is detected to be active, a prediction from the uncorrupted data component is computed and used as a reconstruction of the observed data. The resulting reconstruction filter is compared to standard methods on simulated and real data. The filter exhibits excellent properties, especially in the case of blocks of outliers.
Motulsky, Harvey J; Brown, Ronald E
2006-03-09
Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1-3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives.
Motulsky Harvey J
2006-03-01
Full Text Available Abstract Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives.
Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.
2013-01-01
he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
El Azami, Meriem; Hammers, Alexander; Jung, Julien; Costes, Nicolas; Bouet, Romain; Lartizien, Carole
2016-01-01
Pattern recognition methods, such as computer aided diagnosis (CAD) systems, can help clinicians in their diagnosis by marking abnormal regions in an image. We propose a machine learning system based on a one-class support vector machine (OC-SVM) classifier for the detection of abnormalities in magnetic resonance images (MRI) applied to patients with intractable epilepsy. The system learns the features associated with healthy control subjects, allowing a voxelwise assessment of the deviation of a test subject pattern from the learned patterns. While any number of various features can be chosen and learned, here we focus on two texture parameters capturing image patterns associated with epileptogenic lesions on T1-weighted brain MRI e.g. heterotopia and blurred junction between the grey and white matter. The CAD output consists of patient specific 3D maps locating clusters of suspicious voxels ranked by size and degree of deviation from control patterns. System performance was evaluated using realistic simulations of challenging detection tasks as well as clinical data of 77 healthy control subjects and of eleven patients (13 lesions). It was compared to that of a mass univariate statistical parametric mapping (SPM) single subject analysis based on the same set of features. For all simulations, OC-SVM yielded significantly higher values of the area under the ROC curve (AUC) and higher sensitivity at low false positive rate. For the clinical data, both OC-SVM and SPM successfully detected 100% of the lesions in the MRI positive cases (3/13). For the MRI negative cases (10/13), OC-SVM detected 7/10 lesions and SPM analysis detected 5/10 lesions. In all experiments, OC-SVM produced fewer false positive detections than SPM. OC-SVM may be a versatile system for unbiased lesion detection. PMID:27603778
Aggarwal, Charu C
2013-01-01
With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions- the data can be of any type, structured or unstructured, and may be extremely large.Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and
Bekkevold, Dorte; Gross, Riho; Arula, Timo
2016-01-01
analyses reveal clear genetic differences between ecotypes and hence support reproductive isolation. Loci showing non-neutral behaviour, so-called outlier loci, show convergence between autumn spawning herring from demographically disjoint populations, potentially reflecting selective processes associated...... with autumn spawning ecotypes. The abundance and exploitation of the two ecotypes have varied strongly over space and time in the Baltic Sea, where autumn spawners have faced strong depression for decades. The results therefore have practical implications by highlighting the need for specific management...
Butt, D.M., E-mail: Dennis.Butt@forces.gc.ca [Royal Military College of Canada, Dept. of Chemistry and Chemical Engineering, Kingston, Ontario (Canada); Underhill, P.R.; Krause, T.W., E-mail: Thomas.Krause@rmc.ca [Royal Military College of Canada, Dept. of Physics, Kingston, Ontario (Canada)
2016-09-15
Ageing aircraft are susceptible to fatigue cracks at bolt hole locations in multi-layer aluminum wing lap-joints due to cyclic loading conditions experienced during typical aircraft operation, Current inspection techniques require removal of fasteners to permit inspection of the second layer from within the bolt hole. Inspection from the top layer without fastener removal is desirable in order to minimize aircraft downtime while reducing the risk of collateral damage. The ability to detect second layer cracks without fastener removal has been demonstrated using a pulsed eddy current (PEC) technique. The technique utilizes a breakdown of the measured signal response into its principal components, each of which is multiplied by a representative factor known as a score. The reduced data set of scores, which represent the measured signal, are examined for outliers using cluster analysis methods in order to detect the presence of defects. However, the cluster analysis methodology is limited by the fact that a number of representative signals, obtained from fasteners where defects are not present, are required in order to perform classification of the data. Alternatively, blind outlier detection can be achieved without having to obtain representative defect-free signals, by using a modified smallest half-volume (MSHV) approach. Results obtained using this approach suggest that self-calibrating blind detection of cyclic fatigue cracks in second layer wing structures in the presence of ferrous fasteners is possible without prior knowledge of the sample under test and without the use of costly calibration standards. (author)
Patching rainfall data using regression methods. 3. Grouping, patching and outlier detection
Pegram, Geoffrey
1997-11-01
Rainfall data are used, amongst other things, for augmenting or repairing streamflow records in a water resources analysis environment. Gaps in rainfall records cause problems in the construction of water-balance models using monthly time-steps, when it becomes necessary to estimate missing values. Modest extensions are sometimes also desirable. It is also important to identify outliers as possible erroneous data and to group data which are hydrologically similar in order to accomplish good patching. Algorithms are described which accomplish these tasks using the covariance biplot, multiple linear regression, singular value decomposition and the pseudo-Expectation-Maximization algorithm.
Using Innovative Outliers to Detect Discrete Shifts in Dynamics in Group-Based State-Space Models
Chow, Sy-Miin; Hamaker, Ellen L.; Allaire, Jason C.
2009-01-01
Outliers are typically regarded as data anomalies that should be discarded. However, dynamic or "innovative" outliers can be appropriately utilized to capture unusual but substantively meaningful shifts in a system's dynamics. We extend De Jong and Penzer's 1998 approach for representing outliers in single-subject state-space models to a…
Wang, Rui; Li, Chao; Wang, Jie; Wei, Xiaoer; Li, Yuehua; Hui, Chun; Zhu, Yuemin; Zhang, Su
2014-12-01
White matter lesions (WMLs) are commonly observed on the magnetic resonance (MR) images of normal elderly in association with vascular risk factors, such as hypertension or stroke. An accurate WML detection provides significant information for disease tracking, therapy evaluation, and normal aging research. In this article, we present an unsupervised WML segmentation method that uses Gaussian mixture model to describe the intensity distribution of the normal brain tissues and detects the WMLs as outliers to the normal brain tissue model based on extreme value theory. The detection of WMLs is performed by comparing the probability distribution function of a one-sided normal distribution and a Gumbel distribution, which is a specific extreme value distribution. The performance of the automatic segmentation is validated on synthetic and clinical MR images with regard to different imaging sequences and lesion loads. Results indicate that the segmentation method has a favorable accuracy competitive with other state-of-the-art WML segmentation methods.
Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.
2017-06-01
Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.
Petersen, D; Naveed, P; Ragheb, A; Niedieker, D; El-Mashtoly, S F; Brechmann, T; Kötting, C; Schmiegel, W H; Freier, E; Pox, C; Gerwert, K
2017-06-15
Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples. Copyright
Rough K-means Outlier Factor Based on Entropy Computation
Djoko Budiyanto Setyohadi
2014-07-01
Full Text Available Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are assumed to be similar. The outlier objects intuitively can be grouped into the outlier clusters and the outlier factors of each object within the outlier cluster should be different gradually. It is not natural if the outlierness of each object within outlier cluster is similar. This study proposes the new outlier detection method based on the hybrid of the Rough K-Means clustering algorithm and the entropy computation. We introduce the outlier degree measure namely the entropy outlier factor for the cluster based outlier detection. The proposed algorithm sequentially finds the outlier cluster and calculates the outlier factor degree of the objects within outlier cluster. Each object within outlier cluster is evaluated using entropy cluster-based to a whole cluster. The performance of the algorithm has been tested on four UCI benchmark data sets and show outperform especially in detection rate.
Penalized unsupervised learning with outliers.
Witten, Daniela M
2013-01-01
We consider the problem of performing unsupervised learning in the presence of outliers - that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an "error" term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations' errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored.
Lydia Hopp
2013-12-01
Full Text Available We present an analytic framework based on Self-Organizing Map (SOM machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics.
Changhao Piao
2015-01-01
Full Text Available A novel cell-balancing algorithm which was used for cell balancing of battery management system (BMS was proposed in this paper. Cell balancing algorithm is a key technology for lithium-ion battery pack in the electric vehicle field. The distance-based outlier detection algorithm adopted two characteristic parameters (voltage and state of charge to calculate each cell’s abnormal value and then identified the unbalanced cells. The abnormal and normal type of battery cells were acquired by online clustering strategy and bleeding circuits (R = 33 ohm were used to balance the abnormal cells. The simulation results showed that with the proposed balancing algorithm, the usable capacity of the battery pack increased by 0.614 Ah (9.5% compared to that without balancing.
一种基于相似度量的离群点检测方法%A Kind of Outlier Detection Algorithm Based on Similarity Measurement
孙启林; 方宏彬; 张健; 刘明术
2012-01-01
Outlier detection is an important content in data mining and is widely used in the field of credit card fraud detection, network invasion detection and so on. According to hierarchical clustering and similarity, this paper presents the concept of high dimensional data similarity measurement function and class density, based on class density,the outlier of high dimensional data is redefined so that a kind of outlier detection algorithm based on similarity measurement is proposed. Experiment shows that this algorithm has certain value on outlier detection in high dimensional data.%离群点检测在是数据挖掘的重要领域,广泛应用在信用卡欺诈检测、网络入侵检测等重要方面,文中在结合层次聚类和相似性,给出高维数据的相似度量函数与类密度的概念,并基于类密度重新定义高维数据的离群点,从而提出一种基于相似度量的离群点检测算法;实验表明：算法对高维数据中的离群点检测有一定的价值。
Detecting Outliers in Marathon Data by Means of the Andrews Plot
Stehlík, Milan; Wald, Helmut; Bielik, Viktor; Petrovič, Juraj
2011-09-01
For an optimal race performance, it is important, that the runner keeps steady pace during most of the time of the competition. First time runners or athletes without many competitions often experience an "blow out" after a few kilometers of the race. This could happen, because of strong emotional experiences or low control of running intensity. Competition pace of half marathon of the middle level recreational athletes is approximately 10 sec quicker than their training pace. If an athlete runs the first third of race (7 km) at a pace that is 20 sec quicker than is his capacity (trainability), he would experience an "blow out" in the last third of the race. This would be reflected by reducing the running intensity and inability to keep steady pace in the last kilometers of the race and in the final time as well. In sports science, there are many diagnostic methods ([3], [2], [6]) that are used for prediction of optimal race pace tempo and final time. Otherwise there is lacking practical evidence of diagnostics methods and its use in the field (competition, race). One of the conditions that needs to be carried out is that athletes have not only similar final times, but it is important that they keep constant pace as much as possible during whole race. For this reason it is very important to find outliers. Our experimental group consisted of 20 recreational trained athletes (mean age 32,6 years±8,9). Before the race the athletes were instructed to run on the basis of their subjective feeling and previous experience. The data (running pace of each kilometer, average and maximal heart rate of each kilometer) were collected by GPS-enabled personal trainer Forerunner 305.
Outlier Rejecting Multirate Model for State Estimation
无
2006-01-01
Wavelet transform was introduced to detect and eliminate outliers in time-frequency domain. The outlier rejection and multirate information extraction were initially incorporated by wavelet transform, a new outlier rejecting multirate model for state estimation was proposed. The model is applied to state estimation with interacting multiple model, as the outlier is eliminated and more reasonable multirate information is extracted, the estimation accuracy is greatly enhanced. The simulation results prove that the new model is robust to outliers and the estimation performance is significantly improved.
COMMUNITY OUTLIER DETECTION BASED ON OVERLAPPING MODULARITY%基于重叠模块度的社区离群点检测
封海岳; 薛安荣
2013-01-01
社区离群点是结合数据的社区特性和自身属性挖掘得到的一种特殊离群点.针对现有社区离群点检测算法忽略社区间的重叠现象而导致社区划分不准确的问题,提出一种将对象的特征属性引入到相似度和重叠模块度的计算中的社区离群点检测方法.首先根据节点间的相似度对节点进行聚类,然后根据重叠模块度的变化进行迭代聚类,多次聚类后选取重叠模块度最大的作为划分结果,最终根据特征属性的偏离程度来确定社区离群点,从而解决重叠社区中社区离群点的检测问题.实验结果表明,提出的算法不仅能准确地发现重叠社区而且能有效地检测社区离群点.%Community outlier is a special outlier that is acquired by mining in combination with the community characteristics of data and the attributes of data its own.Existing community outlier detection algorithm ignores the phenomenon of overlapping between the communities,and this results in inaccurate community division.Aiming at this problem,we put forward a new community outlier detection method,which introduces the intrinsic property of the object to the calculation of similarity and overlapping modularity.First,clustering is conducted according to the similarity between the nodes,then according to the variation of the overlapping modularity,iterative clustering is carried out；after times of clustering,the one with largest overlapping modularity is selected as the division result,and the community outliers can be eventually determined according to the deviation degree of the intrinsic property,thus this has solved the problem of community outlier detection from overlapping communities.Experimental results show that the proposed algorithm can detect the overlapping communities accurately,and can also detect the community outliers effectively.
Pham, Ninh Dang; Pagh, Rasmus
2012-01-01
Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest...... neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random...... projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality...
Implementation of Network Intrusion Detection System Based on Density-based Outliers Mining
Huang,Guangqiu; Peng,Xuyou; LV,Dingquan
2005-01-01
The paper puts forward a new method of densitybased anomaly data mining, the method is used to design the engine of network intrusion detection system (NIDS), thus a new NIDS is constructed based on the engine. The NIDS can find new unknown intrusion behaviors, which are used to updated the intrusion rule-base, based on which intrusion detections can be carried out online by the BM pattern match algorithm. Finally all modules of the NIDS are described by formalized language.
Outlier-based Health Insurance Fraud Detection for U.S. Medicaid Data
Thornton, Dallas; van Capelleveen, Guido; Poel, Mannes; van Hillegersberg, Jos; Mueller, Roland
Fraud, waste, and abuse in the U.S. healthcare system are estimated at $700 billion annually. Predictive analytics offers government and private payers the opportunity to identify and prevent or recover such billings. This paper proposes a data-driven method for fraud detection based on comparative
Damage Detection in an Operating Vestas V27 Wind Turbine Blade by use of Outlier Analysis
Ulriksen, Martin Dalgaard; Tcherniak, Dmitri; Damkilde, Lars
2015-01-01
The present paper explores the application of a well-established vibration-based damage detection method to an operating Vestas V27 wind turbine blade. The blade is analyzed in a total of four states, namely, a healthy one plus three damaged ones in which trailing edge openings of increasing sizes...
Outlier Detection of Single Sensor based on Sliding Windows%基于滑动窗口的单传感器数据异常检测
龙滢; 裘晓峰
2014-01-01
数据质量是物联网发展所面临的重大挑战，数据异常检测能实现数据质量提升与潜在信息挖掘。在智能家居等小型物联网场景中数据空间相关性严重不足，因此只能利用时间相关性实现对单传感器数据的异常检测。本文给出基于距离的滑动窗口异常检测算法，通过只处理新加入和刚离开窗口的数据降低时间复杂度，只存储数据对象的 k 个邻居以降低空间复杂度。此外，本文根据滑动窗口内局部异常和全局异常的定义，设计异常检测的处理流程，并借助真实数据实现算法仿真，以检测率 DR 和误检率 FR 为检测指标分析参数对检测结果的影响。从仿真结果可知，该算法能实现较好的检测效果，局部异常检测能保证高 DR，全局异常检测能保证低 FR。%Data quality is a major challenge of the web of things(WoT), data quality can be improved and the underlying information can be mined by detecting outlier in WoT data. The data spatial correlation is serious shortage in some small scale scenarios, such as smart home, it can only use the time correlation for single sensor data outlier detection. In this paper, a detection algorithm based on the distance outlier for sliding windows was given, the time complexity of the algorithm was reduced by only handling the new input instance and leaving instance, moreover, by just storing the k neighbors of the in-stance, the space complexity was reduced. Besides, based on the definition of local outlier and global outlier in sliding win-dows, this paper designed the process of outlier detection. The algorithm was simulated by using the real data from the smart home demo scenario, the detection rate (DR) and false alarm rate (FR) were the detection index of the algorithm to analyze the parameters affect of the detection results. The simulation results show that the algorithm can reach better detec-tion results, local outlier detection
基于离群点的前沿趋势探测方法研究%Study of scientific frontier detection based on outliers
张英杰; 冷伏海
2011-01-01
In view of the low-profile characteristics of the scientific frontier detection based on keyword clusters, the outliers outside high-frequency keyword clusters were taken as the research targets, and a WOS (web of science) data-set on the international space station (ISS) research from 2006 to 2010 was selected for the study. After constructing the matrixes of inclusion index, proximity index and equivalence index, the outliers were detected using the excellent selection algorithm based on clustering in the fields of human research, gravitational biology, space station technology, gravitational physics, earth observation, space astronomy, and so on. Those hot research frontiers were distinguished by comparing this result with that achieved using the citespace II, a high-frequency word method. It was found that the outliers, such as mars, space weather, etc. , were the same detected results, which present the current hot research frontiers. The proposed outlier method was proved to be effective in representing the hot research topics of the ISS. At last, the clustering outliers were visualized by the Matlab for a clear look of the outlier evolution.%针对科学前沿探测中的低频现特征,选取Web of Science 2006 -2010年的国际空间站(ISS)数据集,以高频聚类词簇外的低频现离群点为研究对象,分别构造数据集的包容指数、临近指数和等价指数矩阵,通过基于聚类的优选离群点算法,在ISS人体研究、微重力生物学、空间站技术、微重力物理学、对地观测、空间天文学等领域分别识别若干离群点,在与CitespaceⅡ的高频爆发词探测结果进行比对后,发现mars、space weather等离群点内容与之重合,代表了当前的热点研究前沿,验证了基于离群点开展前沿探测的有效性.为了更直观地跟踪查看各个离群点的演变情况,利用Matlab工具对聚类结果进行了可视化展示.
多尺度点云噪声检测的密度分析法%Hierarchical Outlier Detection for Point Cloud Data Using a Density Analysis Method
朱俊锋; 胡翔云; 张祖勋; 熊小东
2015-01-01
Laser scanning and image matching are both effective ways to get dense point cloud data , however ,outliers obtained from both ways are still inevitable .A novel hierarchical outlier detection method is proposed for the automatic outlier detection of point cloud from image matching and airborne laser scanning .There are two main steps in this method .Firstly ,the hierarchical density estimation is used to remove single and small cluster outliers .Then a progressive TIN method is used to find non‐outliers removed in the previous steps .The experimental results indicate the effectiveness of this method in dealing with the two types of points cloud data .And this method can also handle low quality point cloud data from image matching .The quantitative analysis shows that the outlier detection rate is higher than 97% .%当前机载激光雷达数据和影像匹配得到的点云是密集点云数据的两类主要来源，但都不可避免存在着噪声点。本文提出一种新的点云去噪算法，可适用于这两类数据中所包含的噪声点的去除。算法主要包括两步：第1步利用多尺度的密度算法去除孤立噪声和小的簇状噪声；第2步利用三角网约束将第1步中误检测为噪声的点重新归为正常点。针对真实数据进行了剔噪试验，结果表明本文提出的基于密度分析的多尺度噪声检测算法对孤立噪声和簇状噪声都有较为效，且对于质量较差的影像匹配点云的检测也能有效处理。本文算法检测率达到97％以上。
A Note on the Vogelsang Test for Additive Outliers
Haldrup, Niels; Sansó, Andreu
The role of additive outliers in integrated time series has attractedsome attention recently and research shows that outlier detection shouldbe an integral part of unit root testing procedures. Recently, Vogelsang(1999) suggested an iterative procedure for the detection of multiple additiveoutliers...... to be taken to detect outliers innonstationary time series....
Qiu, Yunping; Moir, Robyn; Willis, Ian; Beecher, Chris; Tsai, Yu-Hsuan; Garrett, Timothy J; Yost, Richard A; Kurland, Irwin J
2016-03-01
Isotopic ratio outlier analysis (IROA) is a (13)C metabolomics profiling method that eliminates sample to sample variance, discriminates against noise and artifacts, and improves identification of compounds, previously done with accurate mass liquid chromatography/mass spectrometry (LC/MS). This is the first report using IROA technology in combination with accurate mass gas chromatography/time-of-flight mass spectrometry (GC/TOF-MS), here used to examine the S. cerevisiae metabolome. S. cerevisiae was grown in YNB media, containing randomized 95% (13)C, or 5%(13)C glucose as the single carbon source, in order that the isotopomer pattern of all metabolites would mirror the labeled glucose. When these IROA experiments are combined, the abundance of the heavy isotopologues in the 5%(13)C extracts, or light isotopologues in the 95%(13)C extracts, follows the binomial distribution, showing mirrored peak pairs for the molecular ion. The mass difference between the (12)C monoisotopic and the (13)C monoisotopic equals the number of carbons in the molecules. The IROA-GC/MS protocol developed, using both chemical and electron ionization, extends the information acquired from the isotopic peak patterns for formulas generation. The process that can be formulated as an algorithm, in which the number of carbons, as well as the number of methoximations and silylations are used as search constraints. In electron impact (EI/IROA) spectra, the artifactual peaks are identified and easily removed, which has the potential to generate "clean" EI libraries. The combination of chemical ionization (CI) IROA and EI/IROA affords a metabolite identification procedure that enables the identification of coeluting metabolites, and allowed us to characterize 126 metabolites in the current study.
Piñeiro Di Blasi, J I; Martínez Torres, J; García Nieto, P J; Alonso Fernández, J R; Díaz Muñiz, C; Taboada, J
2015-01-01
The purposes and intent of the authorities in establishing water quality standards are to provide enhancement of water quality and prevention of pollution to protect the public health or welfare in accordance with the public interest for drinking water supplies, conservation of fish, wildlife and other beneficial aquatic life, and agricultural, industrial, recreational, and other reasonable and necessary uses as well as to maintain and improve the biological integrity of the waters. In this way, water quality controls involve a large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using turbidity, conductivity and ammonium ion as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. This innovative approach lies in considering water quality monitoring over time as continuous curves instead of discrete points, that is to say, the dataset of the problem are considered as a time-dependent function and not as a set of discrete values in different time instants. This new methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in the Nalón river basin with success. Results of this study were discussed here in terms of origin, causes, etc. Finally, the conclusions as well as advantages of
de Vienne, Damien M; Ollier, Sébastien; Aguileta, Gabriela
2012-06-01
Full genome data sets are currently being explored on a regular basis to infer phylogenetic trees, but there are often discordances among the trees produced by different genes. An important goal in phylogenomics is to identify which individual gene and species produce the same phylogenetic tree and are thus likely to share the same evolutionary history. On the other hand, it is also essential to identify which genes and species produce discordant topologies and therefore evolve in a different way or represent noise in the data. The latter are outlier genes or species and they can provide a wealth of information on potentially interesting biological processes, such as incomplete lineage sorting, hybridization, and horizontal gene transfers. Here, we propose a new method to explore the genomic tree space and detect outlier genes and species based on multiple co-inertia analysis (MCOA), which efficiently captures and compares the similarities in the phylogenetic topologies produced by individual genes. Our method allows the rapid identification of outlier genes and species by extracting the similarities and discrepancies, in terms of the pairwise distances, between all the species in all the trees, simultaneously. This is achieved by using MCOA, which finds successive decomposition axes from individual ordinations (i.e., derived from distance matrices) that maximize a covariance function. The method is freely available as a set of R functions. The source code and tutorial can be found online at http://phylomcoa.cgenomics.org.
Kracht, Oliver; Reuter, Hannes I.; Gerboles, Michel
2013-04-01
We present a consolidated screening tool for the detection of outliers in air quality monitoring data, which considers both attribute values and spatio-temporal relationships. Furthermore, an application example of warnings on abnormal values in time series of PM10 datasets in AirBase is presented. Spatial or temporal outliers in air quality datasets represent stations or individual measurements which differ significantly from other recordings within their spatio-temporal neighbourhood. Such abnormal values can be identified as being extreme compared to their neighbours, even though they do not necessarily require to differ significantly from the statistical distribution of the entire population. The identification of such outliers can be of interest as the basis of data quality control systems when several contributors report their measurements to the collection of larger datasets. Beyond this, it can also provide a simple solution to investigate the accuracy of station classifications. Seen from another viewpoint, it can be used as a tool to detect irregular air pollution emission events (e.g. the influence of fires, wind erosion events, or other accidental situations). The presented procedure for outlier detection was designed based on already existing literature. Specifically, we adapted the "Smooth Spatial Attribute Method" that was first developed for the identification of outlier values in networks of traffic sensors [1]. Since a free and extensible simulation platform was considered important, all codes were prototyped in the R environment which is available under the GNU General Public License [2]. Our algorithms are based on the definition of a neighbourhood for each air quality measurement, corresponding to a spatio-temporal domain limited by time (e.g., +/- 2 days) and distance (e.g., +/- 1 spherical degrees) around the location of ambient air monitoring stations. The objective of the method is that within such a given spatio-temporal domain, in which
Square symmetric neighborhood based local outlier detection algorithm%基于方形对称邻域的局部离群点检测方法
揭财明; 刘慧君; 朱庆生
2012-01-01
针对NDOD(outlier detection algorithm based on neighborhood and density)算法在判断具有不同密度分布的聚类间过渡区域对象时存在的不足,以及为了降低算法时间复杂度,提出一种基于方形对称邻域的局部离群点检测方法.该算法改用方形邻域,吸收基于网格的思想,通过扩张方形邻域快速排除聚类点及避免“维灾”；通过引入记忆思想,使得邻域查询次数及范围成倍地减小；同时新定义的离群度度量方法有利于提高检测精度.实验测试表明,该算法检测离群点的速度及精度均优于NDOD等算法.%NDOD may result in wrong estimation when objects are in the location where the density distributions in multiple clusters are significantly different. To void this problem and reduce the computational complexity, this paper proposed a new density based algorithm named SSNOD(square symmetric neighborhood based local outlier detection algorithm). By utilizing the grid-based idea, the algorithm partitioned dataset with square neighborhood and expaned neighborhood rapidly, it could get rid of non-outliers quickly and overcome "dimension curse". By absorbing memory idea, the times of neighborhood query and range were significantly decreased. Besides, computation accuracy could be improved within the novel metrics. Experimental result shows SSNOD is not only efficient in the computation but also more effective than NDOD in detection accuracy.
Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator
Johansen, Søren; Nielsen, Bent
2013-01-01
In regression we can delete outliers based upon a preliminary estimator and reestimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber......-skip estimator. We provide a stochastic recursion equation for the estimation error in terms of a kernel, the previous estimation error and a uniformly small error term. The main contribution is the analysis of the solution of the stochastic recursion equation as a fixed point, and the results...
Continuous Outlier Monitoring on Uncertain Data Streams
曹科研; 王国仁; 韩东红; 丁国辉; 王爱侠; 石凌旭
2014-01-01
Outlier detection on data streams is an important task in data mining. The challenges become even larger when considering uncertain data. This paper studies the problem of outlier detection on uncertain data streams. We propose Continuous Uncertain Outlier Detection (CUOD), which can quickly determine the nature of the uncertain elements by pruning to improve the efficiency. Furthermore, we propose a pruning approach - Probability Pruning for Continuous Uncertain Outlier Detection (PCUOD) to reduce the detection cost. It is an estimated outlier probability method which can effectively reduce the amount of calculations. The cost of PCUOD incremental algorithm can satisfy the demand of uncertain data streams. Finally, a new method for parameter variable queries to CUOD is proposed, enabling the concurrent execution of different queries. To the best of our knowledge, this paper is the first work to perform outlier detection on uncertain data streams which can handle parameter variable queries simultaneously. Our methods are verified using both real data and synthetic data. The results show that they are able to reduce the required storage and running time.
Dynamic outlier detection for process control time series%过程控制时间序列中异常值的动态检测
刘芳; 毛志忠
2012-01-01
针对传统小波异常值检测方法的不足以及控制调节系统在调节阶段采集的震荡数据所具有的特点,提出了适用于调节系统震荡数据异常检测的自回归模型（auto-regression,AR）与小波相结合的在线异常值检测方法.该方法通过引入改进的鲁棒AR模型,克服了传统小波分析方法检测控制过程数据异常值时存在的不足;为了避免传统异常值检测方法需要事先设定检测阈值的问题,算法引入隐马尔科夫模型（hidden Markov model,HMM）来分析小波系数,并在线更新HMM参数,提高了算法的检测精度.通过实验与应用证明了本文提出的异常数据检测方法更适合震荡的控制过程数据,具有一定的实用性.%To improve the traditional outlier-detection by using wavelet analysis method and to deal with the instability characteristic of data from regulatory control process,we propose an improved outlier-detection method.This method combines an improved robust auto-regression（AR） model with the wavelet analysis method to eliminate the deficiency of the wavelet method in outlier-detection.To avoid the requirement of a pre-selected threshold value in the traditional method,we introduce the hidden Markov model（HMM） which analyzes the wavelet coefficients and updates online the coefficient values to improve the detection precision.Experiments and applications show that this method is especially suitable to oscillatory data in control processes.
Manifold Outlier Detection Algorithm Based on Local-Correlation Dimension%基于局部相关维度的流形离群点检测算法
黄添强; 李凯; 郭躬德
2011-01-01
传统的离群点检测算法不适合检测流形离群点,目前专门针对流形离群点检测的算法报道较少.为此,基于实验观察的启示,提出用流形局部相关维度检测流形离群点的算法.首先探讨内在维度的性质,并基于实验观察提出用流形局部相关维度来度量流形离群点,然后证明流形局部相关维度可表征数据样本离群的性质,最后基于此性质提出流形离群点检测算法.在人工数据与真实数据上的实验表明本算法可检测流形离群点,且本算法比最近报道的流形除噪算法具有更优的性能.%Traditional outlier detection algorithm is not suitable for detection of manifold outlier. There are reports of denoising algorithm for manifold learning, but fewer reports of manifold outlier detection algorithms. Therefore, the manifold outlier detection algorithm is proposed based on the local-correlation dimension according to experimental observations. Firstly, the nature of the intrinsic dimension is discussed, and the local-correlation dimension is used to measure the manifold outlier, which is based on experimental observations. And then it is proved that the nature of outliers on manifolds can be characterized by local-correlation dimension. Finally, the manifold outlier detection algorithm based on local-correlation dimension is proposed according to the nature. The performance evaluation of the artificial data and the real data shows that the algorithm can detect manifold outliers and it has better performance than the recently reported manifold blurry mean shif algorithm.
Outlier free Real Estate Predictive Model
Geetali Banerji
2015-11-01
Full Text Available Studying human behaviour is a difficult task for many reasons:-The reasons are ranging from an intentional task to a cognitive sign, which are processes other than those of interest occurring at the same time. The processes can be psychological, social constraint and all cognitive. The processes operated in the background and have either some time no effect on the collective data or they may reflect the results occasionally. All those undesired behaviours produce measurable responses sometimes happens to be correct by chance. Some responses however may attract attention due to their unusual aspects, denoted as outliers. If not properly handled the outliers in the design phase the resultants may affect the resultant inferences or the experimental outcome at initial stage. Thus, it is required to treat the outliers before it is too late in the design phase itself. The influence of outliers is more importance if the sample size is small with the examined statistics, which is less robust. In this paper, we have detected the outlier patterns in the Real Estate era and removed it up to a great degree, which not only reflects the drastic change in the results but also improves the rules formation. Thus, we provide a structure and comprehensive overview of the research on outlier detection.
Outlier free Real Estate Predictive Model
Geetali Banerji
2014-06-01
Full Text Available Studying human behaviour is a difficult task for many reasons:-The reasons are ranging from an intentional task to a cognitive sign, which are processes other than those of interest occurring at the same time. The processes can be psychological, social constraint and all cognitive. The processes operated in the background and have either some time no effect on the collective data or they may reflect the results occasionally. All those undesired behaviours produce measurable responses sometimes happens to be correct by chance. Some responses however may attract attention due to their unusual aspects, denoted as outliers. If not properly handled the outliers in the design phase the resultants may affect the resultant inferences or t he experimental outcome at initial stage. Thus, it is required to treat the outliers before it is too late in the design phase itself. The influence of outliers is more importance if the sample size is small with the examined statistics, which is less robust. In this paper, we have detected the outlier patterns in the Real Estate era and removed it up to a great degree, which not only reflects the drastic change in the results but also improves the rules formation. Thus, we provide a structure and comprehensive overview of the research on outlier detection.
Iterative Outlier Removal: A Method for Identifying Outliers in Laboratory Recalibration Studies.
Parrinello, Christina M; Grams, Morgan E; Sang, Yingying; Couper, David; Wruck, Lisa M; Li, Danni; Eckfeldt, John H; Selvin, Elizabeth; Coresh, Josef
2016-07-01
Extreme values that arise for any reason, including those through nonlaboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990-1992 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was remeasured in 200 participants from stored plasma in 2011-2013 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points >3 SDs from the mean difference. We continued this process using the resulting data until no outliers remained. IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After 4 iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers [mean difference (SD) = 1.22 (0.51) mg/dL]. Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (>7 mg/dL) was 28.5% before IOR and 8.5% after IOR. IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach. © 2016 American Association for Clinical Chemistry.
Ped_Outlier software for automatic identification of within-family outliers.
Zorkoltseva, Irina V; Aulchenko, Yurii S; van Duijn, Cornelia M; Axenovich, Tatiana I
2010-08-01
A high-throughput resequencing technology has brought family based studies back into genetic research focus. Within-family outliers (the individuals whose phenotype is very much unlike the phenotype of relatives) may carry rare variants of large effects and thus resequencing of these provides a highly powered strategy for rare variants detection. On the other hand, such outliers may complicate search for common variants of smaller effects, because they may obscure a real linkage signal. We have developed a program Ped_Outlier allowing automatic detection of within-family outliers in a sample of pedigrees of arbitrary structure and size. We tested our program by identification of within-family outliers for adult height and intracranial volume in large pedigree. Results of linkage analysis of these traits demonstrated that identification of within-family outliers is one of the important steps of pedigree analysis. The program Ped_outlier is freely available at http://mga.bionet.nsc.ru/soft/index.html. Copyright © 2010 Elsevier Ltd. All rights reserved.
Mohamed B. El Mashade
2014-01-01
This paper addresses the problem of detecting the partially-correlated χ2 fluctuating targets with two and four degrees of freedom. It presents the performance analysis, in its exact form, of GTM-CFAR processor when the operating environment is contaminated with extraneous targets and the radar receiver post-detection integrates M pulses of exponentially correlated targets. Mathematical formulas for the detection and false alarm probabilities are derived, in the absence as well as in the pres...
I. Arismendi
2014-05-01
Full Text Available Central tendency statistics may not capture relevant or desired characteristics about the variability of continuous phenomena and thus, they may not completely track temporal patterns of change. Here, we present two methodological approaches to identify long-term changes in environmental regimes. First, we use higher statistical moments (skewness and kurtosis to examine potential changes of empirical distributions at decadal scale. Second, we adapt an outlier detection procedure combining a non-metric multidimensional scaling technique and higher density region plots to detect anomalous years. We illustrate the use of these approaches by examining long-term stream temperature data from minimally and highly human-influenced streams. In particular, we contrast predictions about thermal regime responses to changing climates and human-related water uses. Using these methods, we effectively diagnose years with unusual thermal variability, patterns in variability through time, and spatial variability linked to regional and local factors that influence stream temperature. Our findings highlight the complexity of responses of thermal regimes of streams and reveal a differentiated vulnerability to both the climate warming and human-related water uses. The two approaches presented here can be applied with a variety of other continuous phenomena to address historical changes, extreme events, and their associated ecological responses.
Micenková, Barbora; McWilliams, Brian; Assent, Ira
into the existing unsupervised algorithms. In this paper, we show how to use powerful machine learning approaches to combine labeled examples together with arbitrary unsupervised outlier scoring algorithms. We aim to get the best out of the two worlds—supervised and unsupervised. Our approach is also a viable...
Andrade, Monica de Carvalho Vasconcelos
2004-07-01
This work presents and discusses the neural network technique aiming at the detection of outliers on a set of gas centrifuge isotope separation experimental data. In order to evaluate the application of this new technique, the result obtained of the detection is compared to the result of the statistical analysis combined with the cluster analysis. This method for the detection of outliers presents a considerable potential in the field of data analysis and it is at the same time easier and faster to use and requests very less knowledge of the physics involved in the process. This work established a procedure for detecting experiments which are suspect to contain gross errors inside a data set where the usual techniques for identification of these errors cannot be applied or its use/demands an excessively long work. (author)
基于Isomap算法的恒星光谱离群点挖掘%Stellar Spectral Outliers Detection Based on Isomap
卜育德; 潘景昌; 陈福强
2014-01-01
How to find the spectra misclassified by traditional methods is the key problem that has been widely studied by the ex-perts of astronomical data processing .We found that Isomap algorithm performs well for this problem .By comparing the per-formance of Isomap with that of principal component analysis (PCA) ,we found that (1) Isomap can project the spectra with similar features together and project the spectra with different features far away ,while PCA may project the spectra with differ-ent features into nearby regions ;(2) the outliers given by Isomap can be easily determined ,and most of the outliers are binary stars with high scientific values ;while the outliers given by PCA are difficult to determine and most of outliers are not binary stars .Thus ,Isomap is more efficient than PCA in finding the outliers .Since the spectral data used in experiment are the spectra from the ninth data release of Sloan Digital Sky Survey (SDSS DR9) ,Isomap can find the spectra misclassified by SDSS pipeline efficiently and improve the classification accuracy obviously .Furthermore ,since most of the spectra misclassified by SDSS pipe-line are binary stars ,Isomap can improve the efficiency of finding the binary stars with high scientific values .Though the experi-ment results show that Isomap is more sensitive to the noise than PCA ,this disadvantage will not affect the application of Isomap in spectral classification since most of the spectra with low signal-to-noise ratios are the spectra whose spectral type can't be de-termined manually .%如何从已分类的海量光谱中发现被错分的光谱一直是天文数据处理专家重点研究的问题，探讨的Isomap算法在该问题方面有很好的表现。通过Isomap算法与主成分分析方法（PCA ）算法的实验结果对比发现：（1）PCA将具有不同特征的光谱投影到邻近的区域，而Isomap算法却可以将具有相似特征的光谱投影到邻近区域，而将具有不同特征的光
Mohamed B. El Mashade
2014-10-01
Full Text Available This paper addresses the problem of detecting the partially-correlated χ2 fluctuating targets with two and four degrees of freedom. It presents the performance analysis, in its exact form, of GTM-CFAR processor when the operating environment is contaminated with extraneous targets and the radar receiver post-detection integrates M pulses of exponentially correlated targets. Mathematical formulas for the detection and false alarm probabilities are derived, in the absence as well as in the presence of spurious targets which are fluctuating in accordance with the so-called moderately fluctuating χ2 targets. A thorough performance assessment by several numerical examples, which has considered the role that each parameter can play in the processor performance, is also given. The results show that the processor performance improves, for weak SNR of the primary target, as the correlation coefficient ρs increases and this occurs either in the absence or in the presence of outlying targets. As the strength of the target return increases, the processor tends to invert this behavior. The SWI & SWII and SWIII & SWIV models enclose the correlated target cases when the target correlation follows χ2 fluctuation models with two and four degrees of freedom, respectively, and this behavior is common for all GTM based detectors.
Tabatabaee, Hamidreza; Ghahramani, Fariba; Choobineh, Alireza; Arvinfar, Mona
2016-01-01
Teacher evaluation, as an important strategy for improving the quality of education, has been considered by universities and leads to a better understanding of the strengths and weaknesses of education. Analysis of instructors' scores is one of the main fields of educational research. Since outliers affect analysis and interpretation of information processes both structurally and conceptually, understanding the methods of detecting outliers in collected data can be helpful for scholars, data analysts, and researchers. The present study aimed to present and compare the available techniques for detecting outliers. In this cross-sectional study, the statistical population included the evaluation forms of instructors completed by the students of Shiraz School of Health in the first and second semesters of the academic year 2012-2013. All the forms related to these years (N=1317) were entered into analysis through census. Then, four methods (Dixon, Gauss, Grubb, and Graphical methods) were used for determining outliers. Kappa coefficient was also used to determine the agreement among the methods. In this study 1317 forms were completed by 203 undergraduate and 1114 postgraduate students. The mean scores given by undergraduates and postgraduates were 17.24±3.04 and 18.90±1.82, respectively. The results showed that Dixon and Grubb were the most appropriate methods to determine the outliers of evaluation scores in small samples, because they had appropriate agreement. On the other hand, NPP and QQ plot were the most appropriate methods in large samples. The results showed that each of the studied methods could help us, in some way, determine outliers. Researchers and analysts who intend to select and use the methods must first review the observations with the help of descriptive information and overview of the distribution. Determination of outliers is important in evaluation of instructors, because by determining the outliers and removing the data that might have been
Boucher, Christina; Ma, Bin
2011-02-15
Given n strings s1, …, sn each of length ℓ and a nonnegative integer d, the CLOSEST STRING problem asks to find a center string s such that none of the input strings has Hamming distance greater than d from s. Finding a common pattern in many--but not necessarily all--input strings is an important task that plays a role in many applications in bioinformatics. Although the closest string model is robust to the oversampling of strings in the input, it is severely affected by the existence of outliers. We propose a refined model, the closest string with outliers (CSWO) problem, to overcome this limitation. This new model asks for a center string s that is within Hamming distance d to at least n - k of the n input strings, where k is a parameter describing the maximum number of outliers. A CSWO solution not only provides the center string as a representative for the set of strings but also reveals the outliers of the set.We provide fixed parameter algorithms for CSWO when d and k are parameters, for both bounded and unbounded alphabets. We also show that when the alphabet is unbounded the problem is W[1]-hard with respect to n - k, ℓ, and d. Our refined model abstractly models finding common patterns in several but not all input strings. We initialize the study of the computability of this model and show that it is sensitive to different parameterizations. Lastly, we conclude by suggesting several open problems which warrant further investigation.
An MEF-Based Localization Algorithm against Outliers in Wireless Sensor Networks.
Wang, Dandan; Wan, Jiangwen; Wang, Meimei; Zhang, Qiang
2016-07-07
Precise localization has attracted considerable interest in Wireless Sensor Networks (WSNs) localization systems. Due to the internal or external disturbance, the existence of the outliers, including both the distance outliers and the anchor outliers, severely decreases the localization accuracy. In order to eliminate both kinds of outliers simultaneously, an outlier detection method is proposed based on the maximum entropy principle and fuzzy set theory. Since not all the outliers can be detected in the detection process, the Maximum Entropy Function (MEF) method is utilized to tolerate the errors and calculate the optimal estimated locations of unknown nodes. Simulation results demonstrate that the proposed localization method remains stable while the outliers vary. Moreover, the localization accuracy is highly improved by wisely rejecting outliers.
朱秀英; 韩保民; 尹培昌
2013-01-01
The dual-frequency onboard GPS P codes of the ionosphere-free linear combination was used as the basic observations,and taken the majority voting method to detect and delete the outliers.Then onboard GPS observations from CHAMP satellite was used to simulate and detect outliers,in order to verify the validity of the algorithm.The results indicated that the detection accuracy was relatively high,and also removed effectively the outliers of undifferenced observations.Which provided a good foundation of quality control for precise orbit determination using undifferenced onboard GPS observations.% 采用双频P码的消电离层组合作为基本观测量，结合众数投票算法探测、剔除非差星载GPS观测值中的粗差。利用低轨卫星GPS观测数据，模拟、探测、剔除粗差，以验证该算法的有效性。实验表明，该方法粗差位置探测相对准确，并能有效地剔除非差观测值中的粗差，为非差星载GPS精密定轨提供了良好的质量控制基础。
System Identification in Presence of Outliers.
Yu, Chao; Wang, Qing-Guo; Zhang, Dan; Wang, Lei; Huang, Jiangshuai
2016-05-01
The outlier detection problem for dynamic systems is formulated as a matrix decomposition problem with low rank and sparse matrices, and further recast as a semidefinite programming problem. A fast algorithm is presented to solve the resulting problem while keeping the solution matrix structure and it can greatly reduce the computational cost over the standard interior-point method. The computational burden is further reduced by proper construction of subsets of the raw data without violating low-rank property of the involved matrix. The proposed method can make exact detection of outliers in case of no or little noise in output observations. In case of significant noise, a novel approach based on under-sampling with averaging is developed to denoise while retaining the saliency of outliers, and so-filtered data enables successful outlier detection with the proposed method while the existing filtering methods fail. Use of recovered "clean" data from the proposed method can give much better parameter estimation compared with that based on the raw data.
刘露; 左万利; 彭涛
2016-01-01
Mining rich semantic information hidden in heterogeneous information network is an important task in data mining .The value ,data distribution and generation mechanism of outliers are all different from that of normal data .It is of great significance of analyzing its generation mechanism or even eliminating outliers .Outlier detection in homogeneous information network has been studied and explored for a long time . However ,few of them are aiming at dynamic outlier detection in heterogeneous networks .Many issues need to be settled .Due to the dynamics of the heterogeneous information network ,normal data may become outliers over time . This paper proposes a dynamic tensor representation based outlier detection method ,called TRBOutlier .It constructs tensor index tree according to the high order data represented by tensor .The features are added to direct item set and indirect item set respectively when searching the tensor index tree .Meanwhile ,we describe a clustering method based on the correlation of short texts to judge whether the objects in datasets change their original clusters and then detect outliers dynamically .This model can keep the semantic relationship in heterogeneous networks as much as possible in the case of fully reducing the time and space complexity . T he experimental results show that our proposed method can detect outliers dynamically in heterogeneous information network effectively and efficiently .%挖掘隐藏在异质信息网络中丰富的语义信息是数据挖掘的重要任务之一。离群点在值、数据分布、和产生机制上都明显不同于正常数据对象。检测离群点并分析其不同的产生机制，最终消除离群点具有重要的现实意义。目前，针对异质信息网络动态离群点检测的研究工作相对较少，还有很多问题有待解决。由于异质信息网络的动态性，随着时间的变化，正常数据对象也可能转变为离群点。针对异质网络提出一
Detecting Fraudulent Manipulation of Accounting Ratios in Financial ...
Detecting Fraudulent Manipulation of Accounting Ratios in Financial Reporting ... adoption of mandatory accounting standards, and voluntary accounting changes. ... are expected to consider whether the information presented in the financial ...
Influence of outliers on QTL mapping for complex traits
Yousaf HAYAT; Jian YANG; Hai-ming XU; Jun ZHU
2008-01-01
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1 stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additivex environment interaction effects of QTLs in the presence of outliers.
Influence of outliers on QTL mapping for complex traits.
Hayat, Yousaf; Yang, Jian; Xu, Hai-ming; Zhu, Jun
2008-12-01
A method was proposed for the detection of outliers and influential observations in the framework of a mixed linear model, prior to the quantitative trait locus (QTL) mapping analysis. We investigated the impact of outliers on QTL mapping for complex traits in a mouse BXD population, and observed that the dropping of outliers could provide the evidence of additional QTL and epistatic loci affecting the 1stBrain-OB and the 2ndBrain-OB in a cross of the abovementioned population. The results could also reveal a remarkable increase in estimating heritabilities of QTL in the absence of outliers. In addition, simulations were conducted to investigate the detection powers and false discovery rates (FDRs) of QTLs in the presence and absence of outliers. The results suggested that the presence of a small proportion of outliers could increase the FDR and hence decrease the detection power of QTLs. A drastic increase could be obtained in the estimates of standard errors for position, additive and additivex environment interaction effects of QTLs in the presence of outliers.
魏萌; 王靳辉; 衡广辉
2012-01-01
相关观测的异常值检测是测量数据处理的难题之一.在系统总结和分析前人研究成果的基础上,运用贝叶斯统计推断理论,提出了相关观测异常值检测的贝叶斯方法.首先,基于识别变量的后验概率,提出了相关观测异常值定位的贝叶斯方法；然后设计和构建了计算后验概率的吉布斯抽样方法,基于最大后验估计原理,推导和建立了计算异常值参数的贝叶斯公式；最后对某GPS网相连进行了计算和分析.结果表明,在相关观测条件下,使用新方法能够对多个异常值同时进行检测,有效地消除异常值的不良影响.%Outlier of correlated observation are one of the difficult and important problems in data processing. According to systemically reviewing research history of the puzzle, Bayesian detection method was put forward and applied in the GPS network utilizing the modern Bayesian theories and methods. First of all, on the basis of posterior probabilities of classification variables, the Bayesian methods for positioning outlier of correlated observations was proposed, and based on Gibbs sampling, the algorithm for calculating the posterior probability of classification variables was designed. Secondly, modern Bayesian statistical theory was appliedy to deduce Bayesian estimations for outlier. Then, those new methods in a GPS network adjustment was applied. These numerical examples demonstrated that the new methods were effective. In condition of correlated observations, the methods can detect multiple outliers of correlated observation and eliminate the influence of outlier effectively at the same time.
The distribution-based p-value for the outlier sum in differential gene expression analysis.
Chen, Lin-An; Chen, Dung-Tsa; Chan, Wenyaw
2010-03-01
Outlier sums were proposed by Tibshirani & Hastie (2007) and Wu (2007) for detecting outlier genes where only a small subset of disease samples shows unusually high gene expression, but they did not develop their distributional properties and formal statistical inference. In this study, a new outlier sum for detection of outlier genes is proposed, its asymptotic distribution theory is developed, and the p-value based on this outlier sum is formulated. Its analytic form is derived on the basis of the large-sample theory. We compare the proposed method with existing outlier sum methods by power comparisons. Our method is applied to DNA microarray data from samples of primary breast tumors examined by Huang et al. (2003). The results show that the proposed method is more efficient in detecting outlier genes.
基于小波隐马尔可夫模型的控制过程异常数据检测方法%Outlier detection for control process data based on wavelet-HMM methods
刘芳; 毛志忠
2011-01-01
针对小波异常信号检测原理的局限性，提出了适用于过程数据的基于小波隐马尔可夫模型（W-HMM）的异常数据检测方法．首先在一定尺度下对检测信号进行分解，将频率组分不同于其他大部分信号的信号作为异常信号;然后通过计算待检测信号的小波系数与正常信号小波系数的相似概率，并利用求取隐马尔可夫模型（HMM）最优状态链的Viterbi算法对数据进行最终判断；最后通过数值验证和应用表明了所提出的检测算法的有效性和实用性．%According to the limitation of the principle of outlier detection based on wavelet, this paper proposes an outlier detection method called wavelet-hidden Markov modeI（W-HMM） algorithm. In this algorithm, the signal is decomposed under some scale, and when the wavelet decompositions of the signal are different from the most other wavelet decompositions, the signal can be seen as potential outlier. Aiming to make further accurate judgement, and by calculating the similarity probability between the wavelet coefficient of this signal and that of normal signal, the final confirming is obtained by using Viterbi algorithm which is applied to HMM. Finally, experimentation and application show the effectiveness and practicality of the proposed detection method.
田海清; 王春光; 张海军; 郁志宏; 李建康
2012-01-01
Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0. 601° Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0. 854°Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set(r = 0. 797, RMSEC = 0. 849° Brix, RMSEP=1. 19°Brix), and more representative and stable with all 9 samples were eliminated from calibration set(r = 0.892, RMSEC=0. 605°Brix, RMSEP=0. 862°Brix).%针对蜜瓜可溶性固形物含量透射光谱检测中,异常建模样品对模型精度的影响及多种可能来源,提出异常样品的综合评判方法.为防止漏判,分别针对不同来源,采用基于预测浓度残差、Chauvenet检验法及杠杆值与学生残差T检验准则对85个建模样品(偏最小二乘法建模)进行初步判别,共判别出9个疑似异常样品.为防止误判,对疑似样品逐一回收,考察其对建模与预测精度的影响.先后回收5个样品后,所建校正模型相关系数r为0.889,均方根校正偏差RMSEC为0.601°Brix,对35个未知样品的均方根预测偏差RMSEP为0.854°Brix,比未剔除异常样品前所建模型(r=0.797,RMSEC=0
Edge Detection Operators: Peak Signal to Noise Ratio Based Comparison
D. Poobathy
2014-09-01
Full Text Available Edge detection is the vital task in digital image processing. It makes the image segmentation and pattern recognition more comfort. It also helps for object detection. There are many edge detectors available for pre-processing in computer vision. But, Canny, Sobel, Laplacian of Gaussian (LoG, Robert’s and Prewitt are most applied algorithms. This paper compares each of these operators by the manner of checking Peak signal to Noise Ratio (PSNR and Mean Squared Error (MSE of resultant image. It evaluates the performance of each algorithm with Matlab and Java. The set of four universally standardized test images are used for the experimentation. The PSNR and MSE results are numeric values, based on that, performance of algorithms identified. The time required for each algorithm to detect edges is also documented. After the Experimentation, Canny operator found as the best among others in edge detection accuracy.
Influence of polarization extinction ratio on distributed polarization coupling detection
XU Tian-hua; TANG Feng; JING Wen-cai; ZHANG Hong-xia; JIA Da-gong; YU Chang-song; ZHOU Ge; ZHANG Yi-mo
2008-01-01
Distributed polarization coupling in polarization-maintaining fibers can be detected by using a white light Michelsonin terferorneter. This technique usually requires that only one polarization mode is excited. However, in practical measurement,the injection polarization direction could not be exactly aligned to one of the principal axes of the PMF, so the influence of the polarization extinction ratio should be considered. Based on the polarization coupling theory, the influence of theincident polarization extinction on the measurement result is evaluated and analyzed, and a method for distributed polarization coupling detection is developed when both two orthogonal eigenmodes are excited.
Outlier Identification in Model-Based Cluster Analysis.
Evans, Katie; Love, Tanzy; Thurston, Sally W
2015-04-01
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.
Outlier Identification in Model-Based Cluster Analysis
Evans, Katie; Love, Tanzy; Thurston, Sally W.
2015-01-01
In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data. PMID:26806993
TESTING FOR OUTLIERS IN TIME SERIES USING WAVELETS
ZHANG Tong; ZHANG Xibin; ZHANG Shiying
2003-01-01
One remarkable feature of wavelet decomposition is that the wavelet coefficients are localized, and any singularity in the input signals can only affect the wavelet coefficients at the point near the singularity. The localized property of the wavelet coefficients allows us to identify the singularities in the input signals by studying the wavelet coefficients at different resolution levels. This paper considers wavelet-based approaches for the detection of outliers in time series. Outliers are high-frequency phenomena which are associated with the wavelet coefficients with large absolute values at different resolution levels. On the basis of the first-level wavelet coefficients, this paper presents a diagnostic to identify outliers in a time series. Under the null hypothesis that there is no outlier, the proposed diagnostic is distributed as a X12. Empirical examples are presented to demonstrate the application of the proposed diagnostic.
Goovaerts Pierre
2004-07-01
Full Text Available Abstract Background Complete Spatial Randomness (CSR is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new
Unmasking Outliers in Large Distributed Databases Using Cluster Based Approach: CluBSOLD
A. Rama Satish
2016-04-01
Full Text Available Outliers are dissimilar or inconsistent data objects with respect to the remaining data objects in the data set or which are far away from their cluster centroids. Detecting outliers in data is a very important concept in Knowledge Data Discovery process for finding hidden knowledge. The task of detecting the outliers has been studied in a large number of research areas like Financial Data Analysis, Large Distributed Systems, Biological Data Analysis, Data Mining, Scientific Applications, Health monitoring, etc., Existing research study of outlier detection shows that Density Based outlier detection techniques are robust. Identifying outliers in a distributed environment is not a simple task because processing with a distributed database raises two major issues. First one is rendering massive data which are generated from different databases. And the second is data integration, which may cause data security violation and sensitive information leakage. Handling distributed database is a difficult task. In this paper, we present a cluster based outliers detection to spot outliers in large and vibrant (updated dynamically distributed database in which cell density based centralized detection is used to succeed in dealing with massive data rendering problem and data integration problem. Experiments are conducted on various datasets and the obtained results clearly shows the robustness of the proposed technique forv finding outliers in large distributed database.
Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models.
Alexandrescu, Roxana; Bottle, Alex; Jarman, Brian; Aylin, Paul
2014-05-01
The use of hierarchical logistic regression for provider profiling has been recommended due to the clustering of patients within hospitals, but has some associated difficulties. We assess changes in hospital outlier status based on standard logistic versus hierarchical logistic modelling of mortality. The study population consisted of all patients admitted to acute, non-specialist hospitals in England between 2007 and 2011 with a primary diagnosis of acute myocardial infarction, acute cerebrovascular disease or fracture of neck of femur or a primary procedure of coronary artery bypass graft or repair of abdominal aortic aneurysm. We compared standardised mortality ratios (SMRs) from non-hierarchical models with SMRs from hierarchical models, without and with shrinkage estimates of the predicted probabilities (Model 1 and Model 2). The SMRs from standard logistic and hierarchical models were highly statistically significantly correlated (r > 0.91, p = 0.01). More outliers were recorded in the standard logistic regression than hierarchical modelling only when using shrinkage estimates (Model 2): 21 hospitals (out of a cumulative number of 565 pairs of hospitals under study) changed from a low outlier and 8 hospitals changed from a high outlier based on the logistic regression to a not-an-outlier based on shrinkage estimates. Both standard logistic and hierarchical modelling have identified nearly the same hospitals as mortality outliers. The choice of methodological approach should, however, also consider whether the modelling aim is judgment or improvement, as shrinkage may be more appropriate for the former than the latter.
Undersampled Phase Retrieval with Outliers.
Weller, Daniel S; Pnueli, Ayelet; Divon, Gilad; Radzyner, Ori; Eldar, Yonina C; Fessler, Jeffrey A
2015-12-01
This paper proposes a general framework for reconstructing sparse images from undersampled (squared)-magnitude data corrupted with outliers and noise. This phase retrieval method uses a layered approach, combining repeated minimization of a convex majorizer (surrogate for a nonconvex objective function), and iterative optimization of that majorizer using a preconditioned variant of the alternating direction method of multipliers (ADMM). Since phase retrieval is nonconvex, this implementation uses multiple initial majorization vectors. The introduction of a robust 1-norm data fit term that is better adapted to outliers exploits the generality of this framework. The derivation also describes a normalization scheme for the regularization parameter and a known adaptive heuristic for the ADMM penalty parameter. Both 1D Monte Carlo tests and 2D image reconstruction simulations suggest the proposed framework, with the robust data fit term, reduces the reconstruction error for data corrupted with both outliers and additive noise, relative to competing algorithms having the same total computation.
Screening for Outliers in Multiple Trait Genetic Evaluarion
Madsen, Per; Pösa, Jukka; Pedersen, Jørn
2012-01-01
genetic evaluation in dairy cattle. Application of such is simple to implement and increased the accuracy of predicted breeding values for animals that has one or more records edited. Potential biases in evaluations for contemporary animals were also reduced. Optimum editing rules can be determined using......Use of multivariate models in genetic evaluation requires a multivariate method for detecting erroneous outliers that cannot be detected using univariate methods. A simple rule for detecting outliers based on an approximated Mahanalobis distance was applied to Jersey data from the routine Nordic...
Augmented kludge waveforms for detecting extreme-mass-ratio inspirals
Chua, Alvin J. K.; Moore, Christopher J.; Gair, Jonathan R.
2017-08-01
The extreme-mass-ratio inspirals (EMRIs) of stellar-mass compact objects into massive black holes are an important class of source for the future space-based gravitational-wave detector LISA. Detecting signals from EMRIs will require waveform models that are both accurate and computationally efficient. In this paper, we present the latest implementation of an augmented analytic kludge (AAK) model, publicly available at https://github.com/alvincjk/EMRI_Kludge_Suite as part of an EMRI waveform software suite. This version of the AAK model has improved accuracy compared to its predecessors, with two-month waveform overlaps against a more accurate fiducial model exceeding 0.97 for a generic range of sources; it also generates waveforms 5-15 times faster than the fiducial model. The AAK model is well suited for scoping out data analysis issues in the upcoming round of mock LISA data challenges. A simple analytic argument shows that it might even be viable for detecting EMRIs with LISA through a semicoherent template bank method, while the use of the original analytic kludge in the same approach will result in around 90% fewer detections.
Robust Clustering Using Outlier-Sparsity Regularization
Forero, Pedro A; Giannakis, Georgios B
2011-01-01
Notwithstanding the popularity of conventional clustering algorithms such as K-means and probabilistic clustering, their clustering results are sensitive to the presence of outliers in the data. Even a few outliers can compromise the ability of these algorithms to identify meaningful hidden structures rendering their outcome unreliable. This paper develops robust clustering algorithms that not only aim to cluster the data, but also to identify the outliers. The novel approaches rely on the infrequent presence of outliers in the data which translates to sparsity in a judiciously chosen domain. Capitalizing on the sparsity in the outlier domain, outlier-aware robust K-means and probabilistic clustering approaches are proposed. Their novelty lies on identifying outliers while effecting sparsity in the outlier domain through carefully chosen regularization. A block coordinate descent approach is developed to obtain iterative algorithms with convergence guarantees and small excess computational complexity with res...
Spectral Ratios for Crack Detection Using P and Rayleigh Waves
Enrique Olivera-Villaseñor
2012-01-01
Full Text Available We obtain numerical results to help the detection and characterization of subsurface cracks in solids by the application of P and Rayleigh elastic waves. The response is obtained from boundary integral equations, which belongs to the field of elastodynamics. Once the implementation of the boundary conditions has been done, a system of Fredholm integral equations of the second kind and order zero is found. This system is solved using the method of Gaussian elimination. Resonance peaks in the frequency domain allow us to infer the presence of cracks using spectral ratios. Several models of cracked media were analyzed, where effects due to different crack orientations and locations were observed. The results obtained are in good agreement with those published in the references.
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false Outliers. 413.237 Section 413.237 Public Health...) Services and Organ Procurement Costs § 413.237 Outliers. (a) The following definitions apply to this section. (1) ESRD outlier services are the following items and services that are included in the ESRD PPS...
Top-k(σ)outlier detection algorithm for wireless sensor networks%基于 Top-k（σ）的无线传感器网络异常数据检测算法
胡石; 李光辉; 冯海林
2016-01-01
Outlier detection plays an important role in wireless sensor network (WSN )application system for environment monitoring,which helps people monitor the condition of WSNs themselves,and also can detect the emergent events of the environment such as forest fire and air pollution.After improving the top-k algorithm,a top-k(σ)outlier detection algorithm for WSNs was proposed in this paper.Different from top-k algorithm,the proposed algorithm uses the data distribution collected by the sensor nodes to construct appropriate data grid,and puts the data sets into the grid after normalization,then sets a distance threshold σ to reconstruct the PC list (populated-cells list).This algorithm sorts the numbers of data points in each cell and those of its neighborhood respectively,as well as computes the Euclidean distance R -D between two data subsets,and compares the value of R-D withσ so as to verify the degree of deviation of the subset from the normal data sets.Thus the top-k (σ) algorithm can improve the precision of the outliers detection.For given several datasets,the simulation results under MATLAB platform show that,the thresholdσ has great effect on the performance of outlier detect algorithm.Whenσ∈[2.5,3],the top-k(σ)algorithm has higher detection accuracy and lower false positive rate.Ifσ=3,for the given five data sets,the average accuracy of outlier detection of top-k(σ)algorithm is 93.70%,which is 4.94% higher than that of top-k algorithm,and the average false positive rate of top-k (σ)algorithm is 4.48% lower than that of top-k algorithm.%异常数据检测在基于无线传感器网络的环境监测系统中起着十分重要的作用,不仅有助于对传感器网络健康状况的监测,而且能够及时发现外部环境发生的突发事件(如森林火灾、环境污染等).通过对 top-k 算法的改进,提出了一种基于 top-k(σ)的无线传感器网络异常数据检测算法.不同于 top-k 算法,该算法根据传感器节点采
Robust PCA via Outlier Pursuit
Xu, Huan; Sanghavi, Sujay
2010-01-01
Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount ...
Stochastic epigenetic outliers can define field defects in cancer.
Teschendorff, Andrew E; Jones, Allison; Widschwendter, Martin
2016-04-22
There is growing evidence that DNA methylation alterations may contribute to carcinogenesis. Recent data also suggest that DNA methylation field defects in normal pre-neoplastic tissue represent infrequent stochastic "outlier" events. This presents a statistical challenge for standard feature selection algorithms, which assume frequent alterations in a disease phenotype. Although differential variability has emerged as a novel feature selection paradigm for the discovery of outliers, a growing concern is that these could result from technical confounders, in principle thus favouring algorithms which are robust to outliers. Here we evaluate five differential variability algorithms in over 700 DNA methylomes, including two of the largest cohorts profiling precursor cancer lesions, and demonstrate that most of the novel proposed algorithms lack the sensitivity to detect epigenetic field defects at genome-wide significance. In contrast, algorithms which recognise heterogeneous outlier DNA methylation patterns are able to identify many sites in pre-neoplastic lesions, which display progression in invasive cancer. Thus, we show that many DNA methylation outliers are not technical artefacts, but define epigenetic field defects which are selected for during cancer progression. Given that cancer studies aiming to find epigenetic field defects are likely to be limited by sample size, adopting the novel feature selection paradigm advocated here will be critical to increase assay sensitivity.
Research on Algorithms for Mining Distance-Based Outliers
WANGLizhen; ZOULikun
2005-01-01
The outlier detection is an important and valuable research in KDD (Knowledge discover in database). The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even weather forecast. In existing methods that we have seen for finding outliers, the notion of DB-(Distance-based) outliers is not restricted computationally to small values of the number of dimensions k and goes beyond the data space. Here, we study algorithms for mining DB-outliers. We focus on developing algorithms unlimited by k. First, we present a Partition-based algorithm (the PBA). The key idea is to gain efficiency by divide-and-conquer. Second, we present an optimized algorithm called Object-class-based algorithm (the OCBA). The computing of this algorithm has nothing to do with k and the efficiency of this algorithm is as good as the cell-based algorithm. We provide experimental results showing that the two new algorithms have better execution efficiency.
归庆明; 李涛; 衡广辉
2011-01-01
基于Bayes统计推断理论，提出了自回归模型中异常值定位的Bayes方法；在正态-Gamma先验分布下，分别基于均值漂移模型和方差膨胀模型，提出了后验概率的计算方法，并运用Bayes方法估计了异常扰动；最后将该方法应用到电离层VTEC数据处理的建模中，比较模型修正前后预报的结果，验证了新方法的有效性。%The observation of time series may be influenced by outliers. If we forecasts directly, neglecting the influence, will lead to false result. So, it is important to find proper outliers detection method. Based on the theory of Bayesian statistical inference, in this paper we put forward Bayesian method of positioning outliers in autoregressive model firstly; and then, under the condition of normal-gamma prior information, we put forward computation method of posterior probability based on mean shift model and variance inflation model respectively, and estimate the outliers with Bayesian method; at last, the method is applied in the research on the data modeling of ionospheric VTEC series, compared with the forecasting results on unmodified and modified to test the efficiency.
Anomalous human behavior detection: an adaptive approach
van Leeuwen, Coen; Halma, Arvid; Schutte, Klamer
2013-05-01
Detection of anomalies (outliers or abnormal instances) is an important element in a range of applications such as fault, fraud, suspicious behavior detection and knowledge discovery. In this article we propose a new method for anomaly detection and performed tested its ability to detect anomalous behavior in videos from DARPA's Mind's Eye program, containing a variety of human activities. In this semi-unsupervised task a set of normal instances is provided for training, after which unknown abnormal behavior has to be detected in a test set. The features extracted from the video data have high dimensionality, are sparse and inhomogeneously distributed in the feature space making it a challenging task. Given these characteristics a distance-based method is preferred, but choosing a threshold to classify instances as (ab)normal is non-trivial. Our novel aproach, the Adaptive Outlier Distance (AOD) is able to detect outliers in these conditions based on local distance ratios. The underlying assumption is that the local maximum distance between labeled examples is a good indicator of the variation in that neighborhood, and therefore a local threshold will result in more robust outlier detection. We compare our method to existing state-of-art methods such as the Local Outlier Factor (LOF) and the Local Distance-based Outlier Factor (LDOF). The results of the experiments show that our novel approach improves the quality of the anomaly detection.
A Dataset that Is 44% Outliers
Hayden, Robert W.
2005-01-01
The data illustrate outliers that are not mistakes and not observations that are unusually high or low. The reasons for them are all interesting historically. They illustrate that "outliers" need not be errors but may instead be particularly interesting cases. The data also illustrate that different data displays may differ in their ability to…
How Significant Is a Boxplot Outlier?
Dawson, Robert
2011-01-01
It is common to consider Tukey's schematic ("full") boxplot as an informal test for the existence of outliers. While the procedure is useful, it should be used with caution, as at least 30% of samples from a normally-distributed population of any size will be flagged as containing an outlier, while for small samples (N less than 10) even extreme…
Social norm perception in groups with outliers.
Dannals, Jennifer E; Miller, Dale T
2017-09-01
Social outliers draw a lot of attention from those inside and outside their group and yet little is known about their impact on perceptions of their group as a whole. The present studies examine how outliers influence observers' summary perceptions of a group's behavior and inferences about the group's descriptive and prescriptive norms. Across 4 studies (N = 1,718) we examine how observers perceive descriptive and prescriptive social norms in groups containing outliers of varying degrees. We find consistent evidence that observers overweight outlying behavior when judging the descriptive and prescriptive norms, but overweight outliers less as they become more extreme, especially in perceptions of the prescriptive norm. We find this pattern across norms pertaining to punctuality (Studies 1-2 and 4) and clothing formality (Study 3) and for outliers who are both prescriptively and descriptively deviant (e.g., late arrivers), as well as for outliers who are only descriptive deviants (e.g., early arrivers). We further demonstrate that observers' perceptions of the group shift in the direction of moderate outliers. This occurs because observers anchor on the outlier's behavior and adjust their recollections of nonoutlying individuals, making their inferences about the group's average behavior more extreme. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
How Significant Is a Boxplot Outlier?
Dawson, Robert
2011-01-01
It is common to consider Tukey's schematic ("full") boxplot as an informal test for the existence of outliers. While the procedure is useful, it should be used with caution, as at least 30% of samples from a normally-distributed population of any size will be flagged as containing an outlier, while for small samples (N less than 10) even extreme…
Detecting microalbuminuria by urinary albumin/creatinine concentration ratio
Jensen, J S; Clausen, P; Borch-Johnsen, K
1997-01-01
not included. Urinary albumin (Ualb) and creatinine (Ucreat) concentrations were measured in an overnight collected sample by enzyme-linked immunosorbent and colorimetric assays, respectively. Urinary albumin excretion rate (UAER) and urinary albumin/creatinine concentration ratio (Ualb/Ucreat) were calculated......BACKGROUND: Microalbuminuria, i.e. a subclinical increase of the albumin excretion rate in urine, may be a novel atherosclerotic risk factor. This study aimed to test whether microalbuminuria can be identified by measurement of urinary albumin concentration or urinary albumin....../creatinine concentration ratio, instead of the usual measurement of the albumin excretion rate in a timed urine collection. METHODS: All 2579 subjects analysed were screened in a population based epidemiological study. Participants with diabetes mellitus, renal disease, haematuria, or urinary tract infection were...
Pirson, Magali; Dramaix, Michèle; Leclercq, Pol; Jackson, Terri
2006-03-01
The objective of this study was to find factors that could explain high and low resource use outliers, by associating an explanatory analysis with a statistical analysis. High resource use outliers were selected according to the following rule: 75th percentile + 1.5* inter-quartile range. Low resource use outliers were selected according to: 25th percentile - 1.5* inter-quartile range. The statistical approach was based on a multivariate analysis using logistic regression. A decision tree approach using predictors from this analysis (intensive care unit (ICU) stay, high severity of illness and social factors associated with longer length of stay) was also tested as a more intuitive tool for use by hospitals in focussing review efforts on "not explained" cost outliers. High resource use outliers accounted for 6.31% of the hospital stays versus 1.07% for low resource use outliers. The probability of a patient being a high resource use outlier was higher with an increase in the length of stay (odds ratios (OR) = 1.08), when the patient was treated in an intensive care unit (OR = 3.02), with a major or extreme severity of illness (OR=1.46), and with the presence of social factors (OR = 1.44). The probability of being a low outlier is lower for older patients (OR = 0.98). The probability of being a low outlier is also lower without readmission within the year (OR = 0.55). The more intuitive decision tree method identified 92.26% of the cases identified through residuals of the regression model. One quarter of the high cost outliers were flagged for additional review ("not justified" on the basis of the model), with nearly three-quarters "justified" by clinical and social factors. The analysis of cost outliers can meet different aims (financing of justifiable outliers, improvement of the care process for the outliers not justifiable on medical or social grounds). The two methods are complementary, by proposing a statistical and a didactic approach to achieve the goal of
Detection of marine methane emissions with AVIRIS band ratios
Bradley, Eliza S.; Leifer, Ira; Roberts, Dar A.; Dennison, Philip E.; Washburn, Libe
2011-05-01
The relative source contributions of methane (CH4) have high uncertainty, creating a need for local-scale characterization in concert with global satellite measurements. However, efforts towards methane plume imaging have yet to provide convincing results for concentrated sources. Although atmospheric CH4 mapping did not motivate the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) design, recent studies suggest its potential for studying concentrated CH4 sources such as the Coal Oil Point (COP) seep field (˜0.015 Tg CH4 yr-1) offshore Santa Barbara, California. In this study, we developed a band ratio approach on high glint COP AVIRIS data and demonstrate the first successful local-scale remote sensing mapping of natural atmospheric CH4 plumes. Plume origins closely matched surface and sonar-derived seepage distributions, with plume characteristics consistent with wind advection. Imaging spectrometer data may also be useful for high spatial-resolution characterization of concentrated, globally-significant CH4 emissions from offshore platforms and cattle feedlots.
Blue outliers among intermediate redshift quasars
Marziani, P; Stirpe, G M; Dultzin, D; Del Olmo, A; Martínez-Carballo, M A
2015-01-01
[Oiii]{\\lambda}{\\lambda}4959,5007 "blue outliers" -- that are suggestive of outflows in the narrow line region of quasars -- appear to be much more common at intermediate z (high luminosity) than at low z. About 40% of quasars in a Hamburg ESO intermediate-z sample of 52 sources qualify as blue outliers (i.e., quasars with [OIII] {\\lambda}{\\lambda}4959,5007 lines showing large systematic blueshifts with respect to rest frame). We discuss major findings on what has become an intriguing field in active galactic nuclei research and stress the relevance of blue outliers to feedback and host galaxy evolution.
Outlier Mining Based on Principal Component Estimation
Hu Yang; Ting Yang
2005-01-01
Outlier mining is an important aspect in data mining and the outlier mining based on Cook distance is most commonly used. But we know that when the data have multicollinearity, the traditional Cook method is no longer effective. Considering the excellence of the principal component estimation, we use it to substitute the least squares estimation, and then give the Cook distance measurement based on principal component estimation, which can be used in outlier mining. At the same time, we have done some research on related theories and application problems.
Outliers Mining in Time Series Data Sets
无
2002-01-01
In this paper, we present a cluster-based algorithm for time series outlier mining.We use discrete Fourier transformation (DFT) to transform time series from time domain to frequency domain. Time series thus can be mapped as the points in k-dimensional space.For these points, a cluster-based algorithm is developed to mine the outliers from these points.The algorithm first partitions the input points into disjoint clusters and then prunes the clusters,through judgment that can not contain outliers.Our algorithm has been run in the electrical load time series of one steel enterprise and proved to be effective.
Testing for Multivariate Outliers in the Presence of Missing Data
Woodward, W. A.; Sain, S. R.; Gray, H. L.; Zhao, B.; Fisk, M. D.
- We consider the problem of multivariate outlier testing for purposes of distinguishing seismic signals of underground nuclear events from training samples based on non-nuclear seismic events when certain data are missing. We consider the case in which the training data follow a multivariate normal distribution. Assume a potential outlier is observed on which k features of interest are measured. Assume further that the available training set of n observations on these k features is available but that some of the observations in the training data have missing features. The approach currently used in practice is to perform the outlier testing using a generalized likelihood ratio test procedure based only on the data vectors in the training data with complete data. When there is a substantial amount of missing data within the training set, use of this strategy may lead to a loss of valuable information. An alternative procedure is to incorporate all n of the data vectors in the training data using the EM algorithm to appropriately handle the missing data in the training set. Resampling methods are used to find appropriate critical regions. We use simulation results and analysis of models fit to Pg/Lg ratios for the WMQ station in China to compare these two strategies for dealing with missing data.
A Unified Approach to Spatial Outlier Detection
2001-12-10
Xjà~V^XZYÖâ åå X]V[ æ ë æ��b� Tí�ZVm�] � _�X]T1V�ëx|a æ äx`l|a^]�x å âU� lVeX TëKíKâIíO�<â�V å � w â æ ä æ w T����� �g� Tx � áx��� æ ]X
Spatial Outlier Detection from GSM Mobility Data
Shad, Shafqat Ali; Chen, Enhong
2012-01-01
This paper has been withdrawn by the authors. With the rigorous growth of cellular network many mobility datasets are available publically, which attracted researchers to study human mobility fall under spatio-temporal phenomenon. Mobility profile building is main task in spatio-temporal trend analysis which can be extracted from the location information available in the dataset. The location information is usually gathered through the GPS, service provider assisted faux GPS and Cell Global I...
Daniel L Roden
Full Text Available Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET that enables identification and visualisation of gross abnormalities in gene expression (outliers in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI, using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java.The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.
Time of Flight Estimation in the Presence of Outliers: A Biosonar-Inspired Machine Learning Approach
2013-08-29
disappear . However, since the impact of an outlier to the overall error is proportional to the bias squared, an outlier event will seriously affect MSE of... disappears and the total error becomes: Assume we can train a (weak) classifier that assigns labels to an estimate such that which is the...subterranean mammal ”, Journal of Experimental Biology 2005, pp647-659 [10] R.N. McDonough, A. D. Whalen, “Detection of Signals in Noise”, Academic Press
Improving Electronic Sensor Reliability by Robust Outlier Screening
Federico Cuesta
2013-10-01
Full Text Available Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs. This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB and Bad Bin in a Bad Cluster (BBBC and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT, as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.
Pretorius, Carel J; Dimeski, Goce; O'Rourke, Peter K; Marquart, Louise; Tyack, Shirley A; Wilgen, Urs; Ungerer, Jacobus P J
2011-05-01
It is important that cardiac troponin be measured accurately with a robust method to limit false results with potentially adverse clinical outcomes. In this study, we characterized the robustness of 4 analytical platforms by measuring the outlier rate between duplicate results. We measured cardiac troponin concurrently in duplicate with 4 analyzers on 2391 samples. The outliers were detected from the difference between duplicate results and by calculating a z value: z = (result 1 - result 2) ÷ √(SD1(est)² + SD2(est)²), with z > 3.48 identifying outliers with a probability of 0.0005. The outlier rates were as follows: Abbott Architect i2000SR STAT Troponin-I, 0.10% (0.01%-0.19%); Beckman Coulter Access2 Enhanced AccuTnI, 0.44% (0.25%-0.63%); Roche Cobas e601 TroponinT hs, 0.06% (0.00%-0.13%); and Siemens ADVIA Centaur XP TnI-Ultra, 0.10% (0.01%-0.19%). The occurrence of outliers was higher than statistically expected on all platforms except the Cobas e601 (χ² = 2.7; P = 0.10). A conservative approach with a constant 10% CV and z > 5.0 identified outliers with clear clinical impact and resulted in outlier rates of 0.11% (0.02%-0.20%) with the Architect i2000SR STAT Troponin-I, 0.36% (0.19%-0.53%) with the Access2 Enhanced AccuTnI, 0.02% (0.00%-0.06%) with the Cobas e601 TroponinT hs, and 0.06% (0.00%-0.13%) with the ADVIA Centaur XP TnI-Ultra. Outliers occurred on all analytical platforms, at different rates. Clinicians should be made aware by their laboratory colleagues of the existence of outliers and the rate at which they occur.
Finding multivariate outliers in fMRI time-series data.
Magnotti, John F; Billor, Nedret
2014-10-01
A fundamental challenge for researchers studying the brain is to explain how distributed patterns of brain activity relate to a specific representation or computation. Multivariate techniques are therefore becoming increasingly popular for pattern localization of functional magnetic resonance imaging (fMRI) data. The increased power of these techniques can be offset by their susceptibility to multivariate outliers, a problem not directly encountered when fMRI data are analyzed in more common univariate analysis techniques. We test how two algorithms, High Dimensional Blocked Adaptive Computationally Efficient Outlier Nominators (HD BACON) and Principal Component based Outlier detection (PCOut), can detect multivariate outliers in high-dimensional fMRI data, in which the number of variables is larger than the number of observations. We show how these methods can be applied to individual, voxel time-series to identify outlying voxels within a region of interest. Finally, we compare these methods with simulated data to identify which aspects of the data each method is most sensitive to. Voxels identified by both algorithms were primarily on the edges of univariate activation clusters and near the boundaries between different tissue types. Simulation results showed the PCOut outperformed HD BACON, maintaining both high sensitivity and specificity across a wide range of outlier contamination percentages. Our results suggest that multivariate analysis of fMRI can benefit from including multivariate outlier detection as a routine data quality check prior to model fitting. Copyright © 2014 Elsevier Ltd. All rights reserved.
Outlier-resilient complexity analysis of heartbeat dynamics
Lo, Men-Tzung; Chang, Yi-Chung; Lin, Chen; Young, Hsu-Wen Vincent; Lin, Yen-Hung; Ho, Yi-Lwun; Peng, Chung-Kang; Hu, Kun
2015-03-01
Complexity in physiological outputs is believed to be a hallmark of healthy physiological control. How to accurately quantify the degree of complexity in physiological signals with outliers remains a major barrier for translating this novel concept of nonlinear dynamic theory to clinical practice. Here we propose a new approach to estimate the complexity in a signal by analyzing the irregularity of the sign time series of its coarse-grained time series at different time scales. Using surrogate data, we show that the method can reliably assess the complexity in noisy data while being highly resilient to outliers. We further apply this method to the analysis of human heartbeat recordings. Without removing any outliers due to ectopic beats, the method is able to detect a degradation of cardiac control in patients with congestive heart failure and a more degradation in critically ill patients whose life continuation relies on extracorporeal membrane oxygenator (ECMO). Moreover, the derived complexity measures can predict the mortality of ECMO patients. These results indicate that the proposed method may serve as a promising tool for monitoring cardiac function of patients in clinical settings.
New Vehicle Detection Method with Aspect Ratio Estimation for Hypothesized Windows
Jisu Kim
2015-12-01
Full Text Available All kinds of vehicles have different ratios of width to height, which are called the aspect ratios. Most previous works, however, use a fixed aspect ratio for vehicle detection (VD. The use of a fixed vehicle aspect ratio for VD degrades the performance. Thus, the estimation of a vehicle aspect ratio is an important part of robust VD. Taking this idea into account, a new on-road vehicle detection system is proposed in this paper. The proposed method estimates the aspect ratio of the hypothesized windows to improve the VD performance. Our proposed method uses an Aggregate Channel Feature (ACF and a support vector machine (SVM to verify the hypothesized windows with the estimated aspect ratio. The contribution of this paper is threefold. First, the estimation of vehicle aspect ratio is inserted between the HG (hypothesis generation and the HV (hypothesis verification. Second, a simple HG method named a signed horizontal edge map is proposed to speed up VD. Third, a new measure is proposed to represent the overlapping ratio between the ground truth and the detection results. This new measure is used to show that the proposed method is better than previous works in terms of robust VD. Finally, the Pittsburgh dataset is used to verify the performance of the proposed method.
Identifying Outliers in Data from Patient Record.
Baumberger, Dieter; Buergin, Reto
2016-01-01
It is important for health services to be able to identify potential outliers with minimal effort as part of their daily evaluation of care data from patient record. This study evaluates the suitability of three statistical methods for identifying nursing outliers. The results show that by using methods implemented in the nursing workload measurement system "LEP" with reference to real data, unusual LEP minute profiles (movement, nutrition and so on) can be identified with little effort and therefore seem promising for application to the health services' daily evaluation process. The lessons learned are used to create requirement criteria for the further development of software solutions. It is recommended that the methods for identifying outliers in the daily evaluation process should be standardized in order to increase the efficiency of secondary use of care data from patient record.
Falsely elevated troponin I results due to outliers indicate a lack of analytical robustness.
Ungerer, Jacobus P J; Pretorius, Carel J; Dimeski, Goce; O'Rourke, Peter K; Tyack, Shirley A
2010-05-01
Troponin (Tn) is the preferred biochemical marker for the diagnosis of acute coronary syndrome. Spurious false Tn results (outliers) may cause significant problems with clinical management. We investigated the occurrence of outliers and whether this phenomenon could be explained by analytical imprecision. Methods and results Troponin I (TnI) was measured in duplicate with Beckman AccuTnI reagent if the first TnI result was > or =0.04 microg/L (n = 5265). All TnI requests were performed in duplicate in a subset of samples for one calendar month (n = 881). A total of 13,178 TnI requests were received during the study period. Variables were sample type, centrifugation speed and analyser. Results were identified as outliers when the difference between two results exceeded a critical difference (CD) limit defined by CD = z x square root 2 x SD(Analytical). Outliers at the 0.0005 probability level were detected in 102 of 5265 duplicate observations (1.94 +/- 0.37%). This translated into an outlier rate of 0.55 +/- 0.13% for all TnI results and 1.37 +/- 0.31% for results above 0.04 microg/L. Outliers resulted only in falsely elevated TnI values and were not dependent on the analyser, centrifugation speed or sample type. TnI outliers occurred more frequently than anticipated, could not be explained by analytical imprecision and indicated a lack of robustness in the assay. The high rate and the magnitude of the errors will complicate clinical management and carry a risk of detrimental patient outcome. The outlier rate is a useful parameter to define the robustness of assays.
Dealing with Outliers: Robust, Resistant Regression
Glasser, Leslie
2007-01-01
Least-squares linear regression is the best of statistics and it is the worst of statistics. The reasons for this paradoxical claim, arising from possible inapplicability of the method and the excessive influence of "outliers", are discussed and substitute regression methods based on median selection, which is both robust and resistant, are…
Fuzzy Treatment of Candidate Outliers in Measurements
Giampaolo E. D'Errico
2012-01-01
Full Text Available Robustness against the possible occurrence of outlying observations is critical to the performance of a measurement process. Open questions relevant to statistical testing for candidate outliers are reviewed. A novel fuzzy logic approach is developed and exemplified in a metrology context. A simulation procedure is presented and discussed by comparing fuzzy versus probabilistic models.
赵振英; 林君; 张怀柱
2014-01-01
In the present paper,the outlier detection methods for determination of oil yield in oil shale using near-infrared (NIR) diffuse reflection spectroscopy was studied.During the quantitative analysis with near-infrared spectroscopy,environmental change and operator error will both produce outliers.The presence of outliers will affect the overall distribution trend of samples and lead to the decrease in predictive capability.Thus,the detection of outliers are important for the construction of high-quality calibration models.The methods including principal component analysis-Mahalanobis distance (PCA-MD)and resampling by half-means (RHM)were applied to the discrimination and elimination of outliers in this work.The thresholds and confidences for MD and RHM were optimized using the performance of partial least squares (PLS)models constructed after the elimination of outliers,respectively.Compared with the model constructed with the data of full spectrum,the values of RMSEP of the mod-els constructed with the application of PCA-MD with a threshold of a value equal to the sum of average and standard deviation of MD,RHM with the confidence level of 85%,and the combination of PCA-MD and RHM,were reduced by 48. 3%,27. 5% and 44. 8%,respectively.The predictive ability of the calibration model has been improved effectively.%研究了漫反射近红外（NIR）光谱法分析油页岩含油率过程中异常样品的识别和剔除方法。在近红外光谱定量分析中，环境变化和操作失误等都会产生异常样品，异常样品的存在会导致模型的预测能力下降，因此异常样品的剔除是建模过程中的关键步骤。分别采用主成分分析-马氏距离（PCA-MD）法和半数重采样（RHM）法识别油页岩光谱数据中的异常样品，通过剔除异常样品后所建的偏最小二乘（PLS）分析模型的性能来评价PCA-MD与RHM方法对异常样品的识别能力。实验中考察了不同 MD阈值和RHM置信度对异常
Mining Distance-Based Outliers in Near Linear Time
National Aeronautics and Space Administration — Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule Abstract: Defining outliers by their distance to...
Statistical Mechanics of Learning in the Presence of Outliers
Dietrich, Rainer; Opper, Manfred
1998-01-01
Using methods of statistical mechanics, we analyse the effect of outliers on the supervised learning of a classification problem. The learning strategy aims at selecting informative examples and discarding outliers. We compare two algorithms which perform the selection either in a soft or a hard way. When the fraction of outliers grows large, the estimation errors undergo a first order phase transition.
42 CFR 412.80 - Outlier cases: General provisions.
2010-10-01
... transferring hospitals, CMS provides for additional payment, beyond standard DRG payments, to a hospital for...) Outlier cases in transferring hospitals. CMS provides cost outlier payments to a transferring hospital for... MEDICARE PROGRAM PROSPECTIVE PAYMENT SYSTEMS FOR INPATIENT HOSPITAL SERVICES Payments for Outlier Cases...
No Sexual Dimorphism Detected in Digit Ratios of the Fire Salamander (Salamandra salamandra).
Balogová, Monika; Nelson, Emma; Uhrin, Marcel; Figurová, Mária; Ledecký, Valent; Zyśk, Bartłomiej
2015-10-01
It has been proposed that digit ratio may be used as a biomarker of early developmental effects. Specifically, the second-to-fourth digit ratio (2D:4D) has been linked to the effects of sex hormones and their receptor genes, but other digit ratios have also been investigated. Across taxa, patterns of sexual dimorphism in digit ratios are ambiguous and a scarcity of studies in basal tetrapods makes it difficult to understand how ratios have evolved. Here, we focus on examining sex differences in digit ratios (2D:3D, 2D:4D, and 3D:4D) in a common amphibian, the fire salamander (Salamandra salamandra). We used graphic software to measure soft tissue digit length and digit bone length from X-rays. We found a nonsignificant tendency in males to have a lower 2D:3D than females; however, no sexual differences were detected in the other ratios. We discuss our results in the context of other studies of digit ratios, and how sex determination systems, as well as other factors, might impact patterns of sexual dimorphism, particularly in reptiles and in amphibians. Our findings suggest that caution is needed when using digit ratios as a potential indicator of prenatal hormonal effects in amphibians and highlight the need for more comparative studies to elucidate the evolutionary and genetic mechanisms implicated in sexually dimorphic patterns across taxonomic groups. © 2015 Wiley Periodicals, Inc.
Ovidiu Galescu
2012-01-01
Full Text Available Background. Blood pressure (BP percentiles in childhood are assessed according to age, gender, and height. Objective. To create a simple BP/height ratio for both systolic BP (SBP and diastolic BP (DBP. To study the relationship between BP/height ratios and corresponding BP percentiles in children. Methods. We analyzed data on height and BP from 2006-2007 NHANES data. BP percentiles were calculated for 3775 children. Receiver-operating characteristic (ROC curve analyses were performed to calculate sensitivity and specificity of BP/height ratios as diagnostic tests for elevated BP (>90%. Correlation analysis was performed between BP percentiles and BP/height ratios. Results. The average age was 12.54 ± 2.67 years. SBP/height and DBP/height ratios strongly correlated with SBP & DBP percentiles in both boys (<0.001, 2=0.85, 2=0.86 and girls (<0.001, 2=0.85, 2=0.90. The cutoffs of SBP/height and DBP/height ratios in boys were ≥0.75 and ≥0.46, respectively; in girls the ratios were ≥0.75 and ≥0.48, respectively with sensitivity and specificity in range of 83–100%. Conclusion. BP/height ratios are simple with high sensitivity and specificity to detect elevated BP in children. These ratios can be easily used in routine medical care of children.
Spectrum based feature extraction using spectrum intensity ratio for SSVEP detection.
Itai, Akitoshi; Funase, Arao
2012-01-01
Recent years, a Steady-State Visual Evoked Potential (SSVEP) is used as a basis for Brain Computer Interface (BCI)[1]. Various feature extraction and classification techniques are proposed to achieve BCI based on SSVEP. The feature extraction of SSVEP is developed in the frequency domain regardless of the limitation in flickering frequency of visual stimulus caused by hardware architecture. We introduce here the feature extraction using a spectrum intensity ratio. Results show that the detection ratio reaches 84% by using a spectrum intensity ratio with unsupervised classification. It also indicates the SSVEP is enhanced by proposed feature extraction with second harmonic.
Adaptive vector validation in image velocimetry to minimise the influence of outlier clusters
Masullo, Alessandro; Theunissen, Raf
2016-03-01
The universal outlier detection scheme (Westerweel and Scarano in Exp Fluids 39:1096-1100, 2005) and the distance-weighted universal outlier detection scheme for unstructured data (Duncan et al. in Meas Sci Technol 21:057002, 2010) are the most common PIV data validation routines. However, such techniques rely on a spatial comparison of each vector with those in a fixed-size neighbourhood and their performance subsequently suffers in the presence of clusters of outliers. This paper proposes an advancement to render outlier detection more robust while reducing the probability of mistakenly invalidating correct vectors. Velocity fields undergo a preliminary evaluation in terms of local coherency, which parametrises the extent of the neighbourhood with which each vector will be compared subsequently. Such adaptivity is shown to reduce the number of undetected outliers, even when implemented in the afore validation schemes. In addition, the authors present an alternative residual definition considering vector magnitude and angle adopting a modified Gaussian-weighted distance-based averaging median. This procedure is able to adapt the degree of acceptable background fluctuations in velocity to the local displacement magnitude. The traditional, extended and recommended validation methods are numerically assessed on the basis of flow fields from an isolated vortex, a turbulent channel flow and a DNS simulation of forced isotropic turbulence. The resulting validation method is adaptive, requires no user-defined parameters and is demonstrated to yield the best performances in terms of outlier under- and over-detection. Finally, the novel validation routine is applied to the PIV analysis of experimental studies focused on the near wake behind a porous disc and on a supersonic jet, illustrating the potential gains in spatial resolution and accuracy.
Pervasive selection or is it…? Why are FST outliers sometimes so frequent?
Bierne, Nicolas; Roze, Denis; Welch, John J
2013-04-01
It is now common for population geneticists to estimate FST for a large number of loci across the genome, before testing for selected loci as being outliers to the FST distribution. One surprising result of such FST scans is the often high proportion (>1% and sometimes >10%) of outliers detected, and this is often interpreted as evidence for pervasive local adaptation. In this issue of Molecular Ecolog, Fourcade et al. (2013) observe that a particularly high rate of FST outliers has often been found in river organisms, such as fishes or damselflies, despite there being no obvious reason why selection should affect a larger proportion of the genomes of these organisms. Using computer simulations, Fourcade et al. (2013) show that the strong correlation in co-ancestry produced in long onedimensional landscapes (such as rivers, valleys, peninsulas, oceanic ridges or coastlines) greatly increases the neutral variance in FST, especially when the landscape is further reticulated into fractal networks. As a consequence, outlier tests have a high rate of false positives, unless this correlation can be taken into account. Fourcade et al.'s study highlights an extreme case of the general problem, first noticed by Robertson (1975a,b) and Nei & Maruyama (1975), that correlated co-ancestry inflates the neutral variance in FST when compared to its expectation under an island model of population structure. Similar warnings about the validity of outlier tests have appeared regularly since then but have not been widely cited in the recent genomics literature. We further emphasize that FST outliers can arise in many different ways and that outlier tests are not designed for situations where the genetic architecture of local adaptation involves many loci.
John Patrick Mpindi
Full Text Available BACKGROUND: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type ('outlier genes', a hallmark of potential oncogenes. METHODOLOGY: A new statistical method (the gene tissue index, GTI was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29 of these genes, and 17 of these 19 genes (90% showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. CONCLUSIONS/SIGNIFICANCE: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kueyoung; Choung, Sungwook; Chung, Il Moon
2017-05-01
A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods - the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) - that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.
Residuals and outliers in replicate design crossover studies.
Schall, Robert; Endrenyi, Laszlo; Ring, Arne
2010-07-01
Outliers in bioequivalence trials may arise through various mechanisms, requiring different interpretation and handling of such data points. For example, regulatory authorities might permit exclusion from analysis of outliers caused by product or process failure, while exclusion of outliers caused by subject-by-treatment interaction generally is not acceptable. In standard 2 x 2 crossover studies it is not possible to distinguish between relevant types of outliers based on statistical criteria alone. However, in replicate design (2-treatment, 4-period) crossover studies three types of outliers can be distinguished: (i) Subject outliers are usually unproblematic, at least regarding the analysis of bioequivalence, and may require no further action; (ii) Subject-by-formulation outliers may affect the outcome of the bioequivalence test but generally cannot simply be removed from analysis; and (iii) Removal of single-data-point outliers from analysis may be justified in certain cases. As a very simple but effective diagnostic tool for the identification and classification of outliers in replicate design crossover studies we propose to calculate and plot three types of residual corresponding to the three different types of outliers that can be distinguished. The residuals are obtained from four mutually orthogonal linear contrasts of the four data points associated with each subject. If preferred, outlier tests can be applied to the resulting sets of residuals after suitable standardization.
GNSS spoofing detection: Theoretical analysis and performance of the Ratio Test metric in open sky
Jie Huang
2016-03-01
Full Text Available Nowadays more and more applications rely on the information provided by Global Navigation Satellite Systems (GNSSs, but the vulnerability of GNSS signals to interference, jamming and spoofing is a growing concern. Among all the possible sources of intentional interference, spoofing is extremely deceptive and sinister. In fact, the victim receiver may not be able to warn the user and discern between authentic and false signals. For this reason, a receiver featuring spoofing detection capabilities might become a need in many cases. Different types of spoofing detection algorithms have been presented in recent literature. One of the first, referred to as Ratio Metric, allows for the monitoring of possible distortions in the signal correlation. The effectiveness of the Ratio Test has been widely discussed and demonstrated, while in this paper we analyze its performance, proposing a mathematical model that is used to assess the false alarm and detection probabilities.
On Approximating String Selection Problems with Outliers
Boucher, Christina; Levy, Avivit; Pritchard, David; Weimann, Oren
2012-01-01
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
Motion artifact cancellation and outlier rejection for clip-type ppg-based heart rate sensor.
Shimazaki, Takunori; Hara, Shinsuke; Okuhata, Hiroyuki; Nakamura, Hajime; Kawabata, Takashi
2015-01-01
Heart rate sensing can be used to not only understand exercise intensity but also detect life-critical condition during sports activities. To reduce stress during exercise and attach heart rate sensor easily, we developed a clip-type photoplethysmography (PPG)-based heart rate sensor. The sensor can be attached just by hanging it to the waist part of undershorts, and furthermore, it employs the motion artifact (MA) cancellation technique. However, due to its low contact pressure, sudden jumps and drops, which are called "outliers," are often observed in the sensed heart rate, so we also developed a simple outlier rejection technique. By an experiment using five male subjects (4 sets per subject), we confirmed the MA cancellation and outlier rejection capabilities.
Cao, Xiaochun; Wang, Xiao; Jin, Di; Cao, Yixin; He, Dongxiao
2013-10-21
Community detection is important for understanding networks. Previous studies observed that communities are not necessarily disjoint and might overlap. It is also agreed that some outlier vertices participate in no community, and some hubs in a community might take more important roles than others. Each of these facts has been independently addressed in previous work. But there is no algorithm, to our knowledge, that can identify these three structures altogether. To overcome this limitation, we propose a novel model where vertices are measured by their centrality in communities, and define the identification of overlapping communities, hubs, and outliers as an optimization problem, calculated by nonnegative matrix factorization. We test this method on various real networks, and compare it with several competing algorithms. The experimental results not only demonstrate its ability of identifying overlapping communities, hubs, and outliers, but also validate its superior performance in terms of clustering quality.
Latent Clustering Models for Outlier Identification in Telecom Data
Ye Ouyang
2016-01-01
Full Text Available Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA and time-dependent Gaussian Mixture Models (time-GMM. These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information. We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.
Poland’s Trade with East Asia: An Outlier Approach
Tseng Shoiw-Mei
2015-12-01
Full Text Available Poland achieved an excellent reputation for economic transformation during the recent global recession. The European debt crisis, however, quickly forced the reorientation of Poland’s trade outside of the European Union (EU, especially toward the dynamic region of East Asia. This study analyzes time series data from 1999 to 2013 to detect outliers in order to determine the bilateral trade paths between Poland and each East Asian country during the events of Poland’s accession to the EU in 2004, the global financial crisis from 2008 to 2009, and the European debt crisis from 2010 to 2013. From the Polish standpoint, the results showed significantly clustering outliers in the above periods and in the general trade paths from dependence through distancing and improvement to the chance of approaching East Asian partners. This study also shows that not only China but also several other countries present an excellent opportunity for boosting bilateral trade, especially with regard to Poland’s exports.
Signal to Noise Ratio Estimations for a Volcanic ASH Detection Lidar. Case Study: The Met Office
Georgoussis, George; Adam, Mariana; Avdikos, George
2016-06-01
In this paper we calculate the Signal-to-Noise (SNR) ratio of a 3-channel commercial (Raymetics) volcanic ash detection system, (LR111-D300), already operating under Met Office organization. The methodology for the accurate estimation is presented for day and nighttime conditions. The results show that SNR values are higher than 10 for ranges up to 13 km for both nighttime and daytime conditions. This is a quite good result compared with other values presented in bibliography and proves that such system is able to detect volcanic ash over a range of 20 km.
Signal to Noise Ratio Estimations for a Volcanic ASH Detection Lidar. Case Study: The Met Office
Georgoussis George
2016-01-01
Full Text Available In this paper we calculate the Signal-to-Noise (SNR ratio of a 3-channel commercial (Raymetics volcanic ash detection system, (LR111-D300, already operating under Met Office organization. The methodology for the accurate estimation is presented for day and nighttime conditions. The results show that SNR values are higher than 10 for ranges up to 13 km for both nighttime and daytime conditions. This is a quite good result compared with other values presented in bibliography and proves that such system is able to detect volcanic ash over a range of 20 km.
Radiation detection method and system using the sequential probability ratio test
Nelson, Karl E.; Valentine, John D.; Beauchamp, Brock R.
2007-07-17
A method and system using the Sequential Probability Ratio Test to enhance the detection of an elevated level of radiation, by determining whether a set of observations are consistent with a specified model within a given bounds of statistical significance. In particular, the SPRT is used in the present invention to maximize the range of detection, by providing processing mechanisms for estimating the dynamic background radiation, adjusting the models to reflect the amount of background knowledge at the current point in time, analyzing the current sample using the models to determine statistical significance, and determining when the sample has returned to the expected background conditions.
N. M. S. M. Kadhim
2015-03-01
Full Text Available Very-High-Resolution (VHR satellite imagery is a powerful source of data for detecting and extracting information about urban constructions. Shadow in the VHR satellite imageries provides vital information on urban construction forms, illumination direction, and the spatial distribution of the objects that can help to further understanding of the built environment. However, to extract shadows, the automated detection of shadows from images must be accurate. This paper reviews current automatic approaches that have been used for shadow detection from VHR satellite images and comprises two main parts. In the first part, shadow concepts are presented in terms of shadow appearance in the VHR satellite imageries, current shadow detection methods, and the usefulness of shadow detection in urban environments. In the second part, we adopted two approaches which are considered current state-of-the-art shadow detection, and segmentation algorithms using WorldView-3 and Quickbird images. In the first approach, the ratios between the NIR and visible bands were computed on a pixel-by-pixel basis, which allows for disambiguation between shadows and dark objects. To obtain an accurate shadow candidate map, we further refine the shadow map after applying the ratio algorithm on the Quickbird image. The second selected approach is the GrabCut segmentation approach for examining its performance in detecting the shadow regions of urban objects using the true colour image from WorldView-3. Further refinement was applied to attain a segmented shadow map. Although the detection of shadow regions is a very difficult task when they are derived from a VHR satellite image that comprises a visible spectrum range (RGB true colour, the results demonstrate that the detection of shadow regions in the WorldView-3 image is a reasonable separation from other objects by applying the GrabCut algorithm. In addition, the derived shadow map from the Quickbird image indicates
李庆华; 李新; 蒋盛益
2005-01-01
异常检测是数据挖掘领域研究的最基本的问题之一,它在欺诈甄别、气象预报、客户分类和入侵检测等方面有广泛的应用.针对网络入侵检测的需求提出了一种新的基于混合属性聚类的异常挖掘算法,并且依据异常点(outliers)是数据集中的稀有点这一本质,给出了一种新的数据相似性和异常度的定义.本文所提出算法具有线性时间复杂度,在KDDCUP99和Wisconsin Prognosis BreastCancer数据集上的实验表明,算本法在提供了近似线性时间复杂度和很好的可扩展性的同时,能够较好的发现数据集中的异常点.
Cram, Peter; Lu, Xin; Kates, Stephen L; Li, Yue; Miller, Benjamin J
2011-07-01
Little is known about readmission rates for total hip and total knee arthroplasty (THA and TKA). Our objective was to examine readmission rates and whether hospitals with high and low readmission rates at baseline remain outliers in subsequent years. We identified Medicare beneficiaries who underwent THA (N = 245 995) and TKA (N = 517 867) between 2003 and 2005. We created four different hospital cohorts: low and high volume for THA and TKA. We calculated 30-day risk-standardized readmission rates (RSRRs) for each hospital for each year. Hospitals were defined as having low (lowest 25% of all hospitals), high (highest 25% of hospitals), and intermediate readmission rates (all others) for each year. Hospitals were labeled outliers if they had consistently low or high readmission rates for all years. We examined the number and characteristics of outlier and nonoutlier hospitals. Unadjusted readmission rates in 2003 for THA ranged from 0% to 94.7% (inter-quartile range: 0%-7.0%) and for TKA from 0% to 94.4% (inter-quartile range: 0.7%-5.9%). Of 255 low-volume THA hospitals with low readmission rates in 2003 (RSRRs ≤3.5%), 34 were outliers for all 3 years-significantly more than predicted (P outliers for all 3 years (P Outlier and nonoutlier hospitals did not differ in meaningful ways (teaching status and staffing ratios). Results were similar for other hospital cohorts. Using a 3-year window allows for identification of hospitals with consistently higher and lower readmission rates than predicted.
Improved anomaly detection using multi-scale PLS and generalized likelihood ratio test
Madakyaru, Muddu
2017-02-16
Process monitoring has a central role in the process industry to enhance productivity, efficiency, and safety, and to avoid expensive maintenance. In this paper, a statistical approach that exploit the advantages of multiscale PLS models (MSPLS) and those of a generalized likelihood ratio (GLR) test to better detect anomalies is proposed. Specifically, to consider the multivariate and multi-scale nature of process dynamics, a MSPLS algorithm combining PLS and wavelet analysis is used as modeling framework. Then, GLR hypothesis testing is applied using the uncorrelated residuals obtained from MSPLS model to improve the anomaly detection abilities of these latent variable based fault detection methods even further. Applications to a simulated distillation column data are used to evaluate the proposed MSPLS-GLR algorithm.
Detection of spatial variations in the (D/H) ratio in the local interstellar medium
Vidal-Madjar, Alfred; Lemoine, Martin; Ferlet, Roger; Hebrard, Guillaume; Koester, Detlev; Audouze, Jean; Casse, Michel; Vangioni-Flam, Elisabeth; Webb, John
1998-10-01
We present high resolution (Delta lambda = 3.7 km.s^{-1}) HST-GHRS observations of the DA white dwarf G191-B2B, and derive the interstellar D/H ratio on the line of sight. We have observed and analysed simultaneously the interstellar lines of Hi, D i, N i, O i, Si ii and Si iii. We detect three absorbing clouds, and derive a total Hi\\ column density N(Hi)=2.4m0.1 \\times1018cm^{-2}, confirming our Cycle 1 estimate, but in disagreement with other previous measurements. We derive an average D/H ratio over the three absorbing clouds N(D i)_total/N(Hi)_total=1.12m 0.08 \\times 10^{-5}, in disagreement with the previously reported value of the local D/H as reported by Linsky et al. (1995) toward Capella. We re-analyze the GHRS data of the Capella line of sight, and confirm their estimate, as we find (D/H)_Capella=1.56m 0.1 \\times 10^{-5} in the Local Interstellar Cloud in which the solar system is embedded. This shows that the D/H ratio varies by at least im30% within the local interstellar medium. Furthermore, the Local Interstellar Cloud is also detected toward G191-B2B, and we show that the D/H ratio in this component, toward G191-B2B, can be made compatible with that derived toward Capella. However, this comes at the expense of a much smaller value for the D/H ratio as averaged over the other two components, of order 0.9\\times10^{-5}, and in such a way that the D/H ratio as averaged over all three components remains at the above value, {i.e.} (D/H)_Total=1.12\\times10^{-5}$. We thus conclude that, either the D/H ratio varies from cloud to cloud, and/or the D/H ratio varies within the Local Interstellar Cloud, in which the Sun is embedded, although our observations neither prove nor disprove this latter possibility. Based on observations with the NASA/ESA Hubble Space Telescope, obtained at the Hubble Space Telescope Science Institute which is operated by the Association of Universities for Research in Astronomy Inc., under NASA contract NAS5-26555.
Yang, Jie; McArdle, Conor; Daniels, Stephen
2014-01-01
A Similarity Ratio Analysis (SRA) method is proposed for early-stage Fault Detection (FD) in plasma etching processes using real-time Optical Emission Spectrometer (OES) data as input. The SRA method can help to realise a highly precise control system by detecting abnormal etch-rate faults in real-time during an etching process. The method processes spectrum scans at successive time points and uses a windowing mechanism over the time series to alleviate problems with timing uncertainties due to process shift from one process run to another. A SRA library is first built to capture features of a healthy etching process. By comparing with the SRA library, a Similarity Ratio (SR) statistic is then calculated for each spectrum scan as the monitored process progresses. A fault detection mechanism, named 3-Warning-1-Alarm (3W1A), takes the SR values as inputs and triggers a system alarm when certain conditions are satisfied. This design reduces the chance of false alarm, and provides a reliable fault reporting service. The SRA method is demonstrated on a real semiconductor manufacturing dataset. The effectiveness of SRA-based fault detection is evaluated using a time-series SR test and also using a post-process SR test. The time-series SR provides an early-stage fault detection service, so less energy and materials will be wasted by faulty processing. The post-process SR provides a fault detection service with higher reliability than the time-series SR, but with fault testing conducted only after each process run completes.
Jie Yang
Full Text Available A Similarity Ratio Analysis (SRA method is proposed for early-stage Fault Detection (FD in plasma etching processes using real-time Optical Emission Spectrometer (OES data as input. The SRA method can help to realise a highly precise control system by detecting abnormal etch-rate faults in real-time during an etching process. The method processes spectrum scans at successive time points and uses a windowing mechanism over the time series to alleviate problems with timing uncertainties due to process shift from one process run to another. A SRA library is first built to capture features of a healthy etching process. By comparing with the SRA library, a Similarity Ratio (SR statistic is then calculated for each spectrum scan as the monitored process progresses. A fault detection mechanism, named 3-Warning-1-Alarm (3W1A, takes the SR values as inputs and triggers a system alarm when certain conditions are satisfied. This design reduces the chance of false alarm, and provides a reliable fault reporting service. The SRA method is demonstrated on a real semiconductor manufacturing dataset. The effectiveness of SRA-based fault detection is evaluated using a time-series SR test and also using a post-process SR test. The time-series SR provides an early-stage fault detection service, so less energy and materials will be wasted by faulty processing. The post-process SR provides a fault detection service with higher reliability than the time-series SR, but with fault testing conducted only after each process run completes.
Adaptive Local Outlier Probability for Dynamic Process Monitoring
Yuxin Ma; Hongbo Shi; Mengling Wang
2014-01-01
Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerical y efficient moving window local outlier probability algorithm is proposed. Its key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Final y, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.
Differentiation of thyroid lesion detected by FDG PET/CT using SUV ratio
Kim, Bom Sahn; Kang, Won Jun; Lee, Dong Soo; Chung, June Key; Lee, Myung Chul [Seoul National Univ. College of Medicine, Seoul (Korea, Republic of)
2007-07-01
We investigated the usefulness of SUV ratio to discriminate focal thyroid lesion incidentally detected on 18F-FDG PET/CT (FDG PET) in patients with malignant disease. A total of 2167 subjects with malignant tumor underwent PET/CT for staging. Forty-five of 2167 subjects (2.1%) showed hypermetabolic thyroid lesions on FDG PET. Of 45, 21 lesions were confirmed by pathology (n = 16) or follow up exam (n=5). Seventeen patients had focal FDG uptakes, while 4 patients had diffuse thyroid uptakes. Standardized uptake value (SUV) was measured by drawing region of interest (ROI) on bilateral thyroid lobes and liver. From 21 patients, 12 thyroid lesions were confirmed as malignant lesions and 9 lesions as benign lesions. All of bilateral thyroid FDG uptakes were determined as benign disease such as thyroiditis. From seventeen focal thyroid incidentaloma, FDG PET had 100 % (12/12) of sensitivity and 60 % (3/5) of specificity, retrospectively. Malignant nodules had a significantly higher lesion to liver ratio than those of benign nodules (2.10.9 vs. 1.20.6, p=0.029). With ROC curve, the best cut-off value of lesion to liver was 1.0 with sensitivity of 100% and specificity of 60 % (area under the curve=0.783). The SUV ratio of lesion to contralateral lobe do not have statistical significance to determine malignancy (3.72.1 vs. 2.61.7, p=0.079). This study showed that focal thyroidal FDG uptake detected by FDG PET could be differentiated with best performance by SUV ratio of lesion to liver.
Generalized FMD Detection for Spectrum Sensing Under Low Signal-to-Noise Ratio
Lin, Feng; Hu, Zhen; Hou, Shujie; Browning, James P; Wicks, Michael C
2012-01-01
Spectrum sensing is a fundamental problem in cognitive radio. We propose a function of covariance matrix based detection algorithm for spectrum sensing in cognitive radio network. Monotonically increasing property of function of matrix involving trace operation is utilized as the cornerstone for this algorithm. The advantage of proposed algorithm is it works under extremely low signal-to-noise ratio, like lower than -30 dB with limited sample data. Theoretical analysis of threshold setting for the algorithm is discussed. A performance comparison between the proposed algorithm and other state-of-the-art methods is provided, by the simulation on captured digital television (DTV) signal.
Carbon isotope ratio mass spectrometry for detection of endogenous steroid use: a testing strategy.
Ahrens, Brian D; Butch, Anthony W
2013-07-01
Isotope ratio mass spectrometry (IRMS) testing is performed to determine if an atypical steroid profile is due to administration of an endogenous steroid. Androsterone (Andro) and etiocholanolone (Etio), and/or the androstanediols (5α- and 5β-androstane-3α,17β-diol) are typically analyzed by IRMS to determine the (13) C/(12) C ratio. The ratios of these target compounds are compared to the (13) C/(12) C ratio of an endogenous reference compound (ERC) such as 5β-pregnane-3α,20α-diol (Pdiol). Concentrations of Andro and Etio are high so (13) C/(12) C ratios can easily be measured in most urine samples. Despite the potentially improved sensitivity of the androstanediols for detecting the use of some testosterone formulations, additional processing steps are often required that increase labour costs and turnaround times. Since this can be problematic when performing large numbers of IRMS measurements, we established thresholds for Andro and Etio that can be used to determine the need for additional androstanediol testing. Using these criteria, 105 out of 2639 urine samples exceeded the Andro and/or Etio thresholds, with 52 of these samples being positive based on Andro and Etio IRMS testing alone. The remaining 53 urine samples had androstanediol IRMS testing performed and 3 samples were positive based on the androstanediol results. A similar strategy was used to establish a threshold for Pdiol to identify athletes with relatively (13) C-depleted values so that an alternative ERC can be used to confirm or establish a true endogenous reference value. Adoption of a similar strategy by other laboratories can significantly reduce IRMS sample processing and analysis times, thereby increasing testing capacity.
Detection of Bleeding in Wireless Capsule Endoscopy Images Using Range Ratio Color
Al-Rahayfeh, Amer A; 10.5121/ijma.2010.2201
2010-01-01
Wireless Capsule Endoscopy (WCE) is device to detect abnormalities in colon,esophagus,small intestinal and stomach, to distinguish bleeding in WCE images from non bleeding is a hard job by human reviewing and very time consuming. Consequently, automation for classifying bleeding frames not only will expedite the process but will reduce the burden on the doctors. Using the purity of the red color we can detect the Bleeding areas in WCE images. But, we could find various intensity of red color values in different parts of the small intestinal,so it is not enough to depend on the red color feature alone. We select RGB(Red,Green,Blue) because it takes raw level values and it is easy to use. In this paper we will put range ratio color for each of R,G,and B. Therefore, we divide each image into multiple pixels and apply the range ratio color condition for each pixel. Then we count the number of the pixels that achieved our condition. If the number of pixels grater than zero, then the frame is classified as a bleedi...
Fisher's linear discriminant ratio based threshold for moving human detection in thermal video
Sharma, Lavanya; Yadav, Dileep Kumar; Singh, Annapurna
2016-09-01
In video surveillance, the moving human detection in thermal video is a critical phase that filters out redundant information to extract relevant information. The moving object detection is applied on thermal video because it penetrate challenging problems such as dynamic issues of background and illumination variation. In this work, we have proposed a new background subtraction method using Fisher's linear discriminant ratio based threshold. This threshold is investigated automatically during run-time for each pixel of every sequential frame. Automatically means to avoid the involvement of external source such as programmer or user for threshold selection. This threshold provides better pixel classification at run-time. This method handles problems generated due to multiple behavior of background more accurately using Fisher's ratio. It maximizes the separation between object pixel and the background pixel. To check the efficacy, the performance of this work is observed in terms of various parameters depicted in analysis. The experimental results and their analysis demonstrated better performance of proposed method against considered peer methods.
A Comparison of Best Fit Lines for Data with Outliers
Glaister, P.
2005-01-01
Three techniques for determining a straight line fit to data are compared. The methods are applied to a range of datasets containing one or more outliers, and to a specific example from the field of chemistry. For the method which is the most resistant to the presence of outliers, a Microsoft Excel spreadsheet, as well as two Matlab routines, are…
Araki, Shin; Shimadera, Hikari; Yamamoto, Kouhei; Kondo, Akira
2017-03-01
Land use regression (LUR) or regression kriging have been widely used to estimate spatial distribution of air pollutants especially in health studies. The quality of observations is crucial to these methods because they are completely dependent on observations. When monitoring data contain biases or uncertainties, estimated map will not be reliable. In this study, we apply the spatial outlier detection method, which is widely used in soil science, to observations of PM2.5 and NO2 obtained from the regulatory monitoring network in Japan. The spatial distributions of annual means are modelled both by LUR and regression kriging using the data sets with and without the detected outliers respectively and the obtained results are compared to examine the effect of spatial outliers. Spatial outliers remarkably deteriorate the prediction accuracy except for that of LUR model for NO2. This discrepancy of the effect might be due to the difference in the characteristics of PM2.5 and NO2. The difference in the number of observations makes a limited contribution to it. Although further investigation at different spatial scales is required, our study demonstrated that the spatial outlier detection method is an effective procedure for air pollutant data and should be applied to it when observation based prediction methods are used to generate concentration maps.
Muscle MRS detects elevated PDE/ATP ratios prior to fatty infiltration in Becker muscular dystrophy.
Wokke, B H; Hooijmans, M T; van den Bergen, J C; Webb, A G; Verschuuren, J J; Kan, H E
2014-11-01
Becker muscular dystrophy (BMD) is characterized by progressive muscle weakness. Muscles show structural changes (fatty infiltration, fibrosis) and metabolic changes, both of which can be assessed using MRI and MRS. It is unknown at what stage of the disease process metabolic changes arise and how this might vary for different metabolites. In this study we assessed metabolic changes in skeletal muscles of Becker patients, both with and without fatty infiltration, quantified via Dixon MRI and (31) P MRS. MRI and (31) P MRS scans were obtained from 25 Becker patients and 14 healthy controls using a 7 T MR scanner. Five lower-leg muscles were individually assessed for fat and muscle metabolite levels. In the peroneus, soleus and anterior tibialis muscles with non-increased fat levels, PDE/ATP ratios were higher (P < 0.02) compared with controls, whereas in all muscles with increased fat levels PDE/ATP ratios were higher compared with healthy controls (P ≤ 0.05). The Pi /ATP ratio in the peroneus muscles was higher in muscles with increased fat fractions (P = 0.005), and the PCr/ATP ratio was lower in the anterior tibialis muscles with increased fat fractions (P = 0.005). There were no other significant changes in metabolites, but an increase in tissue pH was found in all muscles of the total group of BMD patients in comparison with healthy controls (P < 0.05). These findings suggest that (31) P MRS can be used to detect early changes in individual muscles of BMD patients, which are present before the onset of fatty infiltration.
Bonnice, W. F.; Motyka, P.; Wagner, E.; Hall, S. R.
1986-01-01
The performance of the orthogonal series generalized likelihood ratio (OSGLR) test in detecting and isolating commercial aircraft control surface and actuator failures is evaluated. A modification to incorporate age-weighting which significantly reduces the sensitivity of the algorithm to modeling errors is presented. The steady-state implementation of the algorithm based on a single linear model valid for a cruise flight condition is tested using a nonlinear aircraft simulation. A number of off-nominal no-failure flight conditions including maneuvers, nonzero flap deflections, different turbulence levels and steady winds were tested. Based on the no-failure decision functions produced by off-nominal flight conditions, the failure detection and isolation performance at the nominal flight condition was determined. The extension of the algorithm to a wider flight envelope by scheduling on dynamic pressure and flap deflection is examined. Based on this testing, the OSGLR algorithm should be capable of detecting control surface failures that would affect the safe operation of a commercial aircraft. Isolation may be difficult if there are several surfaces which produce similar effects on the aircraft. Extending the algorithm over the entire operating envelope of a commercial aircraft appears feasible.
Prospects for detection of gravitational waves from intermediate-mass-ratio inspirals.
Brown, Duncan A; Brink, Jeandrew; Fang, Hua; Gair, Jonathan R; Li, Chao; Lovelace, Geoffrey; Mandel, Ilya; Thorne, Kip S
2007-11-16
We explore prospects for detecting gravitational waves from stellar-mass compact objects spiraling into intermediate mass black holes (BHs) M approximately 50M to 350M) with ground-based observatories. We estimate a rate for such intermediate-mass-ratio inspirals of Advanced LIGO. We show that if the central body is not a BH but its metric is stationary, axisymmetric, reflection symmetric and asymptotically flat, then the waves will likely be triperiodic, as for a BH. We suggest that the evolutions of the waves' three fundamental frequencies and of the complex amplitudes of their spectral components encode (in principle) details of the central body's metric, the energy and angular momentum exchange between the central body and the orbit, and the time-evolving orbital elements. We estimate that advanced ground-based detectors can constrain central body deviations from a BH with interesting accuracy.
Outlier reset CUSUM for the exploration of copy number alteration data.
Lai, Yinglei; Gastwirth, Joseph L
2015-08-01
Copy number alteration (CNA) data have been collected to study disease related chromosomal amplifications and deletions. The CUSUM procedure and related plots have been used to explore CNA data. In practice, it is possible to observe outliers. Then, modifications of the CUSUM procedure may be required. An outlier reset modification of the CUSUM (ORCUSUM) procedure is developed in this paper. The threshold value for detecting outliers or significant CUSUMs can be derived using results for sums of independent truncated normal random variables. Bartel's non-parametric test for autocorrelation is also introduced to the analysis of copy number variation data. Our simulation results indicate that the ORCUSUM procedure can still be used even in the situation where the degree of autocorrelation level is low. Furthermore, the results show the outlier's impact on the traditional CUSUM's performance and illustrate the advantage of the ORCUSUM's outlier reset feature. Additionally, we discuss how the ORCUSUM can be applied to examine CNA data with a simulated data set. To illustrate the procedure, recently collected single nucleotide polymorphism (SNP) based CNA data from The Cancer Genome Atlas (TCGA) Research Network is analyzed. The method is applied to a data set collected in an ovarian cancer study. Three cytogenetic bands (cytobands) are considered to illustrate the method. The cytobands 11q13 and 9p21 have been shown to be related to ovarian cancer. They are presented as positive examples. The cytoband 3q22, which is less likely to be disease related, is presented as a negative example. These results illustrate the usefulness of the ORCUSUM procedure as an exploratory tool for the analysis of SNP based CNA data.
Applying Artificial Neural Network to Predict Semiconductor Machine Outliers
Keng-Chieh Yang
2013-01-01
Full Text Available Advanced semiconductor processes are produced by very sophisticated and complex machines. The demand of higher precision for the monitoring system is becoming more vital when the devices are shrunk into smaller sizes. The high quality and high solution checking mechanism must rely on the advanced information systems, such as fault detection and classification (FDC. FDC can timely detect the deviations of the machine parameters when the parameters deviate from the original value and exceed the range of the specification. This study adopts backpropagation neural network model and gray relational analysis as tools to analyze the data. This study uses FDC data to detect the semiconductor machine outliers. Data collected for network training are in three different intervals: 6-month period, 3-month period, and one-month period. The results demonstrate that 3-month period has the best result. However, 6-month period has the worst result. The findings indicate that machine deteriorates quickly after continuous use for 6 months. The equipment engineers and managers can take care of this phenomenon and make the production yield better.
Sparse maximum harmonics-to-noise-ratio deconvolution for weak fault signature detection in bearings
Miao, Yonghao; Zhao, Ming; Lin, Jing; Xu, Xiaoqiang
2016-10-01
De-noising and enhancement of the weak fault signature from the noisy signal are crucial for fault diagnosis, as features are often very weak and masked by the background noise. Deconvolution methods have a significant advantage in counteracting the influence of the transmission path and enhancing the fault impulses. However, the performance of traditional deconvolution methods is greatly affected by some limitations, which restrict the application range. Therefore, this paper proposes a new deconvolution method, named sparse maximum harmonics-noise-ratio deconvolution (SMHD), that employs a novel index, the harmonics-to-noise ratio (HNR), to be the objective function for iteratively choosing the optimum filter coefficients to maximize HNR. SMHD is designed to enhance latent periodic impulse faults from heavy noise signals by calculating the HNR to estimate the period. A sparse factor is utilized to further suppress the noise and improve the signal-to-noise ratio of the filtered signal in every iteration step. In addition, the updating process of the sparse threshold value and the period guarantees the robustness of SMHD. On this basis, the new method not only overcomes the limitations associated with traditional deconvolution methods, minimum entropy deconvolution (MED) and maximum correlated kurtosis deconvolution (MCKD), but visual inspection is also better, even if the fault period is not provided in advance. Moreover, the efficiency of the proposed method is verified by simulations and bearing data from different test rigs. The results show that the proposed method is effective in the detection of various bearing faults compared with the original MED and MCKD.
Harbater, Osnat; Gannot, Israel
2014-03-01
The pathogenic process of Alzheimer's Disease (AD), characterized by amyloid plaques and neurofibrillary tangles in the brain, begins years before the clinical diagnosis. Here, we suggest a novel method which may detect AD up to nine years earlier than current exams, minimally invasive, with minimal risk, pain and side effects. The method is based on previous reports which relate the concentrations of biomarkers in the Cerebrospinal Fluid (CSF) (Aβ and Tau proteins) to the future development of AD in mild cognitive impairment patients. Our method, which uses fluorescence measurements of the relative concentrations of the CSF biomarkers, replaces the lumbar puncture process required for CSF drawing. The process uses a miniature needle coupled trough an optical fiber to a laser source and a detector. The laser radiation excites fluorescent probes which were prior injected and bond to the CSF biomarkers. Using the ratio between the fluorescence intensities emitted from the two biomarkers, which is correlated to their concentration ratio, the patient's risk of developing AD is estimated. A theoretical model was developed and validated using Monte Carlo simulations, demonstrating the relation between fluorescence emission and biomarker concentration. The method was tested using multi-layered tissue phantoms simulating the epidural fat, the CSF in the sub-arachnoid space and the bone. These phantoms were prepared with different scattering and absorption coefficients, thicknesses and fluorescence concentrations in order to simulate variations in human anatomy and in the needle location. The theoretical and in-vitro results are compared and the method's accuracy is discussed.
Eigenvalue ratio detection based on exact moments of smallest and largest eigenvalues
Shakir, Muhammad
2011-01-01
Detection based on eigenvalues of received signal covariance matrix is currently one of the most effective solution for spectrum sensing problem in cognitive radios. However, the results of these schemes always depend on asymptotic assumptions since the close-formed expression of exact eigenvalues ratio distribution is exceptionally complex to compute in practice. In this paper, non-asymptotic spectrum sensing approach to approximate the extreme eigenvalues is introduced. In this context, the Gaussian approximation approach based on exact analytical moments of extreme eigenvalues is presented. In this approach, the extreme eigenvalues are considered as dependent Gaussian random variables such that the joint probability density function (PDF) is approximated by bivariate Gaussian distribution function for any number of cooperating secondary users and received samples. In this context, the definition of Copula is cited to analyze the extent of the dependency between the extreme eigenvalues. Later, the decision threshold based on the ratio of dependent Gaussian extreme eigenvalues is derived. The performance analysis of our newly proposed approach is compared with the already published asymptotic Tracy-Widom approximation approach. © 2011 ICST.
Shi, Yonggang; Lai, Rongjie; Toga, Arthur W
2011-01-01
In this paper we propose a novel system for the accurate reconstruction of cortical surfaces from magnetic resonance images. At the core of our system is a novel framework for outlier detection and pruning by integrating intrinsic Reeb analysis of Laplace-Beltrami eigen-functions with topology-preserving evolution for localized filtering of outliers, which avoids unnecessary smoothing and shrinkage of cortical regions with high curvature. In our experiments, we compare our method with FreeSurfer and illustrate that our results can better capture cortical geometry in deep sulcal regions. To demonstrate the robustness of our method, we apply it to over 1300 scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI). We show that cross-sectional group differences and longitudinal changes can be detected successfully with our method.
Detection of water molecules in inert gas based plasma by the ratios of atomic spectral lines
Bernatskiy, A. V.; Ochkin, V. N.
2017-01-01
A new approach is considered to detect the water leaks in inert plasma-forming gas present in the reactor chamber. It is made up of the intensity ratio of D α and H α spectral lines in combination with O, Ar and Xe lines intensity. The concentrations of H2O, O, H and D particles have been measured with high sensitivity. At the D2 admixture pressure {{p}{{\\text{D}\\text{2}}}} = 0.025 mbar, we used the acquisition time of 10 s to measure the rate of water molecules injected from the outside, Γ0 = 1.4 · 10-9 mbar · m3 · s-1, and the incoming water molecules to plasma, Γ = 5 ·10-11 mbar · m3 · s-1. The scaling proves that at small D2 admixtures (10-4 mbar), the leaks with the rates Γ0 ≈ 6 · 10-12 mbar · m3 · s-1 and Γ ≈ 2 · 10-13 mbar · m3 · s-1 can be detected and measured. The difference between Γ0 and Γ values is due to the high degree of H2O dissociation, which can be up to 97-98%.
Pradhan, Anagha; Kielmann, Karina; Gupte, Himanshu; Bamne, Arun; Porter, John D H; Rangan, Sheela
2010-05-20
India's Revised National Tuberculosis Control Programme (RNTCP) is deemed highly successful in terms of detection and cure rates. However, some patients experience delays in accessing diagnosis and treatment. Patients falling between the 96th and 100th percentiles for these access indicators are often ignored as atypical 'outliers' when assessing programme performance. They may, however, provide clues to understanding why some patients never reach the programme. This paper examines the underlying vulnerabilities of patients with extreme values for delays in accessing the RNTCP in Mumbai city, India. We conducted a cross-sectional study with 266 new sputum positive patients registered with the RNTCP in Mumbai. Patients were classified as 'outliers' if patient, provider and system delays were beyond the 95th percentile for the respective variable. Case profiles of 'outliers' for patient, provider and system delays were examined and compared with the rest of the sample to identify key factors responsible for delays. Forty-two patients were 'outliers' on one or more of the delay variables. All 'outliers' had a significantly lower per capita income than the remaining sample. The lack of economic resources was compounded by social, structural and environmental vulnerabilities. Longer patient delays were related to patients' perception of symptoms as non-serious. Provider delays were incurred as a result of private providers' failure to respond to tuberculosis in a timely manner. Diagnostic and treatment delays were minimal, however, analysis of the 'outliers' revealed the importance of social support in enabling access to the programme. A proxy for those who fail to reach the programme, these case profiles highlight unique vulnerabilities that need innovative approaches by the RNTCP. The focus on 'outliers' provides a less resource- and time-intensive alternative to community-based studies for understanding the barriers to reaching public health programmes.
Porter John DH
2010-05-01
Full Text Available Abstract Background India's Revised National Tuberculosis Control Programme (RNTCP is deemed highly successful in terms of detection and cure rates. However, some patients experience delays in accessing diagnosis and treatment. Patients falling between the 96th and 100th percentiles for these access indicators are often ignored as atypical 'outliers' when assessing programme performance. They may, however, provide clues to understanding why some patients never reach the programme. This paper examines the underlying vulnerabilities of patients with extreme values for delays in accessing the RNTCP in Mumbai city, India. Methods We conducted a cross-sectional study with 266 new sputum positive patients registered with the RNTCP in Mumbai. Patients were classified as 'outliers' if patient, provider and system delays were beyond the 95th percentile for the respective variable. Case profiles of 'outliers' for patient, provider and system delays were examined and compared with the rest of the sample to identify key factors responsible for delays. Results Forty-two patients were 'outliers' on one or more of the delay variables. All 'outliers' had a significantly lower per capita income than the remaining sample. The lack of economic resources was compounded by social, structural and environmental vulnerabilities. Longer patient delays were related to patients' perception of symptoms as non-serious. Provider delays were incurred as a result of private providers' failure to respond to tuberculosis in a timely manner. Diagnostic and treatment delays were minimal, however, analysis of the 'outliers' revealed the importance of social support in enabling access to the programme. Conclusion A proxy for those who fail to reach the programme, these case profiles highlight unique vulnerabilities that need innovative approaches by the RNTCP. The focus on 'outliers' provides a less resource- and time-intensive alternative to community-based studies for
Discovering Outliers of Potential Drug Toxicities Using a Large-scale Data-driven Approach.
Luo, Jake; Cisler, Ron A
2016-01-01
We systematically compared the adverse effects of cancer drugs to detect event outliers across different clinical trials using a data-driven approach. Because many cancer drugs are toxic to patients, better understanding of adverse events of cancer drugs is critical for developing therapies that could minimize the toxic effects. However, due to the large variabilities of adverse events across different cancer drugs, methods to efficiently compare adverse effects across different cancer drugs are lacking. To address this challenge, we present an exploration study that integrates multiple adverse event reports from clinical trials in order to systematically compare adverse events across different cancer drugs. To demonstrate our methods, we first collected data on 186,339 clinical trials from ClinicalTrials.gov and selected 30 common cancer drugs. We identified 1602 cancer trials that studied the selected cancer drugs. Our methods effectively extracted 12,922 distinct adverse events from the clinical trial reports. Using the extracted data, we ranked all 12,922 adverse events based on their prevalence in the clinical trials, such as nausea 82%, fatigue 77%, and vomiting 75.97%. To detect the significant drug outliers that could have a statistically high possibility of causing an event, we used the boxplot method to visualize adverse event outliers across different drugs and applied Grubbs' test to evaluate the significance. Analyses showed that by systematically integrating cross-trial data from multiple clinical trial reports, adverse event outliers associated with cancer drugs can be detected. The method was demonstrated by detecting the following four statistically significant adverse event cases: the association of the drug axitinib with hypertension (Grubbs' test, P < 0.001), the association of the drug imatinib with muscle spasm (P < 0.001), the association of the drug vorinostat with deep vein thrombosis (P < 0.001), and the association of the drug afatinib
胡奎
2013-01-01
Aiming at the features of radar measurement data, put forwards the radar data outlier elimination method based on soft-weighted K-mean distance outlier factor. At first, carry out soft weight for measurement sequence, then, maps them into feature space and compute the k-mean distance outlier factor to detect the outliers. Take one of the task measurement data of certain type radar to carry out experiment. The experimental results show that the method can effectively recognize the outlier data, and protect race measurement data with a maximum of integrity.% 针对雷达测量数据的特点，提出一种基于软加权k-均值距离异常因子的雷达数据剔野方法。对测量序列进行软性加权，再将序列映射到特征空间通过计算异常因子来对野值进行检测，并以某型雷达在某次任务中的一段测量数据进行实验。实验结果表明：该方法能很好地识别野值，最大限度地保持轨迹测量数据的完整性。
Mareck, Ute; Geyer, Hans; Flenker, Ulrich; Piper, Thomas; Thevis, Mario; Schänzer, Wilhelm
2007-01-01
According to World Anti-Doping Agency (WADA) rules (WADA Technical Document-TD2004EAAS) urine samples containing dehydroepiandrosterone (DHEA) concentrations greater than 100 ng ML(-1) shall be submitted to isotope ratio mass spectrometry (IRMS) analysis. The threshold concentration is based on the equivalent to the glucuronide, and the DHEA concentrations have to be adjusted for a specific gravity value of 1.020. In 2006, 11,012 doping control urine samples from national and international federations were analyzed in the Cologne doping control laboratory, 100 (0.9%) of them yielding concentrations of DHEA greater than 100 ng mL(-1). Sixty-eight percent of the specimens showed specific gravity values higher than 1.020, 52% originated from soccer players, 95% were taken in competition, 85% were male urines, 99% of the IRMS results did not indicate an application of testosterone or related prohormones. Only one urine sample was reported as an adverse analytical finding having 319 ng mL(-1) DHEA (screening result), more than 10,000 ng mL(-1) androsterone and depleted carbon isotope ratio values for the testosterone metabolites androsterone and etiocholanolone. Statistical evaluation showed significantly different DHEA concentrations between specimens taken in- and out-of- competition, whereas females showed smaller DHEA values than males for both types of control. Also a strong influence of the DHEA excretion on different sport disciplines was detectable. The highest DHEA values were detected for game sports (soccer, basketball, handball, ice hockey), followed by boxing and wrestling. In 2007, 6622 doping control urine samples were analyzed for 3alpha,5-cyclo-5alpha-androstan-6beta-ol-17-one (3alpha,5-cyclo), a DHEA metabolite which was described as a useful gas chromatography-mass spectrometry (GC-MS) screening marker for DHEA abuse. Nineteen urine specimens showed concentrations higher than the suggested threshold of 140 ng mL(-1), six urine samples yielded
肖健华
2003-01-01
The paper introduces the approach of outher detection based on kernel,and points out the approach is hard to be realized with the increase of sample number. In order to reduce the size of optimization,the sample selection method based on distance is proposed Through selecting samples,the computation workload and the demand of EMS memory can be decreased greatly. In the end,the real-time demand can be met.
Statistical data preparation: management of missing values and outliers.
Kwak, Sang Kyu; Kim, Jong Hae
2017-08-01
Missing values and outliers are frequently encountered while collecting data. The presence of missing values reduces the data available to be analyzed, compromising the statistical power of the study, and eventually the reliability of its results. In addition, it causes a significant bias in the results and degrades the efficiency of the data. Outliers significantly affect the process of estimating statistics (e.g., the average and standard deviation of a sample), resulting in overestimated or underestimated values. Therefore, the results of data analysis are considerably dependent on the ways in which the missing values and outliers are processed. In this regard, this review discusses the types of missing values, ways of identifying outliers, and dealing with the two.
Merging galaxies produce outliers from the Fundamental Metallicity Relation
Grønnow, Asger; Christensen, Lise
2015-01-01
From a large sample of $\\approx 170,000$ local SDSS galaxies we find that the Fundamental Metallicity Relation (FMR) has an overabundance of outliers, compared to what would be expected from a Gaussian distribution of residuals, with significantly lower metallicities than predicted from their stellar mass and star formation rate (SFR). This low-metallicity population has lower stellar masses, bimodial specific SFRs with enhanced star formation within the aperture and smaller half-light radii than the general sample, and is hence a physically distinct population. We show that they are consistent with being galaxies that are merging or have recently merged with a satellite galaxy. In this scenario, low-metallicity gas flows in from large radii, diluting the metallicity of star-forming regions and enhancing the specific SFR until the inflowing gas is processed and the metallicity has recovered. We introduce a simple model in which mergers with a mass ratio larger than a minimum dilute the central galaxy's metall...
Factors influencing hospital high length of stay outliers.
Freitas, Alberto; Silva-Costa, Tiago; Lopes, Fernando; Garcia-Lema, Isabel; Teixeira-Pinto, Armando; Brazdil, Pavel; Costa-Pereira, Altamiro
2012-08-20
The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their
Identification and influence of spatial outliers in air quality measurements
O'Leary, B. F.; Lemke, L. D.
2015-12-01
The heterogeneous nature of urban air complicates the analysis of spatial and temporal variability in air quality measurements. Evaluation of potentially inaccurate measurements (i.e., outliers) poses particularly difficult challenges in extensive air quality datasets with multiple measurements distributed in time and space. This study investigated the identification and impact of outliers in measurements of NO2, BTEX, PM2.5, and PM10 in the contiguous Detroit, Michigan, USA and Windsor, Ontario, Canada international airshed. Measurements were taken at 100 locations during September 2008 and June 2009 and modeled at a 300m by 300m scale resolution. The objective was to determine if outliers were present and, if so, to quantify the magnitude of their impact on modeled spatial pollution distributions. The study built upon previous investigations by the Geospatial Determinants of Health Outcomes Consortium that examined relationships between air pollutant distributions and asthma exacerbations in the Detroit and Windsor airshed. Four independent approaches were initially employed to identify potential outliers: boxplots, variogram clouds, difference maps, and the Local Moran's I statistic. Potential outliers were subsequently reevaluated for consistency among methods and individually assessed to select a final set of outliers. The impact of excluding individual outliers was subsequently determined by revising the spatially variable air pollution models and recalculating associations between air contaminant concentrations and asthma exacerbations in Detroit and Windsor in 2008. For the pollutants examined, revised associations revealed weaker correlations with spatial outliers removed. Nevertheless, the approach employed improves the model integrity by increasing our understanding of the spatial variability of air pollution in the built environment and providing additional insights into the association between acute asthma exacerbations and air pollution.
Saudan, Christophe; Augsburger, Marc; Mangin, Patrice; Saugy, Martial
2007-01-01
Since GHB (gamma-hydroxybutyric acid) is naturally produced in the human body, clinical and forensic toxicologists must be able to discriminate between endogenous levels and a concentration resulting from exposure. To suggest an alternative to the use of interpretative concentration cut-offs, the detection of exogenous GHB in urine specimens was investigated by means of gas chromatography/combustion/isotope ratio mass spectrometry (GC/C/IRMS). GHB was isolated from urinary matrix by successive purification on Oasis MCX and Bond Elute SAX solid-phase extraction (SPE) cartridges prior to high-performance liquid chromatography (HPLC) fractioning using an Atlantis dC18 column eluted with a mixture of formic acid and methanol. Subsequent intramolecular esterification of GHB leading to the formation of gamma-butyrolactone (GBL) was carried out to avoid introduction of additional carbon atoms for carbon isotopic ratio analysis. A precision of 0.3 per thousand was determined using this IRMS method for samples at GHB concentrations of 10 mg/L. The (13)C/(12)C ratios of GHB in samples of subjects exposed to the drug ranged from -32.1 to -42.1 per thousand, whereas the results obtained for samples containing GHB of endogenous origin at concentration levels less than 10 mg/L were in the range -23.5 to -27.0 per thousand. Therefore, these preliminary results show that a possible discrimination between endogenous and exogenous GHB can be made using carbon isotopic ratio analyses.
Robust estimation of unbalanced mixture models on samples with outliers.
Galimzianova, Alfiia; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga
2015-11-01
Mixture models are often used to compactly represent samples from heterogeneous sources. However, in real world, the samples generally contain an unknown fraction of outliers and the sources generate different or unbalanced numbers of observations. Such unbalanced and contaminated samples may, for instance, be obtained by high density data sensors such as imaging devices. Estimation of unbalanced mixture models from samples with outliers requires robust estimation methods. In this paper, we propose a novel robust mixture estimator incorporating trimming of the outliers based on component-wise confidence level ordering of observations. The proposed method is validated and compared to the state-of-the-art FAST-TLE method on two data sets, one consisting of synthetic samples with a varying fraction of outliers and a varying balance between mixture weights, while the other data set contained structural magnetic resonance images of the brain with tumors of varying volumes. The results on both data sets clearly indicate that the proposed method is capable to robustly estimate unbalanced mixtures over a broad range of outlier fractions. As such, it is applicable to real-world samples, in which the outlier fraction cannot be estimated in advance.
Outlier Ranking via Subspace Analysis in Multiple Views of the Data
Muller, Emmanuel; Assent, Ira; Iglesias, Patricia
2012-01-01
, a novel outlier ranking concept. Outrank exploits subspace analysis to determine the degree of outlierness. It considers different subsets of the attributes as individual outlier properties. It compares clustered regions in arbitrary subspaces and derives an outlierness score for each object. Its...
42 CFR 412.86 - Payment for extraordinarily high-cost day outliers.
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false Payment for extraordinarily high-cost day outliers... Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for Certain Replaced Devices Payment for Outlier Cases § 412.86 Payment for extraordinarily high-cost day outliers. For...
Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco
2016-01-01
Sampling the biodiversity is an essential step for conservation, and understanding the efficiency of sampling methods allows us to estimate the quality of our biodiversity data. Sex ratio is an important population characteristic, but until now, no study has evaluated how efficient are the sampling methods commonly used in biodiversity surveys in estimating the sex ratio of populations. We used a virtual ecologist approach to investigate whether active and passive capture methods are able to accurately sample a population’s sex ratio and whether differences in movement pattern and detectability between males and females produce biased estimates of sex-ratios when using these methods. Our simulation allowed the recognition of individuals, similar to mark-recapture studies. We found that differences in both movement patterns and detectability between males and females produce biased estimates of sex ratios. However, increasing the sampling effort or the number of sampling days improves the ability of passive or active capture methods to properly sample sex ratio. Thus, prior knowledge regarding movement patterns and detectability for species is important information to guide field studies aiming to understand sex ratio related patterns. PMID:27441554
Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco
2016-01-01
Sampling the biodiversity is an essential step for conservation, and understanding the efficiency of sampling methods allows us to estimate the quality of our biodiversity data. Sex ratio is an important population characteristic, but until now, no study has evaluated how efficient are the sampling methods commonly used in biodiversity surveys in estimating the sex ratio of populations. We used a virtual ecologist approach to investigate whether active and passive capture methods are able to accurately sample a population's sex ratio and whether differences in movement pattern and detectability between males and females produce biased estimates of sex-ratios when using these methods. Our simulation allowed the recognition of individuals, similar to mark-recapture studies. We found that differences in both movement patterns and detectability between males and females produce biased estimates of sex ratios. However, increasing the sampling effort or the number of sampling days improves the ability of passive or active capture methods to properly sample sex ratio. Thus, prior knowledge regarding movement patterns and detectability for species is important information to guide field studies aiming to understand sex ratio related patterns.
Detecting animal by-product intake using stable isotope ratio mass spectrometry (IRMS).
da Silva, D A F; Biscola, N P; Dos Santos, L D; Sartori, M M P; Denadai, J C; da Silva, E T; Ducatti, C; Bicudo, S D; Barraviera, B; Ferreira, R S
2016-11-01
Sheep are used in many countries as food and for manufacturing bioproducts. However, when these animals consume animal by-products (ABP), which is widely prohibited, there is a risk of transmitting scrapie - a fatal prion disease in human beings. Therefore, it is essential to develop sensitive methods to detect previous ABP intake to select safe animals for producing biopharmaceuticals. We used stable isotope ratio mass spectrometry (IRMS) for (13)C and (15)N to trace animal proteins in the serum of three groups of sheep: 1 - received only vegetable protein (VP) for 89 days; 2 - received animal and vegetable protein (AVP); and 3 - received animal and vegetable protein with animal protein subsequently removed (AVPR). Groups 2 and 3 received diets with 30% bovine meat and bone meal (MBM) added to a vegetable diet (from days 16-89 in the AVP group and until day 49 in the AVPR group, when MBM was removed). The AVPR group showed (15)N equilibrium 5 days after MBM removal (54th day). Conversely, (15)N equilibrium in the AVP group occurred 22 days later (76th day). The half-life differed between these groups by 3.55 days. In the AVPR group, (15)N elimination required 53 days, which was similar to this isotope's incorporation time. Turnover was determined based on natural (15)N signatures. IRMS followed by turnover calculations was used to evaluate the time period for the incorporation and elimination of animal protein in sheep serum. The δ(13)C and δ(15)N values were used to track animal protein in the diet. This method is biologically and economically relevant for the veterinary field because it can track protein over time or make a point assessment of animal feed with high sensitivity and resolution, providing a low-cost analysis coupled with fast detection. Isotopic profiles could be measured throughout the experimental period, demonstrating the potential to use the method for traceability and certification assessments. Copyright © 2016 Elsevier Ltd. All rights
Feng, Xiao-Jing; Jiang, Guo-Fang; Fan, Zhou
2015-09-03
Identification of loci under divergent selection is a key step in understanding the evolutionary process because those loci are responsible for the genetic variations that affect fitness in different environments. Understanding how environmental forces give rise to adaptive genetic variation is a challenge in pest control. Here, we performed an amplified fragment length polymorphism (AFLP) genome scan in populations of the bamboo locust, Ceracris kiangsu, to search for candidate loci that are influenced by selection along an environmental gradient in southern China. In outlier locus detection, loci that demonstrate significantly higher or lower among-population genetic differentiation than expected under neutrality are identified as outliers. We used several outlier detection methods to study the features of C. kiangsu, including method DFDIST, BayeScan, and logistic regression. A total of 97 outlier loci were detected in the C. kiangsu genome with very high statistical supports. Moreover, the results suggested that divergent selection arising from environmental variation has been driven by differences in temperature, precipitation, humidity and sunshine. These findings illustrate that divergent selection and potential local adaptation are prevalent in locusts despite seemingly high levels of gene flow. Thus, we propose that native environments in each population may induce divergent natural selection.
Sawyer, Nicola; Blennerhassett, John; Lambert, Ramon; Sheehan, Paul; Vasikaran, Samuel D
2014-07-01
False-positive cardiac troponin (Tn) results caused by outliers have been reported on various analytical platforms. We have compared the precision profile and outlier rate of the Abbott Diagnostics contemporary troponin I (TnI) assay with their high sensitivity (hs) TnI assay. Three studies were conducted over a 10-month period using routine patients' samples. TnI was measured in duplicate using the contemporary TnI assay in Study 1 and Study 2 (n = 7011 and 7089) and the hs-TnI assay in Study 3 (n = 1522). Critical outliers were defined as duplicate results whose absolute difference exceeded a critical difference (CD = z x √2 x SDAnalytical) at a probability level of 0.0005, with one of the results on the opposite side of the decision limit to its partner. The TnI concentration at 10% imprecision (coefficient of variation) for the contemporary TnI assay was 0.034 µg/L (Study 1) and 0.042 µg/L (Study 2), and 0.006 µg/L (6 ng/L) for the hs-TnI assay. The critical outlier rates for the contemporary TnI assay were 0.51% (Study 1) and 0.37% (Study 2) using a cut-off of 0.04 µg/L, and 0% for the hs-TnI assay using gender-specific cut-offs. The significant number of critical outliers detected using the contemporary TnI assay may pose a risk for misclassification of patients. By contrast, no critical outliers were detected using the hs-TnI assay. However, the total outlier rates for both assays were significantly higher than the expected variability of either assay. The cause of these outliers remains unclear. © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers
Hachey Mark
2009-10-01
Full Text Available Abstract Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier detection has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for
Kong, Tian Fook; Shen, Xinhui; Marcos, Yang, Chun
2017-06-01
We present a microfluidic impedance device for achieving both the flow ratio sensing and the conductivity difference detection between sample stream and reference buffer. By using a flow focusing configuration, with the core flow having a higher conductivity sample than the sheath flow streams, the conductance of the device varies linearly with the flow ratio, with R2 > 0.999. On the other hand, by using deionized (DI)-water sheath flow as a reference, we can detect the difference in conductivity between the buffer of core flow and sheath DI-water with a high detection sensitivity of up to 1 nM of sodium chloride solution. Our study provides a promising approach for on-chip flow mixing characterization and bacteria detection.
A Multi-Objective Genetic Algorithm for Outlier Removal.
Nahum, Oren E; Yosipof, Abraham; Senderowitz, Hanoch
2015-12-28
Quantitative structure activity relationship (QSAR) or quantitative structure property relationship (QSPR) models are developed to correlate activities for sets of compounds with their structure-derived descriptors by means of mathematical models. The presence of outliers, namely, compounds that differ in some respect from the rest of the data set, compromise the ability of statistical methods to derive QSAR models with good prediction statistics. Hence, outliers should be removed from data sets prior to model derivation. Here we present a new multi-objective genetic algorithm for the identification and removal of outliers based on the k nearest neighbors (kNN) method. The algorithm was used to remove outliers from three different data sets of pharmaceutical interest (logBBB, factor 7 inhibitors, and dihydrofolate reductase inhibitors), and its performances were compared with those of five other methods for outlier removal. The results suggest that the new algorithm provides filtered data sets that (1) better maintain the internal diversity of the parent data sets and (2) give rise to QSAR models with much better prediction statistics. Equally good filtered data sets in terms of these metrics were obtained when another objective function was added to the algorithm (termed "preservation"), forcing it to remove certain compounds with low probability only. This option is highly useful when specific compounds should be preferably kept in the final data set either because they have favorable activities or because they represent interesting molecular scaffolds. We expect this new algorithm to be useful in future QSAR applications.
Is procalcitonin to C-reactive protein ratio useful for the detection of late onset neonatal sepsis?
Hahn, Won-Ho; Song, Joon-Hwan; Kim, Ho; Park, Suyeon
2017-02-21
Procalcitonin (PCT) has been reported as a sensitive marker for neonatal bacterial infections. Recently, small numbers of studies reported usefulness of PCT/C-reactive protein (CRP) ratio in detection of infectious conditions in adults. Thus, we conducted this study to evaluate PCT/CRP ratio in late onset neonatal sepsis. Serum PCT and CRP was measured in blood samples from 7-60 days after birth in 106 of neonates with late onset sepsis and 212 of controls who were matched with gestational age, postnatal age, birth weight and gender. Areas under ROC curve (AUC) were calculated and pairwise comparisons between ROC curves were performed. As a result, CRP (AUC 0.96) showed best performance in detection of sepsis from healthy controls compared with PCT (AUC 0.87) and PCT/CRP ratio (AUC 0.62); CRP > PCT > PCT/CRP ratio in pairwise comparison (Psepsis from healthy controls compared with PCT/CRP ratio (AUC 0.54); CRP = PCT > PCT/CRP ratio in pairwise comparison (Pdetection of blood culture proven sepsis from suspected sepsis, PCT (AUC 0.70) and PCT/CRP ratio (AUC 0.73) showed better performance compared with CRP (AUC 0.51); PCT = PCT/CRP ratio > CRP in pairwise comparison (Psepsis and healthy controls. However, PCT/CRP ratio seems to be helpful in distinguishing proven sepsis from suspected sepsis together with PCT. Further studies are warranted to elucidate the efficacy of PCT/CRP ratio with enrollment of enough numbers of infants.
The cause of outliers in electromagnetic pulse (EMP) locations
Fenimore, Edward E. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)
2014-10-02
We present methods to calculate the location of EMP pulses when observed by 5 or more satellites. Simulations show that, even with a good initial guess and fitting a location to all of the data, there are sometime outlier results whose locations are much worse than most cases. By comparing simulations using different ionospheric transfer functions (ITFs), it appears that the outliers are caused by not including the additional path length due to refraction rather than being caused by not including higher order terms in the Appleton-Hartree equation. We suggest ways that the outliers can be corrected. These correction methods require one to use an electron density profile along the line of sight from the event to the satellite rather than using the total electron content (TEC) to characterize the ionosphere.
Outlier detection algorithms for least squares time series regression
Johansen, Søren; Nielsen, Bent
We review recent asymptotic results on some robust methods for multiple regression. The regressors include stationary and non-stationary time series as well as polynomial terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators, in particular the Impulse Indicator...... theory involves normal distribution results and Poisson distribution results. The theory is applied to a time series data set....
Stratification-Based Outlier Detection over the Deep Web
Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming
2016-01-01
.... Introduction As a result of the rapid development of e-commerce, the deep web has been increasingly valued by data mining researchers in recent years. The deep web, which is termed to make a contr...
Detection of Outliers in TWSTFT Data Used in TAI
2009-11-01
data. INTRODUCTION Each month, the BIPM Time, Frequency, and Gravimetry Section produces International Atomic Time (TAI) and Coordinated...removal in TWSTFT links has been implemented for the calculation of time links in the BIPM Time, Frequency, and Gravimetry Section in order to improve
Privacy Preserving Outlier Detection through Random Nonlinear Data Distortion
National Aeronautics and Space Administration — Consider a scenario in which the data owner has some private/sensitive data and wants a data miner to access it for studying important patterns without revealing the...
Irradiated Xenon Isotopic Ratio Measurement for Failed Fuel Detection and Location in Fast Reactor
Ito, Chikara; Iguchi, Tetsuo; Harano, Hideki
2009-08-01
The accuracy of xenon isotopic ratio burn-up calculations used for failed fuel identification was evaluated by an irradiation test of xenon tag gas samples in the Joyo test reactor. The experiment was carried out using pressurized steel capsules containing unique blend ratios of stable xenon tag gases in an on-line creep rupture experiment in Joyo. The tag gas samples were irradiated to total neutron fluences of 1.6 to 4.8 × 1026 n/m2. Laser resonance ionization mass spectrometry was used to analyze the cover gas containing released tag gas diluted to isotopic ratios of 100 to 102 ppb. The isotopic ratios of xenon tag gases after irradiation were calculated using the ORIGEN2 code. The neutron cross sections of xenon nuclides were based on the JENDL-3.3 library. These cross sections were collapsed into one group using the neutron spectra of Joyo. The comparison of measured and calculated xenon isotopic ratios provided C/E values that ranged from 0.92 to 1.10. The differences between calculation and measurement were considered to be mainly due to the measurement errors and the xenon nuclide cross section uncertainties.
Kim, Man Soo; Son, Jong Min; Koh, In Jun; Bahk, Ji Hoon; In, Yong
2017-08-01
A considerable percentage of outliers with under- or over-correction continue to be reported despite precise preoperative planning and cautious intraoperative correction of lower limb alignment in medial opening-wedge high tibial osteotomy (MOWHTO). The purpose of this study was to determine whether our novel technique for the intraoperative adjustment of alignment under valgus stress reduces the number of outliers in patients undergoing MOWHTO compared to the conventional technique, which corrects alignment according to the cable method only. One hundred seventeen consecutive knees were enrolled in this case-control study. The first 52 knees (51 patients) were corrected in accordance with preoperative plans using the Dugdale method with modification with an intraoperative cable (group 1). In the other 65 knees (60 patients), the angle was corrected using the Dugdale method and limb alignment was adjusted using the intraoperative cable technique by applying valgus stress to the knee joint (group 2). The postoperative weight bearing line ratios and mechanical axis of the lower limb were compared at postoperative one year. Each knee was evaluated according to the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) score preoperatively and at postoperative one year. A significant reduction in the number of outliers was seen in group 2 compared to group 1 (group 1 = 48.1%, group 2 = 9.2%, p outliers compared to a technique that corrected alignment using the cable method in patients undergoing MOWHTO. Level III, retrospective comparative study.
Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T
2015-01-01
Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
K K L B Adikaram
Full Text Available Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1 and 2/n * (1 + k2, respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.
Antolovich, M; Li, X; Robards, K
2001-05-01
Stable carbon isotope ratio analysis (SCIRA) was used to determine the authenticity of commercial Australian orange juices. Thirty-five samples of Valencia (delta(13)C values from -23.8 to -24.7 ppt) and eight samples of Navel juices (delta(13)C values from -24.1 to -24.5 ppt) of known origin were used to establish a decision level before analysis. No significant seasonal variations in (13)C/(12)C ratio were observed. Variations in combustion temperature in the method were also found to be insignificant.
Chan, Cheng Leng; Rudrappa, Sowmya; Ang, Pei San; Li, Shu Chuen; Evans, Stephen J W
2017-08-01
The ability to detect safety concerns from spontaneous adverse drug reaction reports in a timely and efficient manner remains important in public health. This paper explores the behaviour of the Sequential Probability Ratio Test (SPRT) and ability to detect signals of disproportionate reporting (SDRs) in the Singapore context. We used SPRT with a combination of two hypothesised relative risks (hRRs) of 2 and 4.1 to detect signals of both common and rare adverse events in our small database. We compared SPRT with other methods in terms of number of signals detected and whether labelled adverse drug reactions were detected or the reaction terms were considered serious. The other methods used were reporting odds ratio (ROR), Bayesian Confidence Propagation Neural Network (BCPNN) and Gamma Poisson Shrinker (GPS). The SPRT produced 2187 signals in common with all methods, 268 unique signals, and 70 signals in common with at least one other method, and did not produce signals in 178 cases where two other methods detected them, and there were 403 signals unique to one of the other methods. In terms of sensitivity, ROR performed better than other methods, but the SPRT method found more new signals. The performances of the methods were similar for negative predictive value and specificity. Using a combination of hRRs for SPRT could be a useful screening tool for regulatory agencies, and more detailed investigation of the medical utility of the system is merited.
Bailey, A.; Blossey, P. N.; Noone, D.; Nusbaumer, J.; Wood, R.
2017-06-01
As global temperatures rise, regional differences in evaporation (E) and precipitation (P) are likely to become more disparate, causing the drier E-dominated regions of the tropics to become drier and the wetter P-dominated regions to become wetter. Models suggest that such intensification of the water cycle should already be taking place; however, quantitatively verifying these changes is complicated by inherent difficulties in measuring E and P with sufficient spatial coverage and resolution. This paper presents a new metric for tracking changes in regional moisture imbalances (e.g., E-P) by defining δDq—the isotope ratio normalized to a reference water vapor concentration of 4 mmol mol-1—and evaluates its efficacy using both remote sensing retrievals and climate model simulations in the tropics. By normalizing the isotope ratio with respect to water vapor concentration, δDq isolates the portion of isotopic variability most closely associated with shifts between E- and P-dominated regimes. Composite differences in δDq between cold and warm phases of El Niño-Southern Oscillation (ENSO) verify that δDq effectively tracks changes in the hydrological cycle when large-scale convective reorganization takes place. Simulated δDq also demonstrates sensitivity to shorter-term variability in E-P at most tropical locations. Since the isotopic signal of E-P in free tropospheric water vapor transfers to the isotope ratios of precipitation, multidecadal observations of both water vapor and precipitation isotope ratios should provide key evidence of changes in regional moisture imbalances now and in the future.
Heterodyne detection with an injection laser. Part 2; Signal-to-noise ratio
Marcuse, D. (AT and T Bell Labs., Holmdel, NJ (USA))
1990-04-01
The authors previously presented a theory of the conversion efficiency of the self-heterodyne laser detector. In this device a light signal is passed into the resonant cavity of an actively oscillating injection laser, causing an electrical signal at the difference frequency between laser and signal to flow through the wire supplying the dc bias to the laser. In this paper they derive an expression for the signal-to-noise ratio of the self-heterodyne laser detector. The authors' result shows that in the limit of ideal operation (that is complete population inversion and no internal losses) the signal-to-noise ratio of the self-heterodyne laser detector reaches one-half of the quantum noise limit. To describe the signal-to-noise ratio in a realistic self-heterodyne laser detector, the authors introduce an excess noise factor and plot its value for a few representative examples. Excess noise is typically on the order of 10 to 20 dB.
Adrián-Martínez, S; Bou-Cabo, M; Felis, I; Llorens, C; Martínez-Mora, J A; Saldaña, M
2015-01-01
The study and application of signal detection techniques based on cross-correlation method for acoustic transient signals in noisy and reverberant environments are presented. These techniques are shown to provide high signal to noise ratio, good signal discernment from very close echoes and accurate detection of signal arrival time. The proposed methodology has been tested on real data collected in environments and conditions where its benefits can be shown. This work focuses on the acoustic detection applied to tasks of positioning in underwater structures and calibration such those as ANTARES and KM3NeT deep-sea neutrino telescopes, as well as, in particle detection through acoustic events for the COUPP/PICO detectors. Moreover, a method for obtaining the real amplitude of the signal in time (voltage) by using cross correlation has been developed and tested and is described in this work.
Diffusion-based outlier rejection for underwater navigation
Vike, Steinar; Jouffroy, Jerome
trajectory segments at a time. The proposed observer contains a nonlinear feedback gain, constraining the estimate to detach from measurements that are considered as outliers. The observer is proven to be exponentially convergent. Simulation and experimental results are presented to illustrate the benefits...
Do We Often Find ARCH Because Of Neglected Outliers?
Ph.H.B.F. Franses (Philip Hans); D.J.C. van Dijk (Dick)
1997-01-01
textabstractIn this paper we test for (Generalized) AutoRegressive Conditional Heteroskedasticity [(G)ARCH] in daily and weekly data on 22 exchange rates and 13 stock market indices using the standard Lagrange Multiplier [LM] test for GARCH and a new LM test that is resistant to additive outliers. T
Testing for Smooth Transition Nonlinearity in the Presence of Outliers
D.J.C. van Dijk (Dick); Ph.H.B.F. Franses (Philip Hans); A. Lucas (André)
1996-01-01
textabstractRegime-switching models, like the smooth transition autoregressive (STAR) model are typically applied to time series of moderate length. Hence, the nonlinear features which these models intend to describe may be reflected in only a few observations. Conversely, neglected outliers in a li
Outliers: Elementary Teachers Who Actually Teach Social Studies
Anderson, Derek
2014-01-01
This mixed methods study identified six elementary teachers, who, despite the widespread marginalization of elementary social studies, spent considerable time on the subject. These six outliers from a sample of forty-six Michigan elementary teachers were interviewed, and their teaching was observed to better understand how and why they deviate…
Outliers, Cheese, and Rhizomes: Variations on a Theme of Limitation
Stone, Lynda
2011-01-01
All research has limitations, for example, from paradigm, concept, theory, tradition, and discipline. In this article Lynda Stone describes three exemplars that are variations on limitation and are "extraordinary" in that they change what constitutes future research in each domain. Malcolm Gladwell's present day study of outliers makes a…
Outliers in Assessments. Research Report. ETS RR-08-41
Haberman, Shelby J.
2008-01-01
Outliers in assessments are often treated as a nuisance for data analysis; however, they can also assist in quality assurance. Their frequency can suggest problems with form codes, scanning accuracy, ability of examinees to enter responses as they intend, or exposure of items.
Metabolomic Biomarker Identification in Presence of Outliers and Missing Values.
Kumar, Nishith; Hoque, Md Aminul; Shahjaman, Md; Islam, S M Shahinul; Mollah, Md Nurul Haque
2017-01-01
Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional data matrix which contains missing values as well as outliers. However, missing value imputation and outliers handling techniques play important role in identifying biomarker correctly. Although several missing value imputation techniques are available, outliers deteriorate the accuracy of imputation as well as the accuracy of biomarker identification. Therefore, in this paper we have proposed a new biomarker identification technique combining the groupwise robust singular value decomposition, t-test, and fold-change approach that can identify biomarkers more correctly from metabolomics dataset. We have also compared the performance of the proposed technique with those of other traditional techniques for biomarker identification using both simulated and real data analysis in absence and presence of outliers. Using our proposed method in hepatocellular carcinoma (HCC) dataset, we have also identified the four upregulated and two downregulated metabolites as potential metabolomic biomarkers for HCC disease.
On the Impact of Cepheid Outliers on the Distance Ladder
Becker, M R; Rozo, E; Marshall, P; Rykoff, E S
2015-01-01
Recent work by Efstathiou (2014) highlighted the importance of outliers in the period-luminosity (PL) relation of Cepheid data on the distance ladder. We present a statistical framework designed to address this difficulty, and apply it to the Cepheid data from the Milky Way (MW), the Large Magellanic Cloud (LMC), and the Riess et al. (2011) (hereafter R11) dataset. We consider two possible models of the outlier population in the R11 Cepheid dataset. One of these models exhibits tension between the PL relation of the R11 cepheids and the MW+LMC cepheids, while the other does not. We extend our models to adequately account for tension between the cepheid data sets when appropriate. Our outlier treatment has a significant impact on the distance scales to Supernovae hosts with Cepheid distances, increasing the uncertainty in these distances by a median factor of ~30%. We further find that our Cepheid outlier treatment translates into a modest, but non-negligible increase in the statistical uncertainty of H0, addi...
Outlier-Tolerance RML Identification of Parameters in CAR Model
Hong Teng-teng
2016-10-01
Full Text Available The measured data inevitably contain abnormal data under the normal operating conditions. Most of the existing algorithms, such as least squares identification and maximum likelihood estimation, are easily affected by abnormal data and appear large indentation deviation. It is a difficult task needed to be addressed that how to improve the sensitivity of the existing algorithm or build a new parameter identifying algorithm with outlier-tolerance ability to abnormal data in system identification technology application. In this paper, the sensitivity of the RML to the sampled abnormal data was analyzed and a new improvement algorithm of CAR process is established to improve outlier-tolerance ability of the RML identification when there are outliers in the sampling series. The improved algorithm not only effectively inhibits the negative impact of the abnormal data but also effectively improve the quality of the parameter identification results. Some simulation given in this paper shows that the improved RML algorithm has strong outlier-tolerance. This paper’s research results play an important role in engineering control, signal processing, industrial automation and aerospace or other fields.
F.A. Kreuger; H. Beerman (Henk); H.G. Nijs (Huub); M. van Ballegooijen (Marjolein)
1998-01-01
textabstractBACKGROUND: In organized screening programmes for cervical cancer, pre-cancerous lesions are detected by cervical smears. However, during follow-up after a positive smear these pre-cancerous lesions are not always found. The purpose of the study is to analys
F.A. Kreuger; H. Beerman (Henk); H.G. Nijs (Huub); M. van Ballegooijen (Marjolein)
1998-01-01
textabstractBACKGROUND: In organized screening programmes for cervical cancer, pre-cancerous lesions are detected by cervical smears. However, during follow-up after a positive smear these pre-cancerous lesions are not always found. The purpose of the study is to analys
Muramatsu, Chisako; Hatanaka, Yuji; Iwase, Tatsuhiko; Hara, Takeshi; Fujita, Hiroshi
2010-03-01
Abnormalities of retinal vasculatures can indicate health conditions in the body, such as the high blood pressure and diabetes. Providing automatically determined width ratio of arteries and veins (A/V ratio) on retinal fundus images may help physicians in the diagnosis of hypertensive retinopathy, which may cause blindness. The purpose of this study was to detect major retinal vessels and classify them into arteries and veins for the determination of A/V ratio. Images used in this study were obtained from DRIVE database, which consists of 20 cases each for training and testing vessel detection algorithms. Starting with the reference standard of vasculature segmentation provided in the database, major arteries and veins each in the upper and lower temporal regions were manually selected for establishing the gold standard. We applied the black top-hat transformation and double-ring filter to detect retinal blood vessels. From the extracted vessels, large vessels extending from the optic disc to temporal regions were selected as target vessels for calculation of A/V ratio. Image features were extracted from the vessel segments from quarter-disc to one disc diameter from the edge of optic discs. The target segments in the training cases were classified into arteries and veins by using the linear discriminant analysis, and the selected parameters were applied to those in the test cases. Out of 40 pairs, 30 pairs (75%) of arteries and veins in the 20 test cases were correctly classified. The result can be used for the automated calculation of A/V ratio.
Outlier Preservation by Dimensionality Reduction Techniques
Onderwater, M.
2015-01-01
Sensors are increasingly part of our daily lives: motion detection, lighting control, and energy consumption all rely on sensors. Combining this information into, for instance, simple and comprehensive graphs can be quite challenging. Dimensionality reduction is often used to address this problem, b
The high cost of low-acuity ICU outliers.
Dahl, Deborah; Wojtal, Greg G; Breslow, Michael J; Holl, Randy; Huguez, Debra; Stone, David; Korpi, Gloria
2012-01-01
Direct variable costs were determined on each hospital day for all patients with an intensive care unit (ICU) stay in four Phoenix-area hospital ICUs. Average daily direct variable cost in the four ICUs ranged from $1,436 to $1,759 and represented 69.4 percent and 45.7 percent of total hospital stay cost for medical and surgical patients, respectively. Daily ICU cost and length of stay (LOS) were higher in patients with higher ICU admission acuity of illness as measured by the APACHE risk prediction methodology; 16.2 percent of patients had an ICU stay in excess of six days, and these LOS outliers accounted for 56.7 percent of total ICU cost. While higher-acuity patients were more likely to be ICU LOS outliers, 11.1 percent of low-risk patients were outliers. The low-risk group included 69.4 percent of the ICU population and accounted for 47 percent of all LOS outliers. Low-risk LOS outliers accounted for 25.3 percent of ICU cost and incurred fivefold higher hospital stay costs and mortality rates. These data suggest that severity of illness is an important determinant of daily resource consumption and LOS, regardless of whether the patient arrives in the ICU with high acuity or develops complications that increase acuity. The finding that a substantial number of long-stay patients come into the ICU with low acuity and deteriorate after ICU admission is not widely recognized and represents an important opportunity to improve patient outcomes and lower costs. ICUs should consider adding low-risk LOS data to their quality and financial performance reports.
Nonlinear Optimization-Based Device-Free Localization with Outlier Link Rejection
Wendong Xiao
2015-04-01
Full Text Available Device-free localization (DFL is an emerging wireless technique for estimating the location of target that does not have any attached electronic device. It has found extensive use in Smart City applications such as healthcare at home and hospitals, location-based services at smart spaces, city emergency response and infrastructure security. In DFL, wireless devices are used as sensors that can sense the target by transmitting and receiving wireless signals collaboratively. Many DFL systems are implemented based on received signal strength (RSS measurements and the location of the target is estimated by detecting the changes of the RSS measurements of the wireless links. Due to the uncertainty of the wireless channel, certain links may be seriously polluted and result in erroneous detection. In this paper, we propose a novel nonlinear optimization approach with outlier link rejection (NOOLR for RSS-based DFL. It consists of three key strategies, including: (1 affected link identification by differential RSS detection; (2 outlier link rejection via geometrical positional relationship among links; (3 target location estimation by formulating and solving a nonlinear optimization problem. Experimental results demonstrate that NOOLR is robust to the fluctuation of the wireless signals with superior localization accuracy compared with the existing Radio Tomographic Imaging (RTI approach.
Antoniadou, Ifigeneia; Manson, G.; Dervilis, N.; Worden, K. [Sheffield Univ. (United Kingdom); Barszcz, T.; Staszewski, W. [AGH Univ. of Science and Technology, Krakow (Poland)
2012-07-01
Wind turbines are subject to variable aerodynamic loads and extreme environmental conditions. Wind turbine components fail frequently, resulting in high maintenance costs. For this reason, gearbox condition monitoring becomes important since gearboxes are among the wind turbine components with the most frequent failure observations. The major challenge here is the detection of faults under the time varying operating conditions prevailing in wind turbine systems. This paper analyses wind turbine gearbox vibration data using the empirical mode decomposition method and the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. The instantaneous characteristics of the signals are obtained with the application of the Hilbert transform. The lowest level of fault detection, the threshold value, is considered and Mahalanobis squared-distance is calculated for the novelty detection problem. (orig.)
Bhide, Amar; Rana, Ritu; Dhavilkar, Mrugaya; Amodio-Hernandez, Montserrat; Deshpande, Deepika; Caric, Vedrana
2015-05-01
To explore the correlation between urinary protein:creatinine ratio and 24-h excretion of protein, we studied 149 women referred to a day assessment unit for investigations for suspected preeclampsia. Paired samples were obtained for measurement of urinary protein:creatinine ratio and 24-h protein excretion. Collection of a 24-h urine sample was validated by the daily creatinine excretion. The outcome measure was proteinuria of 300 mg/day or more. Inaccurate 24-h collection was observed in 17% of women. All women (n = 56) with a protein:creatinine ratio >60 mg/mM had significant proteinuria. No woman with protein:creatinine ratio proteinuria. We recommend that a dual cut-off should be used for excluding and "ruling in" the diagnosis of significant proteinuria. A 24-h urine collection should be used only for urinary protein:creatinine ratio values between 18 and 60 mg/mM in the detection of significant proteinuria. © 2015 Nordic Federation of Societies of Obstetrics and Gynecology.
Signal-to-noise ratio application to seismic marker analysis and fracture detection
Xu Hui-Qun; and Gui Zhi-Xian
2014-01-01
Seismic data with high signal-to-noise ratios (SNRs) are useful in reservoir exploration. To obtain high SNR seismic data, significant effort is required to achieve noise attenuation in seismic data processing, which is costly in materials, and human and financial resources. We introduce a method for improving the SNR of seismic data. The SNR is calculated by using the frequency domain method. Furthermore, we optimize and discuss the critical parameters and calculation procedure. We applied the proposed method on real data and found that the SNR is high in the seismic marker and low in the fracture zone. Consequently, this can be used to extract detailed information about fracture zones that are inferred by structural analysis but not observed in conventional seismic data.
42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).
2010-10-01
... outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Payments for Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...
42 CFR 484.240 - Methodology used for the calculation of the outlier payment.
2010-10-01
... 42 Public Health 5 2010-10-01 2010-10-01 false Methodology used for the calculation of the outlier... Payment System for Home Health Agencies § 484.240 Methodology used for the calculation of the outlier payment. (a) CMS makes an outlier payment for an episode whose estimated cost exceeds a threshold amount...
42 CFR 412.84 - Payment for extraordinarily high-cost cases (cost outliers).
2010-10-01
... outliers). 412.84 Section 412.84 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Payments for Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for Certain Replaced Devices Payment for Outlier Cases § 412.84 Payment for extraordinarily high-cost cases...
Tosun, Murat
2013-06-01
Honey can be adulterated in various ways. One of the adulteration methods is the addition of different sugar syrups during or after honey production. Starch-based sugar syrups, high fructose corn syrup (HFCS), glucose syrup (GS) and saccharose syrups (SS), which are produced from beet or canes, can be used for adulterating honey. In this study, adulterated honey samples were prepared with the addition of HFCS, GS and SS (beet sugar) at a ratio of 0%, 10%, 20%, 40% and 50% by weight. (13)C/(12)C analysis was conducted on these adulterated honey samples using an isotope ratio mass spectrometer in combination with an elemental analyser (EA-IRMS). As a result, adulteration using C(4) sugar syrups (HFCS and GS) could be detected to a certain extent while adulteration of honey using C(3) sugar syrups (beet sugar) could not be detected. Adulteration by using SS (beet sugar) still has a serious detection problem, especially in countries in which beet is used in manufacturing sugar. For this reason, practice and analysis methods are needed to meet this deficit and to detect the adulterations precisely in the studies that will be conducted. Copyright © 2012 Elsevier Ltd. All rights reserved.
Chrétien, Stéphane; Guyeux, Christophe; Conesa, Bastien; Delage-Mouroux, Régis; Jouvenot, Michèle; Huetz, Philippe; Descôtes, Françoise
2016-08-31
Non-Negative Matrix factorization has become an essential tool for feature extraction in a wide spectrum of applications. In the present work, our objective is to extend the applicability of the method to the case of missing and/or corrupted data due to outliers. An essential property for missing data imputation and detection of outliers is that the uncorrupted data matrix is low rank, i.e. has only a small number of degrees of freedom. We devise a new version of the Bregman proximal idea which preserves nonnegativity and mix it with the Augmented Lagrangian approach for simultaneous reconstruction of the features of interest and detection of the outliers using a sparsity promoting ℓ 1 penality. An application to the analysis of gene expression data of patients with bladder cancer is finally proposed.
Mustafa Altındiş, Zafer Çetinkaya, Raike Kalaycı, Ihsan H Ciftçi, Alpaslan Arslan, Orhan C. Aktepe
2011-06-01
Full Text Available Objectives: The aim of the present study was to evaluate the efficacy (recovery rate, time to detection and Drug SusceptibilityTests –DST- of Mycobacteria-only B460 of new colorimetric medium, Dio-TK and to compare it with routinely used conventional media, Lowenstein Jensen (LJ and Bactec 460 TB culture system.Materials and methods: Totally 901 clinic specimens were investigated for assignment of tuberculosis by Ehrlich-Ziehl-Nielsen smear strain method, Lowenstein-Jensen, BACTEC 460TB and Dio-TK medium culture systems.Results: Nineteen of 901 clinic specimens (2.1% were positive by any of these methods. 17 (89.5% of these specimens positive found by smear strain method, 17 (89.5% by Lowenstein-Jensen, 19 (100% by BACTEC 460TB and 14 (73.7% by Dio-TK medium. NAP and Niacin identification tests were applied to Mycobacterium strains. 12 (63.1% of 19 isolates were identified as M.tuberculosis complex and 7 (36.9% were identified as Mycobacterium other than tuberculosis (MOTT bacilli. 10 (83.3% of 12 M.tuberculosis complex strains were not resistant to any major drug. But one of 2 isolate was resistant to streptomycin and the other one isolate was resistant to both streptomycin and isoniazid.Conclusion: Our data suggest that some advantages (such as an early detection and differentiation mycobacterium growth from contamination of the Dio-TK CS over other mycobacterial culture systems make it a practical and rapid system for daily use, and a suitable alternative to other currently available solid media, such as LJ, for detection time of mycobacteria and DST. J Microbiol Infect Dis 2011;1 (1 :5-9.
Kakade, Rohan; Walker, John G.; Phillips, Andrew J.
2016-08-01
Confocal fluorescence microscopy (CFM) is widely used in biological sciences because of its enhanced 3D resolution that allows image sectioning and removal of out-of-focus blur. This is achieved by rejection of the light outside a detection pinhole in a plane confocal with the illuminated object. In this paper, an alternative detection arrangement is examined in which the entire detection/image plane is recorded using an array detector rather than a pinhole detector. Using this recorded data an attempt is then made to recover the object from the whole set of recorded photon array data; in this paper maximum-likelihood estimation has been applied. The recovered object estimates are shown (through computer simulation) to have good resolution, image sectioning and signal-to-noise ratio compared with conventional pinhole CFM images.
Li Yue; Yang Bao-Jun; Deng Xiao-Ying; Jin Lei; Du Li-Zhi
2004-01-01
In the zero-order approximation, we use the perturbation method of parameter with small magnitude to prove that the harmonic frequency in the solution of the equation is close to that of the driving force when the chaotic system from Duffing-Holmes equation stays in the stable periodic state, which is the physical mechanism of the detection of the unknown frequency of weak harmonic signal using the chaotic theory. The result of the simulation experiment shows that the method proposed in this paper, by which one can determine the frequency of the stable system from the number of circulation change of the phase state directionally across a fixed phase state point (x, x) in fixed simulation time period, is successful. Analyzing the effects of the damping ratio on the chaotic detection result, one can see that for different frequency ranges it is necessary to carefully choose corresponding damping ratio α.
Gao, Feng; Dong, Junyu; Li, Bo; Xu, Qizhi; Xie, Cui
2016-10-01
Change detection is of high practical value to hazard assessment, crop growth monitoring, and urban sprawl detection. A synthetic aperture radar (SAR) image is the ideal information source for performing change detection since it is independent of atmospheric and sunlight conditions. Existing SAR image change detection methods usually generate a difference image (DI) first and use clustering methods to classify the pixels of DI into changed class and unchanged class. Some useful information may get lost in the DI generation process. This paper proposed an SAR image change detection method based on neighborhood-based ratio (NR) and extreme learning machine (ELM). NR operator is utilized for obtaining some interested pixels that have high probability of being changed or unchanged. Then, image patches centered at these pixels are generated, and ELM is employed to train a model by using these patches. Finally, pixels in both original SAR images are classified by the pretrained ELM model. The preclassification result and the ELM classification result are combined to form the final change map. The experimental results obtained on three real SAR image datasets and one simulated dataset show that the proposed method is robust to speckle noise and is effective to detect change information among multitemporal SAR images.
Mumford, Jeanette A
2017-07-01
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500-1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL's Flame 1 and FSL's outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL's Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall's Tau. Additionally, subject omission using the Cook's Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are
Mumford, Jeanette A
2017-02-15
Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500-1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL's Flame 1 and FSL's outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL's Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall's Tau. Additionally, subject omission using the Cook's Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are
Xu-Sheng Zhang; Chi Zhang; Wei-Jia Zeng
2016-01-01
Objective:To find the value of elastic strain ratio detection by transvaginal elastography in diagnosis of cervical carcinoma and judgment of illness.Methods:A total of 178 cases of women receiving vaginal ultrasound in our hospital from August 2011 to June 2014 were selected as research subjects, obtained cervical tissue block received pathological examination, and according to pathological results, all research subjects were divided into cervical carcinoma group 51 cases, benign cervical lesion group 78 cases and healthy control group 49 cases. Elastography pressure release index and elastic strain ratio, protein expression of CDK8, Ki-67, P53, Pin1 and Cyclin D1 in cervical tissue, serum tumor maker levels, serum COX-2, MMP-9, SCCAg, Apo-1/Fas and S-TK1 levels of all subjects were detected, and the correlation between elastic strain ratio and above cervical carcinoma-related indicator values was further analyzed.Results:Elastography pressure release index and elastic strain ratio of cervical carcinoma group were higher than those of benign cervical lesion group and healthy control group; protein expression levels of CDK8, Ki-67, P53, Pin1 and Cyclin D1 in cervical tissue of cervical carcinoma group were higher than those of benign cervical lesion group and healthy control group; serum CA125, Cyfra21-1, CA724 and CEA levels of cervical carcinoma group were higher than those of benign cervical lesion group and healthy control group; serum COX-2, MMP-9, SCCAg, Apo-1/Fas and S-TK1 levels of cervical carcinoma group were higher than those of benign cervical lesion group and healthy control group; elastic strain ratio had positive correlation with protein expression levels of CDK8, Ki-67, P53, Pin1 and Cyclin D1 in cervical tissue as well as CA125, Cyfra21-1, CA724, CEA, COX-2, MMP-9, SCCAg, Apo-1/Fas and S-TK1 levels in serum.Conclusion:Elastic strain ratio detected by transvaginal elastography can be used as the effective means for diagnosis of cervical carcinoma
Menzies Robert T.
2016-01-01
Full Text Available The JPL airborne Laser Absorption Spectrometer instrument has been flown several times in the 2007-2011 time frame for the purpose of measuring CO2 mixing ratios in the lower atmosphere. The four most recent flight campaigns were on the NASA DC-8 research aircraft, in support of the NASA ASCENDS (Active Sensing of CO2 Emissions over Nights, Days, and Seasons mission formulation studies. This instrument operates in the 2.05-μm spectral region. The Integrated Path Differential Absorption (IPDA method is used to retrieve weighted CO2 column mixing ratios. We present key features of the CO2LAS signal processing, data analysis, and the calibration/validation methodology. Results from flights in various U.S. locations during the past three years include observed mid-day CO2 drawdown in the Midwest, also cases of point-source and regional plume detection that enable the calculation of emission rates.
Menzies, Robert T.; Spiers, Gary D.; Jacob, Joseph C.
2016-06-01
The JPL airborne Laser Absorption Spectrometer instrument has been flown several times in the 2007-2011 time frame for the purpose of measuring CO2 mixing ratios in the lower atmosphere. The four most recent flight campaigns were on the NASA DC-8 research aircraft, in support of the NASA ASCENDS (Active Sensing of CO2 Emissions over Nights, Days, and Seasons) mission formulation studies. This instrument operates in the 2.05-μm spectral region. The Integrated Path Differential Absorption (IPDA) method is used to retrieve weighted CO2 column mixing ratios. We present key features of the CO2LAS signal processing, data analysis, and the calibration/validation methodology. Results from flights in various U.S. locations during the past three years include observed mid-day CO2 drawdown in the Midwest, also cases of point-source and regional plume detection that enable the calculation of emission rates.
Outlier robustness for wind turbine extrapolated extreme loads
Natarajan, Anand; Verelst, David Robert
2012-01-01
Methods for extrapolating extreme loads to a 50 year probability of exceedance, which display robustness to the presence of outliers in simulated loads data set, are described. Case studies of isolated high extreme out-of-plane loads are discussed to emphasize their underlying physical reasons....... Stochastic identification of numerical artifacts in simulated loads is demonstrated using the method of principal component analysis. The extrapolation methodology is made robust to outliers through a weighted loads approach, whereby the eigenvalues of the correlation matrix obtained using the loads with its...... simulation is demonstrated and compared with published results. Further effects of varying wind inflow angles and shear exponent is brought out. Parametric fitting techniques that consider all extreme loads including ‘outliers’ are proposed, and the physical reasons that result in isolated high extreme loads...
Factor-based forecasting in the presence of outliers
Kristensen, Johannes Tang
2014-01-01
Macroeconomic forecasting using factor models estimated by principal components has become a popular research topic with many both theoretical and applied contributions in the literature. In this paper we attempt to address an often neglected issue in these models: The problem of outliers...... in the data. Most papers take an ad-hoc approach to this problem and simply screen datasets prior to estimation and remove anomalous observations. We investigate whether forecasting performance can be improved by using the original unscreened dataset and replacing principal components with a robust...... apply the estimator in a simulated real-time forecasting exercise to test its merits. We use a newly compiled dataset of US macroeconomic series spanning the period 1971:2–2012:10. Our findings suggest that the chosen treatment of outliers does affect forecasting performance and that in many cases...
No Longer the Outlier: Updating the Air Component Structure
2016-06-23
4 | Air & Space Power Journal No Longer the Outlier Updating the Air Component Structure Lt Gen CQ Brown Jr., USAF Lt Col Rick Fournier, USAF...This article may be reproduced in whole or in part without permission. If it is reproduced, the Air and Space Power Journal requests a courtesy...COCOM) with a highly competent air component that is capable of con- ducting and supporting air , space , and cyberspace operations within its assigned
Outliers and Extremes: Dragon-Kings or Dragon-Fools?
Schertzer, D. J.; Tchiguirinskaia, I.; Lovejoy, S.
2012-12-01
Geophysics seems full of monsters like Victor Hugo's Court of Miracles and monstrous extremes have been statistically considered as outliers with respect to more normal events. However, a characteristic magnitude separating abnormal events from normal ones would be at odd with the generic scaling behaviour of nonlinear systems, contrary to "fat tailed" probability distributions and self-organized criticality. More precisely, it can be shown [1] how the apparent monsters could be mere manifestations of a singular measure mishandled as a regular measure. Monstrous fluctuations are the rule, not outliers and they are more frequent than usually thought up to the point that (theoretical) statistical moments can easily be infinite. The empirical estimates of the latter are erratic and diverge with sample size. The corresponding physics is that intense small scale events cannot be smoothed out by upscaling. However, based on a few examples, it has also been argued [2] that one should consider "genuine" outliers of fat tailed distributions so monstrous that they can be called "dragon-kings". We critically analyse these arguments, e.g. finite sample size and statistical estimates of the largest events, multifractal phase transition vs. more classical phase transition. We emphasize the fact that dragon-kings are not needed in order that the largest events become predictable. This is rather reminiscent of the Feast of Fools picturesquely described by Victor Hugo. [1] D. Schertzer, I. Tchiguirinskaia, S. Lovejoy et P. Hubert (2010): No monsters, no miracles: in nonlinear sciences hydrology is not an outlier! Hydrological Sciences Journal, 55 (6) 965 - 979. [2] D. Sornette (2009): Dragon-Kings, Black Swans and the Prediction of Crises. International Journal of Terraspace Science and Engineering 1(3), 1-17.
Deteksi Outlier Transaksi Menggunakan Visualisasi-Olap Pada Data Warehouse Perguruan Tinggi Swasta
Gusti Ngurah Mega Nata
2016-07-01
Full Text Available Mendeteksi outlier pada data warehouse merupakan hal penting. Data pada data warehouse sudah diagregasi dan memiliki model multidimensional. Agregasi pada data warehouse dilakukan karena data warehouse digunakan untuk menganalisis data secara cepat pada top level manajemen. Sedangkan, model data multidimensional digunakan untuk melihat data dari berbagai dimensi objek bisnis. Jadi, Mendeteksi outlier pada data warehouse membutuhkan teknik yang dapat melihat outlier pada data yang sudah diagregasi dan dapat melihat dari berbagai dimensi objek bisnis. Mendeteksi outlier pada data warehouse akan menjadi tantangan baru. Di lain hal, Visualisasi On-line Analytic process (OLAP merupakan tugas penting dalam menyajikan informasi trend (report pada data warehouse dalam bentuk visualisasi data. Pada penelitian ini, visualisasi OLAP digunakan untuk deteksi outlier transaksi. Maka, dalam penelitian ini melakukan analisis untuk mendeteksi outlier menggunakan visualisasi-OLAP. Operasi OLAP yang digunakan yaitu operasi drill-down. Jenis visualisasi yang akan digunakan yaitu visualisasi satu dimensi, dua dimensi dan multi dimensi menggunakan tool weave desktop. Pembangunan data warehouse dilakukan secara button-up. Studi kasus dilakukan pada perguruan tinggi swasta. Kasus yang diselesaikan yaitu mendeteksi outlier transaki pembayaran mahasiswa pada setiap semester. Deteksi outlier pada visualisasi data menggunakan satu tabel dimensional lebih mudah dianalisis dari pada deteksi outlier pada visualisasi data menggunakan dua atau multi tabel dimensional. Dengan kata lain semakin banyak tabel dimensi yang terlibat semakin sulit analisis deteksi outlier yang dilakukan. Kata kunci — Deteksi Outlier, Visualisasi OLAP, Data warehouse
Barkhoudarian, Sarkis; Kittinger, Scott
2006-01-01
Optical spectrometry can provide means to characterize rocket engine exhaust plume impurities due to eroded materials, as well as combustion mixture ratio without any interference with plume. Fiberoptic probes and cables were designed, fabricated and installed on Space Shuttle Main Engines (SSME), allowing monitoring of the plume spectra in real time with a Commercial of the Shelf (COTS) fiberoptic spectrometer, located in a test-stand control room. The probes and the cables survived the harsh engine environments for numerous hot-fire tests. When the plume was seeded with a nickel alloy powder, the spectrometer was able to successfully detect all the metallic and OH radical spectra from 300 to 800 nanometers.
Extension of EMA to address regional skew and low outliers
Griffis, V.W.; Stedinger, J.R.; Cohn, T.A.; Bizier, P.; DeBarry, P.
2003-01-01
The recently developed expected moments algorithm [EMA] (Cohn et al. 1997) does as well as MLEs at estimating LP3 flood quantiles using systematic and historical information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B and to make use of such hydrologic information. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study illustrates the performance of an extended EMA estimator compared to estimators that employ the complete data set with and without use of regional skew, conditional probability adjustment from Bulletin 17B, and an estimator that uses probability plot regression to compute substitute values for low outliers. Estimators that use a regional skew all do better than estimators that fail to use an informative regional skewness estimator. For LP3 data, the low outlier rejection procedure results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew were generally negligible in the skew range of real interest.
Rank regression: an alternative regression approach for data with outliers.
Chen, Tian; Tang, Wan; Lu, Ying; Tu, Xin
2014-10-01
Linear regression models are widely used in mental health and related health services research. However, the classic linear regression analysis assumes that the data are normally distributed, an assumption that is not met by the data obtained in many studies. One method of dealing with this problem is to use semi-parametric models, which do not require that the data be normally distributed. But semi-parametric models are quite sensitive to outlying observations, so the generated estimates are unreliable when study data includes outliers. In this situation, some researchers trim the extreme values prior to conducting the analysis, but the ad-hoc rules used for data trimming are based on subjective criteria so different methods of adjustment can yield different results. Rank regression provides a more objective approach to dealing with non-normal data that includes outliers. This paper uses simulated and real data to illustrate this useful regression approach for dealing with outliers and compares it to the results generated using classical regression models and semi-parametric regression models.
Luyan Zhang; Huihui Li; Jiankang Wang
2012-01-01
Epistasis is a commonly observed genetic phenomenon and an important source of variation of complex traits,which could maintain additive variance and therefore assure the long-term genetic gain in breeding.Inclusive composite interval mapping (ICIM) is able to identify epistatic quantitative trait loci (QTLs) no matter whether the two interacting QTLs have any additive effects.In this article,we conducted a simulation study to evaluate detection power and false discovery rate (FDR) of ICIM epistatic mapping,by considering F2 and doubled haploid (DH) populations,different F2 segregation ratios and population sizes.Results indicated that estimations of QTL locations and effects were unbiased,and the detection power of epistatic mapping was largely affected by population size,heritability of epistasis,and the amount and distribution of genetic effects.When the same likelihood of odd (LOD) threshold was used,detection power of QTL was higher in F2 population than power in DH population; meanwhile FDR in F2 was also higher than that in DH.The increase of marker density from 10 cM to 5 cM led to similar detection power but higher FDR.In simulated populations,ICIM achieved better mapping results than multiple interval mapping (MIM) in estimation of QTL positions and effect.At the end,we gave epistatic mapping results of ICIM in one actual population in rice (Oryza sativa L.).
Makkulau Makkulau
2010-01-01
Full Text Available There are several problems in industrial process for example problems associated with product quality. In statistics, observation which is significantly different to the average is called outlier. The outlier can give significant influence to the result of modeling, which can affect the decision making. This research develops the outlier detection method using the Likelihood Displacement Statistic method, called Likelihood Displacement Statistic-Lagrange (LDL method. The LDL method is applied to sugar and molasses production data of Djombang Baru Sugar Factory, Jombang, East Java. The result of this research shows that factors influenced the sugar and molasses production are sugar cane with the dirt less than 5%, sugar cane with the dirt between 5% to 7%, sugar cane with the dirt higher than 7%, and imbibition water
Keshtkaran, Mohammad Reza; Yang, Zhi
2017-06-01
Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.
Pattisahusiwa, Asis [Bandung Institute of Technology (Indonesia); Liong, The Houw; Purqon, Acep [Earth physics and complex systems research group, Bandung Institute of Technology (Indonesia)
2015-09-30
Seismo-Ionospheric is a study of ionosphere disturbances associated with seismic activities. In many previous researches, heliogeomagnetic or strong earthquake activities can caused the disturbances in the ionosphere. However, it is difficult to separate these disturbances based on related sources. In this research, we proposed a method to separate these disturbances/outliers by using nu-SVR with the world-wide GPS data. TEC data related to the 26th December 2004 Sumatra and the 11th March 2011 Honshu earthquakes had been analyzed. After analyzed TEC data in several location around the earthquake epicenter and compared with geomagnetic data, the method shows a good result in the average to detect the source of these outliers. This method is promising to use in the future research.
Sun, Xiaoli; Abshire, James B.
2011-01-01
Integrated path differential absorption (IPDA) lidar can be used to remotely measure the column density of gases in the path to a scattering target [1]. The total column gas molecular density can be derived from the ratio of the laser echo signal power with the laser wavelength on the gas absorption line (on-line) to that off the line (off-line). 80th coherent detection and direct detection IPDA lidar have been used successfully in the past in horizontal path and airborne remote sensing measurements. However, for space based measurements, the signal propagation losses are often orders of magnitude higher and it is important to use the most efficient laser modulation and detection technique to minimize the average laser power and the electrical power from the spacecraft. This paper gives an analysis the receiver signal to noise ratio (SNR) of several laser modulation and detection techniques versus the average received laser power under similar operation environments. Coherent detection [2] can give the best receiver performance when the local oscillator laser is relatively strong and the heterodyne mixing losses are negligible. Coherent detection has a high signal gain and a very narrow bandwidth for the background light and detector dark noise. However, coherent detection must maintain a high degree of coherence between the local oscillator laser and the received signal in both temporal and spatial modes. This often results in a high system complexity and low overall measurement efficiency. For measurements through atmosphere the coherence diameter of the received signal also limits the useful size of the receiver telescope. Direct detection IPDA lidars are simpler to build and have fewer constraints on the transmitter and receiver components. They can use much larger size 'photon-bucket' type telescopes to reduce the demands on the laser transmitter. Here we consider the two most widely used direct detection IPDA lidar techniques. The first technique uses two CW
Shi, Yonggang; Lai, Rongjie
2013-01-01
In this paper we present a novel system for the automated reconstruction of cortical surfaces from T1-weighted magnetic resonance images. At the core of our system is a unified Reeb analysis framework for the detection and removal of geometric and topological outliers on tissue boundaries. Using intrinsic Reeb analysis, our system can pinpoint the location of spurious branches and topological outliers, and correct them with localized filtering using information from both image intensity distributions and geometric regularity. In this system, we have also developed enhanced tissue classification with Hessian features for improved robustness to image inhomogeneity, and adaptive interpolation to achieve sub-voxel accuracy in reconstructed surfaces. By integrating these novel developments, we have a system that can automatically reconstruct cortical surfaces with improved quality and dramatically reduced computational cost as compared with the popular FreeSurfer software. In our experiments, we demonstrate on 40 simulated MR images and the MR images of 200 subjects from two databases: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and International Consortium of Brain Mapping (ICBM), the robustness of our method in large scale studies. In comparisons with FreeSurfer, we show that our system is able to generate surfaces that better represent cortical anatomy and produce thickness features with higher statistical power in population studies. PMID:23086519
Kfits: A software framework for fitting and cleaning outliers in kinetic measurements.
Rimon, Oded; Reichmann, Dana
2017-09-14
Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases. Therefore, a better tool for analysing and cleaning protein aggregation traces is required. Here we introduce Kfits , an intuitive graphical tool for detecting and removing noise caused by outliers in protein aggregation kinetics data. Following its workflow allows the user to quickly and easily clean large quantities of data and receive kinetic parameters for assessment of the results. With minor adjustments, the software can be applied to any type of kinetic measurements, not restricted to protein aggregation. Kfits is implemented in Python and available online at http://kfits.reichmannlab.com , in source at https://github.com/odedrim/kfits/ , or by direct installation from PyPI (`pip install kfits`). oded.rimon@mail.huji.ac.il or danare@mail.huji.ac.il. Supplementary data are available at Bioinformatics online.
Wang, Xiao; Cao, Xiaochun; Jin, Di; Cao, Yixin; He, Dongxiao
2016-03-01
For its crucial importance in the study of large-scale networks, many researchers devote to the detection of communities in various networks. It is now widely agreed that the communities usually overlap with each other. In some communities, there exist members that play a special role as hubs (also known as leaders), whose importance merits special attention. Moreover, it is also observed that some members of the network do not belong to any communities in a convincing way, and hence recognized as outliers. Failure to detect and exclude outliers will distort, sometimes significantly, the outcome of the detected communities. In short, it is preferable for a community detection method to detect all three structures altogether. This becomes even more interesting and also more challenging when we take the unsupervised assumption, that is, we do not assume the prior knowledge of the number K of communities. Our approach here is to define a novel generative model and formalize the detection of overlapping communities as well as hubs and outliers as an optimization problem on it. When K is given, we propose a normalized symmetric nonnegative matrix factorization algorithm based on Kullback-Leibler (KL) divergence to learn the parameters of the model. Otherwise, by combining KL divergence and prior model on parameters, we introduce another parameter learning method based on Bayesian symmetric nonnegative matrix factorization to learn the parameters of the model, while determining K. Therefore, we present a community detection method arguably in the most general sense, which detects all three structures altogether without prior knowledge of the number of communities. Finally, we test the proposed method on various real-world networks. The experimental results, in contrast to several state-of-art algorithms, indicate its superior performance over other ones in terms of both clustering accuracy and community quality.
Influence of outliers on accuracy estimation in genomic prediction in plant breeding.
Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter
2014-10-01
Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.
Taravat, Alireza; Del Frate, Fabio
2012-12-01
Accurate knowledge of the spatial extents and distributions of an oil spill is very impor-tant for efficient response. This is because most petroleum products spread rapidly on the water surface when released into the ocean, with the majority of the affected area becoming covered by very thin sheets. This article presents a study for examining the feasibility of Landsat ETM+ images in order to detect oil spills pollutions. The Landsat ETM+ images for 1st, 10th, 17th May 2010 were used to study the oil spill in Gulf of Mexico. In this article, an attempt has been made to perform ratio operations to enhance the feature. The study concluded that the bands difference between 660 and 560 nm, division at 660 and 560 and division at 825 and 560 nm, normalized by 480 nm provide the best result. Multilayer perceptron neural network classifier is used in order to perform a pixel-based supervised classification. The result indicates the potential of Landsat ETM+ data in oil spill detection. The promising results achieved encourage a further analysis of the potential of the optical oil spill detection approach.
Cao, Zhen; Wang, Yong; Li, Zhanguo; Yu, Naisen
2016-12-01
Snowflake-like ZnO structures originating from self-assembled nanowires were prepared by a low-temperature aqueous solution method. The as-grown hierarchical ZnO structures were investigated by X-ray diffraction (XRD) and field-emission scanning electron microscopy (FESEM). The results showed that the snowflake-like ZnO structures were composed of high-aspect-ratio nanowires. Furthermore, gas-sensing properties to various testing gases of 10 and 50 ppm were measured, which confirms that the ZnO structures were of good selectivity and response to acetone and could serve for acetone sensor to detect low-concentration acetone.
Francisco Javier Moreno Arboleda
2014-01-01
Full Text Available Diverse movement patterns may be identified when we study a set of moving entities. One of these patterns is known as a V-formation for it is shaped like the letter V. Informally, a set of entities shows a V-formation if the entities are located on one of their two characteristic lines. These lines meet in a position where there is just one entity considered the leader of the formation. Another movement pattern is known as a circular formation for it is shaped like a circle. Informally, circular formations present a set of entities grouped around a center in which the distance from these entities to the center is less than a given threshold. In this paper we present a model to identify V-formations and circular formations with outliers. An outlier is an entity which is part of a formation but is away from it. We also present a model to identify doughnut formations, which are an extension of circular formations. We present formal rules for our models and an algorithm for detecting outliers. The model was validated with NetLogo, a programming and modeling environment for the simulation of natural and social phenomena.
Zhang, Hua; Wang, Caixia; Wang, Kui; Xuan, Xiaopeng; Lv, Qingzhang; Jiang, Kai
2016-11-15
Glutathione (GSH) ultratrace change in mitochondria of cancer cells can mildly and effectively induce cancer cells apoptosis in early stage. Thus, if GSH ultratrace change in mitochondria of cancer cells could be recognized and imaged, it will be beneficial for fundamental research of cancer therapy. There have reported a lot of fluorescent probes for GSH, but the fluorescent probe with ultrasensitivity and high selectivity for the ratio imaging of GSH ultratrace changes in mitochondria of cancer cells is scarce. Herein, based on different reaction mechanism of sulfonamide under different pH, a sulfonamide-based reactive ratiometric fluorescent probe (IQDC-M) was reported for the recognizing and imaging of GSH ultratrace change in mitochondria of cancer cells. The detection limit of IQDC-M for GSH ultratrace change is low to 2.02nM, which is far less than 1.0‰ of endogenic GSH in living cells. And during the recognition process, IQDC-M can emit different fluorescent signals at 520nm and 592nm, which results in it recognizing GSH ultratrace change on ratio mode. More importantly, IQDC-M recognizing GSH ultratrace change specifically occurs in mitochondria of cancer cells because of appropriate water/oil amphipathy (log P) of IQDC-M. So, these make IQDC-M possible to image and monitor GSH ultratrace change in mitochondria during cancer cells apoptosis for the first time. Copyright © 2016 Elsevier B.V. All rights reserved.
Liu, Quan; Grant, Gerald; Li, Jianjun; Zhang, Yan; Hu, Fangyao; Li, Shuqin; Wilson, Christy; Chen, Kui; Bigner, Darell; Vo-Dinh, Tuan
2011-03-01
We report the development of a compact point-detection fluorescence spectroscopy system and two data analysis methods to quantify the intrinsic fluorescence redox ratio and diagnose brain cancer in an orthotopic brain tumor rat model. Our system employs one compact cw diode laser (407 nm) to excite two primary endogenous fluorophores, reduced nicotinamide adenine dinucleotide, and flavin adenine dinucleotide. The spectra were first analyzed using a spectral filtering modulation method developed previously to derive the intrinsic fluorescence redox ratio, which has the advantages of insensitivty to optical coupling and rapid data acquisition and analysis. This method represents a convenient and rapid alternative for achieving intrinsic fluorescence-based redox measurements as compared to those complicated model-based methods. It is worth noting that the method can also extract total hemoglobin concentration at the same time but only if the emission path length of fluorescence light, which depends on the illumination and collection geometry of the optical probe, is long enough so that the effect of absorption on fluorescence intensity due to hemoglobin is significant. Then a multivariate method was used to statistically classify normal tissues and tumors. Although the first method offers quantitative tissue metabolism information, the second method provides high overall classification accuracy. The two methods provide complementary capabilities for understanding cancer development and noninvasively diagnosing brain cancer. The results of our study suggest that this portable system can be potentially used to demarcate the elusive boundary between a brain tumor and the surrounding normal tissue during surgical resection.
Duan, Jubao; Sanders, Alan R; Moy, Winton; Drigalenko, Eugene I; Brown, Eric C; Freda, Jessica; Leites, Catherine; Göring, Harald H H; Gejman, Pablo V
2015-08-15
We searched a gene expression dataset comprised of 634 schizophrenia (SZ) cases and 713 controls for expression outliers (i.e., extreme tails of the distribution of transcript expression values) with SZ cases overrepresented compared with controls. These outlier genes were enriched for brain expression and for genes known to be associated with neurodevelopmental disorders. SZ cases showed higher outlier burden (i.e., total outlier events per subject) than controls for genes within copy number variants (CNVs) associated with SZ or neurodevelopmental disorders. Outlier genes were enriched for CNVs and for rare putative regulatory variants, but this only explained a small proportion of the outlier subjects, highlighting the underlying presence of additional genetic and potentially, epigenetic mechanisms.
Identification and influence of spatio-temporal outliers in urban air quality measurements.
O'Leary, Brendan; Reiners, John J; Xu, Xiaohong; Lemke, Lawrence D
2016-12-15
Forty eight potential outliers in air pollution measurements taken simultaneously in Detroit, Michigan, USA and Windsor, Ontario, Canada in 2008 and 2009 were identified using four independent methods: box plots, variogram clouds, difference maps, and the Local Moran's I statistic. These methods were subsequently used in combination to reduce and select a final set of 13 outliers for nitrogen dioxide (NO2), volatile organic compounds (VOCs), total benzene, toluene, ethyl benzene, and xylene (BTEX), and particulate matter in two size fractions (PM2.5 and PM10). The selected outliers were excluded from the measurement datasets and used to revise air pollution models. In addition, a set of temporally-scaled air pollution models was generated using time series measurements from community air quality monitors, with and without the selected outliers. The influence of outlier exclusion on associations with asthma exacerbation rates aggregated at a postal zone scale in both cities was evaluated. Results demonstrate that the inclusion or exclusion of outliers influences the strength of observed associations between intraurban air quality and asthma exacerbation in both cities. The box plot, variogram cloud, and difference map methods largely determined the final list of outliers, due to the high degree of conformity among their results. The Moran's I approach was not useful for outlier identification in the datasets studied. Removing outliers changed the spatial distribution of modeled concentration values and derivative exposure estimates averaged over postal zones. Overall, associations between air pollution and acute asthma exacerbation rates were weaker with outliers removed, but improved with the addition of temporal information. Decreases in statistically significant associations between air pollution and asthma resulted, in part, from smaller pollutant concentration ranges used for linear regression. Nevertheless, the practice of identifying outliers through
Canizares, Priscilla; Sopuerta, Carlos F
2012-01-01
[abridged] The detection of gravitational waves from extreme-mass-ratio (EMRI) binaries, comprising a stellar-mass compact object orbiting around a massive black hole, is one of the main targets for low-frequency gravitational-wave detectors in space, like the Laser Interferometer Space Antenna (LISA or eLISA/NGO). The long-duration gravitational-waveforms emitted by such systems encode the structure of the strong field region of the massive black hole, in which the inspiral occurs. The detection and analysis of EMRIs will therefore allow us to study the geometry of massive black holes and determine whether their nature is as predicted by General Relativity and even to test whether General Relativity is the correct theory to describe the dynamics of these systems. To achieve this, EMRI modeling in alternative theories of gravity is required to describe the generation of gravitational waves. In this paper, we explore to what extent EMRI observations with LISA or eLISA/NGO might be able to distinguish between G...
A. Coenen (Adriaan); M. Lubbers (Marisa); A. Kurata (Akira); A.K. Kono (Atsushi K.); A. Dedic (Admir); R.G. Chelu (Raluca Gabriela); M.L. Dijkshoorn (Marcel); Rossi, A. (Alexia); R.J.M. van Geuns (Robert Jan); K. Nieman (Koen)
2016-01-01
textabstractObjectives: To investigate the additional value of transmural perfusion ratio (TPR) in dynamic CT myocardial perfusion imaging for detection of haemodynamically significant coronary artery disease compared with fractional flow reserve (FFR). Methods: Subjects with suspected or known coro
Identifying outliers and implausible values in growth trajectory data.
Yang, Seungmi; Hutcheon, Jennifer A
2016-01-01
To illustrate how conditional growth percentiles can be adapted for use to systematically identify implausible measurements in growth trajectory data. The use of conditional growth percentiles as a tool to assess serial weight data was reviewed. The approach was applied to 86,427 weight measurements (kg) taken between birth and age 6.5 years in 8217 girls participating in the Promotion of Breast Feeding Intervention Trial in Belarus. A conditional mean and variance was calculated for each weight measurement, which reflects the expected weight at a current visit given the girl's previous weights. Measurements were flagged as outliers if they were more than 4 standard deviation (SD) above or below the expected (conditional) weight. The method identified 234 weight measurements (0.3%) from 216 girls as potential outliers. Review of these trajectories confirmed the implausibility of the flagged measurements, and that the approach identified observations that would not have been identified using a conventional cross-sectional approach (± 4 SD of the population mean) for identifying implausible values. Stata code to implement the approach is provided. Conditional growth percentiles can be used to systematically identify implausible values in growth trajectory data and may be particularly useful for large data sets where the high number of trajectories makes ad hoc approaches unfeasible. Copyright © 2016 Elsevier Inc. All rights reserved.
Deleting Outliers in Robust Regression with Mixed Integer Programming
Georgios Zioutas; Antonios Avramidis
2005-01-01
In robust regression we often have to decide how many are the unusual observations, which should be removed from the sample in order to obtain better fitting for the rest of the observations. Generally, we use the basic principle of LTS, which is to fit the majority of the data, identifying as outliers those points that cause the biggest damage to the robust fit. However, in the LTS regression method the choice of default values for high break down-point affects seriously the efficiency of the estimator. In the proposed approach we introduce penalty cost for discarding an outlier, consequently, the best fit for the majority of the data is obtained by discarding only catastrophic observations. This penalty cost is based on robust design weights and high break down-point residual scale taken from the LTS estimator. The robust estimation is obtained by solving a convex quadratic mixed integer programming problem, where in the objective function the sum of the squared residuals and penalties for discarding observations is minimized. The proposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct a simulation study to compare other robust estimators with our approach in terms of their efficiency and robustness.
Summary of prospective quantification of reimbursement recovery from inpatient acute care outliers.
Silberstein, Gerald S; Paulson, Albert S
2011-01-01
The purpose of this study is to identify and quantify inpatient acute care hospital cases that are eligible for additional financial reimbursement. Acute care hospitals are reimbursed by third-party payers on behalf of their patients. Reimbursement is a fixed amount dependent primarily upon the diagnostic related group (DRG) of the case and the service intensity weight of the individual hospital. This method is used by nearly all third-party payers. For a given case, reimbursement is fixed (all else being equal) until a certain threshold level of charges, the cost outlier threshold, is reached. Above this amount the hospital is partially reimbursed for additional charges above the cost outlier threshold. Hospital discharge information has been described as having an error rate of between 7 and 22 percent in attribution of basic case characteristics. It can be expected that there is a significant error rate in the attribution of charges as well. This could be due to miscategorization of the case, misapplication of charges, or other causes. Identification of likely cases eligible for additional reimbursement would alleviate financial pressure where hospitals would have to absorb high expenses for outlier cases. Determining predicted values for total charges for each case was accomplished by exploring associative relationships between charges and case-specific variables. These variables were clinical, demographic, and administrative. Year-by-year comparisons show that these relationships appear stable throughout the five-year period under study. Beta coefficients developed in Year 1 are applied to develop predictions for Year 3 cases. This was also done for year pairs 2 and 4, and 3 and 5. Based on the predicted and actual value of charges, recovery amounts were calculated for each case in the second year of the year pairs. The year gap is necessary to allow for collection and analysis of the data of the first year of each pair. The analysis was performed in two parts
Daria A Gaykalova
Full Text Available Head and Neck Squamous Cell Carcinoma (HNSCC is the fifth most common cancer, annually affecting over half a million people worldwide. Presently, there are no accepted biomarkers for clinical detection and surveillance of HNSCC. In this work, a comprehensive genome-wide analysis of epigenetic alterations in primary HNSCC tumors was employed in conjunction with cancer-specific outlier statistics to define novel biomarker genes which are differentially methylated in HNSCC. The 37 identified biomarker candidates were top-scoring outlier genes with prominent differential methylation in tumors, but with no signal in normal tissues. These putative candidates were validated in independent HNSCC cohorts from our institution and TCGA (The Cancer Genome Atlas. Using the top candidates, ZNF14, ZNF160, and ZNF420, an assay was developed for detection of HNSCC cancer in primary tissue and saliva samples with 100% specificity when compared to normal control samples. Given the high detection specificity, the analysis of ZNF DNA methylation in combination with other DNA methylation biomarkers may be useful in the clinical setting for HNSCC detection and surveillance, particularly in high-risk patients. Several additional candidates identified through this work can be further investigated toward future development of a multi-gene panel of biomarkers for the surveillance and detection of HNSCC.
Liu, Yan; Zumbo, Bruno D.
2007-01-01
The impact of outliers on Cronbach's coefficient [alpha] has not been documented in the psychometric or statistical literature. This is an important gap because coefficient [alpha] is the most widely used measurement statistic in all of the social, educational, and health sciences. The impact of outliers on coefficient [alpha] is investigated for…
Liu, Yan; Zumbo, Bruno D.
2007-01-01
The impact of outliers on Cronbach's coefficient [alpha] has not been documented in the psychometric or statistical literature. This is an important gap because coefficient [alpha] is the most widely used measurement statistic in all of the social, educational, and health sciences. The impact of outliers on coefficient [alpha] is investigated for…
A Geometric Analysis of Subspace Clustering with Outliers
Soltanolkotabi, Mahdi
2011-01-01
This paper considers the problem of clustering a collection of unlabeled data points assumed to lie near a union of lower dimensional planes. As is common in computer vision or unsupervised learning applications, we do not know in advance how many subspaces there are nor do we have any information about their dimensions. We develop a novel geometric analysis of an algorithm named {\\em sparse subspace clustering} (SSC) \\cite{Elhamifar09}, which significantly broadens the range of problems where it is provably effective. For instance, we show that SSC can recover multiple subspaces, each of dimension comparable to the ambient dimension. We also prove that SSC can correctly cluster data points even when the subspaces of interest intersect. Further, we develop an extension of SSC that succeeds when the data set is corrupted with possibly overwhelmingly many outliers. Underlying our analysis are clear geometric insights, which may bear on other sparse recovery problems. A numerical study complements our theoretica...
New models for describing outliers in meta-analysis.
Baker, Rose; Jackson, Dan
2016-09-01
An unobserved random effect is often used to describe the between-study variation that is apparent in meta-analysis datasets. A normally distributed random effect is conventionally used for this purpose. When outliers or other unusual estimates are included in the analysis, the use of alternative random effect distributions has previously been proposed. Instead of adopting the usual hierarchical approach to modelling between-study variation, and so directly modelling the study specific true underling effects, we propose two new marginal distributions for modelling heterogeneous datasets. These two distributions are suggested because numerical integration is not needed to evaluate the likelihood. This makes the computation required when fitting our models much more robust. The properties of the new distributions are described, and the methodology is exemplified by fitting models to four datasets. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.
Object classification and outliers analysis in the forthcoming Gaia mission
Ordóñez-Blanco, D.; Arcay, B.; Dafonte, C.; Manteiga, M.; Ulla, A.
2010-12-01
Astrophysics is evolving towards the rational optimization of costly observational material by the intelligent exploitation of large astronomical databases from both terrestrial telescopes and spatial mission archives. However, there has been relatively little advance in the development of highly scalable data exploitation and analysis tools needed to generate the scientific returns from these large and expensively obtained datasets. Among the upcoming projects of astronomical instrumentation, Gaia is the next cornerstone ESA mission. The Gaia survey foresees the creation of a data archive and its future exploitation with automated or semi-automated analysis tools. This work reviews some of the work that is being developed by the Gaia Data Processing and Analysis Consortium for the object classification and analysis of outliers in the forthcoming mission.
The variance of length of stay and the optimal DRG outlier payments.
Felder, Stefan
2009-09-01
Prospective payment schemes in health care often include supply-side insurance for cost outliers. In hospital reimbursement, prospective payments for patient discharges, based on their classification into diagnosis related group (DRGs), are complemented by outlier payments for long stay patients. The outlier scheme fixes the length of stay (LOS) threshold, constraining the profit risk of the hospitals. In most DRG systems, this threshold increases with the standard deviation of the LOS distribution. The present paper addresses the adequacy of this DRG outlier threshold rule for risk-averse hospitals with preferences depending on the expected value and the variance of profits. It first shows that the optimal threshold solves the hospital's tradeoff between higher profit risk and lower premium loading payments. It then demonstrates for normally distributed truncated LOS that the optimal outlier threshold indeed decreases with an increase in the standard deviation.
A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements
Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto
2017-08-01
The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.
Statistical Outliers and Dragon-Kings as Bose-Condensed Droplets
Yukalov, V I
2012-01-01
A theory of exceptional extreme events, characterized by their abnormal sizes compared with the rest of the distribution, is presented. Such outliers, called "dragon-kings", have been reported in the distribution of financial drawdowns, city-size distributions (e.g., Paris in France and London in the UK), in material failure, epileptic seizure intensities, and other systems. Within our theory, the large outliers are interpreted as droplets of Bose-Einstein condensate: the appearance of outliers is a natural consequence of the occurrence of Bose-Einstein condensation controlled by the relative degree of attraction, or utility, of the largest entities. For large populations, Zipf's law is recovered (except for the dragon-king outliers). The theory thus provides a parsimonious description of the possible coexistence of a power law distribution of event sizes (Zipf's law) and dragon-king outliers.
Canizares, Priscilla; Gair, Jonathan R.; Sopuerta, Carlos F.
2012-08-01
The detection of gravitational waves from extreme-mass-ratio inspirals (EMRI) binaries, comprising a stellar-mass compact object orbiting around a massive black hole, is one of the main targets for low-frequency gravitational-wave detectors in space, like the Laser Interferometer Space Antenna (LISA) or evolved LISA/New Gravitational Observatory (eLISA/NGO). The long-duration gravitational-waveforms emitted by such systems encode the structure of the strong field region of the massive black hole, in which the inspiral occurs. The detection and analysis of EMRIs will therefore allow us to study the geometry of massive black holes and determine whether their nature is as predicted by general relativity and even to test whether general relativity is the correct theory to describe the dynamics of these systems. To achieve this, EMRI modeling in alternative theories of gravity is required to describe the generation of gravitational waves. However, up to now, only a restricted class of theories has been investigated. In this paper, we explore to what extent EMRI observations with a space-based gravitational-wave observatory like LISA or eLISA/NGO might be able to distinguish between general relativity and a particular modification of it, known as dynamical Chern-Simons modified gravity. Our analysis is based on a parameter estimation study which uses approximate gravitational waveforms obtained via a radiative-adiabatic method. In this framework, the trajectory of the stellar object is modeled as a sequence of geodesics in the spacetime of the modified-gravity massive black hole. The evolution between geodesics is determined by flux formulae based on general relativistic post-Newtonian and black hole perturbation theory computations. Once the trajectory of the stellar compact object has been obtained, the waveforms are computed using the standard multipole formulae for gravitational radiation applied to this trajectory. Our analysis is restricted to a five
B. Langford
2015-03-01
Full Text Available All eddy-covariance flux measurements are associated with random uncertainties which are a combination of sampling error due to natural variability in turbulence and sensor noise. The former is the principal error for systems where the signal-to-noise ratio of the analyser is high, as is usually the case when measuring fluxes of heat, CO2 or H2O. Where signal is limited, which is often the case for measurements of other trace gases and aerosols, instrument uncertainties dominate. We are here applying a consistent approach based on auto- and cross-covariance functions to quantifying the total random flux error and the random error due to instrument noise separately. As with previous approaches, the random error quantification assumes that the time-lag between wind and concentration measurement is known. However, if combined with commonly used automated methods that identify the individual time-lag by looking for the maximum in the cross-covariance function of the two entities, analyser noise additionally leads to a systematic bias in the fluxes. Combining datasets from several analysers and using simulations we show that the method of time-lag determination becomes increasingly important as the magnitude of the instrument error approaches that of the sampling error. The flux bias can be particularly significant for disjunct data, whereas using a prescribed time-lag eliminates these effects (provided the time-lag does not fluctuate unduly over time. We also demonstrate that when sampling at higher elevations, where low frequency turbulence dominates and covariance peaks are broader, both the probability and magnitude of bias are magnified. We show that the statistical significance of noisy flux data can be increased (limit of detection can be decreased by appropriate averaging of individual fluxes, but only if systematic biases are avoided by using a prescribed time-lag. Finally, we make recommendations for the analysis and reporting of data with
Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P
2014-06-26
To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
Meijer, Marrigje F; Reininga, Inge H F; Boerboom, Alexander L; Bulstra, Sjoerd K; Stevens, Martin
2014-10-01
Computer-assisted surgery (CAS) has been developed to enhance prosthetic alignment during primary TKAs. Imageless CAS improves coronal and sagittal alignment compared with conventional TKA. However, the effect of imageless CAS on rotational alignment remains unclear. We conducted a systematic and qualitative review of the current literature regarding the effectiveness of imageless CAS during TKA on (1) rotational alignment of the femoral and tibial components and tibiofemoral mismatch in terms of deviation from neutral rotation, and (2) the number of femoral and tibial rotational outliers. Data sources included PubMed, MEDLINE, and EMBASE. Study selection, data extraction, and methodologic quality assessment were conducted independently by two reviewers. Standardized mean difference with 95% CI was calculated for continuous variables (rotational alignment of the femoral or tibial component and tibiofemoral mismatch). To compare the number of outliers for femoral and tibial component rotation, the odds ratio and 95% CI were calculated. The literature search produced 657 potentially relevant studies, 17 of which met the inclusion criteria. One study was considered as having high methodologic quality, 15 studies had medium, and one study had low quality. Conflicting evidence was found for all outcome measures except for tibiofemoral mismatch. Moderate evidence was found that imageless CAS had no influence on postoperative tibiofemoral mismatch. The measurement protocol for measuring tibial rotation varied among the studies and in only one of the studies was the sample size calculation based on one of the outcome measures used in our systematic review. More studies of high methodologic quality and with a sample size calculation based on the outcome measures will be helpful to assess whether an imageless CAS TKA improves femoral and tibial rotational alignment and tibiofemoral mismatch or decreases the number of femoral and tibial rotational outliers. To statistically
An optimized Leave One Out approach to efficiently identify outliers
Biagi, L.; Caldera, S.; Perego, D.
2012-04-01
Least squares are a well established and very popular statistical toolbox in geomatics. Particularly, LS are applied to routinely adjust geodetic networks in the cases both of classical surveys and of modern GNSS permanent networks, both at the local and at the global spatial scale. The linearized functional model between the observables and a vector of unknowns parameters is given. A vector of N observations and its apriori covariance is available. Typically, the observations vector can be decomposed into n subvectors, internally correlated but reciprocally uncorrelated. This happens, for example, when double differences are built from undifferenced observations and are processed to estimate the network coordinates of a GNSS session. Note that when all the observations are independent, n=N: this is for example the case of the adjustment of a levelling network. LS provide the estimates of the parameters, the observables, the residuals and of the a posteriori variance. The testing of the initial hypotheses, the rejection of outliers and the estimation of accuracies and reliabilities can be performed at different levels of significance and power. However, LS are not robust. The a posteriori estimation of the variance can be biased by one unmodelled outlier in the observations. In some case, the unmodelled bias is spread into all the residuals and its identification is difficult. A possible solution to this problem is given by the so called Leave One Out (LOO) approach. A particular subvector can be excluded from the adjustment, whose results are used to check the residuals of the excluded subvector. Clearly, the check is more robust, because a bias in the subvector does not affect the adjustment results. The process can be iterated on all the subvectors. LOO is robust but can be very slow, when n adjustments are performed. An optimized LLO algorithm has been studied. The usual LS adjustment on all the observations is performed to obtain a 'batch' result. The
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
Michel, T. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany)]. E-mail: thilo.michel@physik.uni-erlangen.de; Anton, G. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Boehnel, M. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Durst, J. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Firsching, M. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Korn, A. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Kreisler, B. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Loehr, A. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Nachtrab, F. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Niederloehner, D. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Sukowski, F. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany); Takoukam Talla, P. [Physikalisches Institut, Universitaet Erlangen-Nuernberg, Erwin-Rommel-Strasse 1, 91058 Erlangen (Germany)
2006-12-01
We outline in this paper that the noise of a photon counting pixel detector depends on the detection efficiency and the average multiplicity of counts per interacting photon. We give a simple expression for the signal-to-noise ratio (SNR) and zero-frequency detective quantum efficiency (DQE). We describe a method to determine the DQE from measured data and to optimize the DQE as a function of energy threshold.
2013-01-01
Background Decades of research strongly suggest that the genetic etiology of autism spectrum disorders (ASDs) is heterogeneous. However, most published studies focus on group differences between cases and controls. In contrast, we hypothesized that the heterogeneity of the disorder could be characterized by identifying pathways for which individuals are outliers rather than pathways representative of shared group differences of the ASD diagnosis. Methods Two previously published blood gene expression data sets – the Translational Genetics Research Institute (TGen) dataset (70 cases and 60 unrelated controls) and the Simons Simplex Consortium (Simons) dataset (221 probands and 191 unaffected family members) – were analyzed. All individuals of each dataset were projected to biological pathways, and each sample’s Mahalanobis distance from a pooled centroid was calculated to compare the number of case and control outliers for each pathway. Results Analysis of a set of blood gene expression profiles from 70 ASD and 60 unrelated controls revealed three pathways whose outliers were significantly overrepresented in the ASD cases: neuron development including axonogenesis and neurite development (29% of ASD, 3% of control), nitric oxide signaling (29%, 3%), and skeletal development (27%, 3%). Overall, 50% of cases and 8% of controls were outliers in one of these three pathways, which could not be identified using group comparison or gene-level outlier methods. In an independently collected data set consisting of 221 ASD and 191 unaffected family members, outliers in the neurogenesis pathway were heavily biased towards cases (20.8% of ASD, 12.0% of control). Interestingly, neurogenesis outliers were more common among unaffected family members (Simons) than unrelated controls (TGen), but the statistical significance of this effect was marginal (Chi squared P outlier groups were distinct for each implicated pathway. Moreover, our results suggest that by seeking
Karrila, Seppo; Lee, Julian Hock Ean; Tucker-Kellogg, Greg
2011-04-18
A core component in translational cancer research is biomarker discovery using gene expression profiling for clinical tumors. This is often based on cell line experiments; one population is sampled for inference in another. We disclose a semisupervised workflow focusing on binary (switch-like, bimodal) informative genes that are likely cancer relevant, to mitigate this non-statistical problem. Outlier detection is a key enabling technology of the workflow, and aids in identifying the focus genes.We compare outlier detection techniques MOST, LSOSS, COPA, ORT, OS, and t-test, using a publicly available NSCLC dataset. Removing genes with Gaussian distribution is computationally efficient and matches MOST particularly well, while also COPA and OS pick prognostically relevant genes in their top ranks. Also our stability assessment is in favour of both MOST and COPA; the latter does not pair well with prefiltering for non-Gaussianity, but can handle data sets lacking non-cancer cases.We provide R code for replicating our approach or extending it.
Effect of Dosimetric Outliers on the Performance of a Commercial Knowledge-Based Planning Solution.
Delaney, Alexander R; Tol, Jim P; Dahele, Max; Cuijpers, Johan; Slotman, Ben J; Verbakel, Wilko F A R
2016-03-01
RapidPlan, a commercial knowledge-based planning solution, uses a model library containing the geometry and associated dosimetry of existing plans. This model predicts achievable dosimetry for prospective patients that can be used to guide plan optimization. However, it is unknown how suboptimal model plans (outliers) influence the predictions or resulting plans. We investigated the effect of, first, removing outliers from the model (cleaning it) and subsequently adding deliberate dosimetric outliers. Clinical plans from 70 head and neck cancer patients comprised the uncleaned (UC) ModelUC, from which outliers were cleaned (C) to create ModelC. The last 5 to 40 patients of ModelC were replanned with no attempt to spare the salivary glands. These substantial dosimetric outliers were reintroduced to the model in increments of 5, creating Model5 to Model40 (Model5-40). These models were used to create plans for a 10-patient evaluation group. Plans from ModelUC and ModelC, and ModelC and Model5-40 were compared on the basis of boost (B) and elective (E) target volume homogeneity indexes (HIB/HIE) and mean doses to oral cavity, composite salivary glands (compsal) and swallowing (compswal) structures. On average, outlier removal (ModelC vs ModelUC) had minimal effects on HIB/HIE (0%-0.4%) and sparing of organs at risk (mean dose difference to oral cavity and compsal/compswal were ≤0.4 Gy). Model5-10 marginally improved compsal sparing, whereas adding a larger number of outliers (Model20-40) led to deteriorations in compsal up to 3.9 Gy, on average. These increases are modest compared to the 14.9 Gy dose increases in the added outlier plans, due to the placement of optimization objectives below the inferior boundary of the dose-volume histogram-predicted range. Overall, dosimetric outlier removal from or addition of 5 to 10 outliers to a 70-patient model had marginal effects on resulting plan quality. Although the addition of >20 outliers deteriorated plan quality, the
Outliers,inliers and the generalized least trimmed squares estimator in system identification
Erwei BAI
2003-01-01
The least trimmed squares estimator (LTS) is a well known robust estimator in terms of protecting the estimate fiom the outliers. Its high computational complexity is however a problem in practice. We show that the LTS estimate can be obtained by a simple algorithm with the complexity O( NIn N) for hrge N, where N is the number of measurements. We also show that though the LTS is robust in terms of the outliers, it is sensitive to the inliers. The concept of the inliers is introduced. Moreover, the Generalized Least Trimmed Squares estimator (GLTS) together with its solution are presented that reduces the effect of both the outliers and the inliers.
Robust Estimators for the Correlation Measure to Resist Outliers in Data
Juthaphorn Sinsomboonthong
2016-12-01
Full Text Available The objective of this research was to propose a composite correlation coefficient to estimate the rank correlation coefficient of two variables. A simulation study was conducted using 228 situations for a bivariate normal distribution to compare the robustness properties of the proposed rank correlation coefficient with three estimators, namely, Spearman’s rho, Kendall’s tau and Plantagenet’s correlation coefficients when the data were contaminated with outliers. In both cases of non-outliers and outliers in the data, it was found that the composite correlation coefficient seemed to be the most robust estimator for all sample sizes, whatever the level of the correlation coefficient.
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Vikas Bansal
Full Text Available Copy number variations (CNVs are one of the main sources of variability in the human genome. Many CNVs are associated with various diseases including cardiovascular disease. In addition to hybridization-based methods, next-generation sequencing (NGS technologies are increasingly used for CNV discovery. However, respective computational methods applicable to NGS data are still limited. We developed a novel CNV calling method based on outlier detection applicable to small cohorts, which is of particular interest for the discovery of individual CNVs within families, de novo CNVs in trios and/or small cohorts of specific phenotypes like rare diseases. Approximately 7,000 rare diseases are currently known, which collectively affect ∼6% of the population. For our method, we applied the Dixon's Q test to detect outliers and used a Hidden Markov Model for their assessment. The method can be used for data obtained by exome and targeted resequencing. We evaluated our outlier-based method in comparison to the CNV calling tool CoNIFER using eight HapMap exome samples and subsequently applied both methods to targeted resequencing data of patients with Tetralogy of Fallot (TOF, the most common cyanotic congenital heart disease. In both the HapMap samples and the TOF cases, our method is superior to CoNIFER, such that it identifies more true positive CNVs. Called CNVs in TOF cases were validated by qPCR and HapMap CNVs were confirmed with available array-CGH data. In the TOF patients, we found four copy number gains affecting three genes, of which two are important regulators of heart development (NOTCH1, ISL1 and one is located in a region associated with cardiac malformations (PRODH at 22q11. In summary, we present a novel CNV calling method based on outlier detection, which will be of particular interest for the analysis of de novo or individual CNVs in trios or cohorts up to 30 individuals, respectively.
Krishnan, S.; Kerkhoff, H.G.
2013-01-01
Stringent quality requirements on final electronic products are continuously forcing semiconductor industries, especially the automobile industry, to insert additional reliability tests in their production flow. The problem with linear regressions models for outlier identification in analog and RF
The effect of phenotypic outliers and non-normality on rare-variant association testing.
Auer, Paul L; Reiner, Alex P; Leal, Suzanne M
2016-08-01
Rare-variant association studies (RVAS) have made important contributions to human complex trait genetics. These studies rely on specialized statistical methods for analyzing rare-variant associations, both individually and in aggregate. We investigated the impact that phenotypic outliers and non-normality have on the performance of rare-variant association testing procedures. Ignoring outliers or non-normality can significantly inflate Type I error rates. We found that rank-based inverse normal transformation (INT) and trait winsorisation were both effective at maintaining Type I error control without sacrificing power in the presence of outliers. INT was the optimal method for non-normally distributed traits. For RVAS of quantitative traits with outliers or non-normality, we recommend using INT to transform phenotypic values before association testing.
Eigenvalue Outliers of Non-Hermitian Random Matrices with a Local Tree Structure.
Neri, Izaak; Metz, Fernando Lucas
2016-11-25
Spectra of sparse non-Hermitian random matrices determine the dynamics of complex processes on graphs. Eigenvalue outliers in the spectrum are of particular interest, since they determine the stationary state and the stability of dynamical processes. We present a general and exact theory for the eigenvalue outliers of random matrices with a local tree structure. For adjacency and Laplacian matrices of oriented random graphs, we derive analytical expressions for the eigenvalue outliers, the first moments of the distribution of eigenvector elements associated with an outlier, the support of the spectral density, and the spectral gap. We show that these spectral observables obey universal expressions, which hold for a broad class of oriented random matrices.
Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling
Vallisneri, Michele
2016-01-01
Pulsar-timing datasets have been analyzed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e., outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a "no-U-...
An application of robust ridge regression model in the presence of outliers to real data problem
Shariff, N. S. Md.; Ferdaos, N. A.
2017-09-01
Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.
A method to account for outliers in the development of safety performance functions.
El-Basyouny, Karim; Sayed, Tarek
2010-07-01
Accident data sets can include some unusual data points that are not typical of the rest of the data. The presence of these data points (usually termed outliers) can have a significant impact on the estimates of the parameters of safety performance functions (SPFs). Few studies have considered outliers analysis in the development of SPFs. In these studies, the practice has been to identify and then exclude outliers from further analysis. This paper introduces alternative mixture models based on the multivariate Poisson lognormal (MVPLN) regression. The proposed approach presents outlier resistance modeling techniques that provide robust safety inferences by down-weighting the outlying observations rather than rejecting them. The first proposed model is a scale-mixture model that is obtained by replacing the normal distribution in the Poisson-lognormal hierarchy by the Student t distribution, which has heavier tails. The second model is a two-component mixture (contaminated normal model) where it is assumed that most of the observations come from a basic distribution, whereas the remaining few outliers arise from an alternative distribution that has a larger variance. The results indicate that the estimates of the extra-Poisson variation parameters were considerably smaller under the mixture models leading to higher precision. Also, both mixture models have identified the same set of outliers. In terms of goodness-of-fit, both mixture models have outperformed the MVPLN. The outlier rejecting MVPLN model provided a superior fit in terms of a much smaller DIC and standard deviations for the parameter estimates. However, this approach tends to underestimate uncertainty by producing too small standard deviations for the parameter estimates, which may lead to incorrect conclusions. It is recommended that the proposed outlier resistance modeling techniques be used unless the exclusion of the outlying observations can be justified because of data related reasons (e
Outlier rejection fuzzy c-means (ORFCM) algorithm for image segmentation
segmentation, Outlier rejection fuzzy c-means (ORFCM)
2013-01-01
This paper presents a fuzzy clustering-based technique for image segmentation. Many attempts have been put into practice to increase the conventional fuzzy c-means (FCM) performance. In this paper, the sensitivity of the soft membership function of the FCM algorithm to the outlier is considered and the new exponent operator on the Euclidean distance is implemented in the membership function to improve the outlier rejection characteristics of the FCM. The comparative quantitative and qua...
Jiang, Yang; Gong, Yuanzheng; Wang, Thomas D.; Seibel, Eric J.
2017-02-01
Multimodal endoscopy, with fluorescence-labeled probes binding to overexpressed molecular targets, is a promising technology to visualize early-stage cancer. T/B ratio is the quantitative analysis used to correlate fluorescence regions to cancer. Currently, T/B ratio calculation is post-processing and does not provide real-time feedback to the endoscopist. To achieve real-time computer assisted diagnosis (CAD), we establish image processing protocols for calculating T/B ratio and locating high-risk fluorescence regions for guiding biopsy and therapy in Barrett's esophagus (BE) patients. Methods: Chan-Vese algorithm, an active contour model, is used to segment high-risk regions in fluorescence videos. A semi-implicit gradient descent method was applied to minimize the energy function of this algorithm and evolve the segmentation. The surrounding background was then identified using morphology operation. The average T/B ratio was computed and regions of interest were highlighted based on user-selected thresholding. Evaluation was conducted on 50 fluorescence videos acquired from clinical video recordings using a custom multimodal endoscope. Results: With a processing speed of 2 fps on a laptop computer, we obtained accurate segmentation of high-risk regions examined by experts. For each case, the clinical user could optimize target boundary by changing the penalty on area inside the contour. Conclusion: Automatic and real-time procedure of calculating T/B ratio and identifying high-risk regions of early esophageal cancer was developed. Future work will increase processing speed to <5 fps, refine the clinical interface, and apply to additional GI cancers and fluorescence peptides.
Taming outliers in pulsar-timing datasets with hierarchical likelihoods and Hamiltonian sampling
Vallisneri, Michele; van Haasteren, Rutger
2017-01-01
Pulsar-timing datasets have been analyzed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e., outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a "no-U-turn" Hamiltonian sampler coupled to a highly customized reparametrization of the likelihood; this code is described elsewhere, but it is already available online. We recommend our method as a standard step in the preparation of pulsar-timing-array datasets: even if statistical inference is not affected, follow-up studies of outlier candidates can reveal unseen problems in radio observations and timing measurements; furthermore, confidence in the results of gravitational-wave searches will only benefit from stringent statistical evidence that datasets are clean and outlier-free.
Sembroni, Andrea; Molin, Paola; Dramis, Francesco; Faccenna, Claudio; Abebe, Bekele
2017-05-01
An outlier consists of an area of younger rocks surrounded by older ones. Its formation is mainly related to the erosion of surrounding rocks which causes the interruption of the original continuity of the rocks. Because of its origin, an outlier is an important witness of the paleogeography of a region and, therefore, essential to understand its topographic and geological evolution. The Mekele Outlier (N Ethiopia) is characterized by poorly incised Mesozoic marine sediments and dolerites (∼2000 m in elevation), surrounded by strongly eroded Precambrian and Paleozoic rocks and Tertiary volcanic deposits in a context of a mantle supported topography. In the past, studies about the Mekele outlier focused mainly in the mere description of the stratigraphic and tectonic settings without taking into account the feedback between surface and deep processes in shaping such peculiar feature. In this study we present the geological and geomorphometric analyses of the Mekele Outlier taking into account the general topographic features (slope map, swath profiles, local relief), the river network and the principal tectonic lineaments of the outlier. The results trace the evolution of the study area as related not only to the mere erosion of the surrounding rocks but to a complex interaction between surface and deep processes where the lithology played a crucial role.
Robust maximum likelihood estimation for stochastic state space model with observation outliers
AlMutawa, J.
2016-08-01
The objective of this paper is to develop a robust maximum likelihood estimation (MLE) for the stochastic state space model via the expectation maximisation algorithm to cope with observation outliers. Two types of outliers and their influence are studied in this paper: namely,the additive outlier (AO) and innovative outlier (IO). Due to the sensitivity of the MLE to AO and IO, we propose two techniques for robustifying the MLE: the weighted maximum likelihood estimation (WMLE) and the trimmed maximum likelihood estimation (TMLE). The WMLE is easy to implement with weights estimated from the data; however, it is still sensitive to IO and a patch of AO outliers. On the other hand, the TMLE is reduced to a combinatorial optimisation problem and hard to implement but it is efficient to both types of outliers presented here. To overcome the difficulty, we apply the parallel randomised algorithm that has a low computational cost. A Monte Carlo simulation result shows the efficiency of the proposed algorithms. An earlier version of this paper was presented at the 8th Asian Control Conference, Kaohsiung, Taiwan, 2011.
Taming outliers in pulsar-timing data sets with hierarchical likelihoods and Hamiltonian sampling
Vallisneri, Michele; van Haasteren, Rutger
2017-04-01
Pulsar-timing data sets have been analysed with great success using probabilistic treatments based on Gaussian distributions, with applications ranging from studies of neutron-star structure to tests of general relativity and searches for nanosecond gravitational waves. As for other applications of Gaussian distributions, outliers in timing measurements pose a significant challenge to statistical inference, since they can bias the estimation of timing and noise parameters, and affect reported parameter uncertainties. We describe and demonstrate a practical end-to-end approach to perform Bayesian inference of timing and noise parameters robustly in the presence of outliers, and to identify these probabilistically. The method is fully consistent (i.e. outlier-ness probabilities vary in tune with the posterior distributions of the timing and noise parameters), and it relies on the efficient sampling of the hierarchical form of the pulsar-timing likelihood. Such sampling has recently become possible with a 'no-U-turn' Hamiltonian sampler coupled to a highly customized reparametrization of the likelihood; this code is described elsewhere, but it is already available online. We recommend our method as a standard step in the preparation of pulsar-timing-array data sets: even if statistical inference is not affected, follow-up studies of outlier candidates can reveal unseen problems in radio observations and timing measurements; furthermore, confidence in the results of gravitational-wave searches will only benefit from stringent statistical evidence that data sets are clean and outlier-free.
Eduardo S. Cantú
2013-02-01
Full Text Available Distorted sex ratios occur in hematologic disorders. For example, chronic lymphocytic leukemia (CLL displays disproportionate sex ratios with a large male excess. However, the underlying genetics for these disparities are poorly understood, and gender differences for specific cytogenetic abnormalities have not been carefully investigated. We sought to provide an initial characterization of gender representation in genetic abnormalities in CLL by using fluorescence in situ hybridization (FISH. We confirm the well known skewed male-to-female (M/F sex ratio of ~1.5 in our CLL study population, but also determine the genotypic M/F sex ratio values corresponding to specific FISH DNA probes. Genetic changes in CLL detectable by four FISH probes were statistically compared with respect to gender. Initial FISH evaluations of 4698 CLL patients were retrospectively examined and new findings of the genotypic M/F sex ratios for these probes are reported. This study represents the largest CLL survey conducted in the United States using FISH probes. The CLL database demonstrated that FISH abnormalities (trisomy 12, 13q14.3 deletion and 17p13.1 deletion probes had skewed M/F ratios of ~1.5. Also, by statistical analysis it was shown that ATM gene loss (11q22.3q23.1 deletion solely or with other abnormalities was considerably higher in males with an M/F ratio of 2.5 and significantly different from M/F ratios of 1.0 or 1.5. We hypothesize that interactions involving these autosomal abnormalities (trisomy 12, and deletions of 11q22.3, 13q14.3, and 17p13.1, and the sex chromosomes may provide the genetic basis for the altered phenotypic M/F ratio in CLL.
Ulriksen, M. D.; Damkilde, L.
2016-02-01
Contrary to global modal parameters such as eigenfrequencies, mode shapes inherently provide structural information on a local level. Therefore, this particular modal parameter and its derivatives are utilized extensively for damage identification. Typically, more or less advanced mathematical methods are employed to identify damage-induced discontinuities in the spatial mode shape signals, hereby, potentially, facilitating damage detection and/or localization. However, by being based on distinguishing damage-induced discontinuities from other signal irregularities, an intrinsic deficiency in these methods is the high sensitivity towards measurement noise. In the present paper, a damage localization method which, compared to the conventional mode shape-based methods, has greatly enhanced robustness towards measurement noise is proposed. The method is based on signal processing of a spatial mode shape by means of continuous wavelet transformation (CWT) and subsequent application of a generalized discrete Teager-Kaiser energy operator (GDTKEO) to identify damage-induced mode shape discontinuities. In order to evaluate whether the identified discontinuities are in fact damage-induced, outlier analysis is conducted by applying the Mahalanobis metric to major principal scores of the sensor-located bands of the signal-processed mode shape. The method is tested analytically and benchmarked with other mode shape-based damage localization approaches on the basis of a free-vibrating beam and validated experimentally in the context of a residential-sized wind turbine blade subjected to an impulse load.
Tingting SHAO
2015-03-01
Full Text Available Background and objective Mediastinal involvement in lung cancer is a highly significant prognostic factor for survival, and accurate staging of the mediastinum will correctly identify patients who will benefit the most from surgery. Positron emission tomography/computed tomography (PET/CT has become the standard imaging modality for the staging of patients with lung cancer. The aim of this study is to investigate 18-fluoro-2-deoxy-glucose (18F-FDG PET/CT imaging in the detection of mediastinal disease in lung cancer. Methods A total of 72 patients newly diagnosed with non-small cell lung cancer (NSCLC who underwent preoperative whole-body 18F-FDG PET/CT were retrospectively included. All patients underwent radical surgery and mediastinal lymph node dissection. Mediastinal disease was histologically confirmed in 45 of 413 lymph nodes. PET/CT doctors analyzed patients’ visual images and evaluated lymph node’s short axis, lymph node’s maximum standardized uptake value (SUVmax, node/aorta density ratio, node/aorta SUV ratio, and other parameters using the histopathological results as the reference standard. The optimal cutoff value for each ratio was determined by receiver operator characteristic curve analysis. Results Using a threshold of 0.9 for density ratio and 1.2 for SUV ratio yielded high accuracy for the detection of mediastinal disease. The lymph node’s short axis, lymph node’s SUVmax, density ratio, and SUV ratio of integrated PET/CT for the accuracy of diagnosing mediastinal lymph node was 95.2%. The diagnostic accuracy of mediastinal lymph node with conventional PET/CT was 89.8%, whereas that of PET/CT comprehensive analysis was 90.8%. Conclusion Node/aorta density ratio and SUV ratio may be complimentary to conventional visual interpretation and SUVmax measurement. The use of lymph node’s short axis, lymph node’s SUVmax, and both ratios in combination is better than either conventional PET/CT analysis or PET
Detection of HD in the atmospheres of Uranus and Neptune : a new determination of the D/H ratio
Feuchtgruber, H; Lellouch, E; Bezard, B; Encrenaz, T; de Graauw, T; Davis, GR
1999-01-01
Observations with the Short Wavelength Spectrometer (SWS) onboard the Infrared Space Observatory (ISO) have led to the first unambiguous detection of HD in the atmospheres of Uranus and Neptune, from its R(2) rotational line at 37.7 mu m Using S(0) and S(1) quadrupolar lines of H(2) at 28.2 and 17.0
Detection of HD in the atmospheres of Uranus and Neptune : a new determination of the D/H ratio
Feuchtgruber, H; Lellouch, E; Bezard, B; Encrenaz, T; de Graauw, T; Davis, GR
1999-01-01
Observations with the Short Wavelength Spectrometer (SWS) onboard the Infrared Space Observatory (ISO) have led to the first unambiguous detection of HD in the atmospheres of Uranus and Neptune, from its R(2) rotational line at 37.7 mu m Using S(0) and S(1) quadrupolar lines of H(2) at 28.2 and 17.0
Detection of HD in the atmospheres of Uranus and Neptune : a new determination of the D/H ratio
Feuchtgruber, H; Lellouch, E; Bezard, B; Encrenaz, T; de Graauw, T; Davis, GR
Observations with the Short Wavelength Spectrometer (SWS) onboard the Infrared Space Observatory (ISO) have led to the first unambiguous detection of HD in the atmospheres of Uranus and Neptune, from its R(2) rotational line at 37.7 mu m Using S(0) and S(1) quadrupolar lines of H(2) at 28.2 and 17.0
Detection of chloronium and measurement of the 35Cl/37Cl isotopic ratio at z=0.89 toward PKS1830-211
Muller, S; Guelin, M; Henkel, C; Combes, F; Gerin, M; Aalto, S; Beelen, A; Darling, J; Horellou, C; Martin, S; Menten, K M; Dinh-V-Trung,; Zwaan, M A
2014-01-01
We report the first extragalactic detection of chloronium (H2Cl+), in the z=0.89 absorber in front of the lensed blazar PKS1830-211. The ion is detected through its 1_11-0_00 line along two independent lines of sight toward the North-East and South-West images of the blazar. The relative abundance of H2Cl+ is significantly higher (by a factor ~7) in the NE line of sight, which has a lower H2/H fraction, indicating that H2Cl+ preferably traces the diffuse gas component. From the ratio of the H2^35Cl+ and H2^37Cl+ absorptions toward the SW image, we measure a 35Cl/37Cl isotopic ratio of 3.1 (-0.2; +0.3) at z=0.89, similar to that observed in the Galaxy and the solar system.
Byung Eun Lee
2014-09-01
Full Text Available This paper proposes an algorithm for fault detection, faulted phase and winding identification of a three-winding power transformer based on the induced voltages in the electrical power system. The ratio of the induced voltages of the primary-secondary, primary-tertiary and secondary-tertiary windings is the same as the corresponding turns ratio during normal operating conditions, magnetic inrush, and over-excitation. It differs from the turns ratio during an internal fault. For a single phase and a three-phase power transformer with wye-connected windings, the induced voltages of each pair of windings are estimated. For a three-phase power transformer with delta-connected windings, the induced voltage differences are estimated to use the line currents, because the delta winding currents are practically unavailable. Six detectors are suggested for fault detection. An additional three detectors and a rule for faulted phase and winding identification are presented as well. The proposed algorithm can not only detect an internal fault, but also identify the faulted phase and winding of a three-winding power transformer. The various test results with Electromagnetic Transients Program (EMTP-generated data show that the proposed algorithm successfully discriminates internal faults from normal operating conditions including magnetic inrush and over-excitation. This paper concludes by implementing the algorithm into a prototype relay based on a digital signal processor.
Integrated genomic analysis of survival outliers in glioblastoma.
Peng, Sen; Dhruv, Harshil; Armstrong, Brock; Salhia, Bodour; Legendre, Christophe; Kiefer, Jeffrey; Parks, Julianna; Virk, Selene; Sloan, Andrew E; Ostrom, Quinn T; Barnholtz-Sloan, Jill S; Tran, Nhan L; Berens, Michael E
2017-06-01
To elucidate molecular features associated with disproportionate survival of glioblastoma (GB) patients, we conducted deep genomic comparative analysis of a cohort of patients receiving standard therapy (surgery plus concurrent radiation and temozolomide); "GB outliers" were identified: long-term survivor of 33 months (LTS; n = 8) versus short-term survivor of 7 months (STS; n = 10). We implemented exome, RNA, whole genome sequencing, and DNA methylation for collection of deep genomic data from STS and LTS GB patients. LTS GB showed frequent chromosomal gains in 4q12 (platelet derived growth factor receptor alpha and KIT) and 12q14.1 (cyclin-dependent kinase 4), and deletion in 19q13.33 (BAX, branched chain amino-acid transaminase 2, and cluster of differentiation 33). STS GB showed frequent deletion in 9p11.2 (forkhead box D4-like 2 and aquaporin 7 pseudogene 3) and 22q11.21 (Hypermethylated In Cancer 2). LTS GB showed 2-fold more frequent copy number deletions compared with STS GB. Gene expression differences showed the STS cohort with altered transcriptional regulators: activation of signal transducer and activator of transcription (STAT)5a/b, nuclear factor-kappaB (NF-κB), and interferon-gamma (IFNG), and inhibition of mitogen-activated protein kinase (MAPK1), extracellular signal-regulated kinase (ERK)1/2, and estrogen receptor (ESR)1. Expression-based biological concepts prominent in the STS cohort include metabolic processes, anaphase-promoting complex degradation, and immune processes associated with major histocompatibility complex class I antigen presentation; the LTS cohort features genes related to development, morphogenesis, and the mammalian target of rapamycin signaling pathway. Whole genome methylation analyses showed that a methylation signature of 89 probes distinctly separates LTS from STS GB tumors. We posit that genomic instability is associated with longer survival of GB (possibly with vulnerability to standard therapy); conversely, genomic
Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research
Bakker, Marjan; Wicherts, Jelte M.
2014-01-01
Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606
Tracking and Resolving CT Dose Metric Outliers Using Root-Cause Analysis.
Chen, Yingming Amy; MacGregor, Kate; Li, Iris; Concepcion, Lianne; Deva, Djeven Parameshvara; Dowdell, Timothy; Gray, Bruce Garstang
2016-06-01
The aim of this study was to examine the frequency and type of outlier dose metrics for three common CT examination types on the basis of a root-cause analysis (RCA) approach. Institutional review board approval was obtained for this retrospective observational study. The requirement to obtain informed consent was waived. Between January 2010 and December 2013, radiation dose metric data from 34,615 CT examinations, including 26,878 routine noncontrast CT head, 2,992 CT pulmonary angiographic (CTPA), and 4,745 renal colic examinations, were extracted from a radiation dose index monitoring database and manually cleaned. Dose outliers were identified on the basis of the statistical distribution of volumetric CT dose index and dose-length product for each examination type; values higher than the 99th percentile and less than the 1st percentile were flagged for RCA. There were 397 noncontrast CT head, 52 CTPA, and 80 renal colic outliers. Root causes for high-outlier examinations included repeat examinations due to patient motion (n = 122 [31%]), modified protocols mislabeled as "routine" (n = 69 [18%]), higher dose examinations for patients with large body habitus (n = 27 [7%]), repeat examinations due to technical artifacts (n = 20 [5%]), and repeat examinations due to suboptimal contrast timing (CTPA examinations) (n = 18 [5%]). Root causes for low-outlier examinations included low-dose protocols (n = 112 [29%]) and aborted examinations (n = 8 [2%]). On the basis of examination frequency over a 3-month period, the 90th and 10th percentile values were set in the radiation dose index monitoring database as thresholds for sending notifications to staff members responsible for outlier investigations. Systematic RCA of dose outliers identifies sources of variation and dose excess and pinpoints specific protocol and technical shortcomings for corrective action. Copyright © 2016 American College of Radiology. Published by Elsevier Inc. All rights reserved.
Mustafa Karabacak; Kenan Ahmet Turkdogan; Abuzer Coskun; Orhan Akpinar; Ali Duman; Mcahit Kapci; Sevki Hakan Eren; Pnar Karabacak
2015-01-01
Objective: To investigate neutrophil–lymphocyte ratio (NLR), which is an indicator of systemic inflammation, in patients with carbon monoxide (CO) poisoning. Methods: We included 528 patients (275 women) who presented with a diagnosis of CO poisoning between June 2009 and March 2014. Control group was composed of 54 patients (24 women). Platelet count and mean platelet volume level were significantly higher in the CO poisoning group. Results: White blood cell level (9.8 ± 3.3vs 8.6 ± 2.9× 103/mL, respectively;P= 0.01), neutrophil count (6.00 ± 2.29vs 4.43 ± 2.04×103/mL, respectively;P Conclusions: The increase ofNLR may indicate the progression of fatal complications due to CO poisoning.
Singh, Niraj Kumar; Barman, Animesh
2016-01-01
Several parameters of ocular vestibular-evoked myogenic potential (oVEMP) have been used to identify Meniere's disease. Nonetheless frequency-amplitude ratio (FAR), which is the ratio of amplitude between two frequencies, is one among the parameters that has failed to attract researchers' attention despite proving its worth in diagnosis of Meniere's disease when used in conjunction with cervical VEMP. Thus, the present study aimed at investigating the utility of FAR of oVEMP in identifying Meniere's disease and finding out an optimum frequency pair for its diagnosis. Using a case-control design, oVEMPs were recorded for tone bursts of 500, 750, 1000, and 1500 Hz from 36 individuals with unilateral definite Meniere's disease in the age range of 15 to 50 years. For comparison purposes, oVEMP at the above frequencies were also obtained from an equal number of age- and gender-matched healthy individuals. The amplitudes of 750, 1000, 1500 Hz and tuned frequency, which was the frequency with the largest peak to peak amplitude among the above-mentioned frequencies, were divided by the amplitude of 500 Hz to obtain FARs for 750/500, 1000/500, 1500/500, and tuned frequency/500 frequency pairs. The results revealed significantly higher FAR in the Meniere's disease group than the healthy controls for all the frequency pairs (p < 0.05). The sensitivity of almost 90% and the specificity 100% was obtained for 1000/500 and 750/500, whereas the other frequency pairs produced a sensitivity of about 56% while still showing a specificity of 100%. High sensitivity and specificity, coupled with considerably lowered test duration when using only two frequencies, makes the use of FAR a more attractive prerogative, with 1000/500 as the frequency pair of choice.
Einian, M R; Aghamiri, S M R; Ghaderi, R
2016-01-01
The discrimination of the composition of environmental and non-environmental materials by the estimation of the (234)U/(238)U activity ratio in alpha-particle spectrometry is important in many applications. If the interfering elements are not completely separated from the uranium, they can interfere with the determination of (234)U. Thickness as a result of the existence of iron in the source preparation phase and their alpha lines can broaden the alpha line of (234)U in alpha spectra. Therefore, the asymmetric broadening of the alpha line of (234)U and overlapping of peaks make the analysis of the alpha particle spectra and the interpretation of the results difficult. Applying Artificial Neural Network (ANN) to a spectrometry system is a good idea because it eliminates limitations of classical approaches by extracting the desired information from the input data. In this work, the average of a partial uranium raw spectrum, were considered. Each point that its slope was of the order of 0-1% per 10 channels, was used as input to the multi-layer feed forward error-back propagation network. The network was trained by an alpha spectrum library which has been developed in the present work. The training data in this study was actual spectral data with any reasonable thickness and interfering elements. According to the results, the method applied to estimate the activity ratio in this work, can examine the alpha spectrum for peaks which would not be expected for a source of given element and provide the clues about composition of uranium contamination in the environmental samples in a fast screening and classifying procedures.
LIU Liang-yun; HUANG Wen-jiang; PU Rui-liang; WANG Ji-hua
2014-01-01
Spectral relfectance in the near-infrared (NIR) shoulder (750-900 nm) region is affected by internal leaf structure, but it has rarely been investigated. In this study, a dehydration treatment and three paraquat herbicide applications were conducted to explore how spectral relfectance and shape in the NIR shoulder region responded to various stresses. A new spectral ratio index in the NIR shoulder region (NSRI), deifned by a simple ratio of relfectance at 890 nm to relfectance at 780 nm, was proposed for assessing leaf structure deterioration. Firstly, a wavelength-independent increase in spectral relfectance in the NIR shoulder region was observed from the mature leaves with slight dehydration. An increase in spectral slope in the NIR shoulder would be expected only when water stress developed sufifciently to cause severe leaf dehydration resulting in an alteration in cell structure. Secondly, the alteration of leaf cell structure caused by Paraquat herbicide applications resulted in a wavelength-dependent variation of spectral relfectance in the NIR shoulder region. The NSRI in the NIR shoulder region increased signiifcantly under an herbicide application. Although the dehydration process also occurred with the herbicide injury, NSRI is more sensitive to herbicide injury than the water-related indices (water index and normalized difference water index) and normalized difference vegetation index. Finally, the sensitivity of NSRI to stripe rust in winter wheat was examined, yielding a determination coefifcient of 0.61, which is more signiifcant than normalized difference vegetation index (NDVI), water index (WI) and normalized difference water index (NDWI), with a determination coefifcient of 0.45, 0.36 and 0.13, respectively. In this study, all experimental results demonstrated that NSRI will increase with internal leaf structure deterioration, and it is also a sensitive spectral index for herbicide injury or stripe rust in winter wheat.
Bin Li
2016-05-01
Full Text Available Water quality maintenance should be considered from an ecological perspective since water is a substrate ingredient in the biogeochemical cycle and is closely linked with ecosystem functioning and services. Addressing the status of live organisms in aquatic ecosystems is a critical issue for appropriate prediction and water quality management. Recently, genetic changes in biological organisms have garnered more attention due to their in-depth expression of environmental stress on aquatic ecosystems in an integrative manner. We demonstrate that genetic diversity would adaptively respond to environmental constraints in this study. We applied a self-organizing map (SOM to characterize complex Amplified Fragment Length Polymorphisms (AFLP of aquatic insects in six streams in Japan with natural and anthropogenic variability. After SOM training, the loci compositions of aquatic insects effectively responded to environmental selection pressure. To measure how important the role of loci compositions was in the population division, we altered the AFLP data by flipping the existence of given loci individual by individual. Subsequently we recognized the cluster change of the individuals with altered data using the trained SOM. Based on SOM recognition of these altered data, we determined the outlier loci (over 90th percentile that showed drastic changes in their belonging clusters (D. Subsequently environmental responsiveness (Ek’ was also calculated to address relationships with outliers in different species. Outlier loci were sensitive to slightly polluted conditions including Chl-a, NH4-N, NOX-N, PO4-P, and SS, and the food material, epilithon. Natural environmental factors such as altitude and sediment additionally showed relationships with outliers in somewhat lower levels. Poly-loci like responsiveness was detected in adapting to environmental constraints. SOM training followed by recognition shed light on developing algorithms de novo to
Srivastava, S.
2015-12-01
Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.
N. Mijailovic
2015-01-01
Full Text Available Knowledge about the knee cartilage deformation ratio as well as the knee cartilage stress distribution is of particular importance in clinical studies due to the fact that these represent some of the basic indicators of cartilage state and that they also provide information about joint cartilage wear so medical doctors can predict when it is necessary to perform surgery on a patient. In this research, we apply various kinds of sensors such as a system of infrared cameras and reflective markers, three-axis accelerometer, and force plate. The fluorescent marker and accelerometers are placed on the patient’s hip, knee, and ankle, respectively. During a normal walk we are recording the space position of markers, acceleration, and ground reaction force by force plate. Measured data are included in the biomechanical model of the knee joint. Geometry for this model is defined from CT images. This model includes the impact of ground reaction forces, contact force between femur and tibia, patient body weight, ligaments, and muscle forces. The boundary conditions are created for the finite element method in order to noninvasively determine the cartilage stress distribution.
Ahmad, Rafiq; Tripathy, Nirmalya; Ahn, Min-Sang; Hahn, Yoon-Bong
2017-04-01
This study demonstrates a highly stable, selective and sensitive uric acid (UA) biosensor based on high aspect ratio zinc oxide nanorods (ZNRs) vertical grown on electrode surface via a simple one-step low temperature solution route. Uricase enzyme was immobilized on the ZNRs followed by Nafion covering to fabricate UA sensing electrodes (Nafion/Uricase-ZNRs/Ag). The fabricated electrodes showed enhanced performance with attractive analytical response, such as a high sensitivity of 239.67 μA cm-2 mM-1 in wide-linear range (0.01-4.56 mM), rapid response time (~3 s), low detection limit (5 nM), and low value of apparent Michaelis-Menten constant (Kmapp, 0.025 mM). In addition, selectivity, reproducibility and long-term storage stability of biosensor was also demonstrated. These results can be attributed to the high aspect ratio of vertically grown ZNRs which provides high surface area leading to enhanced enzyme immobilization, high electrocatalytic activity, and direct electron transfer during electrochemical detection of UA. We expect that this biosensor platform will be advantageous to fabricate ultrasensitive, robust, low-cost sensing device for numerous analyte detection.
Hines, Jason A.; Mark, William D.
2014-02-01
The frequency-domain ALR (average-log-ratio) damage-detection algorithm [MSSP 24 (2010) 2807-2823] is utilized to illustrate damage detection and progression on notched-tooth spiral-bevel gears. Use of equal weighting of increases or decreases of individual rotational-harmonic amplitudes caused by damage, for early ALR detections, is substantiated. Continuously improving statistical reliability of ALR is documented by using increasing numbers of rotational-harmonic amplitude-ratios and increasing numbers of waveforms in the synchronous averaging. Sensitivity of the ALR algorithm to incipient damage is observed to be comparable to that obtained from the kurtosis-based Figure of Merit 4 (FM4). In contrast to FM4, ALR is shown to monotonically increase with increasing damage and running time. Interestingly, this diagnostic technique can be implemented with remarkably low analog-to-digital conversion rates. Computation of ALR for differing torque levels shows strong indications of weakening tooth-stiffness and increasing tooth-plastic-deformation. ALR computation utilizing tooth-rotational-location windowing also is illustrated.
Sloth, Jens Jørgen; Larsen, Erik Huusfeldt
2000-01-01
Inductively coupled plasma dynamic reaction cell mass spectrometry (ICP-DRC-MS) was characterised for the detection of the six naturally occurring selenium isotopes. The potentially interfering argon dimers at the selenium masses m/z 74, 76, 78 and 80 were reduced in intensity by approximately five...... orders of magnitude by using methane as reactive cell gas in the DRC. By using 3% v/v methanol in water for carbon-enhanced ionisation of selenium, the sensitivity of Se-80 was 10(4) counts s(-1) per ng ml(-1) of selenium, and the estimated limit of detection was 6 pg ml(-1). The precision of the isotope...... ratios. Deuterated methane used as the DRC gas showed that hydrogen transfer from methane was not involved in the formation of SeH as SeD was absent in the mass spectrum. The almost interference-free detection of selenium by ICP-DRC-MS made the detection of the Se-80 isotope possible for detection...
Persistent and extreme outliers in causes of death by state, 1999-2013.
Boscoe, Francis P
2015-01-01
In the United States, state-specific mortality rates that are high relative to national rates can result from legitimate reasons or from variability in coding practices. This paper identifies instances of state-specific mortality rates that were at least twice the national rate in each of three consecutive five-year periods (termed persistent outliers), along with rates that were at least five times the national rate in at least one five-year period (termed extreme outliers). The resulting set of 71 outliers, 12 of which appear on both lists, illuminates mortality variations within the country, including some that are amenable to improvement either because they represent preventable causes of death or highlight weaknesses in coding techniques. Because the approach used here is based on relative rather than absolute mortality, it is not dominated by the most common causes of death such as heart disease and cancer.
Robust Kalman tracking and smoothing with propagating and non-propagating outliers
Ruckdeschel, Peter; Pupashenko, Daria
2012-01-01
A common situation in filtering where classical Kalman filtering does not perform particularly well is tracking in the presence of propagating outliers. This calls for robustness understood in a distributional sense, i.e.; we enlarge the distribution assumptions made in the ideal model by suitable neighborhoods. Based on optimality results for distributional-robust Kalman filtering from Ruckdeschel[01,10], we propose new robust recursive filters and smoothers designed for this purpose as well as specialized versions for non-propagating outliers. We apply these procedures in the context of a GPS problem arising in the car industry. To better understand these filters, we study their behavior at stylized outlier patterns (for which they are not designed) and compare them to other approaches for the tracking problem. Finally, in a simulation study we discuss efficiency of our procedures in comparison to competitors.
Sandborg, Michael; Carlsson, G.A. (Linkoeping Univ. (Sweden). Dept. of Radiation Physics)
1992-06-01
A lower limit to patient irradiation in diagnostic radiology is set by the fundamental stochastics of the energy imparted to the image receptor (quantum noise). Image quality is investigated here and expressed in terms of the signal-to-noise ratio due to quantum noise. The Monte Carlo method is used to calculate signal-to-noise ratios (SNR{sub {Delta}S}) and detective quantum efficiencies (DQE{sub {Delta}S}) in imaging thin contrasting details of air, fat, bone and iodine within a water phantom using x-ray spectra (40-140 kV) and detectors of CsI, BaFCl and Gd{sub 2}O{sub 2}S. The atomic composition of the contrasting detail influences considerably the values of SNR{sub {Delta}S} due to the different modulations of the energy spectra of primary photons passing beside and through the contrasting detail. (author).
The Impact of Outliers on Net-Benefit Regression Model in Cost-Effectiveness Analysis.
Wen, Yu-Wen; Tsai, Yi-Wen; Wu, David Bin-Chia; Chen, Pei-Fen
2013-01-01
Ordinary least square (OLS) in regression has been widely used to analyze patient-level data in cost-effectiveness analysis (CEA). However, the estimates, inference and decision making in the economic evaluation based on OLS estimation may be biased by the presence of outliers. Instead, robust estimation can remain unaffected and provide result which is resistant to outliers. The objective of this study is to explore the impact of outliers on net-benefit regression (NBR) in CEA using OLS and to propose a potential solution by using robust estimations, i.e. Huber M-estimation, Hampel M-estimation, Tukey's bisquare M-estimation, MM-estimation and least trimming square estimation. Simulations under different outlier-generating scenarios and an empirical example were used to obtain the regression estimates of NBR by OLS and five robust estimations. Empirical size and empirical power of both OLS and robust estimations were then compared in the context of hypothesis testing. Simulations showed that the five robust approaches compared with OLS estimation led to lower empirical sizes and achieved higher empirical powers in testing cost-effectiveness. Using real example of antiplatelet therapy, the estimated incremental net-benefit by OLS estimation was lower than those by robust approaches because of outliers in cost data. Robust estimations demonstrated higher probability of cost-effectiveness compared to OLS estimation. The presence of outliers can bias the results of NBR and its interpretations. It is recommended that the use of robust estimation in NBR can be an appropriate method to avoid such biased decision making.
Welch, C; Petersen, I; Walters, K; Morris, R W; Nazareth, I; Kalaitzaki, E; White, I R; Marston, L; Carpenter, J
2012-07-01
PURPOSE: In the UK, primary care databases include repeated measurements of health indicators at the individual level. As these databases encompass a large population, some individuals have extreme values, but some values may also be recorded incorrectly. The challenge for researchers is to distinguish between records that are due to incorrect recording and those which represent true but extreme values. This study evaluated different methods to identify outliers. METHODS: Ten percent of practices were selected at random to evaluate the recording of 513,367 height measurements. Population-level outliers were identified using boundaries defined using Health Survey for England data. Individual-level outliers were identified by fitting a random-effects model with subject-specific slopes for height measurements adjusted for age and sex. Any height measurements with a patient-level standardised residual more extreme than ±10 were identified as an outlier and excluded. The model was subsequently refitted twice after removing outliers at each stage. This method was compared with existing methods of removing outliers. RESULTS: Most outliers were identified at the population level using the boundaries defined using Health Survey for England (1550 of 1643). Once these were removed from the database, fitting the random-effects model to the remaining data successfully identified only 75 further outliers. This method was more efficient at identifying true outliers compared with existing methods. CONCLUSIONS: We propose a new, two-stage approach in identifying outliers in longitudinal data and show that it can successfully identify outliers at both population and individual level. Copyright © 2011 John Wiley & Sons, Ltd. Copyright © 2011 John Wiley & Sons, Ltd.
Paul, Jijo, E-mail: jijopaul1980@gmail.com [Department of Diagnostic Radiology, Goethe University Hospital, Theodor-Stern-Kai 7, 60590 Frankfurt am Main (Germany); Department of Biophysics, Goethe University, Max von Laue-Str.1, 60438 Frankfurt am Main (Germany); Bauer, Ralf W. [Department of Diagnostic Radiology, Goethe University Hospital, Theodor-Stern-Kai 7, 60590 Frankfurt am Main (Germany); Maentele, Werner [Department of Biophysics, Goethe University, Max von Laue-Str.1, 60438 Frankfurt am Main (Germany); Vogl, Thomas J. [Department of Diagnostic Radiology, Goethe University Hospital, Theodor-Stern-Kai 7, 60590 Frankfurt am Main (Germany)
2011-11-15
Objective: The purpose of this study was to evaluate image fusion in dual energy computed tomography for detecting various anatomic structures based on the effect on contrast enhancement, contrast-to-noise ratio, signal-to-noise ratio and image quality. Material and methods: Forty patients underwent a CT neck with dual energy mode (DECT under a Somatom Definition flash Dual Source CT scanner (Siemens, Forchheim, Germany)). Tube voltage: 80-kV and Sn140-kV; tube current: 110 and 290 mA s; collimation-2 x 32 x 0.6 mm. Raw data were reconstructed using a soft convolution kernel (D30f). Fused images were calculated using a spectrum of weighting factors (0.0, 0.3, 0.6 0.8 and 1.0) generating different ratios between the 80- and Sn140-kV images (e.g. factor 0.6 corresponds to 60% of their information from the 80-kV image, and 40% from the Sn140-kV image). CT values and SNRs measured in the ascending aorta, thyroid gland, fat, muscle, CSF, spinal cord, bone marrow and brain. In addition, CNR values calculated for aorta, thyroid, muscle and brain. Subjective image quality evaluated using a 5-point grading scale. Results compared using paired t-tests and nonparametric-paired Wilcoxon-Wilcox-test. Results: Statistically significant increases in mean CT values noted in anatomic structures when increasing weighting factors used (all P {<=} 0.001). For example, mean CT values derived from the contrast enhanced aorta were 149.2 {+-} 12.8 Hounsfield Units (HU), 204.8 {+-} 14.4 HU, 267.5 {+-} 18.6 HU, 311.9 {+-} 22.3 HU, 347.3 {+-} 24.7 HU, when the weighting factors 0.0, 0.3, 0.6, 0.8 and 1.0 were used. The highest SNR and CNR values were found in materials when the weighting factor 0.6 used. The difference CNR between the weighting factors 0.6 and 0.3 was statistically significant in the contrast enhanced aorta and thyroid gland (P = 0.012 and P = 0.016, respectively). Visual image assessment for image quality showed the highest score for the data reconstructed using the
Zhang, G; Brown, E W; Hammack, T S
2013-11-01
Salmonella enterica ssp. enterica serovar Enteritidis is the leading reported cause of Salmonella infections. Most Salmonella Enteritidis infections are associated with whole shell eggs and egg products. This project attempted to lay the foundation for improving the Food and Drug Administration's current Bacteriological Analytical Manual method for the detection of Salmonella Enteritidis in shell eggs. Two Salmonella Enteritidis isolates were used for comparisons among different preenrichment and enrichment media and for the evaluation of egg:preenrichment broth ratios for the detection of Salmonella Enteritidis in shell eggs. The effect of surface disinfection on the detection of Salmonella Enteritidis in shell eggs was also investigated. The results indicated that tryptic soy broth (TSB) was similar to TSB plus ferrous sulfate, but significantly (α = 0.05) better than nutrient broth, Universal Preenrichment broth, and buffered peptone water when used for preenrichment of Salmonella in shell eggs. Salmonella Enteritidis populations after enrichment with Rappaport-Vassiliadis broth were 0.40 to 1.11 log cfu/mL of culture lower than those in preenrichment cultures. The reduction was statistically significant (α = 0.05). Egg:broth ratios at 1:9 and 1:2 produced significantly (α = 0.05) higher Salmonella Enteritidis populations after preenrichment with TSB with inoculum levels at 4 cfu/100 g of eggs and 40 cfu/1,000 g of eggs than the ratio at 1:1. Salmonella Enteritidis populations in TSB preenrichment cultures of shell eggs surface-disinfected with 70% alcohol:iodine/potassium iodide solution and untreated control were 9.11 ± 0.11 and 9.18 ± 0.05 log cfu/mL, respectively, for SE 13-2, and 9.20 ± 0.04 and 9.16 ± 0.05 log cfu/mL, respectively, for SE CDC_2010K_1543. Surface disinfection of eggs did not reduce the sensitivity of detection of Salmonella Enteritidis in liquid eggs. These results could improve the Food and Drug Administration's current
Coenen, Adriaan; Lubbers, Marisa M.; Dedic, Admir; Chelu, Raluca G.; Geuns, Robert-Jan M. van; Nieman, Koen [Erasmus University Medical Center, Department of Radiology, Rotterdam (Netherlands); Erasmus University Medical Center, Department of Cardiology, Rotterdam (Netherlands); Kurata, Akira; Kono, Atsushi; Dijkshoorn, Marcel L. [Erasmus University Medical Center, Department of Radiology, Rotterdam (Netherlands); Rossi, Alexia [Erasmus University Medical Center, Department of Radiology, Rotterdam (Netherlands); Barts Health NHS Trust, NIHR Cardiovascular Biomedical Research Unit at Barts, William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London and Department of Cardiology, London (United Kingdom)
2017-06-15
To investigate the additional value of transmural perfusion ratio (TPR) in dynamic CT myocardial perfusion imaging for detection of haemodynamically significant coronary artery disease compared with fractional flow reserve (FFR). Subjects with suspected or known coronary artery disease were prospectively included and underwent a CT-MPI examination. From the CT-MPI time-point data absolute myocardial blood flow (MBF) values were temporally resolved using a hybrid deconvolution model. An absolute MBF value was measured in the suspected perfusion defect. TPR was defined as the ratio between the subendocardial and subepicardial MBF. TPR and MBF results were compared with invasive FFR using a threshold of 0.80. Forty-three patients and 94 territories were analysed. The area under the receiver operator curve was larger for MBF (0.78) compared with TPR (0.65, P = 0.026). No significant differences were found in diagnostic classification between MBF and TPR with a territory-based accuracy of 77 % (67-86 %) for MBF compared with 70 % (60-81 %) for TPR. Combined MBF and TPR classification did not improve the diagnostic classification. Dynamic CT-MPI-based transmural perfusion ratio predicts haemodynamically significant coronary artery disease. However, diagnostic performance of dynamic CT-MPI-derived TPR is inferior to quantified MBF and has limited incremental value. (orig.)
Liu, Yan; Wu, Amery D.; Zumbo, Bruno D.
2010-01-01
In a recent Monte Carlo simulation study, Liu and Zumbo showed that outliers can severely inflate the estimates of Cronbach's coefficient alpha for continuous item response data--visual analogue response format. Little, however, is known about the effect of outliers for ordinal item response data--also commonly referred to as Likert, Likert-type,…
Liu, Yan; Zumbo, Bruno D.; Wu, Amery D.
2012-01-01
Previous studies have rarely examined the impact of outliers on the decisions about the number of factors to extract in an exploratory factor analysis. The few studies that have investigated this issue have arrived at contradictory conclusions regarding whether outliers inflated or deflated the number of factors extracted. By systematically…
Liu, Yan; Wu, Amery D.; Zumbo, Bruno D.
2010-01-01
In a recent Monte Carlo simulation study, Liu and Zumbo showed that outliers can severely inflate the estimates of Cronbach's coefficient alpha for continuous item response data--visual analogue response format. Little, however, is known about the effect of outliers for ordinal item response data--also commonly referred to as Likert, Likert-type,…
Waldo, Stephen W; McCabe, James M; Kennedy, Kevin F; Zigler, Corwin M; Pinto, Duane S; Yeh, Robert W
2017-05-16
Public reporting of percutaneous coronary intervention (PCI) outcomes may create disincentives for physicians to provide care for critically ill patients, particularly at institutions with worse clinical outcomes. We thus sought to evaluate the procedural management and in-hospital outcomes of patients treated for acute myocardial infarction before and after a hospital had been publicly identified as a negative outlier. Using state reports, we identified hospitals that were recognized as negative PCI outliers in 2 states (Massachusetts and New York) from 2002 to 2012. State hospitalization files were used to identify all patients with an acute myocardial infarction within these states. Procedural management and in-hospital outcomes were compared among patients treated at outlier hospitals before and after public report of outlier status. Patients at nonoutlier institutions were used to control for temporal trends. Among 86 hospitals, 31 were reported as outliers for excess mortality. Outlier facilities were larger, treating more patients with acute myocardial infarction and performing more PCIs than nonoutlier hospitals (Poutlier hospital after public report. The likelihood of PCI at outlier (relative risk [RR], 1.13; 95% confidence interval [CI], 1.12-1.15) and nonoutlier institutions (RR, 1.13; 95% CI, 1.11-1.14) increased in a similar fashion (interaction P=0.50) after public report of outlier status. The likelihood of in-hospital mortality decreased at outlier institutions (RR, 0.83; 95% CI, 0.81-0.85) after public report, and to a lesser degree at nonoutlier institutions (RR, 0.90; 95% CI, 0.87-0.92; interaction Poutlier institutions after public recognition of outlier status in comparison with prior (RR, 0.72; 9% CI, 0.66-0.79), a decline that exceeded the reduction at nonoutlier institutions (RR, 0.87; 95% CI, 0.80-0.96; interaction Poutliers. The rates of percutaneous revascularization increased similarly at outlier and nonoutlier institutions after report
Wang, Jingzhu; Wu, Moutian; Liu, Xin; Xu, Youxuan
2011-12-20
Androstenedione (4-androstene-3,17-dione) is banned by the World Anti-Doping Agency (WADA) as an endogenous steroid. The official method to confirm androstenedione abuse is isotope ratio mass spectrometry (IRMS). According to the guidance published by WADA, atypical steroid profiles are required to trigger IRMS analysis. However, in some situations, steroid profile parameters are not effective enough to suspect the misuse of endogenous steroids. The aim of this study was to investigate the atypical steroid profile induced by androstenedione administration and the detection of androstenedione doping using IRMS. Ingestion of androstenedione resulted in changes in urinary steroid profile, including increased concentrations of androsterone (An), etiocholanolone (Etio), 5α-androstane-3α,17β-diol (5α-diol), and 5β-androstane-3α,17β-diol (5β-diol) in all of the subjects. Nevertheless, the testosterone/epitestosterone (T/E) ratio was elevated only in some of the subjects. The rapid increases in the concentrations of An and Etio, as well as in T/E ratio for some subjects could provide indicators for initiating IRMS analysis only for a short time period, 2-22h post-administration. However, IRMS could provide positive determinations for up to 55h post-administration. This study demonstrated that, 5β-diol concentration or Etio/An ratio could be utilized as useful indicators for initiating IRMS analysis during 2-36h post-administration. Lastly, Etio, with slower clearance, could be more effectively used than An for the confirmation of androstenedione doping using IRMS.
Yoh, Kousei; Kuwabara, Akiko; Tanaka, Kiyoshi
2014-09-01
Vertebral fracture (VFx) is associated with various co-morbidities and increased mortality. In this paper, we have studied the detective value of height loss for VFx using two indices; historical height loss (HHL) which is the difference between the maximal height, and the current height (CH), and CH/knee height (KH) ratio. One-hundred and fifty-one postmenopausal women visiting the outpatient clinic of orthopaedics were studied for their CH, self-reported maximal height, KH, and radiographically diagnosed VFx number(s). VFx was present in 41.1 % of the subjects. Multiple regression analyses revealed that the number of prevalent fractures was a significant predictor of HHL and CH/KH ratio. Receiver operator characteristic curve analysis has shown that for HHL, the area under the curve (AUC) with their 95 %CI in the parentheses was 0.84 (0.77, 0.90), 0.88 (0.83, 0.94), and 0.91 (0.86, 0.96) for ≥ 1, ≥ 2, and ≥ 3 fractures, respectively. For the presence of ≥ 1 VFx, the cut-off value was 4.0 cm (specificity 79 %; sensitivity 79 %). Regarding the CH/KH ratio, AUC was 0.73 (0.65, 0.82), 0.85 (0.78, 0.93), and 0.91 (0.86, 0.96) for ≥ 1, ≥ 2, and ≥ 3 fractures, respectively. For the presence of ≥ 1 VFx, the cut-off value was 3.3 (specificity 47 %; sensitivity 91 %). Both cut-off values for HHL and CH/KH ratio had high negative predictivity across the wide range of theoretical VFx prevalence. Thus, HHL and CH/KH were both good detectors of VFx. Our data would be the basis to determine the cut-off value for the screening or case finding of subjects with VFx.
The effects of additive outliers on tests for unit roots and cointegration
Ph.H.B.F. Franses (Philip Hans); N. Haldrup (Niels)
1994-01-01
textabstractThe properties of the univariate Dickey-Fuller test and the Johansen test for the cointegrating rank when there exist additive outlying observations in the time series are examined. The analysis provides analytical as well as numerical evidence that additive outliers may produce spurious
The impact of a surgical assessment unit on numbers of general surgery outliers.
Jacobson, Alexandra; Poole, Garth; Hill, Andrew G; Biggar, Magdalena
2016-12-02
Patient care and efficiency outcomes are improved if acute patients admitted to non-specialty (outlier) wards are minimised.1 Assessment units may help to reduce numbers of outlier patients.2 A surgical assessment unit (SAU) was recently established at Middlemore Hospital. We aimed to determine the impact of its introduction on numbers of general surgery outliers on post-acute ward rounds. A 10-bed SAU was introduced in July 2015, coinciding with the closure of 20 beds on the general surgical wards. The numbers and locations of patients on post-acute ward rounds before and after the establishment of the SAU were compared. A student two-tailed t-test was used for statistical comparisons, with poutlier wards after the introduction of the SAU (mean 1.7 before vs 0.8 after, p=0.04). Despite a net reduction in general surgery beds and no change in the overall number of post-acute patients, the establishment of a SAU was associated with a reduction in outliers.
Simulation MLE of Parameters of the Mixture Distribution in the Presence of Two Outliers
Einolah Deiri
2014-12-01
Full Text Available In the presence paper, we deal with the estimation of parameters of the Exponentiated Gamma (EG distribution with presence of multiple(r=2 outliers. The maximum likelihood and moment of the estimators are derived. These estimators are compared empirically using Monte Carlo simulation when all the parameters are unknown. There bias and MSE are investigated with help of numerical technique.
Addawe, Rizavel C.; Addawe, Joel M.; Magadia, Joselito C.
2016-11-01
The Least Squares (LS), Least Median Squares (LMdS), Reweighted Least Squares (RLS) and Trimmed Least Squares (TLS) estimators are used to obtain parameter estimates of AR models using DE algorithm. The empirical study indicated that, the RLS estimator seems to be very reasonable because of having smaller root mean square error (RMSE), particularly for the Gaussian AR(1) process with unknown drift and additive outliers. Moreover, while LS performs well on shorter processes with less percentage and smaller magnitude of additive outliers (AOS); RLS and TLS compare favorably with respect to LS for longer AR processes. Thus, this study recommends the Reweighted Least Squares estimator as an alternative to the LS estimator in the case of autoregressive processes with additive outliers. The experiment also demonstrates that Differential Evolution (DE) algorithm obtains optimal solutions for fitting first-order autoregressive processes with outliers using the estimators. At the request of all authors of the paper, and with the agreement of the Proceedings Editor, an updated version of this article was published on 15 December 2016. The original version supplied to AIP Publishing contained errors in some of the mathematical equations and in Table 2. The errors have been corrected in the updated and re-published article.
Estimation of the scale parameter of gamma model in presence of outlier observations
M. E. Ghitany
1990-01-01
Full Text Available This paper considers the Bayesian point estimation of the scale parameter for a two-parameter gamma life-testing model in presence of several outlier observations in the data. The Bayesian analysis is carried out under the assumption of squared error loss function and fixed or random shape parameter.
Development and Impact of Implementing FY91 (Version 8) Champus DRG weights and Outlier Criteria,
1992-05-20
METAB DISORDERS 22301 1 38 9.1 288 O.R. PROCEDURES FOR OBESITY 1.7266 1 14 5.0 289 PARATHYROID PROCEDURES 0.8712 1 13 2.7 290 THYROID PROCEDURES 0.7487...CASE WEIGHTS WITH CHAMPUS OUTLIER CRITERIA (Continued) Cm.. LOW HO ORG DOwscnplon WeaIgt Tnm Poft Tom Point .OS 361 LAPAROSCOPY & INCISIONAL TUBAL
Frutiger, Jerome; Abildskov, Jens; Sin, Gürkan
2015-01-01
Flammability data is needed to assess the risk of fire and explosions. This study presents a new group contribution (GC) model to predict the upper flammability limit UFL oforganic chemicals. Furthermore, it provides a systematic method for outlier treatment inorder to improve the parameter...
Milano, Ilaria; Babbucci, Massimiliano; Cariani, Alessia; Atanassova, Miroslava; Bekkevold, Dorte; Carvalho, Gary R; Espiñeira, Montserrat; Fiorentino, Fabio; Garofalo, Germana; Geffen, Audrey J; Hansen, Jakob H; Helyar, Sarah J; Nielsen, Einar E; Ogden, Rob; Patarnello, Tomaso; Stagioni, Marco; Tinti, Fausto; Bargelloni, Luca
2014-01-01
Shallow population structure is generally reported for most marine fish and explained as a consequence of high dispersal, connectivity and large population size. Targeted gene analyses and more recently genome-wide studies have challenged such view, suggesting that adaptive divergence might occur even when neutral markers provide genetic homogeneity across populations. Here, 381 SNPs located in transcribed regions were used to assess large- and fine-scale population structure in the European hake (Merluccius merluccius), a widely distributed demersal species of high priority for the European fishery. Analysis of 850 individuals from 19 locations across the entire distribution range showed evidence for several outlier loci, with significantly higher resolving power. While 299 putatively neutral SNPs confirmed the genetic break between basins (F(CT) = 0.016) and weak differentiation within basins, outlier loci revealed a dramatic divergence between Atlantic and Mediterranean populations (F(CT) range 0.275-0.705) and fine-scale significant population structure. Outlier loci separated North Sea and Northern Portugal populations from all other Atlantic samples and revealed a strong differentiation among Western, Central and Eastern Mediterranean geographical samples. Significant correlation of allele frequencies at outlier loci with seawater surface temperature and salinity supported the hypothesis that populations might be adapted to local conditions. Such evidence highlights the importance of integrating information from neutral and adaptive evolutionary patterns towards a better assessment of genetic diversity. Accordingly, the generated outlier SNP data could be used for tackling illegal practices in hake fishing and commercialization as well as to develop explicit spatial models for defining management units and stock boundaries.
A Novel EKF-SLAM Algorithm Against Outlier Disturbance%一种新的抗外部干扰EKF-SLAM算法
吕太之
2012-01-01
There is not only sensor noise, but also outlier disturbance when a robot explores in unknown environments. The traditional EKF-SLAM algorithm does not consider the impact of outlier disturbance that may lead to positioning failure. The new algorithm detects the outlier disturbance by comparing two observations result using polar coordinates. Covariance would be inflated when disturbance is detected, so that system state of uncertainty is expanded and the state quickly converges to the true value. Simulation results show that the proposed algorithm is better than EKF-SLAM both in mobile robot SLAM accuracy and robustness.%机器人在未知环境中探索时不仅存在传感器误差,而且经常受到外部干扰的影响.传统EKF-SLAM算法没有考虑外部干扰,会导致机器人定位的失败,为此,提出一种改进的EKF-SLAM算法.采用极坐标对比前后2次观测结果来检测是否存在外部干扰.当检测到存在外部干扰时,通过膨胀系统状态的方差扩大其不确定性,使系统状态迅速收敛到真值.仿真结果表明,该算法在移动机器人SLAM的估计精度和鲁棒性两方面均优于传统的EKF-SLAM算法.
Liu, Yan; Zumbo, Bruno D.
2012-01-01
There is a lack of research on the effects of outliers on the decisions about the number of factors to retain in an exploratory factor analysis, especially for outliers arising from unintended and unknowingly included subpopulations. The purpose of the present research was to investigate how outliers from an unintended and unknowingly included…
Mullaney, J R; Huynh, M; Goulding, A D; Frayer, D
2009-01-01
(Abridged) We investigate the far-infrared properties of X-ray sources detected in the Chandra Deep Field-South (CDF-S) survey using the ultra-deep 70um and 24um Spitzer observations taken in this field. We rely on stacking analyses of the 70um data to characterise the average 70um properties of the X-ray sources. Using Spitzer-IRS data of the Swift-BAT sample of z~0 active galactic nuclei (hereafter, AGNs), we show that the 70um/24um flux ratio can distinguish between AGN-dominated and starburst-dominated systems out to z~1.5. From stacking analysis we find that both high redshift and z~0 AGNs follow the same tendency toward warmer 70um/24um colours with increasing X-ray luminosity (L_X). We also show that the 70um flux can be used to determine the infrared (8-1000um) luminosities of high redshift AGNs. We use this information to show that L_X=10^{42-43} erg/s AGNs at high redshifts (z=1-2) have infrared to X-ray luminosity ratios (hereafter, L_IR/L_X) that are, on average, 4.7_{-2.0}^{+10.2} and 12.7^{+7.1}...
Boissière, Louis; Bourghli, Anouar; Vital, Jean-Marc; Gille, Olivier; Obeid, Ibrahim
2013-06-01
Sagittal malalignment is frequently observed in adult scoliosis. C7 plumb line, lumbar lordosis and pelvic tilt are the main factors to evaluate sagittal balance and the need of a vertebral osteotomy to correct it. We described a ratio: the lumbar lordosis index (ratio lumbar lordosis/pelvic incidence) (LLI) and analyzed its relationships with spinal malalignment and vertebral osteotomies. 53 consecutive patients with a surgical adult scoliosis had preoperative and postoperative full spine EOS radiographies to measure spino-pelvic parameters and LLI. The lack of lordosis was calculated after prediction of theoretical lumbar lordosis. Correlation analysis between the different parameters was performed. All parameters were correlated with spinal malalignment but LLI is the most correlated parameter (r = -0.978). It is also the best parameter in this study to predict the need of a spinal osteotomy (r = 1 if LLI <0.5). LLI is a statistically validated parameter for sagittal malalignment analysis. It can be used as a mathematical tool to detect spinal malalignment in adult scoliosis and guides the surgeon decision of realizing a vertebral osteotomy for adult scoliosis sagittal correction. It can be used as well for the interpretation of clinical series in adult scoliosis.
One Class Classification for Anomaly Detection: Support Vector Data Description Revisited
Pauwels, E.J.; Ambekar, O.; Perner, P.
2011-01-01
The Support Vector Data Description (SVDD) has been introduced to address the problem of anomaly (or outlier) detection. It essentially fits the smallest possible sphere around the given data points, allowing some points to be excluded as outliers. Whether or not a point is excluded, is governed by
Anomalous human behavior detection: An Adaptive approach
Leeuwen, C. van; Halma, A.; Schutte, K.
2013-01-01
Detection of anomalies (outliers or abnormal instances) is an important element in a range of applications such as fault, fraud, suspicious behavior detection and knowledge discovery. In this article we propose a new method for anomaly detection and performed tested its ability to detect anomalous
Anomalous human behavior detection: An Adaptive approach
Leeuwen, C. van; Halma, A.; Schutte, K.
2013-01-01
Detection of anomalies (outliers or abnormal instances) is an important element in a range of applications such as fault, fraud, suspicious behavior detection and knowledge discovery. In this article we propose a new method for anomaly detection and performed tested its ability to detect anomalous b
Weller, Wendy E; Kabra, Garima; Cozzens, Kimberly S; Hannan, Edward L
2010-04-01
The major objective of this study was to determine whether individual hospital performance would be assessed differently if clinical data were added to an administrative dataset. Patients in the 2004 New York State AMI Registry (AMI registry) who could be matched to patients in the New York State Hospital Discharge Database (SPARCS model) comprised the study sample (n=3153). Stepwise logistic regression models were developed (SPARCS model, SPARCS/AMI registry model). Risk-adjusted mortality rates (RAMR) for each hospital in the matched dataset were determined and compared for the SPARCS and the SPARCS/AMI registry model. The RAMR for each hospital was determined by dividing its observed mortality rate by its expected mortality rate and multiplying by the overall mortality rate for the state of New York. Hospitals were considered outliers if they had a RAMR significantly higher or lower than the overall statewide mortality rate. Hierarchical Models were also used to identify hospital outliers. The SPARCS logistic model identified two high hospital outliers; the SPARCS/AMI registry model identified one of those outliers and no others. When Hierarchical Models were used, the SPARCS model also identified two high outliers (one in common with the logistic model) and the SPARCS/AMI registry model identified one high outlier (the same as identified in the logistic model). It is worth exploring the impact of the addition of a small number of clinical data elements to administrative datasets on hospital outlier status. Copyright 2008 Elsevier Ireland Ltd. All rights reserved.
de Jong, Felice A; Beecher, Chris
2012-09-01
Metabolomics or biochemical profiling is a fast emerging science; however, there are still many associated bottlenecks to overcome before measurements will be considered robust. Advances in MS resolution and sensitivity, ultra pressure LC-MS, ESI, and isotopic approaches such as flux analysis and stable-isotope dilution, have made it easier to quantitate biochemicals. The digitization of mass spectrometers has simplified informatic aspects. However, issues of analytical variability, ion suppression and metabolite identification still plague metabolomics investigators. These hurdles need to be overcome for accurate metabolite quantitation not only for in vitro systems, but for complex matrices such as biofluids and tissues, before it is possible to routinely identify biomarkers that are associated with the early prediction and diagnosis of diseases. In this report, we describe a novel isotopic-labeling method that uses the creation of distinct biochemical signatures to eliminate current bottlenecks and enable accurate metabolic profiling.
Ghani, Muhammad U.; Wong, Molly D.; Wu, Di; Zheng, Bin; Fajardo, Laurie L.; Yan, Aimin; Fuh, Janis; Wu, Xizeng; Liu, Hong
2017-05-01
The objective of this study was to demonstrate the potential benefits of using high energy x-rays in comparison with the conventional mammography imaging systems for phase sensitive imaging of breast tissues with varying glandular-adipose ratios. This study employed two modular phantoms simulating the glandular (G) and adipose (A) breast tissue composition in 50 G-50 A and 70 G-30 A percentage densities. Each phantom had a thickness of 5 cm with a contrast detail test pattern embedded in the middle. For both phantoms, the phase contrast images were acquired using a micro-focus x-ray source operated at 120 kVp and 4.5 mAs, with a magnification factor (M) of 2.5 and a detector with a 50 µm pixel pitch. The mean glandular dose delivered to the 50 G-50 A and 70 G-30 A phantom sets were 1.33 and 1.3 mGy, respectively. A phase retrieval algorithm based on the phase attenuation duality that required only a single phase contrast image was applied. Conventional low energy mammography images were acquired using GE Senographe DS and Hologic Selenia systems utilizing their automatic exposure control (AEC) settings. In addition, the automatic contrast mode (CNT) was also used for the acquisition with the GE system. The AEC mode applied higher dose settings for the 70 G-30 A phantom set. As compared to the phase contrast images, the dose levels for the AEC mode acquired images were similar while the dose levels for the CNT mode were almost double. The observer study, contrast-to-noise ratio and figure of merit comparisons indicated a large improvement with the phase retrieved images in comparison to the AEC mode images acquired with the clinical systems for both density levels. As the glandular composition increased, the detectability of smaller discs decreased with the clinical systems, particularly with the GE system, even at higher dose settings. As compared to the CNT mode (double dose) images, the observer study also indicated that the phase retrieved images provided
An analysis of oil production by OPEC countries: Persistence, breaks, and outliers
Pestana Barros, Carlos, E-mail: cbarros@iseg.utl.p [Instituto Superior de Economia e Gestao and Research Unit on Complexity and Economics, Technical University of Lisbon, Lisbon (Portugal); Gil-Alana, Luis A., E-mail: alana@unav.e [University of Navarra, Pamplona (Spain); Payne, James E., E-mail: jepayne@ilstu.ed [Department of Economics, Illinois State University, Normal, IL 61790-4200 (United States)
2011-01-15
This study examines the time series behaviour of oil production for OPEC member countries within a fractional integration modelling framework recognizing the potential for structural breaks and outliers. The analysis is undertaken using monthly data from January 1973 to October 2008 for 13 OPEC member countries. The results indicate there is mean reverting persistence in oil production with breaks identified in 10 out of the 13 countries examined. Thus, shocks affecting the structure of OPEC oil production will have persistent effects in the long run for all countries, and in some cases the effects are expected to be permanent. - Research Highlights: {yields}Mean reverting persistence in oil production with breaks identified in 10 out of the 13 countries examined. {yields} Standard analysis based on cointegration techniques and involving oil production should be examined in the more general context of fractional cointegraton. {yields} Analysis of outliers did not alter the main conclusions of the study.
A COMPARISON BETWEEN CLASSICAL AND ROBUST METHOD IN A FACTORIAL DESIGN IN THE PRESENCE OF OUTLIER
Anwar Fitrianto
2013-01-01
Full Text Available Analysis of Variance (ANOVA techniques which is based on classical Least Squares (LS method requires several assumptions, such as normality, constant variances and independency. Those assumptions can be violated due to several causes, such as the presence of an outlying observation. There are many evident in literatures that the LS estimate is easily affected by outliers. To remedy this problem, a robust procedure that provides estimation, inference and testing that are not influenced by outlying observations is put forward. A well-known approach to handle dataset with outliers is the M-estimation. In this study, both classical and robust procedures are employed to data of a factorial experiment. The results signify that the classical method of least squares estimates instead of robust methods lead to misleading conclusion of the analysis in factorial designs.
Wararit PANICHKITKOSOLKUL
2012-09-01
Full Text Available Guttman and Tiao [1], and Chang [2] showed that the effect of outliers may cause serious bias in estimating autocorrelations, partial correlations, and autoregressive moving average parameters (cited in Chang et al. [3]. This paper presents a modified weighted symmetric estimator for a Gaussian first-order autoregressive AR(1 model with additive outliers. We apply the recursive median adjustment based on an exponentially weighted moving average (EWMA to the weighted symmetric estimator of Park and Fuller [4]. We consider the following estimators: the weighted symmetric estimator (, the recursive mean adjusted weighted symmetric estimator ( proposed by Niwitpong [5], the recursive median adjusted weighted symmetric estimator ( proposed by Panichkitkosolkul [6], and the weighted symmetric estimator using adjusted recursive median based on EWMA (. Using Monte Carlo simulations, we compare the mean square error (MSE of estimators. Simulation results have shown that the proposed estimator, , provides a MSE lower than those of , and for almost all situations.
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Mining Distance Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
Bay, Stephen D.; Schwabacher, Mark
2003-01-01
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
Hollander, J; Galindo, J; Butlin, R K
2015-02-01
A fundamental issue in speciation research is to evaluate phenotypic variation and the genomics driving the evolution of reproductive isolation between sister taxa. Above all, hybrid zones are excellent study systems for researchers to examine the association of genetic differentiation, phenotypic variation and the strength of selection. We investigated two contact zones in the marine gastropod Littorina saxatilis and utilized landmark-based geometric morphometric analysis together with amplified fragment length polymorphism (AFLP) markers to assess phenotypic and genomic divergence between ecotypes under divergent selection. From genetic markers, we calculated the cline width, linkage disequilibrium and the average effective selection on a locus. Additionally, we conducted an association analysis linking the outlier loci and phenotypic variation between ecotypes and show that a proportion of outlier loci are associated with key adaptive phenotypic traits.
On minorities and outliers: The case for making Big Data small
Brooke Foucault Welles
2014-07-01
Full Text Available In this essay, I make the case for choosing to examine small subsets of Big Data datasets—making big data small. Big Data allows us to produce summaries of human behavior at a scale never before possible. But in the push to produce these summaries, we risk losing sight of a secondary but equally important advantage of Big Data—the plentiful representation of minorities. Women, minorities and statistical outliers have historically been omitted from the scientific record, with problematic consequences. Big Data affords the opportunity to remedy those omissions. However, to do so, Big Data researchers must choose to examine very small subsets of otherwise large datasets. I encourage researchers to embrace an ethical, empirical and epistemological stance on Big Data that includes minorities and outliers as reference categories, rather than the exceptions to statistical norms.
Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis.
Westerholt, Rene; Steiger, Enrico; Resch, Bernd; Zipf, Alexander
2016-01-01
Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.
Meghani, Salimah H; Byun, Eeeseung; Chittams, Jesse
Addressing the needs of understudied and vulnerable populations first and foremost necessitate correct application and interpretation of research that is designed to understand sources of disparities in healthcare or health systems outcomes. In this brief research report, we discuss some important concerns and considerations in handling "outliers" when conducting disparities-related research. To illustrate these concerns, we use data from our recently completed study that investigated sources of disparities in cancer pain outcomes between African Americans and Whites with cancer-related pain. A choice-based conjoint (CBC) study was conducted to compare preferences for analgesic treatment for cancer pain between African Americans and Whites. Compared to Whites, African Americans were both disproportionately more likely to make pain treatment decisions based on analgesic side-effects and were more likely to have extreme values for the CBC-elicited utilities for analgesic "side-effects." Our findings raise conceptual and methodological consideration in handling extreme values when conducting disparities-related research. Extreme values or outliers can be caused by random variations, measurement errors, or true heterogeneity in a clinical phenomenon. The researchers should consider: 1) whether systematic patterns of extreme values exist and 2) if systematic patterns of extreme values are consistent with a clinical pattern (e.g., poor management of cancer pain and side-effects in racial/ethnic subgroups as documented by many previous studies). As may be evident, these considerations are particularly important in health disparities research where extreme values may actually represent a clinical reality, such as unequal treatment or disproportionate burden of symptoms in certain subgroups. Approaches to handling outliers, such as non-parametric analyses, log transforming clinically important extreme values, or removing outliers may represent a missed opportunity in
Sharifi, Mona; Marshall, Gareth; Goldman, Roberta; Rifas-Shiman, Sheryl L; Horan, Christine M; Koziol, Renata; Marshall, Richard; Sequist, Thomas D; Taveras, Elsie M
2014-01-01
New approaches for obesity prevention and management can be gleaned from positive outliers-that is, individuals who have succeeded in changing health behaviors and reducing their body mass index (BMI) in the context of adverse built and social environments. We explored perspectives and strategies of parents of positive outlier children living in high-risk neighborhoods. We collected up to 5 years of height/weight data from the electronic health records of 22,443 Massachusetts children, ages 6 to 12 years, seen for well-child care. We identified children with any history of BMI in the 95th percentile or higher (n = 4007) and generated a BMI z-score slope for each child using a linear mixed effects model. We recruited parents for focus groups from the subsample of children with negative slopes who also lived in zip codes where >15% of children were obese. We analyzed focus group transcripts using an immersion/crystallization approach. We reached thematic saturation after 5 focus groups with 41 parents. Commonly cited outcomes that mattered most to parents and motivated change were child inactivity, above-average clothing sizes, exercise intolerance, and negative peer interactions; few reported BMI as a motivator. Convergent strategies among positive outlier families were family-level changes, parent modeling, consistency, household rules/limits, and creativity in overcoming resistance. Parents voiced preferences for obesity interventions that include tailored education and support that extend outside clinical settings and are delivered by both health care professionals and successful peers. Successful strategies learned from positive outlier families can be generalized and tested to accelerate progress in reducing childhood obesity. Copyright © 2014 Academic Pediatric Association. Published by Elsevier Inc. All rights reserved.
On the identification of Dragon Kings among extreme-valued outliers
M. Riva
2013-07-01
Full Text Available Extreme values of earth, environmental, ecological, physical, biological, financial and other variables often form outliers to heavy tails of empirical frequency distributions. Quite commonly such tails are approximated by stretched exponential, log-normal or power functions. Recently there has been an interest in distinguishing between extreme-valued outliers that belong to the parent population of most data in a sample and those that do not. The first type, called Gray Swans by Nassim Nicholas Taleb (often confused in the literature with Taleb's totally unknowable Black Swans, is drawn from a known distribution of the tails which can thus be extrapolated beyond the range of sampled values. However, the magnitudes and/or space–time locations of unsampled Gray Swans cannot be foretold. The second type of extreme-valued outliers, termed Dragon Kings by Didier Sornette, may in his view be sometimes predicted based on how other data in the sample behave. This intriguing prospect has recently motivated some authors to propose statistical tests capable of identifying Dragon Kings in a given random sample. Here we apply three such tests to log air permeability data measured on the faces of a Berea sandstone block and to synthetic data generated in a manner statistically consistent with these measurements. We interpret the measurements to be, and generate synthetic data that are, samples from α-stable sub-Gaussian random fields subordinated to truncated fractional Gaussian noise (tfGn. All these data have frequency distributions characterized by power-law tails with extreme-valued outliers about the tail edges.
Farahi, Morteza; Rojas, Monica; Mañanas, Miguel Angel; Farina, Dario
2016-01-01
Knowledge of the location of muscle Innervation Zones (IZs) is important in many applications, e.g. for minimizing the quantity of injected botulinum toxin for the treatment of spasticity or for deciding on the type of episiotomy during child delivery. Surface EMG (sEMG) can be noninvasively recorded to assess physiological and morphological characteristics of contracting muscles. However, it is not often possible to record signals of high quality. Moreover, muscles could have multiple IZs, which should all be identified. We designed a fully-automatic algorithm based on the enhanced image Graph-Cut segmentation and morphological image processing methods to identify up to five IZs in 60-ms intervals of very-low to moderate quality sEMG signal detected with multi-channel electrodes (20 bipolar channels with Inter Electrode Distance (IED) of 5 mm). An anisotropic multilayered cylinder model was used to simulate 750 sEMG signals with signal-to-noise ratio ranging from -5 to 15 dB (using Gaussian noise) and in each 60-ms signal frame, 1 to 5 IZs were included. The micro- and macro- averaged performance indices were then reported for the proposed IZ detection algorithm. In the micro-averaging procedure, the number of True Positives, False Positives and False Negatives in each frame were summed up to generate cumulative measures. In the macro-averaging, on the other hand, precision and recall were calculated for each frame and their averages are used to determine F1-score. Overall, the micro (macro)-averaged sensitivity, precision and F1-score of the algorithm for IZ channel identification were 82.7% (87.5%), 92.9% (94.0%) and 87.5% (90.6%), respectively. For the correctly identified IZ locations, the average bias error was of 0.02±0.10 IED ratio. Also, the average absolute conduction velocity estimation error was 0.41±0.40 m/s for such frames. The sensitivity analysis including increasing IED and reducing interpolation coefficient for time samples was performed
The obligation of physicians to medical outliers: a Kantian and Hegelian synthesis
Marco Alan P
2004-06-01
Full Text Available Abstract Background Patients who present to medical practices without health insurance or with serious co-morbidities can become fiscal disasters to those who care for them. Their consumption of scarce resources has caused consternation among providers and institutions, especially as it concerns the amount and type of care they should receive. In fact, some providers may try to avoid caring for them altogether, or at least try to limit their institutional or practice exposure to them. Discussion We present a philosophical discourse, with emphasis on the writings of Immanuel Kant and G.F.W. Hegel, as to why physicians have the moral imperative to give such "outliers" considerate and thoughtful care. Outliers are defined and the ideals of morality, responsibility, good will, duty, and principle are applied to the care of patients whose financial means are meager and to those whose care is physiologically futile. Actions of moral worth, unconditional good will, and doing what is right are examined. Summary Outliers are a legitimate economic concern to individual practitioners and institutions, however this should not lead to an evasion of care. These patients should be identified early in their course of care, but such identification should be preceded by a well-planned recognition of this burden and appropriate staffing and funding should be secured. A thoughtful team approach by medical practices and their institutions, involving both clinicians and non-clinicians, should be pursued.
Errors, Omissions, and Outliers in Hourly Vital Signs Measurements in Intensive Care.
Maslove, David M; Dubin, Joel A; Shrivats, Arvind; Lee, Joon
2016-11-01
To empirically examine the prevalence of errors, omissions, and outliers in hourly vital signs recorded in the ICU. Retrospective analysis of vital signs measurements from a large-scale clinical data warehouse (Multiparameter Intelligent Monitoring in Intensive Care III). Data were collected from the medical, surgical, cardiac, and cardiac surgery ICUs of a tertiary medical center in the United States. We analyzed data from approximately 48,000 ICU stays including approximately 28 million vital signs measurements. None. We used the vital sign day as our unit of measurement, defined as all the recordings from a single patient for a specific vital sign over a single 24-hour period. Approximately 30-40% of vital sign days included at least one gap of greater than 70 minutes between measurements. Between 3% and 10% of blood pressure measurements included logical inconsistencies. With the exception of pulse oximetry vital sign days, the readings in most vital sign days were normally distributed. We found that 15-38% of vital sign days contained at least one statistical outlier, of which 6-19% occurred simultaneously with outliers in other vital signs. We found a significant number of missing, erroneous, and outlying vital signs measurements in a large ICU database. Our results provide empirical evidence of the nonrepresentativeness of hourly vital signs. Additional studies should focus on determining optimal sampling frequencies for recording vital signs in the ICU.
Valley sign in Becker muscular dystrophy and outliers of Duchenne and Becker muscular dystrophy
Pradhan Sunil
2004-04-01
Full Text Available Valley sign has been described in patients with Duchenne muscular dystrophy (DMD. As there are genetic and clinical similarities between DMD and Becker muscular dystrophy (BMD, this clinical sign is evaluated in this study in BMD and DMD/BMD outliers. To evaluate the sign, 28 patients with Becker muscular dystrophy (BMD, 8 DMD/BMD outliers and 44 age-matched male controls with other neuromuscular diseases were studied. The sign was examined after asking patients to abduct their arms to about 90ºwith hands directed upwards; the muscle bulk over the back of the shoulders was observed. The sign was considered positive if the infraspinatus and deltoid muscles were enlarged and between these two muscles, the muscles forming the posterior axillary fold were wasted as if there were a valley between the two mounts. Twenty-five BMD patients and 7 DMD/BMD outliers had positive valley sign. However, it was less remarkable in comparison to DMD. It was absent in all the 44 controls. It was concluded that the presence of valley sign may help in differentiating BMD from other progressive neuromuscular disorders of that age group.
Modeling of activation data in the BrainMapTM database: Detection of outliers
Nielsen, Finn Årup; Hansen, Lars Kai
2002-01-01
We describe a system for meta-analytical modeling of activation foci from functional neuroimaging studies. Our main vehicle is a set of density models in Talairach space capturing the distribution of activation foci in sets of experiments labeled by lobar anatomy. One important use of such density...
Modeling of activation data in the BrainMapTM database: Detection of outliers
Nielsen, Finn Årup; Hansen, Lars Kai
2002-01-01
We describe a system for meta-analytical modeling of activation foci from functional neuroimaging studies. Our main vehicle is a set of density models in Talairach space capturing the distribution of activation foci in sets of experiments labeled by lobar anatomy. One important use of such density...
Observing the Unobservable - Distributed Online Outlier Detection in Wireless Sensor Networks
Zhang, Y.
2010-01-01
The generation of wireless sensor networks (WSNs) makes human beings observe and reason about the physical environment better, easier, and faster. The wireless sensor nodes equipped with sensing, processing, wireless communication and actuation capabilities can be densely deployed in a wide
Outlier detection in healthcare fraud: A case study in the Medicaid dental domain
van Capelleveen, Guido; van Capelleveen, Guido Cornelis; Poel, Mannes; Mueller, Roland; Thornton, Dallas; van Hillegersberg, Jos
Health care insurance fraud is a pressing problem, causing substantial and increasing costs in medical insurance programs. Due to large amounts of claims submitted, estimated at 5 billion per day, review of individual claims or providers is a difficult task. This encourages the employment of
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating
He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei
2013-01-01
Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…
Outlier detection in healthcare fraud: A case study in the Medicaid dental domain
Capelleveen, van Guido; Poel, Mannes; Mueller, Roland M.; Thornton, Dallas; Hillegersberg, van Jos
2016-01-01
Health care insurance fraud is a pressing problem, causing substantial and increasing costs in medical insurance programs. Due to large amounts of claims submitted, estimated at 5 billion per day, review of individual claims or providers is a difficult task. This encourages the employment of automat
Observing the Unobservable - Distributed Online Outlier Detection in Wireless Sensor Networks
Zhang, Y.
2010-01-01
The generation of wireless sensor networks (WSNs) makes human beings observe and reason about the physical environment better, easier, and faster. The wireless sensor nodes equipped with sensing, processing, wireless communication and actuation capabilities can be densely deployed in a wide geograph
Tarun Mehra; Christian Thomas Benedikt Müller; Jörk Volbracht; Burkhardt Seifert; Rudolf Moos
2015-01-01
.... However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG...
Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center
Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf
2015-01-01
.... However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG...
Benyamin, Beben; Perola, Markus; Cornes, Belinda K; Madden, Pamela Af; Palotie, Aarno; Nyholt, Dale R; Montgomery, Grant W; Peltonen, Leena; Martin, Nicholas G; Visscher, Peter M
2008-04-01
Most information in linkage analysis for quantitative traits comes from pairs of relatives that are phenotypically most discordant or concordant. Confounding this, within-family outliers from non-genetic causes may create false positives and negatives. We investigated the influence of within-family outliers empirically, using one of the largest genome-wide linkage scans for height. The subjects were drawn from Australian twin cohorts consisting of 8447 individuals in 2861 families, providing a total of 5815 possible pairs of siblings in sibships. A variance component linkage analysis was performed, either including or excluding the within-family outliers. Using the entire dataset, the largest LOD scores were on chromosome 15q (LOD 2.3) and 11q (1.5). Excluding within-family outliers increased the LOD score for most regions, but the LOD score on chromosome 15 decreased from 2.3 to 1.2, suggesting that the outliers may create false negatives and false positives, although rare alleles of large effect may also be an explanation. Several regions suggestive of linkage to height were found after removing the outliers, including 1q23.1 (2.0), 3q22.1 (1.9) and 5q32 (2.3). We conclude that the investigation of the effect of within-family outliers, which is usually neglected, should be a standard quality control measure in linkage analysis for complex traits and may reduce the noise for the search of common variants of modest effect size as well as help identify rare variants of large effect and clinical significance. We suggest that the effect of within-family outliers deserves further investigation via theoretical and simulation studies.
Delate, Thomas; Meyer, Roxanne; Jenkins, Daniel
2017-08-01
Although most biologic medications for patients with rheumatoid arthritis (RA) have recommended fixed dosing, actual biologic dosing may vary among real-world patients, since some patients can receive higher (high-dose outliers) or lower (low-dose outliers) doses than what is recommended in medication package inserts. To describe the patterns of care for biologic-dosing outliers and nonoutliers in biologic-naive patients with RA. This was a retrospective, longitudinal cohort study of patients with RA who were not pregnant and were aged ≥ 18 and Outlier status was defined as a patient having received at least 1 dose 110% of the approved dose in the package insert at any time during the study period. Baseline patient profiles, treatment exposures, and outcomes were collected during the 180 days before and up to 2 years after biologic initiation and compared across index biologic outlier groups. Patients were followed for at least 1 year, with a subanalysis of those patients who remained as members for 2 years. This study included 434 RA patients with 1 year of follow-up and 372 RA patients with 2 years of follow-up. Overall, the vast majority of patients were female (≈75%) and had similar baseline characteristics. Approximately 10% of patients were outliers in both follow-up cohorts. ETN patients were least likely to become outliers, and ADA patients were most likely to become outliers. Of all outliers during the 1-year follow-up, patients were more likely to be a high-dose outlier (55%) than a low-dose outlier (45%). Median 1- and 2-year adjusted total biologic costs (based on wholesale acquisition costs) were higher for ADA and ETA nonoutliers than for IFX nonoutliers. Biologic persistence was highest for IFX patients. Charlson Comorbidity Index score, ETN and IFX index biologic, and treatment with a nonbiologic disease-modifying antirheumatic drug (DMARD) before biologic initiation were associated with becoming high- or low-dose outliers (c-statistic = 0
Baloloy, A. B.; Blanco, A. C.; Gana, B. S.; Sta. Ana, R. C.; Olalia, L. C.
2016-09-01
The Philippines has a booming sugarcane industry contributing about PHP 70 billion annually to the local economy through raw sugar, molasses and bioethanol production (SRA, 2012). Sugarcane planters adapt different farm practices in cultivating sugarcane, one of which is cane burning to eliminate unwanted plant material and facilitate easier harvest. Information on burned sugarcane extent is significant in yield estimation models to calculate total sugar lost during harvest. Pre-harvest burning can lessen sucrose by 2.7% - 5% of the potential yield (Gomez, et al 2006; Hiranyavasit, 2016). This study employs a method for detecting burn sugarcane area and determining burn severity through Differenced Normalized Burn Ratio (dNBR) using Landsat 8 Images acquired during the late milling season in Tarlac, Philippines. Total burned area was computed per burn severity based on pre-fire and post-fire images. Results show that 75.38% of the total sugarcane fields in Tarlac were burned with post-fire regrowth; 16.61% were recently burned; and only 8.01% were unburned. The monthly dNBR for February to March generated the largest area with low severity burn (1,436 ha) and high severity burn (31.14 ha) due to pre-harvest burning. Post-fire regrowth is highest in April to May when previously burned areas were already replanted with sugarcane. The maximum dNBR of the entire late milling season (February to May) recorded larger extent of areas with high and low post-fire regrowth compared to areas with low, moderate and high burn severity. Normalized Difference Vegetation Index (NDVI) was used to analyse vegetation dynamics between the burn severity classes. Significant positive correlation, rho = 0.99, was observed between dNBR and dNDVI at 5% level (p = 0.004). An accuracy of 89.03% was calculated for the Landsat-derived NBR validated using actual mill data for crop year 2015-2016.