WorldWideScience

Sample records for outlier detection applied

  1. Outlier detection using autoencoders

    CERN Document Server

    Lyudchik, Olga

    2016-01-01

    Outlier detection is a crucial part of any data analysis applications. The goal of outlier detection is to separate a core of regular observations from some polluting ones, called “outliers”. We propose an outlier detection method using deep autoencoder. In our research the invented method was applied to detect outlier points in the MNIST dataset of handwriting digits. The experimental results show that the proposed method has a potential to be used for anomaly detection.

  2. Selection of tests for outlier detection

    NARCIS (Netherlands)

    Bossers, H.C.M.; Hurink, Johann L.; Smit, Gerardus Johannes Maria

    Integrated circuits are tested thoroughly in order to meet the high demands on quality. As an additional step, outlier detection is used to detect potential unreliable chips such that quality can be improved further. However, it is often unclear to which tests outlier detection should be applied and

  3. A Modified Approach for Detection of Outliers

    Directory of Open Access Journals (Sweden)

    Iftikhar Hussain Adil

    2015-04-01

    Full Text Available Tukey’s boxplot is very popular tool for detection of outliers. It reveals the location, spread and skewness of the data. It works nicely for detection of outliers when the data are symmetric. When the data are skewed it covers boundary away from the whisker on the compressed side while declares erroneous outliers on the extended side of the distribution. Hubert and Vandervieren (2008 made adjustment in Tukey’s technique to overcome this problem. However another problem arises that is the adjusted boxplot constructs the interval of critical values which even exceeds from the extremes of the data. In this situation adjusted boxplot is unable to detect outliers. This paper gives solution of this problem and proposed approach detects outliers properly. The validity of the technique has been checked by constructing fences around the true 95% values of different distributions. Simulation technique has been applied by drawing different sample size from chi square, beta and lognormal distributions. Fences constructed by the modified technique are close to the true 95% than adjusted boxplot which proves its superiority on the existing technique.

  4. Stratification-Based Outlier Detection over the Deep Web

    OpenAIRE

    Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

    2016-01-01

    For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribu...

  5. Stratification-Based Outlier Detection over the Deep Web.

    Science.gov (United States)

    Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

    2016-01-01

    For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.

  6. INCREMENTAL PRINCIPAL COMPONENT ANALYSIS BASED OUTLIER DETECTION METHODS FOR SPATIOTEMPORAL DATA STREAMS

    Directory of Open Access Journals (Sweden)

    A. Bhushan

    2015-07-01

    Full Text Available In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.

  7. Detection of outliers in gas centrifuge experimental data

    International Nuclear Information System (INIS)

    Andrade, Monica C.V.; Nascimento, Claudio A.O.

    2005-01-01

    Isotope separation in a gas centrifuge is a very complex process. Development and optimization of a gas centrifuge requires experimentation. These data contain experimental errors, and like other experimental data, there may be some gross errors, also known as outliers. The detection of outliers in gas centrifuge experimental data may be quite complicated because there is not enough repetition for precise statistical determination and the physical equations may be applied only on the control of the mass flows. Moreover, the concentrations are poorly predicted by phenomenological models. This paper presents the application of a three-layer feed-forward neural network to the detection of outliers in a very extensive experiment for the analysis of the separation performance of a gas centrifuge. (author)

  8. The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs

    Science.gov (United States)

    Peterson, Tim J.; Western, Andrew W.; Cheng, Xiang

    2018-03-01

    Suspicious groundwater-level observations are common and can arise for many reasons ranging from an unforeseen biophysical process to bore failure and data management errors. Unforeseen observations may provide valuable insights that challenge existing expectations and can be deemed outliers, while monitoring and data handling failures can be deemed errors, and, if ignored, may compromise trend analysis and groundwater model calibration. Ideally, outliers and errors should be identified but to date this has been a subjective process that is not reproducible and is inefficient. This paper presents an approach to objectively and efficiently identify multiple types of errors and outliers. The approach requires only the observed groundwater hydrograph, requires no particular consideration of the hydrogeology, the drivers (e.g. pumping) or the monitoring frequency, and is freely available in the HydroSight toolbox. Herein, the algorithms and time-series model are detailed and applied to four observation bores with varying dynamics. The detection of outliers was most reliable when the observation data were acquired quarterly or more frequently. Outlier detection where the groundwater-level variance is nonstationary or the absolute trend increases rapidly was more challenging, with the former likely to result in an under-estimation of the number of outliers and the latter an overestimation in the number of outliers.

  9. Adjusted functional boxplots for spatio-temporal data visualization and outlier detection

    KAUST Repository

    Sun, Ying

    2011-10-24

    This article proposes a simulation-based method to adjust functional boxplots for correlations when visualizing functional and spatio-temporal data, as well as detecting outliers. We start by investigating the relationship between the spatio-temporal dependence and the 1.5 times the 50% central region empirical outlier detection rule. Then, we propose to simulate observations without outliers on the basis of a robust estimator of the covariance function of the data. We select the constant factor in the functional boxplot to control the probability of correctly detecting no outliers. Finally, we apply the selected factor to the functional boxplot of the original data. As applications, the factor selection procedure and the adjusted functional boxplots are demonstrated on sea surface temperatures, spatio-temporal precipitation and general circulation model (GCM) data. The outlier detection performance is also compared before and after the factor adjustment. © 2011 John Wiley & Sons, Ltd.

  10. Detection of outliers in a gas centrifuge experimental data

    Directory of Open Access Journals (Sweden)

    M. C. V. Andrade

    2005-09-01

    Full Text Available Isotope separation with a gas centrifuge is a very complex process. Development and optimization of a gas centrifuge requires experimentation. These data contain experimental errors, and like other experimental data, there may be some gross errors, also known as outliers. The detection of outliers in gas centrifuge experimental data is quite complicated because there is not enough repetition for precise statistical determination and the physical equations may be applied only to control of the mass flow. Moreover, the concentrations are poorly predicted by phenomenological models. This paper presents the application of a three-layer feed-forward neural network to the detection of outliers in analysis of performed on a very extensive experiment.

  11. Statistical Outlier Detection for Jury Based Grading Systems

    DEFF Research Database (Denmark)

    Thompson, Mary Kathryn; Clemmensen, Line Katrine Harder; Rosas, Harvey

    2013-01-01

    This paper presents an algorithm that was developed to identify statistical outliers from the scores of grading jury members in a large project-based first year design course. The background and requirements for the outlier detection system are presented. The outlier detection algorithm...... and the follow-up procedures for score validation and appeals are described in detail. Finally, the impact of various elements of the outlier detection algorithm, their interactions, and the sensitivity of their numerical values are investigated. It is shown that the difference in the mean score produced...... by a grading jury before and after a suspected outlier is removed from the mean is the single most effective criterion for identifying potential outliers but that all of the criteria included in the algorithm have an effect on the outlier detection process....

  12. An improved data clustering algorithm for outlier detection

    Directory of Open Access Journals (Sweden)

    Anant Agarwal

    2016-12-01

    Full Text Available Data mining is the extraction of hidden predictive information from large databases. This is a technology with potential to study and analyze useful information present in data. Data objects which do not usually fit into the general behavior of the data are termed as outliers. Outlier Detection in databases has numerous applications such as fraud detection, customized marketing, and the search for terrorism. By definition, outliers are rare occurrences and hence represent a small portion of the data. However, the use of Outlier Detection for various purposes is not an easy task. This research proposes a modified PAM for detecting outliers. The proposed technique has been implemented in JAVA. The results produced by the proposed technique are found better than existing technique in terms of outliers detected and time complexity.

  13. Spatial Outlier Detection of CO2 Monitoring Data Based on Spatial Local Outlier Factor

    Directory of Open Access Journals (Sweden)

    Liu Xin

    2015-12-01

    Full Text Available Spatial local outlier factor (SLOF algorithm was adopted in this study for spatial outlier detection because of the limitations of the traditional static threshold detection. Based on the spatial characteristics of CO2 monitoring data obtained in the carbon capture and storage (CCS project, the K-Nearest Neighbour (KNN graph was constructed using the latitude and longitude information of the monitoring points to identify the spatial neighbourhood of the monitoring points. Then SLOF was adopted to calculate the outlier degrees of the monitoring points and the 3σ rule was employed to identify the spatial outlier. Finally, the selection of K value was analysed and the optimal one was selected. The results show that, compared with the static threshold method, the proposed algorithm has a higher detection precision. It can overcome the shortcomings of the static threshold method and improve the accuracy and diversity of local outlier detection, which provides a reliable reference for the safety assessment and warning of CCS monitoring.

  14. Outlier Detection and Explanation for Domain Experts

    DEFF Research Database (Denmark)

    Micenková, Barbora

    In many data exploratory tasks, extraordinary and rarely occurring patterns called outliers are more interesting than the prevalent ones. For example, they could represent frauds in insurance, intrusions in network and system monitoring, or motion in video surveillance. Decades of research have...... to poor overall performance. Furthermore, in many applications some labeled examples of outliers are available but not sufficient enough in number as training data for standard supervised learning methods. As such, this valuable information is typically ignored. We introduce a new paradigm for outlier...... detection where supervised and unsupervised information are combined to improve the performance while reducing the sensitivity to parameters of individual outlier detection algorithms. We do this by learning a new representation using the outliers from outputs of unsupervised outlier detectors as input...

  15. Application of median-equation approach for outlier detection in geodetic networks

    Directory of Open Access Journals (Sweden)

    Serif Hekimoglu

    Full Text Available In geodetic measurements some outliers may occur sometimes in data sets, depending on different reasons. There are two main approaches to detect outliers as Tests for outliers (Baarda's and Pope's Tests and robust methods (Danish method, Huber method etc.. These methods use the Least Squares Estimation (LSE. The outliers affect the LSE results, especially it smears the effects of the outliers on the good observations and sometimes wrong results may be obtained. To avoid these effects, a method that does not use LSE should be preferred. The median is a high breakdown point estimator and if it is applied for the outlier detection, reliable results can be obtained. In this study, a robust method which uses median with or as a treshould value on median residuals that are obtained from median equations is proposed. If the a priori variance of the observations is known, the reliability of the new approch is greater than the one in the case where the a priori variance is unknown.

  16. Detecting Outlier Microarray Arrays by Correlation and Percentage of Outliers Spots

    Directory of Open Access Journals (Sweden)

    Song Yang

    2006-01-01

    Full Text Available We developed a quality assurance (QA tool, namely microarray outlier filter (MOF, and have applied it to our microarray datasets for the identification of problematic arrays. Our approach is based on the comparison of the arrays using the correlation coefficient and the number of outlier spots generated on each array to reveal outlier arrays. For a human universal reference (HUR dataset, which is used as a technical control in our standard hybridization procedure, 3 outlier arrays were identified out of 35 experiments. For a human blood dataset, 12 outlier arrays were identified from 185 experiments. In general, arrays from human blood samples displayed greater variation in their gene expression profiles than arrays from HUR samples. As a result, MOF identified two distinct patterns in the occurrence of outlier arrays. These results demonstrate that this methodology is a valuable QA practice to identify questionable microarray data prior to downstream analysis.

  17. OUTLIER DETECTION IN PARTIAL ERRORS-IN-VARIABLES MODEL

    Directory of Open Access Journals (Sweden)

    JUN ZHAO

    Full Text Available The weighed total least square (WTLS estimate is very sensitive to the outliers in the partial EIV model. A new procedure for detecting outliers based on the data-snooping is presented in this paper. Firstly, a two-step iterated method of computing the WTLS estimates for the partial EIV model based on the standard LS theory is proposed. Secondly, the corresponding w-test statistics are constructed to detect outliers while the observations and coefficient matrix are contaminated with outliers, and a specific algorithm for detecting outliers is suggested. When the variance factor is unknown, it may be estimated by the least median squares (LMS method. At last, the simulated data and real data about two-dimensional affine transformation are analyzed. The numerical results show that the new test procedure is able to judge that the outliers locate in x component, y component or both components in coordinates while the observations and coefficient matrix are contaminated with outliers

  18. Detection of Outliers in Regression Model for Medical Data

    Directory of Open Access Journals (Sweden)

    Stephen Raj S

    2017-07-01

    Full Text Available In regression analysis, an outlier is an observation for which the residual is large in magnitude compared to other observations in the data set. The detection of outliers and influential points is an important step of the regression analysis. Outlier detection methods have been used to detect and remove anomalous values from data. In this paper, we detect the presence of outliers in simple linear regression models for medical data set. Chatterjee and Hadi mentioned that the ordinary residuals are not appropriate for diagnostic purposes; a transformed version of them is preferable. First, we investigate the presence of outliers based on existing procedures of residuals and standardized residuals. Next, we have used the new approach of standardized scores for detecting outliers without the use of predicted values. The performance of the new approach was verified with the real-life data.

  19. Good and Bad Neighborhood Approximations for Outlier Detection Ensembles

    DEFF Research Database (Denmark)

    Kirner, Evelyn; Schubert, Erich; Zimek, Arthur

    2017-01-01

    Outlier detection methods have used approximate neighborhoods in filter-refinement approaches. Outlier detection ensembles have used artificially obfuscated neighborhoods to achieve diverse ensemble members. Here we argue that outlier detection models could be based on approximate neighborhoods...... in the first place, thus gaining in both efficiency and effectiveness. It depends, however, on the type of approximation, as only some seem beneficial for the task of outlier detection, while no (large) benefit can be seen for others. In particular, we argue that space-filling curves are beneficial...

  20. Ensemble Learning Method for Outlier Detection and its Application to Astronomical Light Curves

    Science.gov (United States)

    Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Chen, Wesley

    2016-09-01

    Outlier detection is necessary for automated data analysis, with specific applications spanning almost every domain from financial markets to epidemiology to fraud detection. We introduce a novel mixture of the experts outlier detection model, which uses a dynamically trained, weighted network of five distinct outlier detection methods. After dimensionality reduction, individual outlier detection methods score each data point for “outlierness” in this new feature space. Our model then uses dynamically trained parameters to weigh the scores of each method, allowing for a finalized outlier score. We find that the mixture of experts model performs, on average, better than any single expert model in identifying both artificially and manually picked outliers. This mixture model is applied to a data set of astronomical light curves, after dimensionality reduction via time series feature extraction. Our model was tested using three fields from the MACHO catalog and generated a list of anomalous candidates. We confirm that the outliers detected using this method belong to rare classes, like Novae, He-burning, and red giant stars; other outlier light curves identified have no available information associated with them. To elucidate their nature, we created a website containing the light-curve data and information about these objects. Users can attempt to classify the light curves, give conjectures about their identities, and sign up for follow up messages about the progress made on identifying these objects. This user submitted data can be used further train of our mixture of experts model. Our code is publicly available to all who are interested.

  1. Comparative Study of Outlier Detection Algorithms via Fundamental Analysis Variables: An Application on Firms Listed in Borsa Istanbul

    Directory of Open Access Journals (Sweden)

    Senol Emir

    2016-04-01

    Full Text Available In a data set, an outlier refers to a data point that is considerably different from the others. Detecting outliers provides useful application-specific insights and leads to choosing right prediction models. Outlier detection (also known as anomaly detection or novelty detection has been studied in statistics and machine learning for a long time. It is an essential preprocessing step of data mining process. In this study, outlier detection step in the data mining process is applied for identifying the top 20 outlier firms. Three outlier detection algorithms are utilized using fundamental analysis variables of firms listed in Borsa Istanbul for the 2011-2014 period. The results of each algorithm are presented and compared. Findings show that 15 different firms are identified by three different outlier detection methods. KCHOL and SAHOL have the greatest number of appearances with 12 observations among these firms. By investigating the results, it is concluded that each of three algorithms makes different outlier firm lists due to differences in their approaches for outlier detection.

  2. Using Person Fit Statistics to Detect Outliers in Survey Research

    Directory of Open Access Journals (Sweden)

    John M. Felt

    2017-05-01

    Full Text Available Context: When working with health-related questionnaires, outlier detection is important. However, traditional methods of outlier detection (e.g., boxplots can miss participants with “atypical” responses to the questions that otherwise have similar total (subscale scores. In addition to detecting outliers, it can be of clinical importance to determine the reason for the outlier status or “atypical” response.Objective: The aim of the current study was to illustrate how to derive person fit statistics for outlier detection through a statistical method examining person fit with a health-based questionnaire.Design and Participants: Patients treated for Cushing's syndrome (n = 394 were recruited from the Cushing's Support and Research Foundation's (CSRF listserv and Facebook page.Main Outcome Measure: Patients were directed to an online survey containing the CushingQoL (English version. A two-dimensional graded response model was estimated, and person fit statistics were generated using the Zh statistic.Results: Conventional outlier detections methods revealed no outliers reflecting extreme scores on the subscales of the CushingQoL. However, person fit statistics identified 18 patients with “atypical” response patterns, which would have been otherwise missed (Zh > |±2.00|.Conclusion: While the conventional methods of outlier detection indicated no outliers, person fit statistics identified several patients with “atypical” response patterns who otherwise appeared average. Person fit statistics allow researchers to delve further into the underlying problems experienced by these “atypical” patients treated for Cushing's syndrome. Annotated code is provided to aid other researchers in using this method.

  3. Spatial Outlier Detection of CO2 Monitoring Data Based on Spatial Local Outlier Factor

    OpenAIRE

    Liu Xin; Zhang Shaoliang; Zheng Pulin

    2015-01-01

    Spatial local outlier factor (SLOF) algorithm was adopted in this study for spatial outlier detection because of the limitations of the traditional static threshold detection. Based on the spatial characteristics of CO2 monitoring data obtained in the carbon capture and storage (CCS) project, the K-Nearest Neighbour (KNN) graph was constructed using the latitude and longitude information of the monitoring points to identify the spatial neighbourhood of the monitoring points. Then ...

  4. Outlier Detection Techniques For Wireless Sensor Networks: A Survey

    NARCIS (Netherlands)

    Zhang, Y.; Meratnia, Nirvana; Havinga, Paul J.M.

    2008-01-01

    In the field of wireless sensor networks, measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are

  5. Development of a methodology for the detection of hospital financial outliers using information systems.

    Science.gov (United States)

    Okada, Sachiko; Nagase, Keisuke; Ito, Ayako; Ando, Fumihiko; Nakagawa, Yoshiaki; Okamoto, Kazuya; Kume, Naoto; Takemura, Tadamasa; Kuroda, Tomohiro; Yoshihara, Hiroyuki

    2014-01-01

    Comparison of financial indices helps to illustrate differences in operations and efficiency among similar hospitals. Outlier data tend to influence statistical indices, and so detection of outliers is desirable. Development of a methodology for financial outlier detection using information systems will help to reduce the time and effort required, eliminate the subjective elements in detection of outlier data, and improve the efficiency and quality of analysis. The purpose of this research was to develop such a methodology. Financial outliers were defined based on a case model. An outlier-detection method using the distances between cases in multi-dimensional space is proposed. Experiments using three diagnosis groups indicated successful detection of cases for which the profitability and income structure differed from other cases. Therefore, the method proposed here can be used to detect outliers. Copyright © 2013 John Wiley & Sons, Ltd.

  6. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  7. Outlier Detection with Space Transformation and Spectral Analysis

    DEFF Research Database (Denmark)

    Dang, Xuan-Hong; Micenková, Barbora; Assent, Ira

    2013-01-01

    which rely on notions of distances or densities, this approach introduces a novel concept based on local quadratic entropy for evaluating the similarity of a data object with its neighbors. This information theoretic quantity is used to regularize the closeness amongst data instances and subsequently......Detecting a small number of outliers from a set of data observations is always challenging. In this paper, we present an approach that exploits space transformation and uses spectral analysis in the newly transformed space for outlier detection. Unlike most existing techniques in the literature...... benefits the process of mapping data into a usually lower dimensional space. Outliers are then identified by spectral analysis of the eigenspace spanned by the set of leading eigenvectors derived from the mapping procedure. The proposed technique is purely data-driven and imposes no assumptions regarding...

  8. On the Evaluation of Outlier Detection and One-Class Classification Methods

    DEFF Research Database (Denmark)

    Swersky, Lorne; Marques, Henrique O.; Sander, Jörg

    2016-01-01

    It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem. In this paper, we focus on the comparison of oneclass classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies ...

  9. Detection of additive outliers in seasonal time series

    DEFF Research Database (Denmark)

    Haldrup, Niels; Montañés, Antonio; Sansó, Andreu

    The detection and location of additive outliers in integrated variables has attracted much attention recently because such outliers tend to affect unit root inference among other things. Most of these procedures have been developed for non-seasonal processes. However, the presence of seasonality......) to deal with data sampled at a seasonal frequency and the size and power properties are discussed. We also show that the presence of periodic heteroscedasticity will inflate the size of the tests and hence will tend to identify an excessive number of outliers. A modified Perron-Rodriguez test which allows...... periodically varying variances is suggested and it is shown to have excellent properties in terms of both power and size...

  10. Elimination of some unknown parameters and its effect on outlier detection

    Directory of Open Access Journals (Sweden)

    Serif Hekimoglu

    Full Text Available Outliers in observation set badly affect all the estimated unknown parameters and residuals, that is because outlier detection has a great importance for reliable estimation results. Tests for outliers (e.g. Baarda's and Pope's tests are frequently used to detect outliers in geodetic applications. In order to reduce the computational time, sometimes elimination of some unknown parameters, which are not of interest, is performed. In this case, although the estimated unknown parameters and residuals do not change, the cofactor matrix of the residuals and the redundancies of the observations change. In this study, the effects of the elimination of the unknown parameters on tests for outliers have been investigated. We have proved that the redundancies in initial functional model (IFM are smaller than the ones in reduced functional model (RFM where elimination is performed. To show this situation, a horizontal control network was simulated and then many experiences were performed. According to simulation results, tests for outlier in IFM are more reliable than the ones in RFM.

  11. On the Evaluation of Outlier Detection: Measures, Datasets, and an Empirical Study Continued

    DEFF Research Database (Denmark)

    Campos, G. O.; Zimek, A.; Sander, J.

    2016-01-01

    The evaluation of unsupervised outlier detection algorithms is a constant challenge in data mining research. Little is known regarding the strengths and weaknesses of different standard outlier detection models, and the impact of parameter choices for these algorithms. The scarcity of appropriate...... are available online in the repository at: http://www.dbs.ifi.lmu.de/research/outlier-evaluation/...

  12. IVS Combination Center at BKG - Robust Outlier Detection and Weighting Strategies

    Science.gov (United States)

    Bachmann, S.; Lösler, M.

    2012-12-01

    Outlier detection plays an important role within the IVS combination. Even if the original data is the same for all contributing Analysis Centers (AC), the analyzed data shows differences due to analysis software characteristics. The treatment of outliers is thus a fine line between keeping data heterogeneity and elimination of real outliers. Robust outlier detection based on the Least Median Square (LMS) is used within the IVS combination. This method allows reliable outlier detection with a small number of input parameters. A similar problem arises for the weighting of the individual solutions within the combination process. The variance component estimation (VCE) is used to control the weighting factor for each AC. The Operator-Software-Impact (OSI) method takes into account that the analyzed data is strongly influenced by the software and the responsible operator. It allows to make the VCE more sensitive to the diverse input data. This method has already been set up within GNSS data analysis as well as the analysis of troposphere data. The benefit of an OSI realization within the VLBI combination and its potential in weighting factor determination has not been investigated before.

  13. Distance Based Method for Outlier Detection of Body Sensor Networks

    Directory of Open Access Journals (Sweden)

    Haibin Zhang

    2016-01-01

    Full Text Available We propose a distance based method for the outlier detection of body sensor networks. Firstly, we use a Kernel Density Estimation (KDE to calculate the probability of the distance to k nearest neighbors for diagnosed data. If the probability is less than a threshold, and the distance of this data to its left and right neighbors is greater than a pre-defined value, the diagnosed data is decided as an outlier. Further, we formalize a sliding window based method to improve the outlier detection performance. Finally, to estimate the KDE by training sensor readings with errors, we introduce a Hidden Markov Model (HMM based method to estimate the most probable ground truth values which have the maximum probability to produce the training data. Simulation results show that the proposed method possesses a good detection accuracy with a low false alarm rate.

  14. Algorithms for Speeding up Distance-Based Outlier Detection

    Data.gov (United States)

    National Aeronautics and Space Administration — The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address...

  15. An Unbiased Distance-based Outlier Detection Approach for High-dimensional Data

    DEFF Research Database (Denmark)

    Nguyen, Hoang Vu; Gopalkrishnan, Vivekanand; Assent, Ira

    2011-01-01

    than a global property. Different from existing approaches, it is not grid-based and dimensionality unbiased. Thus, its performance is impervious to grid resolution as well as the curse of dimensionality. In addition, our approach ranks the outliers, allowing users to select the number of desired...... outliers, thus mitigating the issue of high false alarm rate. Extensive empirical studies on real datasets show that our approach efficiently and effectively detects outliers, even in high-dimensional spaces....

  16. A NOTE ON THE CONVENTIONAL OUTLIER DETECTION TEST PROCEDURES

    Directory of Open Access Journals (Sweden)

    JIANFENG GUO

    Full Text Available Under the assumption of that the variance-covariance matrix is fully populated, Baarda's w-test is turn out to be completely different from the standardized least-squares residual. Unfortunately, this is not generally recognized. In the limiting case of only one degree of freedom, all the three types of test statistics, including Gaussian normal test, Student's t-test and Pope's Tau-test, will be invalid for identification of outliers: (1 all the squares of the Gaussian normal test statistic coincide with the goodness-of-fit (global test statistic, even for correlated observations. Hence, the failure of the global test implies that all the observations will be flagged as outliers, and thus the Gaussian normal test is inconclusive for localization of outliers; (2 the absolute values of the Tau-test statistic are all exactly equal to one, no matter whether the observations are contaminated. Therefore, the Tau-test cannot work for outlier detection in this situation; and (3 Student's t-test statistics are undefined.

  17. A Distributed Algorithm for the Cluster-Based Outlier Detection Using Unsupervised Extreme Learning Machines

    Directory of Open Access Journals (Sweden)

    Xite Wang

    2017-01-01

    Full Text Available Outlier detection is an important data mining task, whose target is to find the abnormal or atypical objects from a given dataset. The techniques for detecting outliers have a lot of applications, such as credit card fraud detection and environment monitoring. Our previous work proposed the Cluster-Based (CB outlier and gave a centralized method using unsupervised extreme learning machines to compute CB outliers. In this paper, we propose a new distributed algorithm for the CB outlier detection (DACB. On the master node, we collect a small number of points from the slave nodes to obtain a threshold. On each slave node, we design a new filtering method that can use the threshold to efficiently speed up the computation. Furthermore, we also propose a ranking method to optimize the order of cluster scanning. At last, the effectiveness and efficiency of the proposed approaches are verified through a plenty of simulation experiments.

  18. Iterative Outlier Removal: A Method for Identifying Outliers in Laboratory Recalibration Studies.

    Science.gov (United States)

    Parrinello, Christina M; Grams, Morgan E; Sang, Yingying; Couper, David; Wruck, Lisa M; Li, Danni; Eckfeldt, John H; Selvin, Elizabeth; Coresh, Josef

    2016-07-01

    Extreme values that arise for any reason, including those through nonlaboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990-1992 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was remeasured in 200 participants from stored plasma in 2011-2013 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points >3 SDs from the mean difference. We continued this process using the resulting data until no outliers remained. IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After 4 iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers [mean difference (SD) = 1.22 (0.51) mg/dL]. Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (>7 mg/dL) was 28.5% before IOR and 8.5% after IOR. IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach. © 2016 American Association for Clinical Chemistry.

  19. Electricity Price Forecasting Based on AOSVR and Outlier Detection

    Institute of Scientific and Technical Information of China (English)

    Zhou Dianmin; Gao Lin; Gao Feng

    2005-01-01

    Electricity price is of the first consideration for all the participants in electric power market and its characteristics are related to both market mechanism and variation in the behaviors of market participants. It is necessary to build a real-time price forecasting model with adaptive capability; and because there are outliers in the price data, they should be detected and filtrated in training the forecasting model by regression method. In view of these points, this paper presents an electricity price forecasting method based on accurate on-line support vector regression (AOSVR) and outlier detection. Numerical testing results show that the method is effective in forecasting the electricity prices in electric power market.

  20. Detection of outliers by neural network on the gas centrifuge experimental data of isotopic separation process

    International Nuclear Information System (INIS)

    Andrade, Monica de Carvalho Vasconcelos

    2004-01-01

    This work presents and discusses the neural network technique aiming at the detection of outliers on a set of gas centrifuge isotope separation experimental data. In order to evaluate the application of this new technique, the result obtained of the detection is compared to the result of the statistical analysis combined with the cluster analysis. This method for the detection of outliers presents a considerable potential in the field of data analysis and it is at the same time easier and faster to use and requests very less knowledge of the physics involved in the process. This work established a procedure for detecting experiments which are suspect to contain gross errors inside a data set where the usual techniques for identification of these errors cannot be applied or its use/demands an excessively long work. (author)

  1. An Improved Semisupervised Outlier Detection Algorithm Based on Adaptive Feature Weighted Clustering

    Directory of Open Access Journals (Sweden)

    Tingquan Deng

    2016-01-01

    Full Text Available There exist already various approaches to outlier detection, in which semisupervised methods achieve encouraging superiority due to the introduction of prior knowledge. In this paper, an adaptive feature weighted clustering-based semisupervised outlier detection strategy is proposed. This method maximizes the membership degree of a labeled normal object to the cluster it belongs to and minimizes the membership degrees of a labeled outlier to all clusters. In consideration of distinct significance of features or components in a dataset in determining an object being an inlier or outlier, each feature is adaptively assigned different weights according to the deviation degrees between this feature of all objects and that of a certain cluster prototype. A series of experiments on a synthetic dataset and several real-world datasets are implemented to verify the effectiveness and efficiency of the proposal.

  2. Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization

    Directory of Open Access Journals (Sweden)

    Salim Zair

    2016-04-01

    Full Text Available In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS data is hindered by Non-Line Of Sight (NLOS and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF or a Rao-Blackwellization (RB approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the ‘outliers’ in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.

  3. Multivariate Functional Data Visualization and Outlier Detection

    KAUST Repository

    Dai, Wenlin

    2017-03-19

    This article proposes a new graphical tool, the magnitude-shape (MS) plot, for visualizing both the magnitude and shape outlyingness of multivariate functional data. The proposed tool builds on the recent notion of functional directional outlyingness, which measures the centrality of functional data by simultaneously considering the level and the direction of their deviation from the central region. The MS-plot intuitively presents not only levels but also directions of magnitude outlyingness on the horizontal axis or plane, and demonstrates shape outlyingness on the vertical axis. A dividing curve or surface is provided to separate non-outlying data from the outliers. Both the simulated data and the practical examples confirm that the MS-plot is superior to existing tools for visualizing centrality and detecting outliers for functional data.

  4. Multivariate Functional Data Visualization and Outlier Detection

    KAUST Repository

    Dai, Wenlin; Genton, Marc G.

    2017-01-01

    This article proposes a new graphical tool, the magnitude-shape (MS) plot, for visualizing both the magnitude and shape outlyingness of multivariate functional data. The proposed tool builds on the recent notion of functional directional outlyingness, which measures the centrality of functional data by simultaneously considering the level and the direction of their deviation from the central region. The MS-plot intuitively presents not only levels but also directions of magnitude outlyingness on the horizontal axis or plane, and demonstrates shape outlyingness on the vertical axis. A dividing curve or surface is provided to separate non-outlying data from the outliers. Both the simulated data and the practical examples confirm that the MS-plot is superior to existing tools for visualizing centrality and detecting outliers for functional data.

  5. Why General Outlier Detection Techniques Do Not Suffice For Wireless Sensor Networks?

    NARCIS (Netherlands)

    Zhang, Y.; Meratnia, Nirvana; Havinga, Paul J.M.

    2009-01-01

    Raw data collected in wireless sensor networks are often unreliable and inaccurate due to noise, faulty sensors and harsh environmental effects. Sensor data that significantly deviate from normal pattern of sensed data are often called outliers. Outlier detection in wireless sensor networks aims at

  6. Supervised Outlier Detection in Large-Scale Mvs Point Clouds for 3d City Modeling Applications

    Science.gov (United States)

    Stucker, C.; Richard, A.; Wegner, J. D.; Schindler, K.

    2018-05-01

    We propose to use a discriminative classifier for outlier detection in large-scale point clouds of cities generated via multi-view stereo (MVS) from densely acquired images. What makes outlier removal hard are varying distributions of inliers and outliers across a scene. Heuristic outlier removal using a specific feature that encodes point distribution often delivers unsatisfying results. Although most outliers can be identified correctly (high recall), many inliers are erroneously removed (low precision), too. This aggravates object 3D reconstruction due to missing data. We thus propose to discriminatively learn class-specific distributions directly from the data to achieve high precision. We apply a standard Random Forest classifier that infers a binary label (inlier or outlier) for each 3D point in the raw, unfiltered point cloud and test two approaches for training. In the first, non-semantic approach, features are extracted without considering the semantic interpretation of the 3D points. The trained model approximates the average distribution of inliers and outliers across all semantic classes. Second, semantic interpretation is incorporated into the learning process, i.e. we train separate inlieroutlier classifiers per semantic class (building facades, roof, ground, vegetation, fields, and water). Performance of learned filtering is evaluated on several large SfM point clouds of cities. We find that results confirm our underlying assumption that discriminatively learning inlier-outlier distributions does improve precision over global heuristics by up to ≍ 12 percent points. Moreover, semantically informed filtering that models class-specific distributions further improves precision by up to ≍ 10 percent points, being able to remove very isolated building, roof, and water points while preserving inliers on building facades and vegetation.

  7. Detection of Outliers in Panel Data of Intervention Effects Model Based on Variance of Remainder Disturbance

    Directory of Open Access Journals (Sweden)

    Yanfang Lyu

    2015-01-01

    Full Text Available The presence of outliers can result in seriously biased parameter estimates. In order to detect outliers in panel data models, this paper presents a modeling method to assess the intervention effects based on the variance of remainder disturbance using an arbitrary strictly positive twice continuously differentiable function. This paper also provides a Lagrange Multiplier (LM approach to detect and identify a general type of outlier. Furthermore, fixed effects models and random effects models are discussed to identify outliers and the corresponding LM test statistics are given. The LM test statistics for an individual-based model to detect outliers are given as a particular case. Finally, this paper performs an application using panel data and explains the advantages of the proposed method.

  8. System and Method for Outlier Detection via Estimating Clusters

    Science.gov (United States)

    Iverson, David J. (Inventor)

    2016-01-01

    An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to "clean" the training data.

  9. Shape based kinetic outlier detection in real-time PCR

    Directory of Open Access Journals (Sweden)

    D'Atri Mario

    2010-04-01

    Full Text Available Abstract Background Real-time PCR has recently become the technique of choice for absolute and relative nucleic acid quantification. The gold standard quantification method in real-time PCR assumes that the compared samples have similar PCR efficiency. However, many factors present in biological samples affect PCR kinetic, confounding quantification analysis. In this work we propose a new strategy to detect outlier samples, called SOD. Results Richards function was fitted on fluorescence readings to parameterize the amplification curves. There was not a significant correlation between calculated amplification parameters (plateau, slope and y-coordinate of the inflection point and the Log of input DNA demonstrating that this approach can be used to achieve a "fingerprint" for each amplification curve. To identify the outlier runs, the calculated parameters of each unknown sample were compared to those of the standard samples. When a significant underestimation of starting DNA molecules was found, due to the presence of biological inhibitors such as tannic acid, IgG or quercitin, SOD efficiently marked these amplification profiles as outliers. SOD was subsequently compared with KOD, the current approach based on PCR efficiency estimation. The data obtained showed that SOD was more sensitive than KOD, whereas SOD and KOD were equally specific. Conclusion Our results demonstrated, for the first time, that outlier detection can be based on amplification shape instead of PCR efficiency. SOD represents an improvement in real-time PCR analysis because it decreases the variance of data thus increasing the reliability of quantification.

  10. Explaining outliers by subspace separability

    DEFF Research Database (Denmark)

    Micenková, Barbora; Ng, Raymond T.; Dang, Xuan-Hong

    2013-01-01

    Outliers are extraordinary objects in a data collection. Depending on the domain, they may represent errors, fraudulent activities or rare events that are subject of our interest. Existing approaches focus on detection of outliers or degrees of outlierness (ranking), but do not provide a possible...... with any existing outlier detection algorithm and it also includes a heuristic that gives a substantial speedup over the baseline strategy....

  11. Detection of Outliers and Imputing of Missing Values for Water Quality UV-VIS Absorbance Time Series

    OpenAIRE

    Plazas-Nossa, Leonardo; Ávila Angulo, Miguel Antonio; Torres, Andrés

    2017-01-01

    Context:The UV-Vis absorbance collection using online optical captors for water quality detection may yield outliers and/or missing values. Therefore, pre-processing to correct these anomalies is required to improve the analysis of monitoring data. The aim of this study is to propose a method to detect outliers as well as to fill-in the gaps in time series. Method:Outliers are detected using Winsorising procedure and the application of the Discrete Fourier Transform (DFT) and the Inverse of F...

  12. Adaptive distributed outlier detection for WSNs.

    Science.gov (United States)

    De Paola, Alessandra; Gaglio, Salvatore; Lo Re, Giuseppe; Milazzo, Fabrizio; Ortolani, Marco

    2015-05-01

    The paradigm of pervasive computing is gaining more and more attention nowadays, thanks to the possibility of obtaining precise and continuous monitoring. Ease of deployment and adaptivity are typically implemented by adopting autonomous and cooperative sensory devices; however, for such systems to be of any practical use, reliability and fault tolerance must be guaranteed, for instance by detecting corrupted readings amidst the huge amount of gathered sensory data. This paper proposes an adaptive distributed Bayesian approach for detecting outliers in data collected by a wireless sensor network; our algorithm aims at optimizing classification accuracy, time complexity and communication complexity, and also considering externally imposed constraints on such conflicting goals. The performed experimental evaluation showed that our approach is able to improve the considered metrics for latency and energy consumption, with limited impact on classification accuracy.

  13. Outlier Detection in Structural Time Series Models

    DEFF Research Database (Denmark)

    Marczak, Martyna; Proietti, Tommaso

    investigate via Monte Carlo simulations how this approach performs for detecting additive outliers and level shifts in the analysis of nonstationary seasonal time series. The reference model is the basic structural model, featuring a local linear trend, possibly integrated of order two, stochastic seasonality......Structural change affects the estimation of economic signals, like the underlying growth rate or the seasonally adjusted series. An important issue, which has attracted a great deal of attention also in the seasonal adjustment literature, is its detection by an expert procedure. The general......–to–specific approach to the detection of structural change, currently implemented in Autometrics via indicator saturation, has proven to be both practical and effective in the context of stationary dynamic regression models and unit–root autoregressions. By focusing on impulse– and step–indicator saturation, we...

  14. Learning Outlier Ensembles

    DEFF Research Database (Denmark)

    Micenková, Barbora; McWilliams, Brian; Assent, Ira

    into the existing unsupervised algorithms. In this paper, we show how to use powerful machine learning approaches to combine labeled examples together with arbitrary unsupervised outlier scoring algorithms. We aim to get the best out of the two worlds—supervised and unsupervised. Our approach is also a viable......Years of research in unsupervised outlier detection have produced numerous algorithms to score data according to their exceptionality. wever, the nature of outliers heavily depends on the application context and different algorithms are sensitive to outliers of different nature. This makes it very...... difficult to assess suitability of a particular algorithm without a priori knowledge. On the other hand, in many applications, some examples of outliers exist or can be obtain edin addition to the vast amount of unlabeled data. Unfortunately, this extra knowledge cannot be simply incorporated...

  15. Adjusted functional boxplots for spatio-temporal data visualization and outlier detection

    KAUST Repository

    Sun, Ying; Genton, Marc G.

    2011-01-01

    This article proposes a simulation-based method to adjust functional boxplots for correlations when visualizing functional and spatio-temporal data, as well as detecting outliers. We start by investigating the relationship between the spatio

  16. Music Outlier Detection Using Multiple Sequence Alignment and Independent Ensembles

    NARCIS (Netherlands)

    Bountouridis, D.; Koops, Hendrik Vincent; Wiering, F.; Veltkamp, R.C.

    2016-01-01

    The automated retrieval of related music documents, such as cover songs or folk melodies belonging to the same tune, has been an important task in the field of Music Information Retrieval (MIR). Yet outlier detection, the process of identifying those documents that deviate significantly from the

  17. Detecting isotopic ratio outliers

    Science.gov (United States)

    Bayne, C. K.; Smith, D. H.

    An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.

  18. Detecting isotopic ratio outliers

    International Nuclear Information System (INIS)

    Bayne, C.K.; Smith, D.H.

    1986-01-01

    An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers

  19. Pendeteksian Outlier pada Regresi Nonlinier dengan Metode statistik Likelihood Displacement

    Directory of Open Access Journals (Sweden)

    Siti Tabi'atul Hasanah

    2012-11-01

    Full Text Available Outlier is an observation that much different (extreme from the other observational data, or data can be interpreted that do not follow the general pattern of the model. Sometimes outliers provide information that can not be provided by other data. That's why outliers should not just be eliminated. Outliers can also be an influential observation. There are many methods that can be used to detect of outliers. In previous studies done on outlier detection of linear regression. Next will be developed detection of outliers in nonlinear regression. Nonlinear regression here is devoted to multiplicative nonlinear regression. To detect is use of statistical method likelihood displacement. Statistical methods abbreviated likelihood displacement (LD is a method to detect outliers by removing the suspected outlier data. To estimate the parameters are used to the maximum likelihood method, so we get the estimate of the maximum. By using LD method is obtained i.e likelihood displacement is thought to contain outliers. Further accuracy of LD method in detecting the outliers are shown by comparing the MSE of LD with the MSE from the regression in general. Statistic test used is Λ. Initial hypothesis was rejected when proved so is an outlier.

  20. Time Series Outlier Detection Based on Sliding Window Prediction

    Directory of Open Access Journals (Sweden)

    Yufeng Yu

    2014-01-01

    Full Text Available In order to detect outliers in hydrological time series data for improving data quality and decision-making quality related to design, operation, and management of water resources, this research develops a time series outlier detection method for hydrologic data that can be used to identify data that deviate from historical patterns. The method first built a forecasting model on the history data and then used it to predict future values. Anomalies are assumed to take place if the observed values fall outside a given prediction confidence interval (PCI, which can be calculated by the predicted value and confidence coefficient. The use of PCI as threshold is mainly on the fact that it considers the uncertainty in the data series parameters in the forecasting model to address the suitable threshold selection problem. The method performs fast, incremental evaluation of data as it becomes available, scales to large quantities of data, and requires no preclassification of anomalies. Experiments with different hydrologic real-world time series showed that the proposed methods are fast and correctly identify abnormal data and can be used for hydrologic time series analysis.

  1. Detecting isotopic ratio outliers

    International Nuclear Information System (INIS)

    Bayne, C.K.; Smith, D.H.

    1985-01-01

    An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers. 6 refs., 3 figs., 3 tabs

  2. [Outlier sample discriminating methods for building calibration model in melons quality detecting using NIR spectra].

    Science.gov (United States)

    Tian, Hai-Qing; Wang, Chun-Guang; Zhang, Hai-Jun; Yu, Zhi-Hong; Li, Jian-Kang

    2012-11-01

    Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0.6010 Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0.854 degrees Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set (r = 0.797, RMSEC= 0.849 degrees Brix, RMSEP = 1.19 degrees Brix), and more representative and stable with all 9 samples were eliminated from calibration set (r = 0.892, RMSEC = 0.605 degrees Brix, RMSEP = 0.862 degrees).

  3. Efficient estimation of dynamic density functions with an application to outlier detection

    KAUST Repository

    Qahtan, Abdulhakim Ali Ali; Zhang, Xiangliang; Wang, Suojin

    2012-01-01

    In this paper, we propose a new method to estimate the dynamic density over data streams, named KDE-Track as it is based on a conventional and widely used Kernel Density Estimation (KDE) method. KDE-Track can efficiently estimate the density with linear complexity by using interpolation on a kernel model, which is incrementally updated upon the arrival of streaming data. Both theoretical analysis and experimental validation show that KDE-Track outperforms traditional KDE and a baseline method Cluster-Kernels on estimation accuracy of the complex density structures in data streams, computing time and memory usage. KDE-Track is also demonstrated on timely catching the dynamic density of synthetic and real-world data. In addition, KDE-Track is used to accurately detect outliers in sensor data and compared with two existing methods developed for detecting outliers and cleaning sensor data. © 2012 ACM.

  4. Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars

    Science.gov (United States)

    Reis, Itamar; Poznanski, Dovi; Baron, Dalya; Zasowski, Gail; Shahaf, Sahar

    2018-05-01

    In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.

  5. Outlier detection by robust Mahalanobis distance in geological data obtained by INAA to provenance studies

    Energy Technology Data Exchange (ETDEWEB)

    Santos, Jose O. dos, E-mail: osmansantos@ig.com.br [Instituto Federal de Educacao, Ciencia e Tecnologia de Sergipe (IFS), Lagarto, SE (Brazil); Munita, Casimiro S., E-mail: camunita@ipen.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Soares, Emilio A.A., E-mail: easoares@ufan.edu.br [Universidade Federal do Amazonas (UFAM), Manaus, AM (Brazil). Dept. de Geociencias

    2013-07-01

    The detection of outlier in geochemical studies is one of the main difficulties in the interpretation of dataset because they can disturb the statistical method. The search for outliers in geochemical studies is usually based in the Mahalanobis distance (MD), since points in multivariate space that are a distance larger the some predetermined values from center of the data are considered outliers. However, the MD is very sensitive to the presence of discrepant samples. Many robust estimators for location and covariance have been introduced in the literature, such as Minimum Covariance Determinant (MCD) estimator. When MCD estimators are used to calculate the MD leads to the so-called Robust Mahalanobis Distance (RD). In this context, in this work RD was used to detect outliers in geological study of samples collected from confluence of Negro and Solimoes rivers. The purpose of this study was to study the contributions of the sediments deposited by the Solimoes and Negro rivers in the filling of the tectonic depressions at Parana do Ariau. For that 113 samples were analyzed by Instrumental Neutron Activation Analysis (INAA) in which were determined the concentration of As, Ba, Ce, Co, Cr, Cs, Eu, Fe, Hf, K, La, Lu, Na, Nd, Rb, Sb, Sc, Sm, U, Yb, Ta, Tb, Th and Zn. In the dataset was possible to construct the ellipse corresponding to robust Mahalanobis distance for each group of samples. The samples found outside of the tolerance ellipse were considered an outlier. The results showed that Robust Mahalanobis Distance was more appropriate for the identification of the outliers, once it is a more restrictive method. (author)

  6. Outlier detection by robust Mahalanobis distance in geological data obtained by INAA to provenance studies

    International Nuclear Information System (INIS)

    Santos, Jose O. dos; Munita, Casimiro S.; Soares, Emilio A.A.

    2013-01-01

    The detection of outlier in geochemical studies is one of the main difficulties in the interpretation of dataset because they can disturb the statistical method. The search for outliers in geochemical studies is usually based in the Mahalanobis distance (MD), since points in multivariate space that are a distance larger the some predetermined values from center of the data are considered outliers. However, the MD is very sensitive to the presence of discrepant samples. Many robust estimators for location and covariance have been introduced in the literature, such as Minimum Covariance Determinant (MCD) estimator. When MCD estimators are used to calculate the MD leads to the so-called Robust Mahalanobis Distance (RD). In this context, in this work RD was used to detect outliers in geological study of samples collected from confluence of Negro and Solimoes rivers. The purpose of this study was to study the contributions of the sediments deposited by the Solimoes and Negro rivers in the filling of the tectonic depressions at Parana do Ariau. For that 113 samples were analyzed by Instrumental Neutron Activation Analysis (INAA) in which were determined the concentration of As, Ba, Ce, Co, Cr, Cs, Eu, Fe, Hf, K, La, Lu, Na, Nd, Rb, Sb, Sc, Sm, U, Yb, Ta, Tb, Th and Zn. In the dataset was possible to construct the ellipse corresponding to robust Mahalanobis distance for each group of samples. The samples found outside of the tolerance ellipse were considered an outlier. The results showed that Robust Mahalanobis Distance was more appropriate for the identification of the outliers, once it is a more restrictive method. (author)

  7. An MEF-Based Localization Algorithm against Outliers in Wireless Sensor Networks.

    Science.gov (United States)

    Wang, Dandan; Wan, Jiangwen; Wang, Meimei; Zhang, Qiang

    2016-07-07

    Precise localization has attracted considerable interest in Wireless Sensor Networks (WSNs) localization systems. Due to the internal or external disturbance, the existence of the outliers, including both the distance outliers and the anchor outliers, severely decreases the localization accuracy. In order to eliminate both kinds of outliers simultaneously, an outlier detection method is proposed based on the maximum entropy principle and fuzzy set theory. Since not all the outliers can be detected in the detection process, the Maximum Entropy Function (MEF) method is utilized to tolerate the errors and calculate the optimal estimated locations of unknown nodes. Simulation results demonstrate that the proposed localization method remains stable while the outliers vary. Moreover, the localization accuracy is highly improved by wisely rejecting outliers.

  8. Detection of Outliers and Imputing of Missing Values for Water Quality UV-VIS Absorbance Time Series

    Directory of Open Access Journals (Sweden)

    Leonardo Plazas-Nossa

    2017-01-01

    Full Text Available Context: The UV-Vis absorbance collection using online optical captors for water quality detection may yield outliers and/or missing values. Therefore, data pre-processing is a necessary pre-requisite to monitoring data processing. Thus, the aim of this study is to propose a method that detects and removes outliers as well as fills gaps in time series. Method: Outliers are detected using Winsorising procedure and the application of the Discrete Fourier Transform (DFT and the Inverse of Fast Fourier Transform (IFFT to complete the time series. Together, these tools were used to analyse a case study comprising three sites in Colombia ((i Bogotá D.C. Salitre-WWTP (Waste Water Treatment Plant, influent; (ii Bogotá D.C. Gibraltar Pumping Station (GPS; and, (iii Itagüí, San Fernando-WWTP, influent (Medellín metropolitan area analysed via UV-Vis (Ultraviolet and Visible spectra. Results: Outlier detection with the proposed method obtained promising results when window parameter values are small and self-similar, despite that the three time series exhibited different sizes and behaviours. The DFT allowed to process different length gaps having missing values. To assess the validity of the proposed method, continuous subsets (a section of the absorbance time series without outlier or missing values were removed from the original time series obtaining an average 12% error rate in the three testing time series. Conclusions: The application of the DFT and the IFFT, using the 10% most important harmonics of useful values, can be useful for its later use in different applications, specifically for time series of water quality and quantity in urban sewer systems. One potential application would be the analysis of dry weather interesting to rain events, a feat achieved by detecting values that correspond to unusual behaviour in a time series. Additionally, the result hints at the potential of the method in correcting other hydrologic time series.

  9. Open-Source Radiation Exposure Extraction Engine (RE3) with Patient-Specific Outlier Detection.

    Science.gov (United States)

    Weisenthal, Samuel J; Folio, Les; Kovacs, William; Seff, Ari; Derderian, Vana; Summers, Ronald M; Yao, Jianhua

    2016-08-01

    We present an open-source, picture archiving and communication system (PACS)-integrated radiation exposure extraction engine (RE3) that provides study-, series-, and slice-specific data for automated monitoring of computed tomography (CT) radiation exposure. RE3 was built using open-source components and seamlessly integrates with the PACS. RE3 calculations of dose length product (DLP) from the Digital imaging and communications in medicine (DICOM) headers showed high agreement (R (2) = 0.99) with the vendor dose pages. For study-specific outlier detection, RE3 constructs robust, automatically updating multivariable regression models to predict DLP in the context of patient gender and age, scan length, water-equivalent diameter (D w), and scanned body volume (SBV). As proof of concept, the model was trained on 811 CT chest, abdomen + pelvis (CAP) exams and 29 outliers were detected. The continuous variables used in the outlier detection model were scan length (R (2)  = 0.45), D w (R (2) = 0.70), SBV (R (2) = 0.80), and age (R (2) = 0.01). The categorical variables were gender (male average 1182.7 ± 26.3 and female 1047.1 ± 26.9 mGy cm) and pediatric status (pediatric average 710.7 ± 73.6 mGy cm and adult 1134.5 ± 19.3 mGy cm).

  10. Exploring Outliers in Crowdsourced Ranking for QoE

    OpenAIRE

    Xu, Qianqian; Yan, Ming; Huang, Chendi; Xiong, Jiechao; Huang, Qingming; Yao, Yuan

    2017-01-01

    Outlier detection is a crucial part of robust evaluation for crowdsourceable assessment of Quality of Experience (QoE) and has attracted much attention in recent years. In this paper, we propose some simple and fast algorithms for outlier detection and robust QoE evaluation based on the nonconvex optimization principle. Several iterative procedures are designed with or without knowing the number of outliers in samples. Theoretical analysis is given to show that such procedures can reach stati...

  11. Outlier-resilient complexity analysis of heartbeat dynamics

    Science.gov (United States)

    Lo, Men-Tzung; Chang, Yi-Chung; Lin, Chen; Young, Hsu-Wen Vincent; Lin, Yen-Hung; Ho, Yi-Lwun; Peng, Chung-Kang; Hu, Kun

    2015-03-01

    Complexity in physiological outputs is believed to be a hallmark of healthy physiological control. How to accurately quantify the degree of complexity in physiological signals with outliers remains a major barrier for translating this novel concept of nonlinear dynamic theory to clinical practice. Here we propose a new approach to estimate the complexity in a signal by analyzing the irregularity of the sign time series of its coarse-grained time series at different time scales. Using surrogate data, we show that the method can reliably assess the complexity in noisy data while being highly resilient to outliers. We further apply this method to the analysis of human heartbeat recordings. Without removing any outliers due to ectopic beats, the method is able to detect a degradation of cardiac control in patients with congestive heart failure and a more degradation in critically ill patients whose life continuation relies on extracorporeal membrane oxygenator (ECMO). Moreover, the derived complexity measures can predict the mortality of ECMO patients. These results indicate that the proposed method may serve as a promising tool for monitoring cardiac function of patients in clinical settings.

  12. Outlier Ranking via Subspace Analysis in Multiple Views of the Data

    DEFF Research Database (Denmark)

    Muller, Emmanuel; Assent, Ira; Iglesias, Patricia

    2012-01-01

    , a novel outlier ranking concept. Outrank exploits subspace analysis to determine the degree of outlierness. It considers different subsets of the attributes as individual outlier properties. It compares clustered regions in arbitrary subspaces and derives an outlierness score for each object. Its...... principled integration of multiple views into an outlierness measure uncovers outliers that are not detectable in the full attribute space. Our experimental evaluation demonstrates that Outrank successfully determines a high quality outlier ranking, and outperforms state-of-the-art outlierness measures....

  13. Sparsity-weighted outlier FLOODing (OFLOOD) method: Efficient rare event sampling method using sparsity of distribution.

    Science.gov (United States)

    Harada, Ryuhei; Nakamura, Tomotake; Shigeta, Yasuteru

    2016-03-30

    As an extension of the Outlier FLOODing (OFLOOD) method [Harada et al., J. Comput. Chem. 2015, 36, 763], the sparsity of the outliers defined by a hierarchical clustering algorithm, FlexDice, was considered to achieve an efficient conformational search as sparsity-weighted "OFLOOD." In OFLOOD, FlexDice detects areas of sparse distribution as outliers. The outliers are regarded as candidates that have high potential to promote conformational transitions and are employed as initial structures for conformational resampling by restarting molecular dynamics simulations. When detecting outliers, FlexDice defines a rank in the hierarchy for each outlier, which relates to sparsity in the distribution. In this study, we define a lower rank (first ranked), a medium rank (second ranked), and the highest rank (third ranked) outliers, respectively. For instance, the first-ranked outliers are located in a given conformational space away from the clusters (highly sparse distribution), whereas those with the third-ranked outliers are nearby the clusters (a moderately sparse distribution). To achieve the conformational search efficiently, resampling from the outliers with a given rank is performed. As demonstrations, this method was applied to several model systems: Alanine dipeptide, Met-enkephalin, Trp-cage, T4 lysozyme, and glutamine binding protein. In each demonstration, the present method successfully reproduced transitions among metastable states. In particular, the first-ranked OFLOOD highly accelerated the exploration of conformational space by expanding the edges. In contrast, the third-ranked OFLOOD reproduced local transitions among neighboring metastable states intensively. For quantitatively evaluations of sampled snapshots, free energy calculations were performed with a combination of umbrella samplings, providing rigorous landscapes of the biomolecules. © 2015 Wiley Periodicals, Inc.

  14. A new approach for assessing the state of environment using isometric log-ratio transformation and outlier detection for computation of mean PCDD/F patterns in biota.

    Science.gov (United States)

    Lehmann, René

    2015-01-01

    To assess the state of the environment, various compartments are examined as part of monitoring programs. Within monitoring, a special focus is on chemical pollution. One of the most toxic substances ever synthesized is the well-known dioxin 2,3,7,8-TCDD (2,3,7,8-tetra-chlor-dibenzo-dioxin). Other PCDD/F (polychlorinated-dibenzo-dioxin and furan) can act toxic too. They are ubiquitary and persistent in various environmental compartments. Assessing the state of environment requires knowledge of typical local patterns of PCDD/F for as many compartments as possible. For various species of wild animals and plants (so called biota), I present the mean local congenere profiles of ubiquitary PCDD/F contamination reflecting typical patterns and levels of environmental burden for various years. Trends in time series of means can indicate success or failure of a measure of PCDD/F reduction. For short time series of mean patterns, it can be hard to detect trends. A new approach regarding proportions of outliers in the corresponding annual cross-sectional data sets in parallel can help detect decreasing or increasing environmental burden and support analysis of time series. Further, in this article, the true structure of PCDD/F data in biota is revealed, that is, the compositional data structure. It prevents direct application of statistical standard procedures to the data rendering results of statistical analysis meaningless. Results indicate that the compositional data structure of PCDD/F in biota is of great interest and should be taken into account in future studies. Isometric log-ratio (ilr) transformation is used, providing data statistical standard procedures that can be applied too. Focusing on the identification of typical PCDD/F patterns in biota, outliers are removed from annual data since they represent an extraordinary situation in the environment. Identification of outliers yields two advantages. First, typical (mean) profiles and levels of PCDD/F contamination

  15. A simple transformation independent method for outlier definition.

    Science.gov (United States)

    Johansen, Martin Berg; Christensen, Peter Astrup

    2018-04-10

    Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination. We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets. The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers. We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.

  16. Mining Outlier Data in Mobile Internet-Based Large Real-Time Databases

    Directory of Open Access Journals (Sweden)

    Xin Liu

    2018-01-01

    Full Text Available Mining outlier data guarantees access security and data scheduling of parallel databases and maintains high-performance operation of real-time databases. Traditional mining methods generate abundant interference data with reduced accuracy, efficiency, and stability, causing severe deficiencies. This paper proposes a new mining outlier data method, which is used to analyze real-time data features, obtain magnitude spectra models of outlier data, establish a decisional-tree information chain transmission model for outlier data in mobile Internet, obtain the information flow of internal outlier data in the information chain of a large real-time database, and cluster data. Upon local characteristic time scale parameters of information flow, the phase position features of the outlier data before filtering are obtained; the decision-tree outlier-classification feature-filtering algorithm is adopted to acquire signals for analysis and instant amplitude and to achieve the phase-frequency characteristics of outlier data. Wavelet transform threshold denoising is combined with signal denoising to analyze data offset, to correct formed detection filter model, and to realize outlier data mining. The simulation suggests that the method detects the characteristic outlier data feature response distribution, reduces response time, iteration frequency, and mining error rate, improves mining adaptation and coverage, and shows good mining outcomes.

  17. An Efficient Method for Detection of Outliers in Tracer Curves Derived from Dynamic Contrast-Enhanced Imaging

    Directory of Open Access Journals (Sweden)

    Linning Ye

    2018-01-01

    Full Text Available Presence of outliers in tracer concentration-time curves derived from dynamic contrast-enhanced imaging can adversely affect the analysis of the tracer curves by model-fitting. A computationally efficient method for detecting outliers in tracer concentration-time curves is presented in this study. The proposed method is based on a piecewise linear model and implemented using a robust clustering algorithm. The method is noniterative and all the parameters are automatically estimated. To compare the proposed method with existing Gaussian model based and robust regression-based methods, simulation studies were performed by simulating tracer concentration-time curves using the generalized Tofts model and kinetic parameters derived from different tissue types. Results show that the proposed method and the robust regression-based method achieve better detection performance than the Gaussian model based method. Compared with the robust regression-based method, the proposed method can achieve similar detection performance with much faster computation speed.

  18. Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS

    Science.gov (United States)

    Bilbao, Aivett; Zhang, Ying; Varesio, Emmanuel; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard

    2016-01-01

    Data-independent acquisition LC-MS/MS techniques complement supervised methods for peptide quantification. However, due to the wide precursor isolation windows, these techniques are prone to interference at the fragment ion level, which in turn is detrimental for accurate quantification. The “non-outlier fragment ion” (NOFI) ranking algorithm has been developed to assign low priority to fragment ions affected by interference. By using the optimal subset of high priority fragment ions these interfered fragment ions are effectively excluded from quantification. NOFI represents each fragment ion as a vector of four dimensions related to chromatographic and MS fragmentation attributes and applies multivariate outlier detection techniques. Benchmarking conducted on a well-defined quantitative dataset (i.e. the SWATH Gold Standard), indicates that NOFI on average is able to accurately quantify 11-25% more peptides than the commonly used Top-N library intensity ranking method. The sum of the area of the Top3-5 NOFIs produces similar coefficients of variation as compared to the library intensity method but with more accurate quantification results. On a biologically relevant human dendritic cell digest dataset, NOFI properly assigns low priority ranks to 85% of annotated interferences, resulting in sensitivity values between 0.92 and 0.80 against 0.76 for the Spectronaut interference detection algorithm. PMID:26412574

  19. Rapid eye movement sleep behavior disorder as an outlier detection problem

    DEFF Research Database (Denmark)

    Kempfner, Jacob; Sørensen, Gertrud Laura; Nikolic, M.

    2014-01-01

    OBJECTIVE: Idiopathic rapid eye movement (REM) sleep behavior disorder is a strong early marker of Parkinson's disease and is characterized by REM sleep without atonia and/or dream enactment. Because these measures are subject to individual interpretation, there is consequently need...... for quantitative methods to establish objective criteria. This study proposes a semiautomatic algorithm for the early detection of Parkinson's disease. This is achieved by distinguishing between normal REM sleep and REM sleep without atonia by considering muscle activity as an outlier detection problem. METHODS......: Sixteen healthy control subjects, 16 subjects with idiopathic REM sleep behavior disorder, and 16 subjects with periodic limb movement disorder were enrolled. Different combinations of five surface electromyographic channels, including the EOG, were tested. A muscle activity score was automatically...

  20. Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land

    International Nuclear Information System (INIS)

    Zhang Chaosheng; Tang Ya; Luo Lin; Xu Weilin

    2009-01-01

    Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with. - Outliers in urban geochemical databases can be detected to provide guidance for identification of potential contaminated land.

  1. Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

    Directory of Open Access Journals (Sweden)

    Ettore Marubini

    2014-01-01

    Full Text Available This paper presents a robust two-stage procedure for identification of outlying observations in regression analysis. The exploratory stage identifies leverage points and vertical outliers through a robust distance estimator based on Minimum Covariance Determinant (MCD. After deletion of these points, the confirmatory stage carries out an Ordinary Least Squares (OLS analysis on the remaining subset of data and investigates the effect of adding back in the previously deleted observations. Cut-off points pertinent to different diagnostics are generated by bootstrapping and the cases are definitely labelled as good-leverage, bad-leverage, vertical outliers and typical cases. The procedure is applied to four examples.

  2. A statistical test for outlier identification in data envelopment analysis

    Directory of Open Access Journals (Sweden)

    Morteza Khodabin

    2010-09-01

    Full Text Available In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the presented method, each observation is deleted from the sample once and the resulting linear program is solved, leading to a distribution of efficiency estimates. Based on the achieved distribution, a pared test is designed to identify the potential outlier(s. We illustrate the method through a real data set. The method could be used in a first step, as an exploratory data analysis, before using any frontier estimation.

  3. A New Methodology Based on Imbalanced Classification for Predicting Outliers in Electricity Demand Time Series

    Directory of Open Access Journals (Sweden)

    Francisco Javier Duque-Pintor

    2016-09-01

    Full Text Available The occurrence of outliers in real-world phenomena is quite usual. If these anomalous data are not properly treated, unreliable models can be generated. Many approaches in the literature are focused on a posteriori detection of outliers. However, a new methodology to a priori predict the occurrence of such data is proposed here. Thus, the main goal of this work is to predict the occurrence of outliers in time series, by using, for the first time, imbalanced classification techniques. In this sense, the problem of forecasting outlying data has been transformed into a binary classification problem, in which the positive class represents the occurrence of outliers. Given that the number of outliers is much lower than the number of common values, the resultant classification problem is imbalanced. To create training and test sets, robust statistical methods have been used to detect outliers in both sets. Once the outliers have been detected, the instances of the dataset are labeled accordingly. Namely, if any of the samples composing the next instance are detected as an outlier, the label is set to one. As a study case, the methodology has been tested on electricity demand time series in the Spanish electricity market, in which most of the outliers were properly forecast.

  4. Displaying an Outlier in Multivariate Data | Gordor | Journal of ...

    African Journals Online (AJOL)

    ... a multivariate data set is proposed. The technique involves the projection of the multidimensional data onto a single dimension called the outlier displaying component. When the observations are plotted on this component the outlier is appreciably revealed. Journal of Applied Science and Technology (JAST), Vol. 4, Nos.

  5. Evaluating Outlier Identification Tests: Mahalanobis "D" Squared and Comrey "Dk."

    Science.gov (United States)

    Rasmussen, Jeffrey Lee

    1988-01-01

    A Monte Carlo simulation was used to compare the Mahalanobis "D" Squared and the Comrey "Dk" methods of detecting outliers in data sets. Under the conditions investigated, the "D" Squared technique was preferable as an outlier removal statistic. (SLD)

  6. Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land

    Energy Technology Data Exchange (ETDEWEB)

    Zhang Chaosheng, E-mail: chaosheng.zhang@nuigalway.i [School of Geography and Archaeology, National University of Ireland, Galway (Ireland); Tang Ya [Department of Environmental Sciences, Sichuan University, Chengdu, Sichuan 610065 (China); Luo Lin; Xu Weilin [State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu, Sichuan 610065 (China)

    2009-11-15

    Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with. - Outliers in urban geochemical databases can be detected to provide guidance for identification of potential contaminated land.

  7. Outlier Detection in Urban Air Quality Sensor Networks

    NARCIS (Netherlands)

    van Zoest, V.M.; Stein, A.; Hoek, Gerard

    2018-01-01

    Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g., leading to outliers. Commonly

  8. PEMODELAN ARIMA DAN DETEKSI OUTLIER DATA CURAH HUJAN SEBAGAI EVALUASI SISTEM RADIO GELOMBANG MILIMETER

    Directory of Open Access Journals (Sweden)

    Achmad Mauludiyanto

    2009-01-01

    Full Text Available The purpose of this paper is to provide the results of Arima modeling and outlier detection in the rainfall data in Surabaya. This paper explained about the steps in the formation of rainfall models, especially Box-Jenkins procedure for Arima modeling and outlier detection. Early stages of modeling stasioneritas Arima is the identification of data, both in mean and variance. Stasioneritas evaluation data in the variance can be done with Box-Cox transformation. Meanwhile, in the mean stasioneritas can be done with the plot data and forms of ACF. Identification of ACF and PACF of the stationary data is used to determine the order of allegations Arima model. The next stage is to estimate the parameters and diagnostic checks to see the suitability model. Process diagnostics check conducted to evaluate whether the residual model is eligible berdistribusi white noise and normal. Ljung-Box Test is a test that can be used to validate the white noise condition, while the Kolmogorov-Smirnov Test is an evaluation test for normal distribution. Residual normality test results showed that the residual model of Arima not white noise, and indicates the existence of outlier in the data. Thus, the next step taken is outlier detection to eliminate outlier effects and increase the accuracy of predictions of the model Arima. Arima modeling implementation and outlier detection is done by using MINITAB package and MATLAB. The research shows that the modeling Arima and outlier detection can reduce the prediction error as measured by the criteria Mean Square Error (MSE. Quantitatively, the decline in the value of MSE by incorporating outlier detection is 23.7%, with an average decline 6.5%.

  9. Estimating the number of components and detecting outliers using Angle Distribution of Loading Subspaces (ADLS) in PCA analysis.

    Science.gov (United States)

    Liu, Y J; Tran, T; Postma, G; Buydens, L M C; Jansen, J

    2018-08-22

    Principal Component Analysis (PCA) is widely used in analytical chemistry, to reduce the dimensionality of a multivariate data set in a few Principal Components (PCs) that summarize the predominant patterns in the data. An accurate estimate of the number of PCs is indispensable to provide meaningful interpretations and extract useful information. We show how existing estimates for the number of PCs may fall short for datasets with considerable coherence, noise or outlier presence. We present here how Angle Distribution of the Loading Subspaces (ADLS) can be used to estimate the number of PCs based on the variability of loading subspace across bootstrap resamples. Based on comprehensive comparisons with other well-known methods applied on simulated dataset, we show that ADLS (1) may quantify the stability of a PCA model with several numbers of PCs simultaneously; (2) better estimate the appropriate number of PCs when compared with the cross-validation and scree plot methods, specifically for coherent data, and (3) facilitate integrated outlier detection, which we introduce in this manuscript. We, in addition, demonstrate how the analysis of different types of real-life spectroscopic datasets may benefit from these advantages of ADLS. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  10. ZODET: software for the identification, analysis and visualisation of outlier genes in microarray expression data.

    Directory of Open Access Journals (Sweden)

    Daniel L Roden

    Full Text Available Complex human diseases can show significant heterogeneity between patients with the same phenotypic disorder. An outlier detection strategy was developed to identify variants at the level of gene transcription that are of potential biological and phenotypic importance. Here we describe a graphical software package (z-score outlier detection (ZODET that enables identification and visualisation of gross abnormalities in gene expression (outliers in individuals, using whole genome microarray data. Mean and standard deviation of expression in a healthy control cohort is used to detect both over and under-expressed probes in individual test subjects. We compared the potential of ZODET to detect outlier genes in gene expression datasets with a previously described statistical method, gene tissue index (GTI, using a simulated expression dataset and a publicly available monocyte-derived macrophage microarray dataset. Taken together, these results support ZODET as a novel approach to identify outlier genes of potential pathogenic relevance in complex human diseases. The algorithm is implemented using R packages and Java.The software is freely available from http://www.ucl.ac.uk/medicine/molecular-medicine/publications/microarray-outlier-analysis.

  11. A Note on optimal estimation in the presence of outliers

    Directory of Open Access Journals (Sweden)

    John N. Haddad

    2017-06-01

    Full Text Available Haddad, J. 2017. A Note on optimal estimation in the presence of outliers. Lebanese Science Journal, 18(1: 136-141. The basic estimation problem of the mean and standard deviation of a random normal process in the presence of an outlying observation is considered. The value of the outlier is taken as a constraint imposed on the maximization problem of the log likelihood. It is shown that the optimal solution of the maximization problem exists and expressions for the estimates are given. Applications to estimation in the presence of outliers and outlier detection are discussed and illustrated through a simulation study and analysis of trade data

  12. SU-F-T-97: Outlier Identification in Radiation Therapy Knowledge Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Sheng, Y [Duke University, Durham, NC (United States); Ge, Y [University of North Carolina at Charlotte, Charlotte, NC (United States); Yuan, L; Yin, F; Wu, Q [Duke University Medical Center, Durham, NC (United States); Li, T [Thomas Jefferson University, Philadelphia, PA (United States)

    2016-06-15

    Purpose: To investigate the impact of outliers on knowledge modeling in radiation therapy, and develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers using pelvic cases. Methods: Four groups (G1-G4) of pelvic plans were included: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases), and G3 (37 prostate bed cases) are all clinical IMRT cases. G4 are 10 plans outside G1 re-planned with dynamic-arc to simulate dosimetric outliers. The workflow involves 2 steps: 1. identify geometric outliers, assess impact and clean up; 2. identify dosimetric outliers, assess impact and clean up.1. A baseline model was trained with all G1 cases. G2/G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverage statistic of inliers (G1) and outliers (G2/G3). Receiver-operating-characteristics (ROC) analysis was performed to determine optimal threshold. 2. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outliers) was then progressively added to perturb this model. DVH predictions were performed using these perturbed models for remaining 5 G1 cases. Normal tissue complication probability (NTCP) calculated from predicted DVH were used to evaluate dosimetric outliers’ impact. Results: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 from G1 was 0.94 (threshold: 0.22) for bladder; and 0.80 (threshold: 0.10) for rectum. For differentiating G3 from G1, the AUC (threshold) was 0.68 (0.09) for bladder, 0.76 (0.08) for rectum. Significant increase in NTCP started from models with 4 dosimetric outliers for bladder (p<0.05), and with only 1 dosimetric outlier for rectum (p<0.05). Conclusion: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for detecting. Results validated the

  13. A Note on the Vogelsang Test for Additive Outliers

    DEFF Research Database (Denmark)

    Haldrup, Niels; Sansó, Andreu

    The role of additive outliers in integrated time series has attractedsome attention recently and research shows that outlier detection shouldbe an integral part of unit root testing procedures. Recently, Vogelsang(1999) suggested an iterative procedure for the detection of multiple additiveoutliers...... in integrated time series. However, the procedure appearsto suffr from serious size distortions towards the finding of too manyoutliers as has been shown by Perron and Rodriguez (2003). In this notewe prove the inconsistency of the test in each step of the iterative procedureand hence alternative routes need...

  14. On damage detection in wind turbine gearboxes using outlier analysis

    Science.gov (United States)

    Antoniadou, Ifigeneia; Manson, Graeme; Dervilis, Nikolaos; Staszewski, Wieslaw J.; Worden, Keith

    2012-04-01

    The proportion of worldwide installed wind power in power systems increases over the years as a result of the steadily growing interest in renewable energy sources. Still, the advantages offered by the use of wind power are overshadowed by the high operational and maintenance costs, resulting in the low competitiveness of wind power in the energy market. In order to reduce the costs of corrective maintenance, the application of condition monitoring to gearboxes becomes highly important, since gearboxes are among the wind turbine components with the most frequent failure observations. While condition monitoring of gearboxes in general is common practice, with various methods having been developed over the last few decades, wind turbine gearbox condition monitoring faces a major challenge: the detection of faults under the time-varying load conditions prevailing in wind turbine systems. Classical time and frequency domain methods fail to detect faults under variable load conditions, due to the temporary effect that these faults have on vibration signals. This paper uses the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. A simplified two-degree-of-freedom gearbox model considering nonlinear backlash, time-periodic mesh stiffness and static transmission error, simulates the vibration signals to be analysed. Local stiffness reduction is used for the simulation of tooth faults and statistical processes determine the existence of intermittencies. The lowest level of fault detection, the threshold value, is considered and the Mahalanobis squared-distance is calculated for the novelty detection problem.

  15. Modeling of activation data in the BrainMapTM database: Detection of outliers

    DEFF Research Database (Denmark)

    Nielsen, Finn Årup; Hansen, Lars Kai

    2002-01-01

    models is identification of novelty, i.e., low probability database events. We rank the novelty of the outliers and investigate the cause for 21 of the most novel, finding several outliers that are entry and transcription errors or infrequent or non-conforming terminology. We briefly discuss the use...

  16. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    Science.gov (United States)

    Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

    2013-01-01

    he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  17. Robust data reconciliation and outlier detection with swarm intelligence in a thermal reactor power calculation

    Energy Technology Data Exchange (ETDEWEB)

    Valdetaro, Eduardo Damianik, E-mail: valdtar@eletronuclear.gov.br [ELETRONUCLEAR - ELETROBRAS, Angra dos Reis, RJ (Brazil). Angra 2 Operating Dept.; Coordenacao dos Programas de Pos-Graduacao de Engenharia (PEN/COPPE/UFRJ), RJ (Brazil). Programa de Engenharia Nuclear; Schirru, Roberto, E-mail: schirru@lmp.ufrj.br [Coordenacao dos Programas de Pos-Graduacao de Engenharia (PEN/COPPE/UFRJ), RJ (Brazil). Programa de Engenharia Nuclear

    2011-07-01

    In Nuclear power plants, Data Reconciliation (DR) and Gross Errors Detection (GED) are techniques of increasing interest and are primarily used to keep mass and energy balance into account, which brings outcomes as a direct and indirect financial benefits. Data reconciliation is formulated by a constrained minimization problem, where the constraints correspond to energy and mass balance model. Statistical methods are used combined with the minimization of quadratic error form. Solving nonlinear optimization problem using conventional methods can be troublesome, because a multimodal function with differentiated solutions introduces some difficulties to search an optimal solution. Many techniques were developed to solve Data Reconciliation and Outlier Detection, some of them use, for example, Quadratic Programming, Lagrange Multipliers, Mixed-Integer Non Linear Programming and others use evolutionary algorithms like Genetic Algorithms (GA) and recently the use of the Particle Swarm Optimization (PSO) showed to be a potential tool as a global optimization algorithm when applied to data reconciliation. Robust Statistics is also increasing in interest and it is being used when measured data are contaminated by random errors and one can not assume the error is normally distributed, situation which reflects real problems situation. The aim of this work is to present a brief comparison between the classical data reconciliation technique and the robust data reconciliation and gross error detection with swarm intelligence procedure in calculating the thermal reactor power for a simplified heat circuit diagram of a steam turbine plant using real data obtained from Angra 2 Nuclear power plant. The main objective is to test the potential of the robust DR and GED method in a integrated framework using swarm intelligence and the three part redescending estimator of Hampel when applied to a real process condition. The results evaluate the potential use of the robust technique in

  18. Robust data reconciliation and outlier detection with swarm intelligence in a thermal reactor power calculation

    International Nuclear Information System (INIS)

    Valdetaro, Eduardo Damianik; Coordenacao dos Programas de Pos-Graduacao de Engenharia; Schirru, Roberto

    2011-01-01

    In Nuclear power plants, Data Reconciliation (DR) and Gross Errors Detection (GED) are techniques of increasing interest and are primarily used to keep mass and energy balance into account, which brings outcomes as a direct and indirect financial benefits. Data reconciliation is formulated by a constrained minimization problem, where the constraints correspond to energy and mass balance model. Statistical methods are used combined with the minimization of quadratic error form. Solving nonlinear optimization problem using conventional methods can be troublesome, because a multimodal function with differentiated solutions introduces some difficulties to search an optimal solution. Many techniques were developed to solve Data Reconciliation and Outlier Detection, some of them use, for example, Quadratic Programming, Lagrange Multipliers, Mixed-Integer Non Linear Programming and others use evolutionary algorithms like Genetic Algorithms (GA) and recently the use of the Particle Swarm Optimization (PSO) showed to be a potential tool as a global optimization algorithm when applied to data reconciliation. Robust Statistics is also increasing in interest and it is being used when measured data are contaminated by random errors and one can not assume the error is normally distributed, situation which reflects real problems situation. The aim of this work is to present a brief comparison between the classical data reconciliation technique and the robust data reconciliation and gross error detection with swarm intelligence procedure in calculating the thermal reactor power for a simplified heat circuit diagram of a steam turbine plant using real data obtained from Angra 2 Nuclear power plant. The main objective is to test the potential of the robust DR and GED method in a integrated framework using swarm intelligence and the three part redescending estimator of Hampel when applied to a real process condition. The results evaluate the potential use of the robust technique in

  19. Segmentation by Large Scale Hypothesis Testing - Segmentation as Outlier Detection

    DEFF Research Database (Denmark)

    Darkner, Sune; Dahl, Anders Lindbjerg; Larsen, Rasmus

    2010-01-01

    a microscope and we show how the method can handle transparent particles with significant glare point. The method generalizes to other problems. THis is illustrated by applying the method to camera calibration images and MRI of the midsagittal plane for gray and white matter separation and segmentation......We propose a novel and efficient way of performing local image segmentation. For many applications a threshold of pixel intensities is sufficient but determine the appropriate threshold value can be difficult. In cases with large global intensity variation the threshold value has to be adapted...... locally. We propose a method based on large scale hypothesis testing with a consistent method for selecting an appropriate threshold for the given data. By estimating the background distribution we characterize the segment of interest as a set of outliers with a certain probability based on the estimated...

  20. Analysis and detection of functional outliers in water quality parameters from different automated monitoring stations in the Nalón river basin (Northern Spain).

    Science.gov (United States)

    Piñeiro Di Blasi, J I; Martínez Torres, J; García Nieto, P J; Alonso Fernández, J R; Díaz Muñiz, C; Taboada, J

    2015-01-01

    The purposes and intent of the authorities in establishing water quality standards are to provide enhancement of water quality and prevention of pollution to protect the public health or welfare in accordance with the public interest for drinking water supplies, conservation of fish, wildlife and other beneficial aquatic life, and agricultural, industrial, recreational, and other reasonable and necessary uses as well as to maintain and improve the biological integrity of the waters. In this way, water quality controls involve a large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using turbidity, conductivity and ammonium ion as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. This innovative approach lies in considering water quality monitoring over time as continuous curves instead of discrete points, that is to say, the dataset of the problem are considered as a time-dependent function and not as a set of discrete values in different time instants. This new methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in the Nalón river basin with success. Results of this study were discussed here in terms of origin, causes, etc. Finally, the conclusions as well as advantages of

  1. Detection of outliers by neural network on the gas centrifuge experimental data of isotopic separation process; Aplicacao de redes neurais para deteccao de erros grosseiros em dados de processo de separacao de isotopos de uranio por ultracentrifugacao

    Energy Technology Data Exchange (ETDEWEB)

    Andrade, Monica de Carvalho Vasconcelos

    2004-07-01

    This work presents and discusses the neural network technique aiming at the detection of outliers on a set of gas centrifuge isotope separation experimental data. In order to evaluate the application of this new technique, the result obtained of the detection is compared to the result of the statistical analysis combined with the cluster analysis. This method for the detection of outliers presents a considerable potential in the field of data analysis and it is at the same time easier and faster to use and requests very less knowledge of the physics involved in the process. This work established a procedure for detecting experiments which are suspect to contain gross errors inside a data set where the usual techniques for identification of these errors cannot be applied or its use/demands an excessively long work. (author)

  2. Principal component analysis applied to Fourier transform infrared spectroscopy for the design of calibration sets for glycerol prediction models in wine and for the detection and classification of outlier samples.

    Science.gov (United States)

    Nieuwoudt, Helene H; Prior, Bernard A; Pretorius, Isak S; Manley, Marena; Bauer, Florian F

    2004-06-16

    Principal component analysis (PCA) was used to identify the main sources of variation in the Fourier transform infrared (FT-IR) spectra of 329 wines of various styles. The FT-IR spectra were gathered using a specialized WineScan instrument. The main sources of variation included the reducing sugar and alcohol content of the samples, as well as the stage of fermentation and the maturation period of the wines. The implications of the variation between the different wine styles for the design of calibration models with accurate predictive abilities were investigated using glycerol calibration in wine as a model system. PCA enabled the identification and interpretation of samples that were poorly predicted by the calibration models, as well as the detection of individual samples in the sample set that had atypical spectra (i.e., outlier samples). The Soft Independent Modeling of Class Analogy (SIMCA) approach was used to establish a model for the classification of the outlier samples. A glycerol calibration for wine was developed (reducing sugar content 8% v/v) with satisfactory predictive ability (SEP = 0.40 g/L). The RPD value (ratio of the standard deviation of the data to the standard error of prediction) was 5.6, indicating that the calibration is suitable for quantification purposes. A calibration for glycerol in special late harvest and noble late harvest wines (RS 31-147 g/L, alcohol > 11.6% v/v) with a prediction error SECV = 0.65 g/L, was also established. This study yielded an analytical strategy that combined the careful design of calibration sets with measures that facilitated the early detection and interpretation of poorly predicted samples and outlier samples in a sample set. The strategy provided a powerful means of quality control, which is necessary for the generation of accurate prediction data and therefore for the successful implementation of FT-IR in the routine analytical laboratory.

  3. Calculation of climatic reference values and its use for automatic outlier detection in meteorological datasets

    Directory of Open Access Journals (Sweden)

    B. Téllez

    2008-04-01

    Full Text Available The climatic reference values for monthly and annual average air temperature and total precipitation in Catalonia – northeast of Spain – are calculated using a combination of statistical methods and geostatistical techniques of interpolation. In order to estimate the uncertainty of the method, the initial dataset is split into two parts that are, respectively, used for estimation and validation. The resulting maps are then used in the automatic outlier detection in meteorological datasets.

  4. Slowing ash mortality: a potential strategy to slam emerald ash borer in outlier sites

    Science.gov (United States)

    Deborah G. McCullough; Nathan W. Siegert; John Bedford

    2009-01-01

    Several isolated outlier populations of emerald ash borer (Agrilus planipennis Fairmaire) were discovered in 2008 and additional outliers will likely be found as detection surveys and public outreach activities...

  5. Outlier analysis

    CERN Document Server

    Aggarwal, Charu C

    2013-01-01

    With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions- the data can be of any type, structured or unstructured, and may be extremely large.Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and

  6. A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

    DEFF Research Database (Denmark)

    Pham, Ninh Dang; Pagh, Rasmus

    2012-01-01

    projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality...... neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random......Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest...

  7. Gear Fault Detection Effectiveness as Applied to Tooth Surface Pitting Fatigue Damage

    Science.gov (United States)

    Lewicki, David G.; Dempsey, Paula J.; Heath, Gregory F.; Shanthakumaran, Perumal

    2010-01-01

    A study was performed to evaluate fault detection effectiveness as applied to gear-tooth-pitting-fatigue damage. Vibration and oil-debris monitoring (ODM) data were gathered from 24 sets of spur pinion and face gears run during a previous endurance evaluation study. Three common condition indicators (RMS, FM4, and NA4 [Ed. 's note: See Appendix A-Definitions D were deduced from the time-averaged vibration data and used with the ODM to evaluate their performance for gear fault detection. The NA4 parameter showed to be a very good condition indicator for the detection of gear tooth surface pitting failures. The FM4 and RMS parameters perfomu:d average to below average in detection of gear tooth surface pitting failures. The ODM sensor was successful in detecting a significant 8lDOunt of debris from all the gear tooth pitting fatigue failures. Excluding outliers, the average cumulative mass at the end of a test was 40 mg.

  8. Factor-based forecasting in the presence of outliers

    DEFF Research Database (Denmark)

    Kristensen, Johannes Tang

    2014-01-01

    Macroeconomic forecasting using factor models estimated by principal components has become a popular research topic with many both theoretical and applied contributions in the literature. In this paper we attempt to address an often neglected issue in these models: The problem of outliers...... in the data. Most papers take an ad-hoc approach to this problem and simply screen datasets prior to estimation and remove anomalous observations. We investigate whether forecasting performance can be improved by using the original unscreened dataset and replacing principal components with a robust...... apply the estimator in a simulated real-time forecasting exercise to test its merits. We use a newly compiled dataset of US macroeconomic series spanning the period 1971:2–2012:10. Our findings suggest that the chosen treatment of outliers does affect forecasting performance and that in many cases...

  9. Outlier identification in urban soils and its implications for identification of potential contaminated land

    Science.gov (United States)

    Zhang, Chaosheng

    2010-05-01

    Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.

  10. Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)

    Science.gov (United States)

    Saleh Ahmar, Ansari; Guritno, Suryo; Abdurakhman; Rahman, Abdul; Awi; Alimuddin; Minggi, Ilham; Arif Tiro, M.; Kasim Aidid, M.; Annas, Suwardi; Utami Sutiksno, Dian; Ahmar, Dewi S.; Ahmar, Kurniawan H.; Abqary Ahmar, A.; Zaki, Ahmad; Abdullah, Dahlan; Rahim, Robbi; Nurdiyanto, Heri; Hidayat, Rahmat; Napitupulu, Darmawan; Simarmata, Janner; Kurniasih, Nuning; Andretti Abdillah, Leon; Pranolo, Andri; Haviluddin; Albra, Wahyudin; Arifin, A. Nurani M.

    2018-01-01

    The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. In the simulation data is obtained that the data contained AO initial models are ARIMA (2,0,0) with MSE = 36,780, after the detection and correction of data obtained by the iteration of the model ARIMA (2,0,0) with the coefficients obtained from the regression Zt = 0,106+0,204Z t-1+0,401Z t-2-329X 1(t)+115X 2(t)+35,9X 3(t) and MSE = 19,365. This shows that there is an improvement of forecasting error rate data.

  11. A tandem regression-outlier analysis of a ligand cellular system for key structural modifications around ligand binding.

    Science.gov (United States)

    Lin, Ying-Ting

    2013-04-30

    A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation. A PPARγ (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural modifications, as well as their tendency of impact to ligand binding. The key structural modifications around ligand binding are effectively extracted or characterized out of cellular reactions. This is because molecular binding is the paramount factor in such ligand cellular system and key structural modifications around ligand binding are expected to create outliers. Therefore, such outliers can be captured by this tandem regression-outlier analysis.

  12. Methods of Detecting Outliers in A Regression Analysis Model. | Ogu ...

    African Journals Online (AJOL)

    A Boilers data with dependent variable Y (man-Hour) and four independent variables X1 (Boiler Capacity), X2 (Design Pressure), X3 (Boiler Type), X4 (Drum Type) were used. The analysis of the Boilers data reviewed an unexpected group of Outliers. The results from the findings showed that an observation can be outlying ...

  13. Examination of pulsed eddy current for inspection of second layer aircraft wing lap-joint structures using outlier detection methods

    Energy Technology Data Exchange (ETDEWEB)

    Butt, D.M., E-mail: Dennis.Butt@forces.gc.ca [Royal Military College of Canada, Dept. of Chemistry and Chemical Engineering, Kingston, Ontario (Canada); Underhill, P.R.; Krause, T.W., E-mail: Thomas.Krause@rmc.ca [Royal Military College of Canada, Dept. of Physics, Kingston, Ontario (Canada)

    2016-09-15

    Ageing aircraft are susceptible to fatigue cracks at bolt hole locations in multi-layer aluminum wing lap-joints due to cyclic loading conditions experienced during typical aircraft operation, Current inspection techniques require removal of fasteners to permit inspection of the second layer from within the bolt hole. Inspection from the top layer without fastener removal is desirable in order to minimize aircraft downtime while reducing the risk of collateral damage. The ability to detect second layer cracks without fastener removal has been demonstrated using a pulsed eddy current (PEC) technique. The technique utilizes a breakdown of the measured signal response into its principal components, each of which is multiplied by a representative factor known as a score. The reduced data set of scores, which represent the measured signal, are examined for outliers using cluster analysis methods in order to detect the presence of defects. However, the cluster analysis methodology is limited by the fact that a number of representative signals, obtained from fasteners where defects are not present, are required in order to perform classification of the data. Alternatively, blind outlier detection can be achieved without having to obtain representative defect-free signals, by using a modified smallest half-volume (MSHV) approach. Results obtained using this approach suggest that self-calibrating blind detection of cyclic fatigue cracks in second layer wing structures in the presence of ferrous fasteners is possible without prior knowledge of the sample under test and without the use of costly calibration standards. (author)

  14. Examination of pulsed eddy current for inspection of second layer aircraft wing lap-joint structures using outlier detection methods

    International Nuclear Information System (INIS)

    Butt, D.M.; Underhill, P.R.; Krause, T.W.

    2016-01-01

    Ageing aircraft are susceptible to fatigue cracks at bolt hole locations in multi-layer aluminum wing lap-joints due to cyclic loading conditions experienced during typical aircraft operation, Current inspection techniques require removal of fasteners to permit inspection of the second layer from within the bolt hole. Inspection from the top layer without fastener removal is desirable in order to minimize aircraft downtime while reducing the risk of collateral damage. The ability to detect second layer cracks without fastener removal has been demonstrated using a pulsed eddy current (PEC) technique. The technique utilizes a breakdown of the measured signal response into its principal components, each of which is multiplied by a representative factor known as a score. The reduced data set of scores, which represent the measured signal, are examined for outliers using cluster analysis methods in order to detect the presence of defects. However, the cluster analysis methodology is limited by the fact that a number of representative signals, obtained from fasteners where defects are not present, are required in order to perform classification of the data. Alternatively, blind outlier detection can be achieved without having to obtain representative defect-free signals, by using a modified smallest half-volume (MSHV) approach. Results obtained using this approach suggest that self-calibrating blind detection of cyclic fatigue cracks in second layer wing structures in the presence of ferrous fasteners is possible without prior knowledge of the sample under test and without the use of costly calibration standards. (author)

  15. An optimized outlier detection algorithm for jury-based grading of engineering design projects

    DEFF Research Database (Denmark)

    Thompson, Mary Kathryn; Espensen, Christina; Clemmensen, Line Katrine Harder

    2016-01-01

    This work characterizes and optimizes an outlier detection algorithm to identify potentially invalid scores produced by jury members while grading engineering design projects. The paper describes the original algorithm and the associated adjudication process in detail. The impact of the various...... (the base rule and the three additional conditions) play a role in the algorithm's performance and should be included in the algorithm. Because there is significant interaction between the base rule and the additional conditions, many acceptable combinations that balance the FPR and FNR can be found......, but no true optimum seems to exist. The performance of the best optimizations and the original algorithm are similar. Therefore, it should be possible to choose new coefficient values for jury populations in other cultures and contexts logically and empirically without a full optimization as long...

  16. Analyzing contentious relationships and outlier genes in phylogenomics.

    Science.gov (United States)

    Walker, Joseph F; Brown, Joseph W; Smith, Stephen A

    2018-06-08

    Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.

  17. Baseline Estimation and Outlier Identification for Halocarbons

    Science.gov (United States)

    Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

    2017-12-01

    The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.

  18. The outlier sample effects on multivariate statistical data processing geochemical stream sediment survey (Moghangegh region, North West of Iran)

    International Nuclear Information System (INIS)

    Ghanbari, Y.; Habibnia, A.; Memar, A.

    2009-01-01

    In geochemical stream sediment surveys in Moghangegh Region in north west of Iran, sheet 1:50,000, 152 samples were collected and after the analyze and processing of data, it revealed that Yb, Sc, Ni, Li, Eu, Cd, Co, as contents in one sample is far higher than other samples. After detecting this sample as an outlier sample, the effect of this sample on multivariate statistical data processing for destructive effects of outlier sample in geochemical exploration was investigated. Pearson and Spear man correlation coefficient methods and cluster analysis were used for multivariate studies and the scatter plot of some elements together the regression profiles are given in case of 152 and 151 samples and the results are compared. After investigation of multivariate statistical data processing results, it was realized that results of existence of outlier samples may appear as the following relations between elements: - true relation between two elements, which have no outlier frequency in the outlier sample. - false relation between two elements which one of them has outlier frequency in the outlier sample. - complete false relation between two elements which both have outlier frequency in the outlier sample

  19. Quality assurance using outlier detection on an automatic segmentation method for the cerebellar peduncles

    Science.gov (United States)

    Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.

    2016-03-01

    Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.

  20. Improving Electronic Sensor Reliability by Robust Outlier Screening

    Directory of Open Access Journals (Sweden)

    Federico Cuesta

    2013-10-01

    Full Text Available Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs. This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB and Bad Bin in a Bad Cluster (BBBC and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT, as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.

  1. Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach

    Science.gov (United States)

    Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.

    2017-06-01

    Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.

  2. Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach.

    Science.gov (United States)

    Petersen, D; Naveed, P; Ragheb, A; Niedieker, D; El-Mashtoly, S F; Brechmann, T; Kötting, C; Schmiegel, W H; Freier, E; Pox, C; Gerwert, K

    2017-06-15

    Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples. Copyright

  3. Unsupervised Condition Change Detection In Large Diesel Engines

    DEFF Research Database (Denmark)

    Pontoppidan, Niels Henrik; Larsen, Jan

    2003-01-01

    This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can be investig...... be investigated further. The method is successfully applied to unsupervised condition change detection in large diesel engines from acoustical emission sensor signal and compared to more classical techniques based on principal component analysis and Gaussian mixture models.......This paper presents a new method for unsupervised change detection which combines independent component modeling and probabilistic outlier etection. The method further provides a compact data representation, which is amenable to interpretation, i.e., the detected condition changes can...

  4. Cancer Outlier Analysis Based on Mixture Modeling of Gene Expression Data

    Directory of Open Access Journals (Sweden)

    Keita Mori

    2013-01-01

    Full Text Available Molecular heterogeneity of cancer, partially caused by various chromosomal aberrations or gene mutations, can yield substantial heterogeneity in gene expression profile in cancer samples. To detect cancer-related genes which are active only in a subset of cancer samples or cancer outliers, several methods have been proposed in the context of multiple testing. Such cancer outlier analyses will generally suffer from a serious lack of power, compared with the standard multiple testing setting where common activation of genes across all cancer samples is supposed. In this paper, we consider information sharing across genes and cancer samples, via a parametric normal mixture modeling of gene expression levels of cancer samples across genes after a standardization using the reference, normal sample data. A gene-based statistic for gene selection is developed on the basis of a posterior probability of cancer outlier for each cancer sample. Some efficiency improvement by using our method was demonstrated, even under settings with misspecified, heavy-tailed t-distributions. An application to a real dataset from hematologic malignancies is provided.

  5. Cross-visit tumor sub-segmentation and registration with outlier rejection for dynamic contrast-enhanced MRI time series data.

    Science.gov (United States)

    Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M

    2010-01-01

    Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.

  6. Optimum outlier model for potential improvement of environmental cleaning and disinfection.

    Science.gov (United States)

    Rupp, Mark E; Huerta, Tomas; Cavalieri, R J; Lyden, Elizabeth; Van Schooneveld, Trevor; Carling, Philip; Smith, Philip W

    2014-06-01

    The effectiveness and efficiency of 17 housekeepers in terminal cleaning 292 hospital rooms was evaluated through adenosine triphosphate detection. A subgroup of housekeepers was identified who were significantly more effective and efficient than their coworkers. These optimum outliers may be used in performance improvement to optimize environmental cleaning.

  7. Nonlinear Optimization-Based Device-Free Localization with Outlier Link Rejection

    Directory of Open Access Journals (Sweden)

    Wendong Xiao

    2015-04-01

    Full Text Available Device-free localization (DFL is an emerging wireless technique for estimating the location of target that does not have any attached electronic device. It has found extensive use in Smart City applications such as healthcare at home and hospitals, location-based services at smart spaces, city emergency response and infrastructure security. In DFL, wireless devices are used as sensors that can sense the target by transmitting and receiving wireless signals collaboratively. Many DFL systems are implemented based on received signal strength (RSS measurements and the location of the target is estimated by detecting the changes of the RSS measurements of the wireless links. Due to the uncertainty of the wireless channel, certain links may be seriously polluted and result in erroneous detection. In this paper, we propose a novel nonlinear optimization approach with outlier link rejection (NOOLR for RSS-based DFL. It consists of three key strategies, including: (1 affected link identification by differential RSS detection; (2 outlier link rejection via geometrical positional relationship among links; (3 target location estimation by formulating and solving a nonlinear optimization problem. Experimental results demonstrate that NOOLR is robust to the fluctuation of the wireless signals with superior localization accuracy compared with the existing Radio Tomographic Imaging (RTI approach.

  8. Construction of composite indices in presence of outliers

    OpenAIRE

    Mishra, SK

    2008-01-01

    Effects of outliers on mean, standard deviation and Pearson’s correlation coefficient are well known. The Principal Components analysis uses Pearson’s product moment correlation coefficients to construct composite indices from indicator variables and hence may be very sensitive to effects of outliers in data. Median, mean deviation and Bradley’s coefficient of absolute correlation are less susceptible to effects of outliers. This paper proposes a method to obtain composite indices by maximiza...

  9. On the identification of Dragon Kings among extreme-valued outliers

    Science.gov (United States)

    Riva, M.; Neuman, S. P.; Guadagnini, A.

    2013-07-01

    Extreme values of earth, environmental, ecological, physical, biological, financial and other variables often form outliers to heavy tails of empirical frequency distributions. Quite commonly such tails are approximated by stretched exponential, log-normal or power functions. Recently there has been an interest in distinguishing between extreme-valued outliers that belong to the parent population of most data in a sample and those that do not. The first type, called Gray Swans by Nassim Nicholas Taleb (often confused in the literature with Taleb's totally unknowable Black Swans), is drawn from a known distribution of the tails which can thus be extrapolated beyond the range of sampled values. However, the magnitudes and/or space-time locations of unsampled Gray Swans cannot be foretold. The second type of extreme-valued outliers, termed Dragon Kings by Didier Sornette, may in his view be sometimes predicted based on how other data in the sample behave. This intriguing prospect has recently motivated some authors to propose statistical tests capable of identifying Dragon Kings in a given random sample. Here we apply three such tests to log air permeability data measured on the faces of a Berea sandstone block and to synthetic data generated in a manner statistically consistent with these measurements. We interpret the measurements to be, and generate synthetic data that are, samples from α-stable sub-Gaussian random fields subordinated to truncated fractional Gaussian noise (tfGn). All these data have frequency distributions characterized by power-law tails with extreme-valued outliers about the tail edges.

  10. On the identification of Dragon Kings among extreme-valued outliers

    Directory of Open Access Journals (Sweden)

    M. Riva

    2013-07-01

    Full Text Available Extreme values of earth, environmental, ecological, physical, biological, financial and other variables often form outliers to heavy tails of empirical frequency distributions. Quite commonly such tails are approximated by stretched exponential, log-normal or power functions. Recently there has been an interest in distinguishing between extreme-valued outliers that belong to the parent population of most data in a sample and those that do not. The first type, called Gray Swans by Nassim Nicholas Taleb (often confused in the literature with Taleb's totally unknowable Black Swans, is drawn from a known distribution of the tails which can thus be extrapolated beyond the range of sampled values. However, the magnitudes and/or space–time locations of unsampled Gray Swans cannot be foretold. The second type of extreme-valued outliers, termed Dragon Kings by Didier Sornette, may in his view be sometimes predicted based on how other data in the sample behave. This intriguing prospect has recently motivated some authors to propose statistical tests capable of identifying Dragon Kings in a given random sample. Here we apply three such tests to log air permeability data measured on the faces of a Berea sandstone block and to synthetic data generated in a manner statistically consistent with these measurements. We interpret the measurements to be, and generate synthetic data that are, samples from α-stable sub-Gaussian random fields subordinated to truncated fractional Gaussian noise (tfGn. All these data have frequency distributions characterized by power-law tails with extreme-valued outliers about the tail edges.

  11. GTI: a novel algorithm for identifying outlier gene expression profiles from integrated microarray datasets.

    Directory of Open Access Journals (Sweden)

    John Patrick Mpindi

    Full Text Available BACKGROUND: Meta-analysis of gene expression microarray datasets presents significant challenges for statistical analysis. We developed and validated a new bioinformatic method for the identification of genes upregulated in subsets of samples of a given tumour type ('outlier genes', a hallmark of potential oncogenes. METHODOLOGY: A new statistical method (the gene tissue index, GTI was developed by modifying and adapting algorithms originally developed for statistical problems in economics. We compared the potential of the GTI to detect outlier genes in meta-datasets with four previously defined statistical methods, COPA, the OS statistic, the t-test and ORT, using simulated data. We demonstrated that the GTI performed equally well to existing methods in a single study simulation. Next, we evaluated the performance of the GTI in the analysis of combined Affymetrix gene expression data from several published studies covering 392 normal samples of tissue from the central nervous system, 74 astrocytomas, and 353 glioblastomas. According to the results, the GTI was better able than most of the previous methods to identify known oncogenic outlier genes. In addition, the GTI identified 29 novel outlier genes in glioblastomas, including TYMS and CDKN2A. The over-expression of these genes was validated in vivo by immunohistochemical staining data from clinical glioblastoma samples. Immunohistochemical data were available for 65% (19 of 29 of these genes, and 17 of these 19 genes (90% showed a typical outlier staining pattern. Furthermore, raltitrexed, a specific inhibitor of TYMS used in the therapy of tumour types other than glioblastoma, also effectively blocked cell proliferation in glioblastoma cell lines, thus highlighting this outlier gene candidate as a potential therapeutic target. CONCLUSIONS/SIGNIFICANCE: Taken together, these results support the GTI as a novel approach to identify potential oncogene outliers and drug targets. The algorithm is

  12. A computational study on outliers in world music.

    Science.gov (United States)

    Panteli, Maria; Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as 'outliers'. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the 'uniqueness' of the music of each country.

  13. Poland’s Trade with East Asia: An Outlier Approach

    Directory of Open Access Journals (Sweden)

    Tseng Shoiw-Mei

    2015-12-01

    Full Text Available Poland achieved an excellent reputation for economic transformation during the recent global recession. The European debt crisis, however, quickly forced the reorientation of Poland’s trade outside of the European Union (EU, especially toward the dynamic region of East Asia. This study analyzes time series data from 1999 to 2013 to detect outliers in order to determine the bilateral trade paths between Poland and each East Asian country during the events of Poland’s accession to the EU in 2004, the global financial crisis from 2008 to 2009, and the European debt crisis from 2010 to 2013. From the Polish standpoint, the results showed significantly clustering outliers in the above periods and in the general trade paths from dependence through distancing and improvement to the chance of approaching East Asian partners. This study also shows that not only China but also several other countries present an excellent opportunity for boosting bilateral trade, especially with regard to Poland’s exports.

  14. Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

    Directory of Open Access Journals (Sweden)

    Lydia Hopp

    2013-12-01

    Full Text Available We present an analytic framework based on Self-Organizing Map (SOM machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics.

  15. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    Directory of Open Access Journals (Sweden)

    Hachey Mark

    2009-10-01

    Full Text Available Abstract Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier detection has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for

  16. Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort

    International Nuclear Information System (INIS)

    Meredith, Jeremy S.; Vetter, Jeffrey S.

    2011-01-01

    In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations are simple, but require sorted data. This paper explores an example of this type of characterization - the identification of quartiles and statistical outliers - and presents a performance analysis of a distributed heterogeneous radix sort as well as an assessment of current architectural bottlenecks.

  17. Outlier Detection in Regression Using an Iterated One-Step Approximation to the Huber-Skip Estimator

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    2013-01-01

    In regression we can delete outliers based upon a preliminary estimator and reestimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber-skip estima......In regression we can delete outliers based upon a preliminary estimator and reestimate the parameters by least squares based upon the retained observations. We study the properties of an iteratively defined sequence of estimators based on this idea. We relate the sequence to the Huber...... that the normalized estimation errors are tight and are close to a linear function of the kernel, thus providing a stochastic expansion of the estimators, which is the same as for the Huber-skip. This implies that the iterated estimator is a close approximation of the Huber-skip...

  18. Deteksi Outlier Transaksi Menggunakan Visualisasi-Olap Pada Data Warehouse Perguruan Tinggi Swasta

    Directory of Open Access Journals (Sweden)

    Gusti Ngurah Mega Nata

    2016-07-01

    Full Text Available Mendeteksi outlier pada data warehouse merupakan hal penting. Data pada data warehouse sudah diagregasi dan memiliki model multidimensional. Agregasi pada data warehouse dilakukan karena data warehouse digunakan untuk menganalisis data secara cepat pada top level manajemen. Sedangkan, model data multidimensional digunakan untuk melihat data dari berbagai dimensi objek bisnis. Jadi, Mendeteksi outlier pada data warehouse membutuhkan teknik yang dapat melihat outlier pada data yang sudah diagregasi dan dapat melihat dari berbagai dimensi objek bisnis. Mendeteksi outlier pada data warehouse akan menjadi tantangan baru.        Di lain hal, Visualisasi On-line Analytic process (OLAP merupakan tugas penting dalam menyajikan informasi trend (report pada data warehouse dalam bentuk visualisasi data. Pada penelitian ini, visualisasi OLAP digunakan untuk deteksi outlier transaksi. Maka, dalam penelitian ini melakukan analisis untuk mendeteksi outlier menggunakan visualisasi-OLAP. Operasi OLAP yang digunakan yaitu operasi drill-down. Jenis visualisasi yang akan digunakan yaitu visualisasi satu dimensi, dua dimensi dan multi dimensi menggunakan tool weave desktop. Pembangunan data warehouse dilakukan secara button-up. Studi kasus dilakukan pada perguruan tinggi swasta. Kasus yang diselesaikan yaitu mendeteksi outlier transaki pembayaran mahasiswa pada setiap semester. Deteksi outlier pada visualisasi data menggunakan satu tabel dimensional lebih mudah dianalisis dari pada deteksi outlier pada visualisasi data menggunakan dua atau multi tabel dimensional. Dengan kata lain semakin banyak tabel dimensi yang terlibat semakin sulit analisis deteksi outlier yang dilakukan. Kata kunci — Deteksi Outlier,  Visualisasi OLAP, Data warehouse

  19. Treatment of Outliers via Interpolation Method with Neural Network Forecast Performances

    Science.gov (United States)

    Wahir, N. A.; Nor, M. E.; Rusiman, M. S.; Gopal, K.

    2018-04-01

    Outliers often lurk in many datasets, especially in real data. Such anomalous data can negatively affect statistical analyses, primarily normality, variance, and estimation aspects. Hence, handling the occurrences of outliers require special attention. Therefore, it is important to determine the suitable ways in treating outliers so as to ensure that the quality of the analyzed data is indeed high. As such, this paper discusses an alternative method to treat outliers via linear interpolation method. In fact, assuming outlier as a missing value in the dataset allows the application of the interpolation method to interpolate the outliers thus, enabling the comparison of data series using forecast accuracy before and after outlier treatment. With that, the monthly time series of Malaysian tourist arrivals from January 1998 until December 2015 had been used to interpolate the new series. The results indicated that the linear interpolation method, which was comprised of improved time series data, displayed better results, when compared to the original time series data in forecasting from both Box-Jenkins and neural network approaches.

  20. A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements

    Science.gov (United States)

    Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto

    2017-08-01

    The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.

  1. The Space-Time Variation of Global Crop Yields, Detecting Simultaneous Outliers and Identifying the Teleconnections with Climatic Patterns

    Science.gov (United States)

    Najafi, E.; Devineni, N.; Pal, I.; Khanbilvardi, R.

    2017-12-01

    An understanding of the climate factors that influence the space-time variability of crop yields is important for food security purposes and can help us predict global food availability. In this study, we address how the crop yield trends of countries globally were related to each other during the last several decades and the main climatic variables that triggered high/low crop yields simultaneously across the world. Robust Principal Component Analysis (rPCA) is used to identify the primary modes of variation in wheat, maize, sorghum, rice, soybeans, and barley yields. Relations between these modes of variability and important climatic variables, especially anomalous sea surface temperature (SSTa), are examined from 1964 to 2010. rPCA is also used to identify simultaneous outliers in each year, i.e. systematic high/low crop yields across the globe. The results demonstrated spatiotemporal patterns of these crop yields and the climate-related events that caused them as well as the connection of outliers with weather extremes. We find that among climatic variables, SST has had the most impact on creating simultaneous crop yields variability and yield outliers in many countries. An understanding of this phenomenon can benefit global crop trade networks.

  2. Identification of unusual events in multichannel bridge monitoring data using wavelet transform and outlier analysis

    Science.gov (United States)

    Omenzetter, Piotr; Brownjohn, James M. W.; Moyo, Pilate

    2003-08-01

    Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure. However, converting large amount of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure in Singapore and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localizing sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.

  3. Reduction of ZTD outliers through improved GNSS data processing and screening strategies

    Science.gov (United States)

    Stepniak, Katarzyna; Bock, Olivier; Wielgosz, Pawel

    2018-03-01

    Though Global Navigation Satellite System (GNSS) data processing has been significantly improved over the years, it is still commonly observed that zenith tropospheric delay (ZTD) estimates contain many outliers which are detrimental to meteorological and climatological applications. In this paper, we show that ZTD outliers in double-difference processing are mostly caused by sub-daily data gaps at reference stations, which cause disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. They can reach a few centimetres in ZTD and usually coincide with a jump in formal errors. The magnitude and sign of these biases are impossible to predict because they depend on different errors in the observations and on the geometry of the baselines. We elaborate and test a new baseline strategy which solves this problem and significantly reduces the number of outliers compared to the standard strategy commonly used for positioning (e.g. determination of national reference frame) in which the pre-defined network is composed of a skeleton of reference stations to which secondary stations are connected in a star-like structure. The new strategy is also shown to perform better than the widely used strategy maximizing the number of observations available in many GNSS programs. The reason is that observations are maximized before processing, whereas the final number of used observations can be dramatically lower because of data rejection (screening) during the processing. The study relies on the analysis of 1 year of GPS (Global Positioning System) data from a regional network of 136 GNSS stations processed using Bernese GNSS Software v.5.2. A post-processing screening procedure is also proposed to detect and remove a few outliers which may still remain due to short data gaps. It is based on a combination of range checks and outlier checks of ZTD and formal errors. The accuracy of the

  4. New approach for the identification of implausible values and outliers in longitudinal childhood anthropometric data.

    Science.gov (United States)

    Shi, Joy; Korsiak, Jill; Roth, Daniel E

    2018-03-01

    We aimed to demonstrate the use of jackknife residuals to take advantage of the longitudinal nature of available growth data in assessing potential biologically implausible values and outliers. Artificial errors were induced in 5% of length, weight, and head circumference measurements, measured on 1211 participants from the Maternal Vitamin D for Infant Growth (MDIG) trial from birth to 24 months of age. Each child's sex- and age-standardized z-score or raw measurements were regressed as a function of age in child-specific models. Each error responsible for a biologically implausible decrease between a consecutive pair of measurements was identified based on the higher of the two absolute values of jackknife residuals in each pair. In further analyses, outliers were identified as those values beyond fixed cutoffs of the jackknife residuals (e.g., greater than +5 or less than -5 in primary analyses). Kappa, sensitivity, and specificity were calculated over 1000 simulations to assess the ability of the jackknife residual method to detect induced errors and to compare these methods with the use of conditional growth percentiles and conventional cross-sectional methods. Among the induced errors that resulted in a biologically implausible decrease in measurement between two consecutive values, the jackknife residual method identified the correct value in 84.3%-91.5% of these instances when applied to the sex- and age-standardized z-scores, with kappa values ranging from 0.685 to 0.795. Sensitivity and specificity of the jackknife method were higher than those of the conditional growth percentile method, but specificity was lower than for conventional cross-sectional methods. Using jackknife residuals provides a simple method to identify biologically implausible values and outliers in longitudinal child growth data sets in which each child contributes at least 4 serial measurements. Crown Copyright © 2018. Published by Elsevier Inc. All rights reserved.

  5. A Pareto scale-inflated outlier model and its Bayesian analysis

    OpenAIRE

    Scollnik, David P. M.

    2016-01-01

    This paper develops a Pareto scale-inflated outlier model. This model is intended for use when data from some standard Pareto distribution of interest is suspected to have been contaminated with a relatively small number of outliers from a Pareto distribution with the same shape parameter but with an inflated scale parameter. The Bayesian analysis of this Pareto scale-inflated outlier model is considered and its implementation using the Gibbs sampler is discussed. The paper contains three wor...

  6. Latent Clustering Models for Outlier Identification in Telecom Data

    Directory of Open Access Journals (Sweden)

    Ye Ouyang

    2016-01-01

    Full Text Available Collected telecom data traffic has boomed in recent years, due to the development of 4G mobile devices and other similar high-speed machines. The ability to quickly identify unexpected traffic data in this stream is critical for mobile carriers, as it can be caused by either fraudulent intrusion or technical problems. Clustering models can help to identify issues by showing patterns in network data, which can quickly catch anomalies and highlight previously unseen outliers. In this article, we develop and compare clustering models for telecom data, focusing on those that include time-stamp information management. Two main models are introduced, solved in detail, and analyzed: Gaussian Probabilistic Latent Semantic Analysis (GPLSA and time-dependent Gaussian Mixture Models (time-GMM. These models are then compared with other different clustering models, such as Gaussian model and GMM (which do not contain time-stamp information. We perform computation on both sample and telecom traffic data to show that the efficiency and robustness of GPLSA make it the superior method to detect outliers and provide results automatically with low tuning parameters or expertise requirement.

  7. Identification of Outlier Loci Responding to Anthropogenic and Natural Selection Pressure in Stream Insects Based on a Self-Organizing Map

    Directory of Open Access Journals (Sweden)

    Bin Li

    2016-05-01

    Full Text Available Water quality maintenance should be considered from an ecological perspective since water is a substrate ingredient in the biogeochemical cycle and is closely linked with ecosystem functioning and services. Addressing the status of live organisms in aquatic ecosystems is a critical issue for appropriate prediction and water quality management. Recently, genetic changes in biological organisms have garnered more attention due to their in-depth expression of environmental stress on aquatic ecosystems in an integrative manner. We demonstrate that genetic diversity would adaptively respond to environmental constraints in this study. We applied a self-organizing map (SOM to characterize complex Amplified Fragment Length Polymorphisms (AFLP of aquatic insects in six streams in Japan with natural and anthropogenic variability. After SOM training, the loci compositions of aquatic insects effectively responded to environmental selection pressure. To measure how important the role of loci compositions was in the population division, we altered the AFLP data by flipping the existence of given loci individual by individual. Subsequently we recognized the cluster change of the individuals with altered data using the trained SOM. Based on SOM recognition of these altered data, we determined the outlier loci (over 90th percentile that showed drastic changes in their belonging clusters (D. Subsequently environmental responsiveness (Ek’ was also calculated to address relationships with outliers in different species. Outlier loci were sensitive to slightly polluted conditions including Chl-a, NH4-N, NOX-N, PO4-P, and SS, and the food material, epilithon. Natural environmental factors such as altitude and sediment additionally showed relationships with outliers in somewhat lower levels. Poly-loci like responsiveness was detected in adapting to environmental constraints. SOM training followed by recognition shed light on developing algorithms de novo to

  8. A Positive Deviance Approach to Early Childhood Obesity: Cross-Sectional Characterization of Positive Outliers

    OpenAIRE

    Foster, Byron Alexander; Farragher, Jill; Parker, Paige; Hale, Daniel E.

    2015-01-01

    Objective: Positive deviance methodology has been applied in the developing world to address childhood malnutrition and has potential for application to childhood obesity in the United States. We hypothesized that among children at high-risk for obesity, evaluating normal weight children will enable identification of positive outlier behaviors and practices.

  9. The high cost of low-acuity ICU outliers.

    Science.gov (United States)

    Dahl, Deborah; Wojtal, Greg G; Breslow, Michael J; Holl, Randy; Huguez, Debra; Stone, David; Korpi, Gloria

    2012-01-01

    Direct variable costs were determined on each hospital day for all patients with an intensive care unit (ICU) stay in four Phoenix-area hospital ICUs. Average daily direct variable cost in the four ICUs ranged from $1,436 to $1,759 and represented 69.4 percent and 45.7 percent of total hospital stay cost for medical and surgical patients, respectively. Daily ICU cost and length of stay (LOS) were higher in patients with higher ICU admission acuity of illness as measured by the APACHE risk prediction methodology; 16.2 percent of patients had an ICU stay in excess of six days, and these LOS outliers accounted for 56.7 percent of total ICU cost. While higher-acuity patients were more likely to be ICU LOS outliers, 11.1 percent of low-risk patients were outliers. The low-risk group included 69.4 percent of the ICU population and accounted for 47 percent of all LOS outliers. Low-risk LOS outliers accounted for 25.3 percent of ICU cost and incurred fivefold higher hospital stay costs and mortality rates. These data suggest that severity of illness is an important determinant of daily resource consumption and LOS, regardless of whether the patient arrives in the ICU with high acuity or develops complications that increase acuity. The finding that a substantial number of long-stay patients come into the ICU with low acuity and deteriorate after ICU admission is not widely recognized and represents an important opportunity to improve patient outcomes and lower costs. ICUs should consider adding low-risk LOS data to their quality and financial performance reports.

  10. The masking breakdown point of multivariate outlier identification rules

    OpenAIRE

    Becker, Claudia; Gather, Ursula

    1997-01-01

    In this paper, we consider one-step outlier identifiation rules for multivariate data, generalizing the concept of so-called alpha outlier identifiers, as presented in Davies and Gather (1993) for the case of univariate samples. We investigate, how the finite-sample breakdown points of estimators used in these identification rules influence the masking behaviour of the rules.

  11. The obligation of physicians to medical outliers: a Kantian and Hegelian synthesis.

    Science.gov (United States)

    Papadimos, Thomas J; Marco, Alan P

    2004-06-03

    Patients who present to medical practices without health insurance or with serious co-morbidities can become fiscal disasters to those who care for them. Their consumption of scarce resources has caused consternation among providers and institutions, especially as it concerns the amount and type of care they should receive. In fact, some providers may try to avoid caring for them altogether, or at least try to limit their institutional or practice exposure to them. We present a philosophical discourse, with emphasis on the writings of Immanuel Kant and G.F.W. Hegel, as to why physicians have the moral imperative to give such "outliers" considerate and thoughtful care. Outliers are defined and the ideals of morality, responsibility, good will, duty, and principle are applied to the care of patients whose financial means are meager and to those whose care is physiologically futile. Actions of moral worth, unconditional good will, and doing what is right are examined. Outliers are a legitimate economic concern to individual practitioners and institutions, however this should not lead to an evasion of care. These patients should be identified early in their course of care, but such identification should be preceded by a well-planned recognition of this burden and appropriate staffing and funding should be secured. A thoughtful team approach by medical practices and their institutions, involving both clinicians and non-clinicians, should be pursued.

  12. The variance of length of stay and the optimal DRG outlier payments.

    Science.gov (United States)

    Felder, Stefan

    2009-09-01

    Prospective payment schemes in health care often include supply-side insurance for cost outliers. In hospital reimbursement, prospective payments for patient discharges, based on their classification into diagnosis related group (DRGs), are complemented by outlier payments for long stay patients. The outlier scheme fixes the length of stay (LOS) threshold, constraining the profit risk of the hospitals. In most DRG systems, this threshold increases with the standard deviation of the LOS distribution. The present paper addresses the adequacy of this DRG outlier threshold rule for risk-averse hospitals with preferences depending on the expected value and the variance of profits. It first shows that the optimal threshold solves the hospital's tradeoff between higher profit risk and lower premium loading payments. It then demonstrates for normally distributed truncated LOS that the optimal outlier threshold indeed decreases with an increase in the standard deviation.

  13. Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis.

    Science.gov (United States)

    Westerholt, Rene; Steiger, Enrico; Resch, Bernd; Zipf, Alexander

    2016-01-01

    Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.

  14. A method for separating seismo-ionospheric TEC outliers from heliogeomagnetic disturbances by using nu-SVR

    Energy Technology Data Exchange (ETDEWEB)

    Pattisahusiwa, Asis [Bandung Institute of Technology (Indonesia); Liong, The Houw; Purqon, Acep [Earth physics and complex systems research group, Bandung Institute of Technology (Indonesia)

    2015-09-30

    Seismo-Ionospheric is a study of ionosphere disturbances associated with seismic activities. In many previous researches, heliogeomagnetic or strong earthquake activities can caused the disturbances in the ionosphere. However, it is difficult to separate these disturbances based on related sources. In this research, we proposed a method to separate these disturbances/outliers by using nu-SVR with the world-wide GPS data. TEC data related to the 26th December 2004 Sumatra and the 11th March 2011 Honshu earthquakes had been analyzed. After analyzed TEC data in several location around the earthquake epicenter and compared with geomagnetic data, the method shows a good result in the average to detect the source of these outliers. This method is promising to use in the future research.

  15. Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

    Science.gov (United States)

    Bakker, Marjan; Wicherts, Jelte M

    2014-09-01

    In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  16. What 'outliers' tell us about missed opportunities for tuberculosis control: a cross-sectional study of patients in Mumbai, India

    Directory of Open Access Journals (Sweden)

    Porter John DH

    2010-05-01

    Full Text Available Abstract Background India's Revised National Tuberculosis Control Programme (RNTCP is deemed highly successful in terms of detection and cure rates. However, some patients experience delays in accessing diagnosis and treatment. Patients falling between the 96th and 100th percentiles for these access indicators are often ignored as atypical 'outliers' when assessing programme performance. They may, however, provide clues to understanding why some patients never reach the programme. This paper examines the underlying vulnerabilities of patients with extreme values for delays in accessing the RNTCP in Mumbai city, India. Methods We conducted a cross-sectional study with 266 new sputum positive patients registered with the RNTCP in Mumbai. Patients were classified as 'outliers' if patient, provider and system delays were beyond the 95th percentile for the respective variable. Case profiles of 'outliers' for patient, provider and system delays were examined and compared with the rest of the sample to identify key factors responsible for delays. Results Forty-two patients were 'outliers' on one or more of the delay variables. All 'outliers' had a significantly lower per capita income than the remaining sample. The lack of economic resources was compounded by social, structural and environmental vulnerabilities. Longer patient delays were related to patients' perception of symptoms as non-serious. Provider delays were incurred as a result of private providers' failure to respond to tuberculosis in a timely manner. Diagnostic and treatment delays were minimal, however, analysis of the 'outliers' revealed the importance of social support in enabling access to the programme. Conclusion A proxy for those who fail to reach the programme, these case profiles highlight unique vulnerabilities that need innovative approaches by the RNTCP. The focus on 'outliers' provides a less resource- and time-intensive alternative to community-based studies for

  17. Identificación de outliers en muestras multivariantes

    OpenAIRE

    Pérez Díez de los Ríos, José Luis

    1987-01-01

    En esta memoria se analiza la problemática de las observaciones Outliers en nuestras Multivariantes describiéndose las distintas técnicas que existen en la actualidad para la identificación de Outliers en nuestras multidimensionales y poniéndose de manifiesto que la mayoría de ellas son generalizaciones de ideas desarrolladas para el caso univariante o técnicas basadas en representaciones graficas. Se aborda a continuación el denominado efecto de enmascaramiento que se puede presentar cuando...

  18. The obligation of physicians to medical outliers: a Kantian and Hegelian synthesis

    Directory of Open Access Journals (Sweden)

    Marco Alan P

    2004-06-01

    Full Text Available Abstract Background Patients who present to medical practices without health insurance or with serious co-morbidities can become fiscal disasters to those who care for them. Their consumption of scarce resources has caused consternation among providers and institutions, especially as it concerns the amount and type of care they should receive. In fact, some providers may try to avoid caring for them altogether, or at least try to limit their institutional or practice exposure to them. Discussion We present a philosophical discourse, with emphasis on the writings of Immanuel Kant and G.F.W. Hegel, as to why physicians have the moral imperative to give such "outliers" considerate and thoughtful care. Outliers are defined and the ideals of morality, responsibility, good will, duty, and principle are applied to the care of patients whose financial means are meager and to those whose care is physiologically futile. Actions of moral worth, unconditional good will, and doing what is right are examined. Summary Outliers are a legitimate economic concern to individual practitioners and institutions, however this should not lead to an evasion of care. These patients should be identified early in their course of care, but such identification should be preceded by a well-planned recognition of this burden and appropriate staffing and funding should be secured. A thoughtful team approach by medical practices and their institutions, involving both clinicians and non-clinicians, should be pursued.

  19. Robust volcano plot: identification of differential metabolites in the presence of outliers.

    Science.gov (United States)

    Kumar, Nishith; Hoque, Md Aminul; Sugimoto, Masahiro

    2018-04-11

    The identification of differential metabolites in metabolomics is still a big challenge and plays a prominent role in metabolomics data analyses. Metabolomics datasets often contain outliers because of analytical, experimental, and biological ambiguity, but the currently available differential metabolite identification techniques are sensitive to outliers. We propose a kernel weight based outlier-robust volcano plot for identifying differential metabolites from noisy metabolomics datasets. Two numerical experiments are used to evaluate the performance of the proposed technique against nine existing techniques, including the t-test and the Kruskal-Wallis test. Artificially generated data with outliers reveal that the proposed method results in a lower misclassification error rate and a greater area under the receiver operating characteristic curve compared with existing methods. An experimentally measured breast cancer dataset to which outliers were artificially added reveals that our proposed method produces only two non-overlapping differential metabolites whereas the other nine methods produced between seven and 57 non-overlapping differential metabolites. Our data analyses show that the performance of the proposed differential metabolite identification technique is better than that of existing methods. Thus, the proposed method can contribute to analysis of metabolomics data with outliers. The R package and user manual of the proposed method are available at https://github.com/nishithkumarpaul/Rvolcano .

  20. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  1. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    Science.gov (United States)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  2. Stoicism, the physician, and care of medical outliers

    Directory of Open Access Journals (Sweden)

    Papadimos Thomas J

    2004-12-01

    Full Text Available Abstract Background Medical outliers present a medical, psychological, social, and economic challenge to the physicians who care for them. The determinism of Stoic thought is explored as an intellectual basis for the pursuit of a correct mental attitude that will provide aid and comfort to physicians who care for medical outliers, thus fostering continued physician engagement in their care. Discussion The Stoic topics of good, the preferable, the morally indifferent, living consistently, and appropriate actions are reviewed. Furthermore, Zeno's cardinal virtues of Justice, Temperance, Bravery, and Wisdom are addressed, as are the Stoic passions of fear, lust, mental pain, and mental pleasure. These concepts must be understood by physicians if they are to comprehend and accept the Stoic view as it relates to having the proper attitude when caring for those with long-term and/or costly illnesses. Summary Practicing physicians, especially those that are hospital based, and most assuredly those practicing critical care medicine, will be emotionally challenged by the medical outlier. A Stoic approach to such a social and psychological burden may be of benefit.

  3. Pathway-based outlier method reveals heterogeneous genomic structure of autism in blood transcriptome.

    Science.gov (United States)

    Campbell, Malcolm G; Kohane, Isaac S; Kong, Sek Won

    2013-09-24

    Decades of research strongly suggest that the genetic etiology of autism spectrum disorders (ASDs) is heterogeneous. However, most published studies focus on group differences between cases and controls. In contrast, we hypothesized that the heterogeneity of the disorder could be characterized by identifying pathways for which individuals are outliers rather than pathways representative of shared group differences of the ASD diagnosis. Two previously published blood gene expression data sets--the Translational Genetics Research Institute (TGen) dataset (70 cases and 60 unrelated controls) and the Simons Simplex Consortium (Simons) dataset (221 probands and 191 unaffected family members)--were analyzed. All individuals of each dataset were projected to biological pathways, and each sample's Mahalanobis distance from a pooled centroid was calculated to compare the number of case and control outliers for each pathway. Analysis of a set of blood gene expression profiles from 70 ASD and 60 unrelated controls revealed three pathways whose outliers were significantly overrepresented in the ASD cases: neuron development including axonogenesis and neurite development (29% of ASD, 3% of control), nitric oxide signaling (29%, 3%), and skeletal development (27%, 3%). Overall, 50% of cases and 8% of controls were outliers in one of these three pathways, which could not be identified using group comparison or gene-level outlier methods. In an independently collected data set consisting of 221 ASD and 191 unaffected family members, outliers in the neurogenesis pathway were heavily biased towards cases (20.8% of ASD, 12.0% of control). Interestingly, neurogenesis outliers were more common among unaffected family members (Simons) than unrelated controls (TGen), but the statistical significance of this effect was marginal (Chi squared P < 0.09). Unlike group difference approaches, our analysis identified the samples within the case and control groups that manifested each expression

  4. Quality of Care at Hospitals Identified as Outliers in Publicly Reported Mortality Statistics for Percutaneous Coronary Intervention.

    Science.gov (United States)

    Waldo, Stephen W; McCabe, James M; Kennedy, Kevin F; Zigler, Corwin M; Pinto, Duane S; Yeh, Robert W

    2017-05-16

    Public reporting of percutaneous coronary intervention (PCI) outcomes may create disincentives for physicians to provide care for critically ill patients, particularly at institutions with worse clinical outcomes. We thus sought to evaluate the procedural management and in-hospital outcomes of patients treated for acute myocardial infarction before and after a hospital had been publicly identified as a negative outlier. Using state reports, we identified hospitals that were recognized as negative PCI outliers in 2 states (Massachusetts and New York) from 2002 to 2012. State hospitalization files were used to identify all patients with an acute myocardial infarction within these states. Procedural management and in-hospital outcomes were compared among patients treated at outlier hospitals before and after public report of outlier status. Patients at nonoutlier institutions were used to control for temporal trends. Among 86 hospitals, 31 were reported as outliers for excess mortality. Outlier facilities were larger, treating more patients with acute myocardial infarction and performing more PCIs than nonoutlier hospitals ( P fashion (interaction P =0.50) after public report of outlier status. The likelihood of in-hospital mortality decreased at outlier institutions (RR, 0.83; 95% CI, 0.81-0.85) after public report, and to a lesser degree at nonoutlier institutions (RR, 0.90; 95% CI, 0.87-0.92; interaction P <0.001). Among patients that underwent PCI, in-hospital mortality decreased at outlier institutions after public recognition of outlier status in comparison with prior (RR, 0.72; 9% CI, 0.66-0.79), a decline that exceeded the reduction at nonoutlier institutions (RR, 0.87; 95% CI, 0.80-0.96; interaction P <0.001). Large hospitals with higher clinical volume are more likely to be designated as negative outliers. The rates of percutaneous revascularization increased similarly at outlier and nonoutlier institutions after report of outlier status. After outlier

  5. Outliers and Extremes: Dragon-Kings or Dragon-Fools?

    Science.gov (United States)

    Schertzer, D. J.; Tchiguirinskaia, I.; Lovejoy, S.

    2012-12-01

    Geophysics seems full of monsters like Victor Hugo's Court of Miracles and monstrous extremes have been statistically considered as outliers with respect to more normal events. However, a characteristic magnitude separating abnormal events from normal ones would be at odd with the generic scaling behaviour of nonlinear systems, contrary to "fat tailed" probability distributions and self-organized criticality. More precisely, it can be shown [1] how the apparent monsters could be mere manifestations of a singular measure mishandled as a regular measure. Monstrous fluctuations are the rule, not outliers and they are more frequent than usually thought up to the point that (theoretical) statistical moments can easily be infinite. The empirical estimates of the latter are erratic and diverge with sample size. The corresponding physics is that intense small scale events cannot be smoothed out by upscaling. However, based on a few examples, it has also been argued [2] that one should consider "genuine" outliers of fat tailed distributions so monstrous that they can be called "dragon-kings". We critically analyse these arguments, e.g. finite sample size and statistical estimates of the largest events, multifractal phase transition vs. more classical phase transition. We emphasize the fact that dragon-kings are not needed in order that the largest events become predictable. This is rather reminiscent of the Feast of Fools picturesquely described by Victor Hugo. [1] D. Schertzer, I. Tchiguirinskaia, S. Lovejoy et P. Hubert (2010): No monsters, no miracles: in nonlinear sciences hydrology is not an outlier! Hydrological Sciences Journal, 55 (6) 965 - 979. [2] D. Sornette (2009): Dragon-Kings, Black Swans and the Prediction of Crises. International Journal of Terraspace Science and Engineering 1(3), 1-17.

  6. Anomaly Detection using the "Isolation Forest" algorithm

    CERN Multimedia

    CERN. Geneva

    2015-01-01

    Anomaly detection can provide clues about an outlying minority class in your data: hackers in a set of network events, fraudsters in a set of credit card transactions, or exotic particles in a set of high-energy collisions. In this talk, we analyze a real dataset of breast tissue biopsies, with malignant results forming the minority class. The "Isolation Forest" algorithm finds anomalies by deliberately “overfitting” models that memorize each data point. Since outliers have more empty space around them, they take fewer steps to memorize. Intuitively, a house in the country can be identified simply as “that house out by the farm”, while a house in the city needs a longer description like “that house in Brooklyn, near Prospect Park, on Union Street, between the firehouse and the library, not far from the French restaurant”. We first use anomaly detection to find outliers in the biopsy data, then apply traditional predictive modeling to discover rules that separate anomalies from normal data...

  7. 42 CFR 484.240 - Methodology used for the calculation of the outlier payment.

    Science.gov (United States)

    2010-10-01

    ... for each case-mix group. (b) The outlier threshold for each case-mix group is the episode payment... the same for all case-mix groups. (c) The outlier payment is a proportion of the amount of estimated...

  8. Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research

    Science.gov (United States)

    Bakker, Marjan; Wicherts, Jelte M.

    2014-01-01

    Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606

  9. 42 CFR 412.84 - Payment for extraordinarily high-cost cases (cost outliers).

    Science.gov (United States)

    2010-10-01

    ... obtains accurate data with which to calculate either an operating or capital cost-to-charge ratio (or both... outlier payments will be based on operating and capital cost-to-charge ratios calculated based on a ratio... outliers). 412.84 Section 412.84 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF...

  10. Patterns of Care for Biologic-Dosing Outliers and Nonoutliers in Biologic-Naive Patients with Rheumatoid Arthritis.

    Science.gov (United States)

    Delate, Thomas; Meyer, Roxanne; Jenkins, Daniel

    2017-08-01

    Although most biologic medications for patients with rheumatoid arthritis (RA) have recommended fixed dosing, actual biologic dosing may vary among real-world patients, since some patients can receive higher (high-dose outliers) or lower (low-dose outliers) doses than what is recommended in medication package inserts. To describe the patterns of care for biologic-dosing outliers and nonoutliers in biologic-naive patients with RA. This was a retrospective, longitudinal cohort study of patients with RA who were not pregnant and were aged ≥ 18 and 110% of the approved dose in the package insert at any time during the study period. Baseline patient profiles, treatment exposures, and outcomes were collected during the 180 days before and up to 2 years after biologic initiation and compared across index biologic outlier groups. Patients were followed for at least 1 year, with a subanalysis of those patients who remained as members for 2 years. This study included 434 RA patients with 1 year of follow-up and 372 RA patients with 2 years of follow-up. Overall, the vast majority of patients were female (≈75%) and had similar baseline characteristics. Approximately 10% of patients were outliers in both follow-up cohorts. ETN patients were least likely to become outliers, and ADA patients were most likely to become outliers. Of all outliers during the 1-year follow-up, patients were more likely to be a high-dose outlier (55%) than a low-dose outlier (45%). Median 1- and 2-year adjusted total biologic costs (based on wholesale acquisition costs) were higher for ADA and ETA nonoutliers than for IFX nonoutliers. Biologic persistence was highest for IFX patients. Charlson Comorbidity Index score, ETN and IFX index biologic, and treatment with a nonbiologic disease-modifying antirheumatic drug (DMARD) before biologic initiation were associated with becoming high- or low-dose outliers (c-statistic = 0.79). Approximately 1 in 10 study patients with RA was identified as a

  11. Outlier identification procedures for contingency tables using maximum likelihood and $L_1$ estimates

    NARCIS (Netherlands)

    Kuhnt, S.

    2004-01-01

    Observed cell counts in contingency tables are perceived as outliers if they have low probability under an anticipated loglinear Poisson model. New procedures for the identification of such outliers are derived using the classical maximum likelihood estimator and an estimator based on the L1 norm.

  12. An unsupervised learning algorithm for fatigue crack detection in waveguides

    International Nuclear Information System (INIS)

    Rizzo, Piervincenzo; Cammarata, Marcello; Kent Harries; Dutta, Debaditya; Sohn, Hoon

    2009-01-01

    Ultrasonic guided waves (UGWs) are a useful tool in structural health monitoring (SHM) applications that can benefit from built-in transduction, moderately large inspection ranges, and high sensitivity to small flaws. This paper describes an SHM method based on UGWs and outlier analysis devoted to the detection and quantification of fatigue cracks in structural waveguides. The method combines the advantages of UGWs with the outcomes of the discrete wavelet transform (DWT) to extract defect-sensitive features aimed at performing a multivariate diagnosis of damage. In particular, the DWT is exploited to generate a set of relevant wavelet coefficients to construct a uni-dimensional or multi-dimensional damage index vector. The vector is fed to an outlier analysis to detect anomalous structural states. The general framework presented in this paper is applied to the detection of fatigue cracks in a steel beam. The probing hardware consists of a National Instruments PXI platform that controls the generation and detection of the ultrasonic signals by means of piezoelectric transducers made of lead zirconate titanate. The effectiveness of the proposed approach to diagnose the presence of defects as small as a few per cent of the waveguide cross-sectional area is demonstrated

  13. Outlier identification in colorectal surgery should separate elective and nonelective service components.

    Science.gov (United States)

    Byrne, Ben E; Mamidanna, Ravikrishna; Vincent, Charles A; Faiz, Omar D

    2014-09-01

    The identification of health care institutions with outlying outcomes is of great importance for reporting health care results and for quality improvement. Historically, elective surgical outcomes have received greater attention than nonelective results, although some studies have examined both. Differences in outlier identification between these patient groups have not been adequately explored. The aim of this study was to compare the identification of institutional outliers for mortality after elective and nonelective colorectal resection in England. This was a cohort study using routine administrative data. Ninety-day mortality was determined by using statutory records of death. Adjusted Trust-level mortality rates were calculated by using multiple logistic regression. High and low mortality outliers were identified and compared across funnel plots for elective and nonelective surgery. All English National Health Service Trusts providing colorectal surgery to an unrestricted patient population were studied. Adults admitted for colorectal surgery between April 2006 and March 2012 were included. Segmental colonic or rectal resection was performed. The primary outcome measured was 90-day mortality. Included were 195,118 patients, treated at 147 Trusts. Ninety-day mortality rates after elective and nonelective surgery were 4% and 18%. No unit with high outlying mortality for elective surgery was a high outlier for nonelective mortality and vice versa. Trust level, observed-to-expected mortality for elective and nonelective surgery, was moderately correlated (Spearman ρ = 0.50, pinstitutional mortality outlier after elective and nonelective colorectal surgery was not closely related. Therefore, mortality rates should be reported for both patient cohorts separately. This would provide a broad picture of the state of colorectal services and help direct research and quality improvement activities.

  14. Outlier robustness for wind turbine extrapolated extreme loads

    DEFF Research Database (Denmark)

    Natarajan, Anand; Verelst, David Robert

    2012-01-01

    . Stochastic identification of numerical artifacts in simulated loads is demonstrated using the method of principal component analysis. The extrapolation methodology is made robust to outliers through a weighted loads approach, whereby the eigenvalues of the correlation matrix obtained using the loads with its...

  15. Fuzzy Treatment of Candidate Outliers in Measurements

    Directory of Open Access Journals (Sweden)

    Giampaolo E. D'Errico

    2012-01-01

    Full Text Available Robustness against the possible occurrence of outlying observations is critical to the performance of a measurement process. Open questions relevant to statistical testing for candidate outliers are reviewed. A novel fuzzy logic approach is developed and exemplified in a metrology context. A simulation procedure is presented and discussed by comparing fuzzy versus probabilistic models.

  16. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    Science.gov (United States)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  17. A study of outliers in statistical distributions of mechanical properties of structural steels

    International Nuclear Information System (INIS)

    Oefverbeck, P.; Oestberg, G.

    1977-01-01

    The safety against failure of pressure vessels can be assessed by statistical methods, so-called probabilistic fracture mechanics. The data base for such estimations is admittedly rather meagre, making it necessary to assume certain conventional statistical distributions. Since the failure rates arrived at are low, for nuclear vessels of the order of 10 - to 10 - per year, the extremes of the variables involved, among other things the mechanical properties of the steel used, are of particular interest. A question sometimes raised is whether outliers, or values exceeding the extremes in the assumed distributions, might occur. In order to explore this possibility a study has been made of strength values of three qualities of structural steels, available in samples of up to about 12,000. Statistical evaluation of these samples with respect to outliers, using standard methods for this purpose, revealed the presence of such outliers in most cases, with a frequency of occurrence of, typically, a few values per thousand, estimated by the methods described. Obviously, statistical analysis alone cannot be expected to shed any light on the causes of outliers. Thus, the interpretation of these results with respect to their implication for the probabilistic estimation of the integrety of pressure vessels must await further studies of a similar nature in which the test specimens corresponding to outliers can be recovered and examined metallographically. For the moment the results should be regarded only as a factor to be considered in discussions of the safety of pressure vessels. (author)

  18. Determinants of long-term growth : New results applying robust estimation and extreme bounds analysis

    NARCIS (Netherlands)

    Sturm, J.-E.; de Haan, J.

    2005-01-01

    Two important problems exist in cross-country growth studies: outliers and model uncertainty. Employing Sala-i-Martin's (1997a,b) data set, we first use robust estimation and analyze to what extent outliers influence OLS regressions. We then use both OLS and robust estimation techniques in applying

  19. Prospective casemix-based funding, analysis and financial impact of cost outliers in all-patient refined diagnosis related groups in three Belgian general hospitals.

    Science.gov (United States)

    Pirson, Magali; Martins, Dimitri; Jackson, Terri; Dramaix, Michèle; Leclercq, Pol

    2006-03-01

    This study examined the impact of cost outliers in term of hospital resources consumption, the financial impact of the outliers under the Belgium casemix-based system, and the validity of two "proxies" for costs: length of stay and charges. The cost of all hospital stays at three Belgian general hospitals were calculated for the year 2001. High resource use outliers were selected according to the following rule: 75th percentile +1.5 xinter-quartile range. The frequency of cost outliers varied from 7% to 8% across hospitals. Explanatory factors were: major or extreme severity of illness, longer length of stay, and intensive care unit stay. Cost outliers account for 22-30% of hospital costs. One-third of length-of-stay outliers are not cost outliers, and nearly one-quarter of charges outliers are not cost outliers. The current funding system in Belgium does not penalize hospitals having a high percentage of outliers. The billing generated by these patients largely compensates for costs generated. Length of stay and charges are not a good approximation to select cost outliers.

  20. A computational study on outliers in world music

    Science.gov (United States)

    Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country. PMID:29253027

  1. Controller modification applied for active fault detection

    DEFF Research Database (Denmark)

    Niemann, Hans Henrik; Stoustrup, Jakob; Poulsen, Niels Kjølstad

    2014-01-01

    This paper is focusing on active fault detection (AFD) for parametric faults in closed-loop systems. This auxiliary input applied for the fault detection will also disturb the external output and consequently reduce the performance of the controller. Therefore, only small auxiliary inputs are used...... with the result that the detection and isolation time can be long. In this paper it will be shown, that this problem can be handled by using a modification of the feedback controller. By applying the YJBK-parameterization (after Youla, Jabr, Bongiorno and Kucera) for the controller, it is possible to modify...... the frequency for the auxiliary input is selected. This gives that it is possible to apply an auxiliary input with a reduced amplitude. An example is included to show the results....

  2. Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

    Directory of Open Access Journals (Sweden)

    Goovaerts Pierre

    2004-07-01

    Full Text Available Abstract Background Complete Spatial Randomness (CSR is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new

  3. Comparative study of methods on outlying data detection in experimental results

    International Nuclear Information System (INIS)

    Oliveira, P.M.S.; Munita, C.S.; Hazenfratz, R.

    2009-01-01

    The interpretation of experimental results through multivariate statistical methods might reveal the outliers existence, which is rarely taken into account by the analysts. However, their presence can influence the results interpretation, generating false conclusions. This paper shows the importance of the outliers determination for one data base of 89 samples of ceramic fragments, analyzed by neutron activation analysis. The results were submitted to five procedures to detect outliers: Mahalanobis distance, cluster analysis, principal component analysis, factor analysis, and standardized residual. The results showed that although cluster analysis is one of the procedures most used to identify outliers, it can fail by not showing the samples that are easily identified as outliers by other methods. In general, the statistical procedures for the identification of the outliers are little known by the analysts. (author)

  4. An application of robust ridge regression model in the presence of outliers to real data problem

    Science.gov (United States)

    Shariff, N. S. Md.; Ferdaos, N. A.

    2017-09-01

    Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.

  5. Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

    Science.gov (United States)

    Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

    2012-08-01

    Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.

  6. A comparative study of outlier detection for large-scale traffic data by one-class SVM and kernel density estimation

    Science.gov (United States)

    Ngan, Henry Y. T.; Yung, Nelson H. C.; Yeh, Anthony G. O.

    2015-02-01

    This paper aims at presenting a comparative study of outlier detection (OD) for large-scale traffic data. The traffic data nowadays are massive in scale and collected in every second throughout any modern city. In this research, the traffic flow dynamic is collected from one of the busiest 4-armed junction in Hong Kong in a 31-day sampling period (with 764,027 vehicles in total). The traffic flow dynamic is expressed in a high dimension spatial-temporal (ST) signal format (i.e. 80 cycles) which has a high degree of similarities among the same signal and across different signals in one direction. A total of 19 traffic directions are identified in this junction and lots of ST signals are collected in the 31-day period (i.e. 874 signals). In order to reduce its dimension, the ST signals are firstly undergone a principal component analysis (PCA) to represent as (x,y)-coordinates. Then, these PCA (x,y)-coordinates are assumed to be conformed as Gaussian distributed. With this assumption, the data points are further to be evaluated by (a) a correlation study with three variant coefficients, (b) one-class support vector machine (SVM) and (c) kernel density estimation (KDE). The correlation study could not give any explicit OD result while the one-class SVM and KDE provide average 59.61% and 95.20% DSRs, respectively.

  7. Técnica de aprendizado semissupervisionado para detecção de outliers

    OpenAIRE

    Fabio Willian Zamoner

    2014-01-01

    Detecção de outliers desempenha um importante papel para descoberta de conhecimento em grandes bases de dados. O estudo é motivado por inúmeras aplicações reais como fraudes de cartões de crédito, detecção de falhas em componentes industriais, intrusão em redes de computadores, aprovação de empréstimos e monitoramento de condições médicas. Um outlier é definido como uma observação que desvia das outras observações em relação a uma medida e exerce considerável influência na análise de dados...

  8. Anomalous human behavior detection: An Adaptive approach

    NARCIS (Netherlands)

    Leeuwen, C. van; Halma, A.; Schutte, K.

    2013-01-01

    Detection of anomalies (outliers or abnormal instances) is an important element in a range of applications such as fault, fraud, suspicious behavior detection and knowledge discovery. In this article we propose a new method for anomaly detection and performed tested its ability to detect anomalous

  9. Identification of outliers and positive deviants for healthcare improvement: looking for high performers in hypoglycemia safety in patients with diabetes

    Directory of Open Access Journals (Sweden)

    Brigid Wilson

    2017-11-01

    Full Text Available Abstract Background The study objectives were to determine: (1 how statistical outliers exhibiting low rates of diabetes overtreatment performed on a reciprocal measure – rates of diabetes undertreatment; and (2 the impact of different criteria on high performing outlier status. Methods The design was serial cross-sectional, using yearly Veterans Health Administration (VHA administrative data (2009–2013. Our primary outcome measure was facility rate of HbA1c overtreatment of diabetes in patients at risk for hypoglycemia. Outlier status was assessed by using two approaches: calculating a facility outlier value within year, comparator group, and A1c threshold while incorporating at risk population sizes; and examining standardized model residuals across year and A1c threshold. Facilities with outlier values in the lowest decile for all years of data using more than one threshold and comparator or with time-averaged model residuals in the lowest decile for all A1c thresholds were considered high performing outliers. Results Using outlier values, three of the 27 high performers from 2009 were also identified in 2010–2013 and considered outliers. There was only modest overlap between facilities identified as top performers based on three thresholds: A1c  9% than VA average in the population of patients at high risk for hypoglycemia. Conclusions Statistical identification of positive deviants for diabetes overtreatment was dependent upon the specific measures and approaches used. Moreover, because two facilities may arrive at the same results via very different pathways, it is important to consider that a “best” practice may actually reflect a separate “worst” practice.

  10. Detecting Outliers in Marathon Data by Means of the Andrews Plot

    Science.gov (United States)

    Stehlík, Milan; Wald, Helmut; Bielik, Viktor; Petrovič, Juraj

    2011-09-01

    For an optimal race performance, it is important, that the runner keeps steady pace during most of the time of the competition. First time runners or athletes without many competitions often experience an "blow out" after a few kilometers of the race. This could happen, because of strong emotional experiences or low control of running intensity. Competition pace of half marathon of the middle level recreational athletes is approximately 10 sec quicker than their training pace. If an athlete runs the first third of race (7 km) at a pace that is 20 sec quicker than is his capacity (trainability), he would experience an "blow out" in the last third of the race. This would be reflected by reducing the running intensity and inability to keep steady pace in the last kilometers of the race and in the final time as well. In sports science, there are many diagnostic methods ([3], [2], [6]) that are used for prediction of optimal race pace tempo and final time. Otherwise there is lacking practical evidence of diagnostics methods and its use in the field (competition, race). One of the conditions that needs to be carried out is that athletes have not only similar final times, but it is important that they keep constant pace as much as possible during whole race. For this reason it is very important to find outliers. Our experimental group consisted of 20 recreational trained athletes (mean age 32,6 years±8,9). Before the race the athletes were instructed to run on the basis of their subjective feeling and previous experience. The data (running pace of each kilometer, average and maximal heart rate of each kilometer) were collected by GPS-enabled personal trainer Forerunner 305.

  11. Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

    Science.gov (United States)

    Yakunin, A. G.; Hussein, H. M.

    2018-01-01

    The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.

  12. Impact of outlier status on critical care patient outcomes: Does boarding medical intensive care unit patients make a difference?

    Science.gov (United States)

    Ahmad, Danish; Moeller, Katherine; Chowdhury, Jared; Patel, Vishal; Yoo, Erika J

    2018-04-01

    To evaluate the impact of outlier status, or the practice of boarding ICU patients in distant critical care units, on clinical and utilization outcomes. Retrospective observational study of all consecutive admissions to the MICU service between April 1, 2014-January 3, 2016, at an urban university hospital. Of 1931 patients, 117 were outliers (6.1%) for the entire duration of their ICU stay. In adjusted analyses, there was no association between outlier status and hospital (OR 1.21, 95% CI 0.72-2.05, p=0.47) or ICU mortality (OR 1.20, 95% CI 0.64-2.25, p=0.57). Outliers had shorter hospital and ICU lengths of stay (LOS) in addition to fewer ventilator days. Crossover patients who had variable outlier exposure also had no increase in hospital (OR 1.61; 95% CI 0.80-3.23; p=0.18) or ICU mortality (OR 1.05; 95% CI 0.43-2.54; p=0.92) after risk-adjustment. Boarding of MICU patients in distant units during times of bed nonavailability does not negatively influence patient mortality or LOS. Increased hospital and ventilator utilization observed among non-outliers in the home unit may be attributable, at least in part, to differences in patient characteristics. Prospective investigation into the practice of ICU boarding will provide further confirmation of its safety. Copyright © 2017 Elsevier Inc. All rights reserved.

  13. Robust Regression Procedures for Predictor Variable Outliers.

    Science.gov (United States)

    1982-03-01

    space of probability dis- tributions. Then the influence function of the estimator is defined to be the derivative of the functional evaluated at the...measure of the impact of an outlier x0 on the estimator . . . . . .0 10 T(F) is the " influence function " which is defined to be T(F) - lirT(F")-T(F...positive and negative directions. An em- pirical influence function can be defined in a similar fashion simply by replacing F with F in eqn. (3.4).n

  14. Outlier Loci Detect Intraspecific Biodiversity amongst Spring and Autumn Spawning Herring across Local Scales.

    Directory of Open Access Journals (Sweden)

    Dorte Bekkevold

    Full Text Available Herring, Clupea harengus, is one of the ecologically and commercially most important species in European northern seas, where two distinct ecotypes have been described based on spawning time; spring and autumn. To date, it is unknown if these spring and autumn spawning herring constitute genetically distinct units. We assessed levels of genetic divergence between spring and autumn spawning herring in the Baltic Sea using two types of DNA markers, microsatellites and Single Nucleotide Polymorphisms, and compared the results with data for autumn spawning North Sea herring. Temporally replicated analyses reveal clear genetic differences between ecotypes and hence support reproductive isolation. Loci showing non-neutral behaviour, so-called outlier loci, show convergence between autumn spawning herring from demographically disjoint populations, potentially reflecting selective processes associated with autumn spawning ecotypes. The abundance and exploitation of the two ecotypes have varied strongly over space and time in the Baltic Sea, where autumn spawners have faced strong depression for decades. The results therefore have practical implications by highlighting the need for specific management of these co-occurring ecotypes to meet requirements for sustainable exploitation and ensure optimal livelihood for coastal communities.

  15. The influence of outliers on a model for the estimation of ...

    African Journals Online (AJOL)

    Veekunde

    problems that violate these assumptions is the problem of outliers. .... A normal probability plot of the ordered residuals on the normal order statistics, which are the ... observations from the normal distribution with zero mean and unit variance.

  16. Outliers, Cheese, and Rhizomes: Variations on a Theme of Limitation

    Science.gov (United States)

    Stone, Lynda

    2011-01-01

    All research has limitations, for example, from paradigm, concept, theory, tradition, and discipline. In this article Lynda Stone describes three exemplars that are variations on limitation and are "extraordinary" in that they change what constitutes future research in each domain. Malcolm Gladwell's present day study of outliers makes a…

  17. Adaptive Outlier-tolerant Exponential Smoothing Prediction Algorithms with Applications to Predict the Temperature in Spacecraft

    OpenAIRE

    Hu Shaolin; Zhang Wei; Li Ye; Fan Shunxi

    2011-01-01

    The exponential smoothing prediction algorithm is widely used in spaceflight control and in process monitoring as well as in economical prediction. There are two key conundrums which are open: one is about the selective rule of the parameter in the exponential smoothing prediction, and the other is how to improve the bad influence of outliers on prediction. In this paper a new practical outlier-tolerant algorithm is built to select adaptively proper parameter, and the exponential smoothing pr...

  18. Robust nonhomogeneous training samples detection method for space-time adaptive processing radar using sparse-recovery with knowledge-aided

    Science.gov (United States)

    Li, Zhihui; Liu, Hanwei; Zhang, Yongshun; Guo, Yiduo

    2017-10-01

    The performance of space-time adaptive processing (STAP) may degrade significantly when some of the training samples are contaminated by the signal-like components (outliers) in nonhomogeneous clutter environments. To remove the training samples contaminated by outliers in nonhomogeneous clutter environments, a robust nonhomogeneous training samples detection method using the sparse-recovery (SR) with knowledge-aided (KA) is proposed. First, the reduced-dimension (RD) overcomplete spatial-temporal steering dictionary is designed with the prior knowledge of system parameters and the possible target region. Then, the clutter covariance matrix (CCM) of cell under test is efficiently estimated using a modified focal underdetermined system solver (FOCUSS) algorithm, where a RD overcomplete spatial-temporal steering dictionary is applied. Third, the proposed statistics are formed by combining the estimated CCM with the generalized inner products (GIP) method, and the contaminated training samples can be detected and removed. Finally, several simulation results validate the effectiveness of the proposed KA-SR-GIP method.

  19. A Student’s t Mixture Probability Hypothesis Density Filter for Multi-Target Tracking with Outliers

    Science.gov (United States)

    Liu, Zhuowei; Chen, Shuxin; Wu, Hao; He, Renke; Hao, Lin

    2018-01-01

    In multi-target tracking, the outliers-corrupted process and measurement noises can reduce the performance of the probability hypothesis density (PHD) filter severely. To solve the problem, this paper proposed a novel PHD filter, called Student’s t mixture PHD (STM-PHD) filter. The proposed filter models the heavy-tailed process noise and measurement noise as a Student’s t distribution as well as approximates the multi-target intensity as a mixture of Student’s t components to be propagated in time. Then, a closed PHD recursion is obtained based on Student’s t approximation. Our approach can make full use of the heavy-tailed characteristic of a Student’s t distribution to handle the situations with heavy-tailed process and the measurement noises. The simulation results verify that the proposed filter can overcome the negative effect generated by outliers and maintain a good tracking accuracy in the simultaneous presence of process and measurement outliers. PMID:29617348

  20. SQL injection detection system

    OpenAIRE

    Vargonas, Vytautas

    2017-01-01

    SQL injection detection system Programmers do not always ensure security of developed systems. That is why it is important to look for solutions outside being reliant on developers. In this work SQL injection detection system is proposed. The system analyzes HTTP request parameters and detects intrusions. It is based on unsupervised machine learning. Trained by regular request data system detects outlier user parameters. Since training is not reliant on previous knowledge of SQL injections, t...

  1. Identification of Outliers in Grace Data for Indo-Gangetic Plain Using Various Methods (Z-Score, Modified Z-score and Adjusted Boxplot) and Its Removal

    Science.gov (United States)

    Srivastava, S.

    2015-12-01

    Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.

  2. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    Science.gov (United States)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  3. Tailor-made Surgical Guide Reduces Incidence of Outliers of Cup Placement.

    Science.gov (United States)

    Hananouchi, Takehito; Saito, Masanobu; Koyama, Tsuyoshi; Sugano, Nobuhiko; Yoshikawa, Hideki

    2010-04-01

    Malalignment of the cup in total hip arthroplasty (THA) increases the risks of postoperative complications such as neck cup impingement, dislocation, and wear. We asked whether a tailor-made surgical guide based on CT images would reduce the incidence of outliers beyond 10 degrees from preoperatively planned alignment of the cup compared with those without the surgical guide. We prospectively followed 38 patients (38 hips, Group 1) having primary THA with the conventional technique and 31 patients (31 hips, Group 2) using the surgical guide. We designed the guide for Group 2 based on CT images and fixed it to the acetabular edge with a Kirschner wire to indicate the planned cup direction. Postoperative CT images showed the guide reduced the number of outliers compared with the conventional method (Group 1, 23.7%; Group 2, 0%). The surgical guide provided more reliable cup insertion compared with conventional techniques. Level II, therapeutic study. See the Guidelines for Authors for a complete description of levels of evidence.

  4. Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

    Science.gov (United States)

    Ganapathy, S.; Yogesh, P.; Kannan, A.

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036

  5. MODVOLC2: A Hybrid Time Series Analysis for Detecting Thermal Anomalies Applied to Thermal Infrared Satellite Data

    Science.gov (United States)

    Koeppen, W. C.; Wright, R.; Pilger, E.

    2009-12-01

    We developed and tested a new, automated algorithm, MODVOLC2, which analyzes thermal infrared satellite time series data to detect and quantify the excess energy radiated from thermal anomalies such as active volcanoes, fires, and gas flares. MODVOLC2 combines two previously developed algorithms, a simple point operation algorithm (MODVOLC) and a more complex time series analysis (Robust AVHRR Techniques, or RAT) to overcome the limitations of using each approach alone. MODVOLC2 has four main steps: (1) it uses the original MODVOLC algorithm to process the satellite data on a pixel-by-pixel basis and remove thermal outliers, (2) it uses the remaining data to calculate reference and variability images for each calendar month, (3) it compares the original satellite data and any newly acquired data to the reference images normalized by their variability, and it detects pixels that fall outside the envelope of normal thermal behavior, (4) it adds any pixels detected by MODVOLC to those detected in the time series analysis. Using test sites at Anatahan and Kilauea volcanoes, we show that MODVOLC2 was able to detect ~15% more thermal anomalies than using MODVOLC alone, with very few, if any, known false detections. Using gas flares from the Cantarell oil field in the Gulf of Mexico, we show that MODVOLC2 provided results that were unattainable using a time series-only approach. Some thermal anomalies (e.g., Cantarell oil field flares) are so persistent that an additional, semi-automated 12-µm correction must be applied in order to correctly estimate both the number of anomalies and the total excess radiance being emitted by them. Although all available data should be included to make the best possible reference and variability images necessary for the MODVOLC2, we estimate that at least 80 images per calendar month are required to generate relatively good statistics from which to run MODVOLC2, a condition now globally met by a decade of MODIS observations. We also found

  6. TrigDB for improving the reliability of the epicenter locations by considering the neighborhood station's trigger and cutting out of outliers in operation of Earthquake Early Warning System.

    Science.gov (United States)

    Chi, H. C.; Park, J. H.; Lim, I. S.; Seong, Y. J.

    2016-12-01

    TrigDB is initially developed for the discrimination of teleseismic-origin false alarm in the case with unreasonably associated triggers producing mis-located epicenters. We have applied TrigDB to the current EEWS(Earthquake Early Warning System) from 2014. During the early stage of testing EEWS from 2011, we adapted ElarmS from US Berkeley BSL to Korean seismic network and applied more than 5 years. We found out that the real-time testing results of EEWS in Korea showed that all events inside of seismic network with bigger than magnitude 3.0 were well detected. However, two events located at sea area gave false location results with magnitude over 4.0 due to the long period and relatively high amplitude signals related to the teleseismic waves or regional deep sources. These teleseismic-relevant false events were caused by logical co-relation during association procedure and the corresponding geometric distribution of associated stations is crescent-shaped. Seismic stations are not deployed uniformly, so the expected bias ratio varies with evaluated epicentral location. This ratio is calculated in advance and stored into database, called as TrigDB, for the discrimination of teleseismic-origin false alarm. We upgraded this method, so called `TrigDB back filling', updating location with supplementary association of stations comparing triggered times between sandwiched stations which was not associated previously based on predefined criteria such as travel-time. And we have tested a module to reject outlier trigger times by setting a criteria comparing statistical values(Sigma) to the triggered times. The criteria of cutting off the outlier is slightly slow to work until the number of stations more than 8, however, the result of location is very much improved.

  7. Locally adaptive decision in detection of clustered microcalcifications in mammograms

    Science.gov (United States)

    Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

    2018-02-01

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  8. Algorithms for Anomaly Detection - Lecture 1

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    The concept of statistical anomalies, or outliers, has fascinated experimentalists since the earliest attempts to interpret data. We want to know why some data points don’t seem to belong with the others: perhaps we want to eliminate spurious or unrepresentative data from our model. Or, the anomalies themselves may be what we are interested in: an outlier could represent the symptom of a disease, an attack on a computer network, a scientific discovery, or even an unfaithful partner. We start with some general considerations, such as the relationship between clustering and anomaly detection, the choice between supervised and unsupervised methods, and the difference between global and local anomalies. Then we will survey the most representative anomaly detection algorithms, highlighting what kind of data each approach is best suited to, and discussing their limitations. We will finish with a discussion of the difficulties of anomaly detection in high-dimensional data and some new directions for anomaly detec...

  9. Algorithms for Anomaly Detection - Lecture 2

    CERN Multimedia

    CERN. Geneva

    2017-01-01

    The concept of statistical anomalies, or outliers, has fascinated experimentalists since the earliest attempts to interpret data. We want to know why some data points don’t seem to belong with the others: perhaps we want to eliminate spurious or unrepresentative data from our model. Or, the anomalies themselves may be what we are interested in: an outlier could represent the symptom of a disease, an attack on a computer network, a scientific discovery, or even an unfaithful partner. We start with some general considerations, such as the relationship between clustering and anomaly detection, the choice between supervised and unsupervised methods, and the difference between global and local anomalies. Then we will survey the most representative anomaly detection algorithms, highlighting what kind of data each approach is best suited to, and discussing their limitations. We will finish with a discussion of the difficulties of anomaly detection in high-dimensional data and some new directions for anomaly detec...

  10. The effects of additive outliers on tests for unit roots and cointegration

    NARCIS (Netherlands)

    Ph.H.B.F. Franses (Philip Hans); N. Haldrup (Niels)

    1994-01-01

    textabstractThe properties of the univariate Dickey-Fuller test and the Johansen test for the cointegrating rank when there exist additive outlying observations in the time series are examined. The analysis provides analytical as well as numerical evidence that additive outliers may produce spurious

  11. Identifying multiple outliers in linear regression: robust fit and clustering approach

    International Nuclear Information System (INIS)

    Robiah Adnan; Mohd Nor Mohamad; Halim Setan

    2001-01-01

    This research provides a clustering based approach for determining potential candidates for outliers. This is modification of the method proposed by Serbert et. al (1988). It is based on using the single linkage clustering algorithm to group the standardized predicted and residual values of data set fit by least trimmed of squares (LTS). (Author)

  12. Outlier Loci and Selection Signatures of Simple Sequence Repeats (SSRs) in Flax (Linum usitatissimum L.).

    Science.gov (United States)

    Soto-Cerda, Braulio J; Cloutier, Sylvie

    2013-01-01

    Genomic microsatellites (gSSRs) and expressed sequence tag-derived SSRs (EST-SSRs) have gained wide application for elucidating genetic diversity and population structure in plants. Both marker systems are assumed to be selectively neutral when making demographic inferences, but this assumption is rarely tested. In this study, three neutrality tests were assessed for identifying outlier loci among 150 SSRs (85 gSSRs and 65 EST-SSRs) that likely influence estimates of population structure in three differentiated flax sub-populations ( F ST  = 0.19). Moreover, the utility of gSSRs, EST-SSRs, and the combined sets of SSRs was also evaluated in assessing genetic diversity and population structure in flax. Six outlier loci were identified by at least two neutrality tests showing footprints of balancing selection. After removing the outlier loci, the STRUCTURE analysis and the dendrogram topology of EST-SSRs improved. Conversely, gSSRs and combined SSRs results did not change significantly, possibly as a consequence of the higher number of neutral loci assessed. Taken together, the genetic structure analyses established the superiority of gSSRs to determine the genetic relationships among flax accessions, although the combined SSRs produced the best results. Genetic diversity parameters did not differ statistically ( P  > 0.05) between gSSRs and EST-SSRs, an observation partially explained by the similar number of repeat motifs. Our study provides new insights into the ability of gSSRs and EST-SSRs to measure genetic diversity and structure in flax and confirms the importance of testing for the occurrence of outlier loci to properly assess natural and breeding populations, particularly in studies considering only few loci.

  13. Robust Wavelet Estimation to Eliminate Simultaneously the Effects of Boundary Problems, Outliers, and Correlated Noise

    Directory of Open Access Journals (Sweden)

    Alsaidi M. Altaher

    2012-01-01

    Full Text Available Classical wavelet thresholding methods suffer from boundary problems caused by the application of the wavelet transformations to a finite signal. As a result, large bias at the edges and artificial wiggles occur when the classical boundary assumptions are not satisfied. Although polynomial wavelet regression and local polynomial wavelet regression effectively reduce the risk of this problem, the estimates from these two methods can be easily affected by the presence of correlated noise and outliers, giving inaccurate estimates. This paper introduces two robust methods in which the effects of boundary problems, outliers, and correlated noise are simultaneously taken into account. The proposed methods combine thresholding estimator with either a local polynomial model or a polynomial model using the generalized least squares method instead of the ordinary one. A primary step that involves removing the outlying observations through a statistical function is considered as well. The practical performance of the proposed methods has been evaluated through simulation experiments and real data examples. The results are strong evidence that the proposed method is extremely effective in terms of correcting the boundary bias and eliminating the effects of outliers and correlated noise.

  14. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012.

    Science.gov (United States)

    Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin

    2017-06-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.

  15. Exploiting the information content of hydrological ''outliers'' for goodness-of-fit testing

    Directory of Open Access Journals (Sweden)

    F. Laio

    2010-10-01

    Full Text Available Validation of probabilistic models based on goodness-of-fit tests is an essential step for the frequency analysis of extreme events. The outcome of standard testing techniques, however, is mainly determined by the behavior of the hypothetical model, FX(x, in the central part of the distribution, while the behavior in the tails of the distribution, which is indeed very relevant in hydrological applications, is relatively unimportant for the results of the tests. The maximum-value test, originally proposed as a technique for outlier detection, is a suitable, but seldom applied, technique that addresses this problem. The test is specifically targeted to verify if the maximum (or minimum values in the sample are consistent with the hypothesis that the distribution FX(x is the real parent distribution. The application of this test is hindered by the fact that the critical values for the test should be numerically obtained when the parameters of FX(x are estimated on the same sample used for verification, which is the standard situation in hydrological applications. We propose here a simple, analytically explicit, technique to suitably account for this effect, based on the application of censored L-moments estimators of the parameters. We demonstrate, with an application that uses artificially generated samples, the superiority of this modified maximum-value test with respect to the standard version of the test. We also show that the test has comparable or larger power with respect to other goodness-of-fit tests (e.g., chi-squared test, Anderson-Darling test, Fung and Paul test, in particular when dealing with small samples (sample size lower than 20–25 and when the parent distribution is similar to the distribution being tested.

  16. Combined CT-based and image-free navigation systems in TKA reduces postoperative outliers of rotational alignment of the tibial component.

    Science.gov (United States)

    Mitsuhashi, Shota; Akamatsu, Yasushi; Kobayashi, Hideo; Kusayama, Yoshihiro; Kumagai, Ken; Saito, Tomoyuki

    2018-02-01

    Rotational malpositioning of the tibial component can lead to poor functional outcome in TKA. Although various surgical techniques have been proposed, precise rotational placement of the tibial component was difficult to accomplish even with the use of a navigation system. The purpose of this study is to assess whether combined CT-based and image-free navigation systems replicate accurately the rotational alignment of tibial component that was preoperatively planned on CT, compared with the conventional method. We compared the number of outliers for rotational alignment of the tibial component using combined CT-based and image-free navigation systems (navigated group) with those of conventional method (conventional group). Seventy-two TKAs were performed between May 2012 and December 2014. In the navigated group, the anteroposterior axis was prepared using CT-based navigation system and the tibial component was positioned under control of the navigation. In the conventional group, the tibial component was placed with reference to the Akagi line that was determined visually. Fisher's exact probability test was performed to evaluate the results. There was a significant difference between the two groups with regard to the number of outliers: 3 outliers in the navigated group compared with 12 outliers in the conventional group (P image-free navigation systems decreased the number of rotational outliers of tibial component, and was helpful for the replication of the accurate rotational alignment of the tibial component that was preoperatively planned.

  17. Adaptive prediction applied to seismic event detection

    International Nuclear Information System (INIS)

    Clark, G.A.; Rodgers, P.W.

    1981-01-01

    Adaptive prediction was applied to the problem of detecting small seismic events in microseismic background noise. The Widrow-Hoff LMS adaptive filter used in a prediction configuration is compared with two standard seismic filters as an onset indicator. Examples demonstrate the technique's usefulness with both synthetic and actual seismic data

  18. Novelty detection for breast cancer image classification

    Science.gov (United States)

    Cichosz, Pawel; Jagodziński, Dariusz; Matysiewicz, Mateusz; Neumann, Łukasz; Nowak, Robert M.; Okuniewski, Rafał; Oleszkiewicz, Witold

    2016-09-01

    Using classification learning algorithms for medical applications may require not only refined model creation techniques and careful unbiased model evaluation, but also detecting the risk of misclassification at the time of model application. This is addressed by novelty detection, which identifies instances for which the training set is not sufficiently representative and for which it may be safer to restrain from classification and request a human expert diagnosis. The paper investigates two techniques for isolated instance identification, based on clustering and one-class support vector machines, which represent two different approaches to multidimensional outlier detection. The prediction quality for isolated instances in breast cancer image data is evaluated using the random forest algorithm and found to be substantially inferior to the prediction quality for non-isolated instances. Each of the two techniques is then used to create a novelty detection model which can be combined with a classification model and used at the time of prediction to detect instances for which the latter cannot be reliably applied. Novelty detection is demonstrated to improve random forest prediction quality and argued to deserve further investigation in medical applications.

  19. An Improved Generalized Predictive Control in a Robust Dynamic Partial Least Square Framework

    Directory of Open Access Journals (Sweden)

    Jin Xin

    2015-01-01

    Full Text Available To tackle the sensitivity to outliers in system identification, a new robust dynamic partial least squares (PLS model based on an outliers detection method is proposed in this paper. An improved radial basis function network (RBFN is adopted to construct the predictive model from inputs and outputs dataset, and a hidden Markov model (HMM is applied to detect the outliers. After outliers are removed away, a more robust dynamic PLS model is obtained. In addition, an improved generalized predictive control (GPC with the tuning weights under dynamic PLS framework is proposed to deal with the interaction which is caused by the model mismatch. The results of two simulations demonstrate the effectiveness of proposed method.

  20. Adaptive prediction applied to seismic event detection

    Energy Technology Data Exchange (ETDEWEB)

    Clark, G.A.; Rodgers, P.W.

    1981-09-01

    Adaptive prediction was applied to the problem of detecting small seismic events in microseismic background noise. The Widrow-Hoff LMS adaptive filter used in a prediction configuration is compared with two standard seismic filters as an onset indicator. Examples demonstrate the technique's usefulness with both synthetic and actual seismic data.

  1. The Super‑efficiency Model and its Use for Ranking and Identification of Outliers

    Directory of Open Access Journals (Sweden)

    Kristína Kočišová

    2017-01-01

    Full Text Available This paper employs non‑radial and non‑oriented super‑efficiency SBM model under the assumption of a variable return to scale to analyse performance of twenty‑two Czech and Slovak domestic commercial banks in 2015. The banks were ranked according to asset‑oriented and profit‑oriented intermediation approach. We pooled the cross‑country data and used them to define a common best‑practice efficiency frontier. This allowed us to focus on determining relative differences in efficiency across banks. The average efficiency was evaluated separately on the “national” and “international” level. Based on the results of analysis can be seen that in Slovak banking sector the level of super‑efficiency was lower compared to Czech banks. Also, the number of super‑efficient banks was lower in a case of Slovakia under both approaches. The boxplot analysis was used to determine the outliers in the dataset. The results suggest that the exclusion of outliers led to the better statistical characteristic of estimated efficiency.

  2. Improved nanostructure reconstruction by performing data refinement in optical scatterometry

    Science.gov (United States)

    Zhu, Jinlong; Jiang, Hao; Shi, Yating; Chen, Xiuguo; Zhang, Chuanwei; Liu, Shiyuan

    2016-01-01

    Recently, we have indirectly demonstrated that nanostructure reconstruction accuracy is degraded by the outliers in optical scatterometry, and we have applied the robust estimation method to suppress these outliers. However, the existence of a possible heavy masking effect could result in the risk of low measurement accuracy, since the detection of outliers is simply based on the judgment of residual value. In this work, a novel method is introduced to directly detect outliers, which can provide the intuitional display of outliers in a two-dimensional coordinate system. Moreover, a robust correction step based on the principle of least trimmed squared estimator regression is proposed to replace the conventional Gauss-Newton iteration step, by which the more reliable and accurate nanostructure reconstruction is achieved. The improved reconstruction of a one-dimensional etched Si grating has demonstrated the feasibility of the proposed methods.

  3. Improved nanostructure reconstruction by performing data refinement in optical scatterometry

    International Nuclear Information System (INIS)

    Zhu, Jinlong; Jiang, Hao; Shi, Yating; Chen, Xiuguo; Zhang, Chuanwei; Liu, Shiyuan

    2016-01-01

    Recently, we have indirectly demonstrated that nanostructure reconstruction accuracy is degraded by the outliers in optical scatterometry, and we have applied the robust estimation method to suppress these outliers. However, the existence of a possible heavy masking effect could result in the risk of low measurement accuracy, since the detection of outliers is simply based on the judgment of residual value. In this work, a novel method is introduced to directly detect outliers, which can provide the intuitional display of outliers in a two-dimensional coordinate system. Moreover, a robust correction step based on the principle of least trimmed squared estimator regression is proposed to replace the conventional Gauss–Newton iteration step, by which the more reliable and accurate nanostructure reconstruction is achieved. The improved reconstruction of a one-dimensional etched Si grating has demonstrated the feasibility of the proposed methods. (paper)

  4. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003–2012

    Science.gov (United States)

    Khan, Diba; Rossen, Lauren M.; Hamilton, Brady E.; He, Yulei; Wei, Rong; Dienes, Erin

    2017-01-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003–2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. PMID:28552189

  5. STEM - software test and evaluation methods: fault detection using static analysis techniques

    International Nuclear Information System (INIS)

    Bishop, P.G.; Esp, D.G.

    1988-08-01

    STEM is a software reliability project with the objective of evaluating a number of fault detection and fault estimation methods which can be applied to high integrity software. This Report gives some interim results of applying both manual and computer-based static analysis techniques, in particular SPADE, to an early CERL version of the PODS software containing known faults. The main results of this study are that: The scope for thorough verification is determined by the quality of the design documentation; documentation defects become especially apparent when verification is attempted. For well-defined software, the thoroughness of SPADE-assisted verification for detecting a large class of faults was successfully demonstrated. For imprecisely-defined software (not recommended for high-integrity systems) the use of tools such as SPADE is difficult and inappropriate. Analysis and verification tools are helpful, through their reliability and thoroughness. However, they are designed to assist, not replace, a human in validating software. Manual inspection can still reveal errors (such as errors in specification and errors of transcription of systems constants) which current tools cannot detect. There is a need for tools to automatically detect typographical errors in system constants, for example by reporting outliers to patterns. To obtain the maximum benefit from advanced tools, they should be applied during software development (when verification problems can be detected and corrected) rather than retrospectively. (author)

  6. A Global Photoionization Response to Prompt Emission and Outliers: Different Origin of Long Gamma-ray Bursts?

    Science.gov (United States)

    Wang, J.; Xin, L. P.; Qiu, Y. L.; Xu, D. W.; Wei, J. Y.

    2018-03-01

    By using the line ratio C IV λ1549/C II λ1335 as a tracer of the ionization ratio of the interstellar medium (ISM) illuminated by a long gamma-ray burst (LGRB), we identify a global photoionization response of the ionization ratio to the photon luminosity of the prompt emission assessed by either L iso/E peak or {L}iso}/{E}peak}2. The ionization ratio increases with both L iso/E peak and L iso/E 2 peak for a majority of the LGRBs in our sample, although there are a few outliers. The identified dependence of C IV/C II on {L}iso}/{E}peak}2 suggests that the scatter of the widely accepted Amati relation is related to the ionization ratio in the ISM. The outliers tend to have relatively high C IV/C II values as well as relatively high C IV λ1549/Si IV λ1403 ratios, which suggests an existence of Wolf–Rayet stars in the environment of these LGRBs. We finally argue that the outliers and the LGRBs following the identified C IV/C II‑L iso/E peak ({L}iso}/{E}peak}2) correlation might come from different progenitors with different local environments.

  7. Moving standard deviation and moving sum of outliers as quality tools for monitoring analytical precision.

    Science.gov (United States)

    Liu, Jiakai; Tan, Chin Hon; Badrick, Tony; Loh, Tze Ping

    2018-02-01

    An increase in analytical imprecision (expressed as CV a ) can introduce additional variability (i.e. noise) to the patient results, which poses a challenge to the optimal management of patients. Relatively little work has been done to address the need for continuous monitoring of analytical imprecision. Through numerical simulations, we describe the use of moving standard deviation (movSD) and a recently described moving sum of outlier (movSO) patient results as means for detecting increased analytical imprecision, and compare their performances against internal quality control (QC) and the average of normal (AoN) approaches. The power of detecting an increase in CV a is suboptimal under routine internal QC procedures. The AoN technique almost always had the highest average number of patient results affected before error detection (ANPed), indicating that it had generally the worst capability for detecting an increased CV a . On the other hand, the movSD and movSO approaches were able to detect an increased CV a at significantly lower ANPed, particularly for measurands that displayed a relatively small ratio of biological variation to CV a. CONCLUSION: The movSD and movSO approaches are effective in detecting an increase in CV a for high-risk measurands with small biological variation. Their performance is relatively poor when the biological variation is large. However, the clinical risks of an increase in analytical imprecision is attenuated for these measurands as an increased analytical imprecision will only add marginally to the total variation and less likely to impact on the clinical care. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  8. Optical fiber-applied radiation detection system

    International Nuclear Information System (INIS)

    Nishiura, Ryuichi; Uranaka, Yasuo; Izumi, Nobuyuki

    2001-01-01

    A technique to measure radiation by using plastic scintillation fibers doped radiation fluorescent (scintillator) to plastic optical fiber for a radiation sensor, was developed. The technique contains some superiority such as high flexibility due to using fibers, relatively easy large area due to detecting portion of whole of fibers, and no electromagnetic noise effect due to optical radiation detection and signal transmission. Measurable to wide range of and continuous radiation distribution along optical fiber cable at a testing portion using scintillation fiber and flight time method, the optical fiber-applied radiation sensing system can effectively monitor space radiation dose or apparatus operation condition monitoring. And, a portable type scintillation optical fiber body surface pollution monitor can measure pollution concentration of radioactive materials attached onto body surface by arranging scintillation fiber processed to a plate with small size and flexibility around a man to be tested. Here were described on outline and fundamental properties of various application products using these plastic scintillation fiber. (G.K.)

  9. Probabilistic Neural Networks for Chemical Sensor Array Pattern Recognition: Comparison Studies, Improvements and Automated Outlier Rejection

    National Research Council Canada - National Science Library

    Shaffer, Ronald E

    1998-01-01

    For application to chemical sensor arrays, the ideal pattern recognition is accurate, fast, simple to train, robust to outliers, has low memory requirements, and has the ability to produce a measure...

  10. Performance of computer-aided detection applied to full-field digital mammography in detection of breast cancers

    International Nuclear Information System (INIS)

    Sadaf, Arifa; Crystal, Pavel; Scaranelo, Anabel; Helbich, Thomas

    2011-01-01

    Objective: The aim of this retrospective study was to evaluate performance of computer-aided detection (CAD) with full-field digital mammography (FFDM) in detection of breast cancers. Materials and Methods: CAD was retrospectively applied to standard mammographic views of 127 cases with biopsy proven breast cancers detected with FFDM (Senographe 2000, GE Medical Systems). CAD sensitivity was assessed in total group of 127 cases and for subgroups based on breast density, mammographic lesion type, mammographic lesion size, histopathology and mode of presentation. Results: Overall CAD sensitivity was 91% (115 of 127 cases). There were no statistical differences (p > 0.1) in CAD detection of cancers in dense breasts 90% (53/59) versus non-dense breasts 91% (62/68). There was statistical difference (p 20 mm 97% (22/23). Conclusion: CAD applied to FFDM showed 100% sensitivity in identifying cancers manifesting as microcalcifications only and high sensitivity 86% (71/83) for other mammographic appearances of cancer. Sensitivity is influenced by lesion size. CAD in FFDM is an adjunct helping radiologist in early detection of breast cancers.

  11. Outlier treatment for improving parameter estimation of group contribution based models for upper flammability limit

    DEFF Research Database (Denmark)

    Frutiger, Jerome; Abildskov, Jens; Sin, Gürkan

    2015-01-01

    Flammability data is needed to assess the risk of fire and explosions. This study presents a new group contribution (GC) model to predict the upper flammability limit UFL oforganic chemicals. Furthermore, it provides a systematic method for outlier treatment inorder to improve the parameter...

  12. Swarm, genetic and evolutionary programming algorithms applied to multiuser detection

    Directory of Open Access Journals (Sweden)

    Paul Jean Etienne Jeszensky

    2005-02-01

    Full Text Available In this paper, the particles swarm optimization technique, recently published in the literature, and applied to Direct Sequence/Code Division Multiple Access systems (DS/CDMA with multiuser detection (MuD is analyzed, evaluated and compared. The Swarm algorithm efficiency when applied to the DS-CDMA multiuser detection (Swarm-MuD is compared through the tradeoff performance versus computational complexity, being the complexity expressed in terms of the number of necessary operations in order to reach the performance obtained through the optimum detector or the Maximum Likelihood detector (ML. The comparison is accomplished among the genetic algorithm, evolutionary programming with cloning and Swarm algorithm under the same simulation basis. Additionally, it is proposed an heuristics-MuD complexity analysis through the number of computational operations. Finally, an analysis is carried out for the input parameters of the Swarm algorithm in the attempt to find the optimum parameters (or almost-optimum for the algorithm applied to the MuD problem.

  13. Robust PLS approach for KPI-related prediction and diagnosis against outliers and missing data

    Science.gov (United States)

    Yin, Shen; Wang, Guang; Yang, Xu

    2014-07-01

    In practical industrial applications, the key performance indicator (KPI)-related prediction and diagnosis are quite important for the product quality and economic benefits. To meet these requirements, many advanced prediction and monitoring approaches have been developed which can be classified into model-based or data-driven techniques. Among these approaches, partial least squares (PLS) is one of the most popular data-driven methods due to its simplicity and easy implementation in large-scale industrial process. As PLS is totally based on the measured process data, the characteristics of the process data are critical for the success of PLS. Outliers and missing values are two common characteristics of the measured data which can severely affect the effectiveness of PLS. To ensure the applicability of PLS in practical industrial applications, this paper introduces a robust version of PLS to deal with outliers and missing values, simultaneously. The effectiveness of the proposed method is finally demonstrated by the application results of the KPI-related prediction and diagnosis on an industrial benchmark of Tennessee Eastman process.

  14. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

    Science.gov (United States)

    Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.

  15. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

    Directory of Open Access Journals (Sweden)

    K K L B Adikaram

    Full Text Available Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1 and 2/n * (1 + k2, respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.

  16. SPATIAL CLUSTER AND OUTLIER IDENTIFICATION OF GEOCHEMICAL ASSOCIATION OF ELEMENTS: A CASE STUDY IN JUIRUI COPPER MINING AREA

    Directory of Open Access Journals (Sweden)

    Tien Thanh NGUYEN

    2016-12-01

    Full Text Available Spatial clusters and spatial outliers play an important role in the study of the spatial distribution patterns of geochemical data. They characterize the fundamental properties of mineralization processes, the spatial distribution of mineral deposits, and ore element concentrations in mineral districts. In this study, a new method for the study of spatial distribution patterns of multivariate data is proposed based on a combination of robust Mahalanobis distance and local Moran’s Ii. In order to construct the spatial matrix, the Moran's I spatial correlogram was first used to determine the range. The robust Mahalanobis distances were then computed for an association of elements. Finally, local Moran’s Ii statistics was used to measure the degree of spatial association and discover the spatial distribution patterns of associations of Cu, Au, Mo, Ag, Pb, Zn, As, and Sb elements including spatial clusters and spatial outliers. Spatial patterns were analyzed at six different spatial scales (2km, 4 km, 6 km, 8 km, 10 km and 12 km for both the raw data and Box-Cox transformed data. The results show that identified spatial cluster and spatial outlier areas using local Moran’s Ii and the robust Mahalanobis accord the objective reality and have a good conformity with known deposits in the study area.

  17. Detection of Doppler Microembolic Signals Using High Order Statistics

    Directory of Open Access Journals (Sweden)

    Maroun Geryes

    2016-01-01

    Full Text Available Robust detection of the smallest circulating cerebral microemboli is an efficient way of preventing strokes, which is second cause of mortality worldwide. Transcranial Doppler ultrasound is widely considered the most convenient system for the detection of microemboli. The most common standard detection is achieved through the Doppler energy signal and depends on an empirically set constant threshold. On the other hand, in the past few years, higher order statistics have been an extensive field of research as they represent descriptive statistics that can be used to detect signal outliers. In this study, we propose new types of microembolic detectors based on the windowed calculation of the third moment skewness and fourth moment kurtosis of the energy signal. During energy embolus-free periods the distribution of the energy is not altered and the skewness and kurtosis signals do not exhibit any peak values. In the presence of emboli, the energy distribution is distorted and the skewness and kurtosis signals exhibit peaks, corresponding to the latter emboli. Applied on real signals, the detection of microemboli through the skewness and kurtosis signals outperformed the detection through standard methods. The sensitivities and specificities reached 78% and 91% and 80% and 90% for the skewness and kurtosis detectors, respectively.

  18. Data Mining for Anomaly Detection

    Science.gov (United States)

    Biswas, Gautam; Mack, Daniel; Mylaraswamy, Dinkar; Bharadwaj, Raj

    2013-01-01

    The Vehicle Integrated Prognostics Reasoner (VIPR) program describes methods for enhanced diagnostics as well as a prognostic extension to current state of art Aircraft Diagnostic and Maintenance System (ADMS). VIPR introduced a new anomaly detection function for discovering previously undetected and undocumented situations, where there are clear deviations from nominal behavior. Once a baseline (nominal model of operations) is established, the detection and analysis is split between on-aircraft outlier generation and off-aircraft expert analysis to characterize and classify events that may not have been anticipated by individual system providers. Offline expert analysis is supported by data curation and data mining algorithms that can be applied in the contexts of supervised learning methods and unsupervised learning. In this report, we discuss efficient methods to implement the Kolmogorov complexity measure using compression algorithms, and run a systematic empirical analysis to determine the best compression measure. Our experiments established that the combination of the DZIP compression algorithm and CiDM distance measure provides the best results for capturing relevant properties of time series data encountered in aircraft operations. This combination was used as the basis for developing an unsupervised learning algorithm to define "nominal" flight segments using historical flight segments.

  19. An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection

    International Nuclear Information System (INIS)

    Zhang, Liangwei; Lin, Jing; Karim, Ramin

    2015-01-01

    The accuracy of traditional anomaly detection techniques implemented on full-dimensional spaces degrades significantly as dimensionality increases, thereby hampering many real-world applications. This work proposes an approach to selecting meaningful feature subspace and conducting anomaly detection in the corresponding subspace projection. The aim is to maintain the detection accuracy in high-dimensional circumstances. The suggested approach assesses the angle between all pairs of two lines for one specific anomaly candidate: the first line is connected by the relevant data point and the center of its adjacent points; the other line is one of the axis-parallel lines. Those dimensions which have a relatively small angle with the first line are then chosen to constitute the axis-parallel subspace for the candidate. Next, a normalized Mahalanobis distance is introduced to measure the local outlier-ness of an object in the subspace projection. To comprehensively compare the proposed algorithm with several existing anomaly detection techniques, we constructed artificial datasets with various high-dimensional settings and found the algorithm displayed superior accuracy. A further experiment on an industrial dataset demonstrated the applicability of the proposed algorithm in fault detection tasks and highlighted another of its merits, namely, to provide preliminary interpretation of abnormality through feature ordering in relevant subspaces. - Highlights: • An anomaly detection approach for high-dimensional reliability data is proposed. • The approach selects relevant subspaces by assessing vectorial angles. • The novel ABSAD approach displays superior accuracy over other alternatives. • Numerical illustration approves its efficacy in fault detection applications

  20. An SPSS implementation of the nonrecursive outlier deletion procedure with shifting z score criterion (Van Selst & Jolicoeur, 1994).

    Science.gov (United States)

    Thompson, Glenn L

    2006-05-01

    Sophisticated univariate outlier screening procedures are not yet available in widely used statistical packages such as SPSS. However, SPSS can accept user-supplied programs for executing these procedures. Failing this, researchers tend to rely on simplistic alternatives that can distort data because they do not adjust to cell-specific characteristics. Despite their popularity, these simple procedures may be especially ill suited for some applications (e.g., data from reaction time experiments). A user friendly SPSS Production Facility implementation of the shifting z score criterion procedure (Van Selst & Jolicoeur, 1994) is presented in an attempt to make it easier to use. In addition to outlier screening, optional syntax modules can be added that will perform tedious database management tasks (e.g., restructuring or computing means).

  1. An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

    Science.gov (United States)

    LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

    2016-12-01

    Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.

  2. ROBUST: an interactive FORTRAN-77 package for exploratory data analysis using parametric, ROBUST and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment

    Science.gov (United States)

    Rock, N. M. S.

    ROBUST calculates 53 statistics, plus significance levels for 6 hypothesis tests, on each of up to 52 variables. These together allow the following properties of the data distribution for each variable to be examined in detail: (1) Location. Three means (arithmetic, geometric, harmonic) are calculated, together with the midrange and 19 high-performance robust L-, M-, and W-estimates of location (combined, adaptive, trimmed estimates, etc.) (2) Scale. The standard deviation is calculated along with the H-spread/2 (≈ semi-interquartile range), the mean and median absolute deviations from both mean and median, and a biweight scale estimator. The 23 location and 6 scale estimators programmed cover all possible degrees of robustness. (3) Normality: Distributions are tested against the null hypothesis that they are normal, using the 3rd (√ h1) and 4th ( b 2) moments, Geary's ratio (mean deviation/standard deviation), Filliben's probability plot correlation coefficient, and a more robust test based on the biweight scale estimator. These statistics collectively are sensitive to most usual departures from normality. (4) Presence of outliers. The maximum and minimum values are assessed individually or jointly using Grubbs' maximum Studentized residuals, Harvey's and Dixon's criteria, and the Studentized range. For a single input variable, outliers can be either winsorized or eliminated and all estimates recalculated iteratively as desired. The following data-transformations also can be applied: linear, log 10, generalized Box Cox power (including log, reciprocal, and square root), exponentiation, and standardization. For more than one variable, all results are tabulated in a single run of ROBUST. Further options are incorporated to assess ratios (of two variables) as well as discrete variables, and be concerned with missing data. Cumulative S-plots (for assessing normality graphically) also can be generated. The mutual consistency or inconsistency of all these measures

  3. The isotope correlation experiment

    International Nuclear Information System (INIS)

    Koch, L.; Schoof, S.

    1983-01-01

    The ESARDA working group on Isotopic Correlation Techniques, ICT and Reprocessing Input Analysis performed an Isotope Correlation Experiment, ICE with the aim to check the feasibility of the new technique. Ten input batches of the reprocessing of the KWO fuel at the WAK plant were analysed by 4 laboratories. All information to compare ICT with the gravimetric and volumetric methods was available. ICT combined with simplified reactor physics calculation was included. The main objectives of the statistical data evaluation were detection of outliers, the estimation of random errors and of systematic errors of the measurements performed by the 4 laboratories. Different methods for outlier detection, analysis of variances, Grubbs' analysis for the constant-bias model and Jaech's non-constant-bias model were applied. Some of the results of the statistical analysis may seem inconsistent which is due to the following reasons. For the statistical evaluations isotope abundance data (weight percent) as well as nuclear concentration data (atoms/initial metal atoms) were subjected to different outlier criteria before being used for further statistical evaluations. None of the four data evaluation groups performed a complete statistical data analysis which would render possible a comparison of the different methods applied since no commonly agreed statistical evaluation procedure existed. The results prove that ICT is as accurate as conventional techniques which have to rely on costly mass spectrometric isotope dilution analysis. The potential of outlier detection by ICT on the basis of the results from a single laboratory is as good as outlier detection by costly interlaboratory comparison. The application of fission product or Cm-244 correlations would be more timely than remeasurements at safeguards laboratories

  4. Outlier-based Health Insurance Fraud Detection for U.S. Medicaid Data

    NARCIS (Netherlands)

    Thornton, Dallas; van Capelleveen, Guido; Poel, Mannes; van Hillegersberg, Jos; Mueller, Roland

    Fraud, waste, and abuse in the U.S. healthcare system are estimated at $700 billion annually. Predictive analytics offers government and private payers the opportunity to identify and prevent or recover such billings. This paper proposes a data-driven method for fraud detection based on comparative

  5. Robust bivariate error detection in skewed data with application to historical radiosonde winds

    KAUST Repository

    Sun, Ying

    2017-01-18

    The global historical radiosonde archives date back to the 1920s and contain the only directly observed measurements of temperature, wind, and moisture in the upper atmosphere, but they contain many random errors. Most of the focus on cleaning these large datasets has been on temperatures, but winds are important inputs to climate models and in studies of wind climatology. The bivariate distribution of the wind vector does not have elliptical contours but is skewed and heavy-tailed, so we develop two methods for outlier detection based on the bivariate skew-t (BST) distribution, using either distance-based or contour-based approaches to flag observations as potential outliers. We develop a framework to robustly estimate the parameters of the BST and then show how the tuning parameter to get these estimates is chosen. In simulation, we compare our methods with one based on a bivariate normal distribution and a nonparametric approach based on the bagplot. We then apply all four methods to the winds observed for over 35,000 radiosonde launches at a single station and demonstrate differences in the number of observations flagged across eight pressure levels and through time. In this pilot study, the method based on the BST contours performs very well.

  6. Robust bivariate error detection in skewed data with application to historical radiosonde winds

    KAUST Repository

    Sun, Ying; Hering, Amanda S.; Browning, Joshua M.

    2017-01-01

    The global historical radiosonde archives date back to the 1920s and contain the only directly observed measurements of temperature, wind, and moisture in the upper atmosphere, but they contain many random errors. Most of the focus on cleaning these large datasets has been on temperatures, but winds are important inputs to climate models and in studies of wind climatology. The bivariate distribution of the wind vector does not have elliptical contours but is skewed and heavy-tailed, so we develop two methods for outlier detection based on the bivariate skew-t (BST) distribution, using either distance-based or contour-based approaches to flag observations as potential outliers. We develop a framework to robustly estimate the parameters of the BST and then show how the tuning parameter to get these estimates is chosen. In simulation, we compare our methods with one based on a bivariate normal distribution and a nonparametric approach based on the bagplot. We then apply all four methods to the winds observed for over 35,000 radiosonde launches at a single station and demonstrate differences in the number of observations flagged across eight pressure levels and through time. In this pilot study, the method based on the BST contours performs very well.

  7. An approach to the analysis of SDSS spectroscopic outliers based on self-organizing maps. Designing the outlier analysis software package for the next Gaia survey

    Science.gov (United States)

    Fustes, D.; Manteiga, M.; Dafonte, C.; Arcay, B.; Ulla, A.; Smith, K.; Borrachero, R.; Sordo, R.

    2013-11-01

    Aims: A new method applied to the segmentation and further analysis of the outliers resulting from the classification of astronomical objects in large databases is discussed. The method is being used in the framework of the Gaia satellite Data Processing and Analysis Consortium (DPAC) activities to prepare automated software tools that will be used to derive basic astrophysical information that is to be included in final Gaia archive. Methods: Our algorithm has been tested by means of simulated Gaia spectrophotometry, which is based on SDSS observations and theoretical spectral libraries covering a wide sample of astronomical objects. Self-organizing maps networks are used to organize the information in clusters of objects, as homogeneously as possible according to their spectral energy distributions, and to project them onto a 2D grid where the data structure can be visualized. Results: We demonstrate the usefulness of the method by analyzing the spectra that were rejected by the SDSS spectroscopic classification pipeline and thus classified as "UNKNOWN". First, our method can help distinguish between astrophysical objects and instrumental artifacts. Additionally, the application of our algorithm to SDSS objects of unknown nature has allowed us to identify classes of objects with similar astrophysical natures. In addition, the method allows for the potential discovery of hundreds of new objects, such as white dwarfs and quasars. Therefore, the proposed method is shown to be very promising for data exploration and knowledge discovery in very large astronomical databases, such as the archive from the upcoming Gaia mission.

  8. Enhanced Isotopic Ratio Outlier Analysis (IROA Peak Detection and Identification with Ultra-High Resolution GC-Orbitrap/MS: Potential Application for Investigation of Model Organism Metabolomes

    Directory of Open Access Journals (Sweden)

    Yunping Qiu

    2018-01-01

    Full Text Available Identifying non-annotated peaks may have a significant impact on the understanding of biological systems. In silico methodologies have focused on ESI LC/MS/MS for identifying non-annotated MS peaks. In this study, we employed in silico methodology to develop an Isotopic Ratio Outlier Analysis (IROA workflow using enhanced mass spectrometric data acquired with the ultra-high resolution GC-Orbitrap/MS to determine the identity of non-annotated metabolites. The higher resolution of the GC-Orbitrap/MS, together with its wide dynamic range, resulted in more IROA peak pairs detected, and increased reliability of chemical formulae generation (CFG. IROA uses two different 13C-enriched carbon sources (randomized 95% 12C and 95% 13C to produce mirror image isotopologue pairs, whose mass difference reveals the carbon chain length (n, which aids in the identification of endogenous metabolites. Accurate m/z, n, and derivatization information are obtained from our GC/MS workflow for unknown metabolite identification, and aids in silico methodologies for identifying isomeric and non-annotated metabolites. We were able to mine more mass spectral information using the same Saccharomyces cerevisiae growth protocol (Qiu et al. Anal. Chem 2016 with the ultra-high resolution GC-Orbitrap/MS, using 10% ammonia in methane as the CI reagent gas. We identified 244 IROA peaks pairs, which significantly increased IROA detection capability compared with our previous report (126 IROA peak pairs using a GC-TOF/MS machine. For 55 selected metabolites identified from matched IROA CI and EI spectra, using the GC-Orbitrap/MS vs. GC-TOF/MS, the average mass deviation for GC-Orbitrap/MS was 1.48 ppm, however, the average mass deviation was 32.2 ppm for the GC-TOF/MS machine. In summary, the higher resolution and wider dynamic range of the GC-Orbitrap/MS enabled more accurate CFG, and the coupling of accurate mass GC/MS IROA methodology with in silico fragmentation has great

  9. Enhanced Isotopic Ratio Outlier Analysis (IROA) Peak Detection and Identification with Ultra-High Resolution GC-Orbitrap/MS: Potential Application for Investigation of Model Organism Metabolomes.

    Science.gov (United States)

    Qiu, Yunping; Moir, Robyn D; Willis, Ian M; Seethapathy, Suresh; Biniakewitz, Robert C; Kurland, Irwin J

    2018-01-18

    Identifying non-annotated peaks may have a significant impact on the understanding of biological systems. In silico methodologies have focused on ESI LC/MS/MS for identifying non-annotated MS peaks. In this study, we employed in silico methodology to develop an Isotopic Ratio Outlier Analysis (IROA) workflow using enhanced mass spectrometric data acquired with the ultra-high resolution GC-Orbitrap/MS to determine the identity of non-annotated metabolites. The higher resolution of the GC-Orbitrap/MS, together with its wide dynamic range, resulted in more IROA peak pairs detected, and increased reliability of chemical formulae generation (CFG). IROA uses two different 13 C-enriched carbon sources (randomized 95% 12 C and 95% 13 C) to produce mirror image isotopologue pairs, whose mass difference reveals the carbon chain length (n), which aids in the identification of endogenous metabolites. Accurate m/z, n, and derivatization information are obtained from our GC/MS workflow for unknown metabolite identification, and aids in silico methodologies for identifying isomeric and non-annotated metabolites. We were able to mine more mass spectral information using the same Saccharomyces cerevisiae growth protocol (Qiu et al. Anal. Chem 2016) with the ultra-high resolution GC-Orbitrap/MS, using 10% ammonia in methane as the CI reagent gas. We identified 244 IROA peaks pairs, which significantly increased IROA detection capability compared with our previous report (126 IROA peak pairs using a GC-TOF/MS machine). For 55 selected metabolites identified from matched IROA CI and EI spectra, using the GC-Orbitrap/MS vs. GC-TOF/MS, the average mass deviation for GC-Orbitrap/MS was 1.48 ppm, however, the average mass deviation was 32.2 ppm for the GC-TOF/MS machine. In summary, the higher resolution and wider dynamic range of the GC-Orbitrap/MS enabled more accurate CFG, and the coupling of accurate mass GC/MS IROA methodology with in silico fragmentation has great potential in

  10. Principal components in the discrimination of outliers: A study in simulation sample data corrected by Pearson's and Yates´s chi-square distance

    Directory of Open Access Journals (Sweden)

    Manoel Vitor de Souza Veloso

    2016-04-01

    Full Text Available Current study employs Monte Carlo simulation in the building of a significance test to indicate the principal components that best discriminate against outliers. Different sample sizes were generated by multivariate normal distribution with different numbers of variables and correlation structures. Corrections by chi-square distance of Pearson´s and Yates's were provided for each sample size. Pearson´s correlation test showed the best performance. By increasing the number of variables, significance probabilities in favor of hypothesis H0 were reduced. So that the proposed method could be illustrated, a multivariate time series was applied with regard to sales volume rates in the state of Minas Gerais, obtained in different market segments.

  11. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation

    Science.gov (United States)

    Adikaram, K. K. L. B.; Becker, T.

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio R max of a max − a min and S n − a min *n and that of R min of a max − a min and a max *n − S n are always equal to 2/n, where a max is the maximum element, a min is the minimum element and S n is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, R max > 2/n and R min > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k 1 ) and 2/n * (1 + k 2 ), respectively, where k 1 > k 2 and 0 ≤ k 1 ≤ n/2 − 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10−4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process. PMID:26571035

  12. Quantile index for gradual and abrupt change detection from CFB boiler sensor data in online settings

    NARCIS (Netherlands)

    Maslov, A.; Pechenizkiy, M.; Kärkkäinen, T.; Tähtinen, M.

    2012-01-01

    In this paper we consider the problem of online detection of gradual and abrupt changes in sensor data having high levels of noise and outliers. We propose a simple heuristic method based on the Quantile Index (QI) and study how robust this method is for detecting both gradual and abrupt changes

  13. Self-adaptive change detection in streaming data with non-stationary distribution

    KAUST Repository

    Zhang, Xiangliang

    2010-01-01

    Non-stationary distribution, in which the data distribution evolves over time, is a common issue in many application fields, e.g., intrusion detection and grid computing. Detecting the changes in massive streaming data with a non-stationary distribution helps to alarm the anomalies, to clean the noises, and to report the new patterns. In this paper, we employ a novel approach for detecting changes in streaming data with the purpose of improving the quality of modeling the data streams. Through observing the outliers, this approach of change detection uses a weighted standard deviation to monitor the evolution of the distribution of data streams. A cumulative statistical test, Page-Hinkley, is employed to collect the evidence of changes in distribution. The parameter used for reporting the changes is self-adaptively adjusted according to the distribution of data streams, rather than set by a fixed empirical value. The self-adaptability of the novel approach enhances the effectiveness of modeling data streams by timely catching the changes of distributions. We validated the approach on an online clustering framework with a benchmark KDDcup 1999 intrusion detection data set as well as with a real-world grid data set. The validation results demonstrate its better performance on achieving higher accuracy and lower percentage of outliers comparing to the other change detection approaches. © 2010 Springer-Verlag.

  14. Automated rice leaf disease detection using color image analysis

    Science.gov (United States)

    Pugoy, Reinald Adrian D. L.; Mariano, Vladimir Y.

    2011-06-01

    In rice-related institutions such as the International Rice Research Institute, assessing the health condition of a rice plant through its leaves, which is usually done as a manual eyeball exercise, is important to come up with good nutrient and disease management strategies. In this paper, an automated system that can detect diseases present in a rice leaf using color image analysis is presented. In the system, the outlier region is first obtained from a rice leaf image to be tested using histogram intersection between the test and healthy rice leaf images. Upon obtaining the outlier, it is then subjected to a threshold-based K-means clustering algorithm to group related regions into clusters. Then, these clusters are subjected to further analysis to finally determine the suspected diseases of the rice leaf.

  15. Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

    Science.gov (United States)

    Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

    2018-05-03

    Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.

  16. Robust Deep Network with Maximum Correntropy Criterion for Seizure Detection

    Directory of Open Access Journals (Sweden)

    Yu Qi

    2014-01-01

    Full Text Available Effective seizure detection from long-term EEG is highly important for seizure diagnosis. Existing methods usually design the feature and classifier individually, while little work has been done for the simultaneous optimization of the two parts. This work proposes a deep network to jointly learn a feature and a classifier so that they could help each other to make the whole system optimal. To deal with the challenge of the impulsive noises and outliers caused by EMG artifacts in EEG signals, we formulate a robust stacked autoencoder (R-SAE as a part of the network to learn an effective feature. In R-SAE, the maximum correntropy criterion (MCC is proposed to reduce the effect of noise/outliers. Unlike the mean square error (MSE, the output of the new kernel MCC increases more slowly than that of MSE when the input goes away from the center. Thus, the effect of those noises/outliers positioned far away from the center can be suppressed. The proposed method is evaluated on six patients of 33.6 hours of scalp EEG data. Our method achieves a sensitivity of 100% and a specificity of 99%, which is promising for clinical applications.

  17. Applied network security monitoring collection, detection, and analysis

    CERN Document Server

    Sanders, Chris

    2013-01-01

    Applied Network Security Monitoring is the essential guide to becoming an NSM analyst from the ground up. This book takes a fundamental approach to NSM, complete with dozens of real-world examples that teach you the key concepts of NSM. Network security monitoring is based on the principle that prevention eventually fails. In the current threat landscape, no matter how much you try, motivated attackers will eventually find their way into your network. At that point, it is your ability to detect and respond to that intrusion that can be the difference between a small incident and a major di

  18. A practical method to detect the freezing/thawing onsets of seasonal frozen ground in Alaska

    Science.gov (United States)

    Chen, Xiyu; Liu, Lin

    2017-04-01

    Microwave remote sensing can provide useful information about freeze/thaw state of soil at the Earth surface. An edge detection method is applied in this study to estimate the onsets of soil freeze/thaw state transition using L band space-borne radiometer data. The Soil Moisture Active Passive (SMAP) mission has a L band radiometer and can provide daily brightness temperature (TB) with horizontal/vertical polarizations. We use the normalized polarization ratios (NPR) calculated based on the Level-1C TB product of SMAP (spatial resolution: 36 km) as the indicator for soil freeze/thaw state, to estimate the freezing and thawing onsets in Alaska in the year of 2015 and 2016. NPR is calculated based on the difference between TB at vertical and horizontal polarizations. Therefore, it is strongly sensitive to liquid water content change in the soil and independent with the soil temperature. Onset estimation is based on the detection of abrupt changes of NPR in transition seasons using edge detection method, and the validation is to compare estimated onsets with the onsets derived from in situ measurement. According to the comparison, the estimated onsets were generally 15 days earlier than the measured onsets in 2015. However, in 2016 there were 4 days in average for the estimation earlier than the measured, which may be due to the less snow cover. Moreover, we extended our estimation to the entire state of Alaska. The estimated freeze/thaw onsets showed a reasonable latitude-dependent distribution although there are still some outliers caused by the noisy variation of NPR. At last, we also try to remove these outliers and improve the performance of the method by smoothing the NPR time series.

  19. Supervised detection of anomalous light curves in massive astronomical catalogs

    International Nuclear Information System (INIS)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

    2014-01-01

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known

  20. Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs

    Science.gov (United States)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

    2014-09-01

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known

  1. The Outlier Sectors: Areas of Non-Free Trade in the North American Free Trade Agreement

    OpenAIRE

    Eric T. Miller

    2002-01-01

    Since its entry into force, the North American Free Trade Agreement (NAFTA) has been enormously influential as a model for trade liberalization. While trade in goods among Canada, the United States and Mexico has been liberalized to a significant degree, this most famous of agreements nonetheless contains areas of recalcitrant protectionism. The first part of this paper identifies these "outlier sectors" and classifies them by primary source advocating protectionism, i.e., producer interests ...

  2. Universal ligation-detection-reaction microarray applied for compost microbes

    Directory of Open Access Journals (Sweden)

    Romantschuk Martin

    2008-12-01

    Full Text Available Abstract Background Composting is one of the methods utilised in recycling organic communal waste. The composting process is dependent on aerobic microbial activity and proceeds through a succession of different phases each dominated by certain microorganisms. In this study, a ligation-detection-reaction (LDR based microarray method was adapted for species-level detection of compost microbes characteristic of each stage of the composting process. LDR utilises the specificity of the ligase enzyme to covalently join two adjacently hybridised probes. A zip-oligo is attached to the 3'-end of one probe and fluorescent label to the 5'-end of the other probe. Upon ligation, the probes are combined in the same molecule and can be detected in a specific location on a universal microarray with complementary zip-oligos enabling equivalent hybridisation conditions for all probes. The method was applied to samples from Nordic composting facilities after testing and optimisation with fungal pure cultures and environmental clones. Results Probes targeted for fungi were able to detect 0.1 fmol of target ribosomal PCR product in an artificial reaction mixture containing 100 ng competing fungal ribosomal internal transcribed spacer (ITS area or herring sperm DNA. The detection level was therefore approximately 0.04% of total DNA. Clone libraries were constructed from eight compost samples. The LDR microarray results were in concordance with the clone library sequencing results. In addition a control probe was used to monitor the per-spot hybridisation efficiency on the array. Conclusion This study demonstrates that the LDR microarray method is capable of sensitive and accurate species-level detection from a complex microbial community. The method can detect key species from compost samples, making it a basis for a tool for compost process monitoring in industrial facilities.

  3. Validation of the Applied Biosystems RapidFinder Shiga Toxin-Producing E. coli (STEC) Detection Workflow.

    Science.gov (United States)

    Cloke, Jonathan; Matheny, Sharon; Swimley, Michelle; Tebbs, Robert; Burrell, Angelia; Flannery, Jonathan; Bastin, Benjamin; Bird, Patrick; Benzinger, M Joseph; Crowley, Erin; Agin, James; Goins, David; Salfinger, Yvonne; Brodsky, Michael; Fernandez, Maria Cristina

    2016-11-01

    The Applied Biosystems™ RapidFinder™ STEC Detection Workflow (Thermo Fisher Scientific) is a complete protocol for the rapid qualitative detection of Escherichia coli (E. coli) O157:H7 and the "Big 6" non-O157 Shiga-like toxin-producing E. coli (STEC) serotypes (defined as serogroups: O26, O45, O103, O111, O121, and O145). The RapidFinder STEC Detection Workflow makes use of either the automated preparation of PCR-ready DNA using the Applied Biosystems PrepSEQ™ Nucleic Acid Extraction Kit in conjunction with the Applied Biosystems MagMAX™ Express 96-well magnetic particle processor or the Applied Biosystems PrepSEQ Rapid Spin kit for manual preparation of PCR-ready DNA. Two separate assays comprise the RapidFinder STEC Detection Workflow, the Applied Biosystems RapidFinder STEC Screening Assay and the Applied Biosystems RapidFinder STEC Confirmation Assay. The RapidFinder STEC Screening Assay includes primers and probes to detect the presence of stx1 (Shiga toxin 1), stx2 (Shiga toxin 2), eae (intimin), and E. coli O157 gene targets. The RapidFinder STEC Confirmation Assay includes primers and probes for the "Big 6" non-O157 STEC and E. coli O157:H7. The use of these two assays in tandem allows a user to detect accurately the presence of the "Big 6" STECs and E. coli O157:H7. The performance of the RapidFinder STEC Detection Workflow was evaluated in a method comparison study, in inclusivity and exclusivity studies, and in a robustness evaluation. The assays were compared to the U.S. Department of Agriculture (USDA), Food Safety and Inspection Service (FSIS) Microbiology Laboratory Guidebook (MLG) 5.09: Detection, Isolation and Identification of Escherichia coli O157:H7 from Meat Products and Carcass and Environmental Sponges for raw ground beef (73% lean) and USDA/FSIS-MLG 5B.05: Detection, Isolation and Identification of Escherichia coli non-O157:H7 from Meat Products and Carcass and Environmental Sponges for raw beef trim. No statistically significant

  4. Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center.

    Science.gov (United States)

    Mehra, Tarun; Müller, Christian Thomas Benedikt; Volbracht, Jörk; Seifert, Burkhardt; Moos, Rudolf

    2015-01-01

    Case weights of Diagnosis Related Groups (DRGs) are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG. 28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings. Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001). Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90). Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile). Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52). ICU stay, mechanical and patient clinical complexity level score (PCCL) predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001). We suggest considering psychiatric diagnosis, admission as an emergency case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses.

  5. Predictors of High Profit and High Deficit Outliers under SwissDRG of a Tertiary Care Center.

    Directory of Open Access Journals (Sweden)

    Tarun Mehra

    Full Text Available Case weights of Diagnosis Related Groups (DRGs are determined by the average cost of cases from a previous billing period. However, a significant amount of cases are largely over- or underfunded. We therefore decided to analyze earning outliers of our hospital as to search for predictors enabling a better grouping under SwissDRG.28,893 inpatient cases without additional private insurance discharged from our hospital in 2012 were included in our analysis. Outliers were defined by the interquartile range method. Predictors for deficit and profit outliers were determined with logistic regressions. Predictors were shortlisted with the LASSO regularized logistic regression method and compared to results of Random forest analysis. 10 of these parameters were selected for quantile regression analysis as to quantify their impact on earnings.Psychiatric diagnosis and admission as an emergency case were significant predictors for higher deficit with negative regression coefficients for all analyzed quantiles (p<0.001. Admission from an external health care provider was a significant predictor for a higher deficit in all but the 90% quantile (p<0.001 for Q10, Q20, Q50, Q80 and p = 0.0017 for Q90. Burns predicted higher earnings for cases which were favorably remunerated (p<0.001 for the 90% quantile. Osteoporosis predicted a higher deficit in the most underfunded cases, but did not predict differences in earnings for balanced or profitable cases (Q10 and Q20: p<0.00, Q50: p = 0.10, Q80: p = 0.88 and Q90: p = 0.52. ICU stay, mechanical and patient clinical complexity level score (PCCL predicted higher losses at the 10% quantile but also higher profits at the 90% quantile (p<0.001.We suggest considering psychiatric diagnosis, admission as an emergency case and admission from an external health care provider as DRG split criteria as they predict large, consistent and significant losses.

  6. Outlier SNP markers reveal fine-scale genetic structuring across European hake populations (Merluccius merluccius)

    DEFF Research Database (Denmark)

    Milano, I.; Babbucci, M.; Cariani, A.

    2014-01-01

    fishery. Analysis of 850 individuals from 19 locations across the entire distribution range showed evidence for several outlier loci, with significantly higher resolving power. While 299 putatively neutral SNPs confirmed the genetic break between basins (FCT = 0.016) and weak differentiation within basins...... even when neutral markers provide genetic homogeneity across populations. Here, 381 SNPs located in transcribed regions were used to assess largeand fine-scale population structure in the European hake (Merluccius merluccius), a widely distributed demersal species of high priority for the European...

  7. Efficient Estimation of Dynamic Density Functions with Applications in Streaming Data

    KAUST Repository

    Qahtan, Abdulhakim Ali Ali

    2016-01-01

    application is to detect outliers in data streams from sensor networks based on the estimated PDF. The method detects outliers accurately and outperforms baseline methods designed for detecting and cleaning outliers in sensor data. The third application

  8. New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System

    Directory of Open Access Journals (Sweden)

    Jae Kang Lee

    2016-05-01

    Full Text Available This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased.

  9. The source of prehistoric obsidian artefacts from the Polynesian outlier of Taumako in the Solomon Islands

    Energy Technology Data Exchange (ETDEWEB)

    Leach, Foss [Otago Univ., Dunedin (New Zealand). Dept. of Anthropology

    1985-01-01

    Six obsidian artefacts from the Polynesian outlier of Taumako in the Solomon Islands dating to between 500 and 1000 B.C. were analysed for trace elements by the PIXE-PIGME method. Four are shown to derive from Vanuatu, but the remaining two artefacts do not match any of the known 66 sources in the Pacific region. Continuing difficulties with the methodology of Pacific obsidian sourcing are discussed. 14 refs; 2 tables.

  10. RE-EXAMINING HIGH ABUNDANCE SLOAN DIGITAL SKY SURVEY MASS-METALLICITY OUTLIERS: HIGH N/O, EVOLVED WOLF-RAYET GALAXIES?

    International Nuclear Information System (INIS)

    Berg, Danielle A.; Skillman, Evan D.; Marble, Andrew R.

    2011-01-01

    We present new MMT spectroscopic observations of four dwarf galaxies representative of a larger sample observed by the Sloan Digital Sky Survey and identified by Peeples et al. as low-mass, high oxygen abundance outliers from the mass-metallicity relation. Peeples showed that these four objects (with metallicity estimates of 8.5 ≤ 12 + log(O/H) ≤ 8.8) have oxygen abundance offsets of 0.4-0.6 dex from the M B luminosity-metallicity relation. Our new observations extend the wavelength coverage to include the [O II] λλ3726, 3729 doublet, which adds leverage in oxygen abundance estimates and allows measurements of N/O ratios. All four spectra are low excitation, with relatively high N/O ratios (N/O ∼> 0.10), each of which tend to bias estimates based on strong emission lines toward high oxygen abundances. These spectra all fall in a regime where the 'standard' strong-line methods for metallicity determinations are not well calibrated either empirically or by photoionization modeling. By comparing our spectra directly to photoionization models, we estimate oxygen abundances in the range of 7.9 ≤ 12 + log (O/H) ≤ 8.4, consistent with the scatter of the mass-metallicity relation. We discuss the physical nature of these galaxies that leads to their unusual spectra (and previous classification as outliers), finding their low excitation, elevated N/O, and strong Balmer absorption are consistent with the properties expected from galaxies evolving past the 'Wolf-Rayet galaxy' phase. We compare our results to the 'main' sample of Peeples and conclude that they are outliers primarily due to enrichment of nitrogen relative to oxygen and not due to unusually high oxygen abundances for their masses or luminosities.

  11. Transfer Entropy Estimation and Directional Coupling Change Detection in Biomedical Time Series

    Directory of Open Access Journals (Sweden)

    Lee Joon

    2012-04-01

    Full Text Available Abstract Background The detection of change in magnitude of directional coupling between two non-linear time series is a common subject of interest in the biomedical domain, including studies involving the respiratory chemoreflex system. Although transfer entropy is a useful tool in this avenue, no study to date has investigated how different transfer entropy estimation methods perform in typical biomedical applications featuring small sample size and presence of outliers. Methods With respect to detection of increased coupling strength, we compared three transfer entropy estimation techniques using both simulated time series and respiratory recordings from lambs. The following estimation methods were analyzed: fixed-binning with ranking, kernel density estimation (KDE, and the Darbellay-Vajda (D-V adaptive partitioning algorithm extended to three dimensions. In the simulated experiment, sample size was varied from 50 to 200, while coupling strength was increased. In order to introduce outliers, the heavy-tailed Laplace distribution was utilized. In the lamb experiment, the objective was to detect increased respiratory-related chemosensitivity to O2 and CO2 induced by a drug, domperidone. Specifically, the separate influence of end-tidal PO2 and PCO2 on minute ventilation (V˙E before and after administration of domperidone was analyzed. Results In the simulation, KDE detected increased coupling strength at the lowest SNR among the three methods. In the lamb experiment, D-V partitioning resulted in the statistically strongest increase in transfer entropy post-domperidone for PO2→V˙E. In addition, D-V partitioning was the only method that could detect an increase in transfer entropy for PCO2→V˙E, in agreement with experimental findings. Conclusions Transfer entropy is capable of detecting directional coupling changes in non-linear biomedical time series analysis featuring a small number of observations and presence of outliers. The results

  12. Using a cross-model loadings plot to identify protein spots causing 2-DE gels to become outliers in PCA

    DEFF Research Database (Denmark)

    Kristiansen, Luise Cederkvist; Jacobsen, Susanne; Jessen, Flemming

    2010-01-01

    The multivariate method PCA is an exploratory tool often used to get an overview of multivariate data, such as the quantified spot volumes of digitized 2-DE gels. PCA can reveal hidden structures present in the data, and thus enables identification of potential outliers and clustering. Based on PCA...

  13. Real-time detection of organic contamination events in water distribution systems by principal components analysis of ultraviolet spectral data.

    Science.gov (United States)

    Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo

    2017-05-01

    The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.

  14. Online Detection of Anomalous Sub-trajectories: A Sliding Window Approach Based on Conformal Anomaly Detection and Local Outlier Factor

    OpenAIRE

    Laxhammar , Rikard; Falkman , Göran

    2012-01-01

    Part 4: First Conformal Prediction and Its Applications Workshop (COPA 2012); International audience; Automated detection of anomalous trajectories is an important problem in the surveillance domain. Various algorithms based on learning of normal trajectory patterns have been proposed for this problem. Yet, these algorithms suffer from one or more of the following limitations: First, they are essentially designed for offline anomaly detection in databases. Second, they are insensitive to loca...

  15. An Applied Physicist Does Econometrics

    Science.gov (United States)

    Taff, L. G.

    2010-02-01

    The biggest problem those attempting to understand econometric data, via modeling, have is that economics has no F = ma. Without a theoretical underpinning, econometricians have no way to build a good model to fit observations to. Physicists do, and when F = ma failed, we knew it. Still desiring to comprehend econometric data, applied economists turn to mis-applying probability theory---especially with regard to the assumptions concerning random errors---and choosing extremely simplistic analytical formulations of inter-relationships. This introduces model bias to an unknown degree. An applied physicist, used to having to match observations to a numerical or analytical model with a firm theoretical basis, modify the model, re-perform the analysis, and then know why, and when, to delete ``outliers'', is at a considerable advantage when quantitatively analyzing econometric data. I treat two cases. One is to determine the household density distribution of total assets, annual income, age, level of education, race, and marital status. Each of these ``independent'' variables is highly correlated with every other but only current annual income and level of education follow a linear relationship. The other is to discover the functional dependence of total assets on the distribution of assets: total assets has an amazingly tight power law dependence on a quadratic function of portfolio composition. Who knew? )

  16. Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

    Science.gov (United States)

    Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

    2014-06-26

    To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.

  17. A new approach for structural health monitoring by applying anomaly detection on strain sensor data

    Science.gov (United States)

    Trichias, Konstantinos; Pijpers, Richard; Meeuwissen, Erik

    2014-03-01

    Structural Health Monitoring (SHM) systems help to monitor critical infrastructures (bridges, tunnels, etc.) remotely and provide up-to-date information about their physical condition. In addition, it helps to predict the structure's life and required maintenance in a cost-efficient way. Typically, inspection data gives insight in the structural health. The global structural behavior, and predominantly the structural loading, is generally measured with vibration and strain sensors. Acoustic emission sensors are more and more used for measuring global crack activity near critical locations. In this paper, we present a procedure for local structural health monitoring by applying Anomaly Detection (AD) on strain sensor data for sensors that are applied in expected crack path. Sensor data is analyzed by automatic anomaly detection in order to find crack activity at an early stage. This approach targets the monitoring of critical structural locations, such as welds, near which strain sensors can be applied during construction and/or locations with limited inspection possibilities during structural operation. We investigate several anomaly detection techniques to detect changes in statistical properties, indicating structural degradation. The most effective one is a novel polynomial fitting technique, which tracks slow changes in sensor data. Our approach has been tested on a representative test structure (bridge deck) in a lab environment, under constant and variable amplitude fatigue loading. In both cases, the evolving cracks at the monitored locations were successfully detected, autonomously, by our AD monitoring tool.

  18. Setup Instructions for the Applied Anomaly Detection Tool (AADT) Web Server

    Science.gov (United States)

    2016-09-01

    tool has been developed for many platforms: Android , iOS, and Windows. The Windows version has been developed as a web server that allows the...Microsoft Windows. 15. SUBJECT TERMS Applied Anomaly Detection Tool, AADT, Windows, server, web service, installation 16. SECURITY CLASSIFICATION OF: 17...instructional information about identifying them as groups and individually. The software has been developed for several different platforms: Android

  19. Tourism Demand in Catalonia: detecting external economic factors

    OpenAIRE

    Clavería González, Óscar; Datzira, Jordi

    2009-01-01

    There is a lack of studies on tourism demand in Catalonia. To fill the gap, this paper focuses on detecting the macroeconomic factors that determine tourism demand in Catalonia. We also analyse the relation between these factors and tourism demand. Despite the strong seasonal component and the outliers in the time series of some countries, overnight stays give a better indication of tourism demand in Catalonia than the number of tourists. The degree of linear association between the macroecon...

  20. A positive deviance approach to early childhood obesity: cross-sectional characterization of positive outliers.

    Science.gov (United States)

    Foster, Byron Alexander; Farragher, Jill; Parker, Paige; Hale, Daniel E

    2015-06-01

    Positive deviance methodology has been applied in the developing world to address childhood malnutrition and has potential for application to childhood obesity in the United States. We hypothesized that among children at high-risk for obesity, evaluating normal weight children will enable identification of positive outlier behaviors and practices. In a community at high-risk for obesity, a cross-sectional mixed-methods analysis was done of normal weight, overweight, and obese children, classified by BMI percentile. Parents were interviewed using a semistructured format in regard to their children's general health, feeding and activity practices, and perceptions of weight. Interviews were conducted in 40 homes in the lower Rio Grande Valley in Texas with a largely Hispanic (87.5%) population. Demographics, including income, education, and food assistance use, did not vary between groups. Nearly all (93.8%) parents of normal weight children perceived their child to be lower than the median weight. Group differences were observed for reported juice and yogurt consumption. Differences in both emotional feeding behaviors and parents' internalization of reasons for healthy habits were identified as different between groups. We found subtle variations in reported feeding and activity practices by weight status among healthy children in a population at high risk for obesity. The behaviors and attitudes described were consistent with previous literature; however, the local strategies associated with a healthy weight are novel, potentially providing a basis for a specific intervention in this population.

  1. Engaging children in the development of obesity interventions: Exploring outcomes that matter most among obesity positive outliers.

    Science.gov (United States)

    Sharifi, Mona; Marshall, Gareth; Goldman, Roberta E; Cunningham, Courtney; Marshall, Richard; Taveras, Elsie M

    2015-11-01

    To explore outcomes and measures of success that matter most to 'positive outlier' children who improved their body mass index (BMI) despite living in obesogenic neighborhoods. We collected residential address and longitudinal height/weight data from electronic health records of 22,657 children ages 6-12 years in Massachusetts. We defined obesity "hotspots" as zip codes where >15% of children had a BMI ≥95th percentile. Using linear mixed effects models, we generated a BMI z-score slope for each child with a history of obesity. We recruited 10-12 year-olds with negative slopes living in hotspots for focus groups. We analyzed group transcripts and discussed emerging themes in iterative meetings using an immersion/crystallization approach. We reached thematic saturation after 4 focus groups with 21 children. Children identified bullying and negative peer comparisons related to physical appearance, clothing size, and athletic ability as motivating them to achieve a healthier weight, and they measured success as improvement in these domains. Positive relationships with friends and family facilitated both behavior change initiation and maintenance. The perspectives of positive outlier children can provide insight into children's motivations leading to successful obesity management. Child/family engagement should guide the development of patient-centered obesity interventions. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  2. Entropy Measures for Stochastic Processes with Applications in Functional Anomaly Detection

    Directory of Open Access Journals (Sweden)

    Gabriel Martos

    2018-01-01

    Full Text Available We propose a definition of entropy for stochastic processes. We provide a reproducing kernel Hilbert space model to estimate entropy from a random sample of realizations of a stochastic process, namely functional data, and introduce two approaches to estimate minimum entropy sets. These sets are relevant to detect anomalous or outlier functional data. A numerical experiment illustrates the performance of the proposed method; in addition, we conduct an analysis of mortality rate curves as an interesting application in a real-data context to explore functional anomaly detection.

  3. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    Science.gov (United States)

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  4. Genomic outlier profile analysis: mixture models, null hypotheses, and nonparametric estimation.

    Science.gov (United States)

    Ghosh, Debashis; Chinnaiyan, Arul M

    2009-01-01

    In most analyses of large-scale genomic data sets, differential expression analysis is typically assessed by testing for differences in the mean of the distributions between 2 groups. A recent finding by Tomlins and others (2005) is of a different type of pattern of differential expression in which a fraction of samples in one group have overexpression relative to samples in the other group. In this work, we describe a general mixture model framework for the assessment of this type of expression, called outlier profile analysis. We start by considering the single-gene situation and establishing results on identifiability. We propose 2 nonparametric estimation procedures that have natural links to familiar multiple testing procedures. We then develop multivariate extensions of this methodology to handle genome-wide measurements. The proposed methodologies are compared using simulation studies as well as data from a prostate cancer gene expression study.

  5. The internal structure of eclogite-facies ophiolite complexes: Implications from the Austroalpine outliers within the Zermatt-Saas Zone, Western Alps

    Science.gov (United States)

    Weber, Sebastian; Martinez, Raul

    2016-04-01

    The Western Alpine Penninic domain is a classical accretionary prism that formed after the closure of the Penninic oceans in the Paleogene. Continental and oceanic nappes were telescoped into the Western Alpine stack associated with continent-continent collision. Within the Western Alpine geologic framework, the ophiolite nappes of the Zermatt-Saas Zone and the Tsate Unit are the remnants of the southern branch of the Piemonte-Liguria ocean basin. In addition, a series of continental basement slices reported as lower Austroalpine outliers have preserved an eclogitic high-pressure imprint, and are tectonically sandwiched between these oceanic nappes. Since the outliers occur at an unusual intra-ophiolitic setting and show a polymetamorphic character, this group of continental slices is of special importance for understanding the tectono-metamorphic evolution of Western Alps. Recently, more geochronological data from the Austroalpine outliers have become available that make it possible to establish a more complete picture of their complex geological history. The Lu-Hf garnet-whole rock ages for prograde growth of garnet fall into the time interval of 52 to 62 Ma (Weber et al., 2015, Fassmer et al. 2015), but are consistently higher than the Lu-Hf garnet-whole rock ages from several other locations throughout the Zermatt-Saas zone that range from 52 to 38 Ma (Skora et al., 2015). This discrepancy suggests that the Austroalpine outliers may have been subducted earlier than the ophiolites of the Zermatt-Saas Zone and therefore have been tectonically emplaced into their present intra-ophiolite position. This points to the possibility that the Zermatt-Saas Zone consists of tectonic subunits, which reached their respective pressure peaks over a prolonged time period, approximately 10-20 Ma. The pressure-temperature estimates from several members of the Austroalpine outliers indicate a complex distribution of metamorphic peak conditions, without ultrahigh

  6. Hybrid online sensor error detection and functional redundancy for systems with time-varying parameters.

    Science.gov (United States)

    Feng, Jianyuan; Turksoy, Kamuran; Samadi, Sediqeh; Hajizadeh, Iman; Littlejohn, Elizabeth; Cinar, Ali

    2017-12-01

    Supervision and control systems rely on signals from sensors to receive information to monitor the operation of a system and adjust manipulated variables to achieve the control objective. However, sensor performance is often limited by their working conditions and sensors may also be subjected to interference by other devices. Many different types of sensor errors such as outliers, missing values, drifts and corruption with noise may occur during process operation. A hybrid online sensor error detection and functional redundancy system is developed to detect errors in online signals, and replace erroneous or missing values detected with model-based estimates. The proposed hybrid system relies on two techniques, an outlier-robust Kalman filter (ORKF) and a locally-weighted partial least squares (LW-PLS) regression model, which leverage the advantages of automatic measurement error elimination with ORKF and data-driven prediction with LW-PLS. The system includes a nominal angle analysis (NAA) method to distinguish between signal faults and large changes in sensor values caused by real dynamic changes in process operation. The performance of the system is illustrated with clinical data continuous glucose monitoring (CGM) sensors from people with type 1 diabetes. More than 50,000 CGM sensor errors were added to original CGM signals from 25 clinical experiments, then the performance of error detection and functional redundancy algorithms were analyzed. The results indicate that the proposed system can successfully detect most of the erroneous signals and substitute them with reasonable estimated values computed by functional redundancy system.

  7. Engaging children in the development of obesity interventions: exploring outcomes that matter most among obesity positive outliers

    OpenAIRE

    Sharifi, Mona; Marshall, Gareth; Goldman, Roberta E.; Cunningham, Courtney; Marshall, Richard; Taveras, Elsie M

    2015-01-01

    Objective To explore outcomes and measures of success that matter most to 'positive outlier' children who improved their body mass index (BMI) despite living in obesogenic neighborhoods. Methods We collected residential address and longitudinal height/weight data from electronic health records of 22,657 children ages 6–12 years in Massachusetts. We defined obesity “hotspots” as zip codes where >15% of children had a BMI ≥95th percentile. Using linear mixed effects models, we gener...

  8. Asymptotic analysis of the Forward Search

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    The Forward Search is an iterative algorithm concerned with detection of outliers and other unsuspected structures in data. This approach has been suggested, analysed and applied for regression models in the monograph Atkinson and Riani (2000). An asymptotic analysis of the Forward Search is made...

  9. A novel bi-level meta-analysis approach: applied to biological pathway analysis.

    Science.gov (United States)

    Nguyen, Tin; Tagett, Rebecca; Donato, Michele; Mitrea, Cristina; Draghici, Sorin

    2016-02-01

    The accumulation of high-throughput data in public repositories creates a pressing need for integrative analysis of multiple datasets from independent experiments. However, study heterogeneity, study bias, outliers and the lack of power of available methods present real challenge in integrating genomic data. One practical drawback of many P-value-based meta-analysis methods, including Fisher's, Stouffer's, minP and maxP, is that they are sensitive to outliers. Another drawback is that, because they perform just one statistical test for each individual experiment, they may not fully exploit the potentially large number of samples within each study. We propose a novel bi-level meta-analysis approach that employs the additive method and the Central Limit Theorem within each individual experiment and also across multiple experiments. We prove that the bi-level framework is robust against bias, less sensitive to outliers than other methods, and more sensitive to small changes in signal. For comparative analysis, we demonstrate that the intra-experiment analysis has more power than the equivalent statistical test performed on a single large experiment. For pathway analysis, we compare the proposed framework versus classical meta-analysis approaches (Fisher's, Stouffer's and the additive method) as well as against a dedicated pathway meta-analysis package (MetaPath), using 1252 samples from 21 datasets related to three human diseases, acute myeloid leukemia (9 datasets), type II diabetes (5 datasets) and Alzheimer's disease (7 datasets). Our framework outperforms its competitors to correctly identify pathways relevant to the phenotypes. The framework is sufficiently general to be applied to any type of statistical meta-analysis. The R scripts are available on demand from the authors. sorin@wayne.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e

  10. A User-Adaptive Algorithm for Activity Recognition Based on K-Means Clustering, Local Outlier Factor, and Multivariate Gaussian Distribution

    Directory of Open Access Journals (Sweden)

    Shizhen Zhao

    2018-06-01

    Full Text Available Mobile activity recognition is significant to the development of human-centric pervasive applications including elderly care, personalized recommendations, etc. Nevertheless, the distribution of inertial sensor data can be influenced to a great extent by varying users. This means that the performance of an activity recognition classifier trained by one user’s dataset will degenerate when transferred to others. In this study, we focus on building a personalized classifier to detect four categories of human activities: light intensity activity, moderate intensity activity, vigorous intensity activity, and fall. In order to solve the problem caused by different distributions of inertial sensor signals, a user-adaptive algorithm based on K-Means clustering, local outlier factor (LOF, and multivariate Gaussian distribution (MGD is proposed. To automatically cluster and annotate a specific user’s activity data, an improved K-Means algorithm with a novel initialization method is designed. By quantifying the samples’ informative degree in a labeled individual dataset, the most profitable samples can be selected for activity recognition model adaption. Through experiments, we conclude that our proposed models can adapt to new users with good recognition performance.

  11. Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

    Directory of Open Access Journals (Sweden)

    Marta Scalfi

    Full Text Available Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst, at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale, and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale. At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST-outlier methods detected together 11 F(ST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale and 38 SNPs (macro-geographic scale significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also

  12. Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

    Science.gov (United States)

    Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B

    2014-01-01

    Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST)-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST)-outlier methods detected together 11 F(ST)-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST)-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an

  13. OutRank

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Steinhausen, Uwe

    2008-01-01

    Outlier detection is an important data mining task for consistency checks, fraud detection, etc. Binary decision making on whether or not an object is an outlier is not appropriate in many applications and moreover hard to parametrize. Thus, recently, methods for outlier ranking have been proposed...

  14. Anomaly Detection Based on Sensor Data in Petroleum Industry Applications

    Directory of Open Access Journals (Sweden)

    Luis Martí

    2015-01-01

    Full Text Available Anomaly detection is the problem of finding patterns in data that do not conform to an a priori expected behavior. This is related to the problem in which some samples are distant, in terms of a given metric, from the rest of the dataset, where these anomalous samples are indicated as outliers. Anomaly detection has recently attracted the attention of the research community, because of its relevance in real-world applications, like intrusion detection, fraud detection, fault detection and system health monitoring, among many others. Anomalies themselves can have a positive or negative nature, depending on their context and interpretation. However, in either case, it is important for decision makers to be able to detect them in order to take appropriate actions. The petroleum industry is one of the application contexts where these problems are present. The correct detection of such types of unusual information empowers the decision maker with the capacity to act on the system in order to correctly avoid, correct or react to the situations associated with them. In that application context, heavy extraction machines for pumping and generation operations, like turbomachines, are intensively monitored by hundreds of sensors each that send measurements with a high frequency for damage prevention. In this paper, we propose a combination of yet another segmentation algorithm (YASA, a novel fast and high quality segmentation algorithm, with a one-class support vector machine approach for efficient anomaly detection in turbomachines. The proposal is meant for dealing with the aforementioned task and to cope with the lack of labeled training data. As a result, we perform a series of empirical studies comparing our approach to other methods applied to benchmark problems and a real-life application related to oil platform turbomachinery anomaly detection.

  15. ¿Se pueden predecir geográficamente los resultados electorales? Una aplicación del análisis de clusters y outliers espaciales

    Directory of Open Access Journals (Sweden)

    Carlos J. Vilalta Perdomo

    2008-01-01

    Full Text Available Los resultados de este estudio demuestran que al aplicar la estadística espacial en la geografía electoral es posible predecir los resultados electorales. Se utilizan los conceptos geográficos de cluster y outlier espaciales, y como variable predictiva la segregación espacial socioeconómica. Las técnicas estadísticas que se emplean son los índices globales y locales de autocorrelación espacial de Moran y el análisis de regresión lineal. Sobre los datos analizados se encuentra: 1 que la Ciudad de México posee clusters espaciales de apoyo electoral y de marginación, 2 outliers espaciales de marginación, 3 que los partidos electorales se excluyen geográficamente, y 4 que sus resultados dependen significativamente de los niveles de segregación espacial en la ciudad.

  16. Reference-free fatigue crack detection using nonlinear ultrasonic modulation under various temperature and loading conditions

    Science.gov (United States)

    Lim, Hyung Jin; Sohn, Hoon; DeSimio, Martin P.; Brown, Kevin

    2014-04-01

    This study presents a reference-free fatigue crack detection technique using nonlinear ultrasonic modulation. When low frequency (LF) and high frequency (HF) inputs generated by two surface-mounted lead zirconate titanate (PZT) transducers are applied to a structure, the presence of a fatigue crack can provide a mechanism for nonlinear ultrasonic modulation and create spectral sidebands around the frequency of the HF signal. The crack-induced spectral sidebands are isolated using a combination of linear response subtraction (LRS), synchronous demodulation (SD) and continuous wavelet transform (CWT) filtering. Then, a sequential outlier analysis is performed on the extracted sidebands to identify the crack presence without referring any baseline data obtained from the intact condition of the structure. Finally, the robustness of the proposed technique is demonstrated using actual test data obtained from simple aluminum plate and complex aircraft fitting-lug specimens under varying temperature and loading variations.

  17. Applying long short-term memory recurrent neural networks to intrusion detection

    Directory of Open Access Journals (Sweden)

    Ralf C. Staudemeyer

    2015-07-01

    Full Text Available We claim that modelling network traffic as a time series with a supervised learning approach, using known genuine and malicious behaviour, improves intrusion detection. To substantiate this, we trained long short-term memory (LSTM recurrent neural networks with the training data provided by the DARPA / KDD Cup ’99 challenge. To identify suitable LSTM-RNN network parameters and structure we experimented with various network topologies. We found networks with four memory blocks containing two cells each offer a good compromise between computational cost and detection performance. We applied forget gates and shortcut connections respectively. A learning rate of 0.1 and up to 1,000 epochs showed good results. We tested the performance on all features and on extracted minimal feature sets respectively. We evaluated different feature sets for the detection of all attacks within one network and also to train networks specialised on individual attack classes. Our results show that the LSTM classifier provides superior performance in comparison to results previously published results of strong static classifiers. With 93.82% accuracy and 22.13 cost, LSTM outperforms the winning entries of the KDD Cup ’99 challenge by far. This is due to the fact that LSTM learns to look back in time and correlate consecutive connection records. For the first time ever, we have demonstrated the usefulness of LSTM networks to intrusion detection.

  18. Gas detection by correlation spectroscopy employing a multimode diode laser.

    Science.gov (United States)

    Lou, Xiutao; Somesfalean, Gabriel; Zhang, Zhiguo

    2008-05-01

    A gas sensor based on the gas-correlation technique has been developed using a multimode diode laser (MDL) in a dual-beam detection scheme. Measurement of CO(2) mixed with CO as an interfering gas is successfully demonstrated using a 1570 nm tunable MDL. Despite overlapping absorption spectra and occasional mode hops, the interfering signals can be effectively excluded by a statistical procedure including correlation analysis and outlier identification. The gas concentration is retrieved from several pair-correlated signals by a linear-regression scheme, yielding a reliable and accurate measurement. This demonstrates the utility of the unsophisticated MDLs as novel light sources for gas detection applications.

  19. Simultaneous estimation of cross-validation errors in least squares collocation applied for statistical testing and evaluation of the noise variance components

    Science.gov (United States)

    Behnabian, Behzad; Mashhadi Hossainali, Masoud; Malekzadeh, Ahad

    2018-02-01

    The cross-validation technique is a popular method to assess and improve the quality of prediction by least squares collocation (LSC). We present a formula for direct estimation of the vector of cross-validation errors (CVEs) in LSC which is much faster than element-wise CVE computation. We show that a quadratic form of CVEs follows Chi-squared distribution. Furthermore, a posteriori noise variance factor is derived by the quadratic form of CVEs. In order to detect blunders in the observations, estimated standardized CVE is proposed as the test statistic which can be applied when noise variances are known or unknown. We use LSC together with the methods proposed in this research for interpolation of crustal subsidence in the northern coast of the Gulf of Mexico. The results show that after detection and removing outliers, the root mean square (RMS) of CVEs and estimated noise standard deviation are reduced about 51 and 59%, respectively. In addition, RMS of LSC prediction error at data points and RMS of estimated noise of observations are decreased by 39 and 67%, respectively. However, RMS of LSC prediction error on a regular grid of interpolation points covering the area is only reduced about 4% which is a consequence of sparse distribution of data points for this case study. The influence of gross errors on LSC prediction results is also investigated by lower cutoff CVEs. It is indicated that after elimination of outliers, RMS of this type of errors is also reduced by 19.5% for a 5 km radius of vicinity. We propose a method using standardized CVEs for classification of dataset into three groups with presumed different noise variances. The noise variance components for each of the groups are estimated using restricted maximum-likelihood method via Fisher scoring technique. Finally, LSC assessment measures were computed for the estimated heterogeneous noise variance model and compared with those of the homogeneous model. The advantage of the proposed method is the

  20. Robust Subjective Visual Property Prediction from Crowdsourced Pairwise Labels.

    Science.gov (United States)

    Fu, Yanwei; Hospedales, Timothy M; Xiang, Tao; Xiong, Jiechao; Gong, Shaogang; Wang, Yizhou; Yao, Yuan

    2016-03-01

    The problem of estimating subjective visual properties from image and video has attracted increasing interest. A subjective visual property is useful either on its own (e.g. image and video interestingness) or as an intermediate representation for visual recognition (e.g. a relative attribute). Due to its ambiguous nature, annotating the value of a subjective visual property for learning a prediction model is challenging. To make the annotation more reliable, recent studies employ crowdsourcing tools to collect pairwise comparison labels. However, using crowdsourced data also introduces outliers. Existing methods rely on majority voting to prune the annotation outliers/errors. They thus require a large amount of pairwise labels to be collected. More importantly as a local outlier detection method, majority voting is ineffective in identifying outliers that can cause global ranking inconsistencies. In this paper, we propose a more principled way to identify annotation outliers by formulating the subjective visual property prediction task as a unified robust learning to rank problem, tackling both the outlier detection and learning to rank jointly. This differs from existing methods in that (1) the proposed method integrates local pairwise comparison labels together to minimise a cost that corresponds to global inconsistency of ranking order, and (2) the outlier detection and learning to rank problems are solved jointly. This not only leads to better detection of annotation outliers but also enables learning with extremely sparse annotations.

  1. Shallow Transits—Deep Learning. I. Feasibility Study of Deep Learning to Detect Periodic Transits of Exoplanets

    Science.gov (United States)

    Zucker, Shay; Giryes, Raja

    2018-04-01

    Transits of habitable planets around solar-like stars are expected to be shallow, and to have long periods, which means low information content. The current bottleneck in the detection of such transits is caused in large part by the presence of red (correlated) noise in the light curves obtained from the dedicated space telescopes. Based on the groundbreaking results deep learning achieves in many signal and image processing applications, we propose to use deep neural networks to solve this problem. We present a feasibility study, in which we applied a convolutional neural network on a simulated training set. The training set comprised light curves received from a hypothetical high-cadence space-based telescope. We simulated the red noise by using Gaussian Processes with a wide variety of hyper-parameters. We then tested the network on a completely different test set simulated in the same way. Our study proves that very difficult cases can indeed be detected. Furthermore, we show how detection trends can be studied and detection biases quantified. We have also checked the robustness of the neural-network performance against practical artifacts such as outliers and discontinuities, which are known to affect space-based high-cadence light curves. Future work will allow us to use the neural networks to characterize the transit model and identify individual transits. This new approach will certainly be an indispensable tool for the detection of habitable planets in the future planet-detection space missions such as PLATO.

  2. Evaluation of Robust Estimators Applied to Fluorescence Assays

    Directory of Open Access Journals (Sweden)

    U. Ruotsalainen

    2007-12-01

    Full Text Available We evaluated standard robust methods in the estimation of fluorescence signal in novel assays used for determining the biomolecule concentrations. The objective was to obtain an accurate and reliable estimate using as few observations as possible by decreasing the influence of outliers. We assumed the true signals to have Gaussian distribution, while no assumptions about the outliers were made. The experimental results showed that arithmetic mean performs poorly even with the modest deviations. Further, the robust methods, especially the M-estimators, performed extremely well. The results proved that the use of robust methods is advantageous in the estimation problems where noise and deviations are significant, such as in biological and medical applications.

  3. Outlier detection in UV/Vis spectrophotometric data

    NARCIS (Netherlands)

    Lepot, M.J.; Aubin, Jean Baptiste; Clemens, F.H.L.R.; Mašić, Alma

    2017-01-01

    UV/Vis spectrophotometers have been used to monitor water quality since the early 2000s. Calibration of these devices requires sampling campaigns to elaborate relations between recorded spectra and measured concentrations. In order to build robust calibration data sets, several spectra must be

  4. Evaluation of the expected moments algorithm and a multiple low-outlier test for flood frequency analysis at streamgaging stations in Arizona

    Science.gov (United States)

    Paretti, Nicholas V.; Kennedy, Jeffrey R.; Cohn, Timothy A.

    2014-01-01

    Flooding is among the costliest natural disasters in terms of loss of life and property in Arizona, which is why the accurate estimation of flood frequency and magnitude is crucial for proper structural design and accurate floodplain mapping. Current guidelines for flood frequency analysis in the United States are described in Bulletin 17B (B17B), yet since B17B’s publication in 1982 (Interagency Advisory Committee on Water Data, 1982), several improvements have been proposed as updates for future guidelines. Two proposed updates are the Expected Moments Algorithm (EMA) to accommodate historical and censored data, and a generalized multiple Grubbs-Beck (MGB) low-outlier test. The current guidelines use a standard Grubbs-Beck (GB) method to identify low outliers, changing the determination of the moment estimators because B17B uses a conditional probability adjustment to handle low outliers while EMA censors the low outliers. B17B and EMA estimates are identical if no historical information or censored or low outliers are present in the peak-flow data. EMA with MGB (EMA-MGB) test was compared to the standard B17B (B17B-GB) method for flood frequency analysis at 328 streamgaging stations in Arizona. The methods were compared using the relative percent difference (RPD) between annual exceedance probabilities (AEPs), goodness-of-fit assessments, random resampling procedures, and Monte Carlo simulations. The AEPs were calculated and compared using both station skew and weighted skew. Streamgaging stations were classified by U.S. Geological Survey (USGS) National Water Information System (NWIS) qualification codes, used to denote historical and censored peak-flow data, to better understand the effect that nonstandard flood information has on the flood frequency analysis for each method. Streamgaging stations were also grouped according to geographic flood regions and analyzed separately to better understand regional differences caused by physiography and climate. The B

  5. Comparing Candidate Hospital Report Cards

    Energy Technology Data Exchange (ETDEWEB)

    Burr, T.L.; Rivenburgh, R.D.; Scovel, J.C.; White, J.M.

    1997-12-31

    We present graphical and analytical methods that focus on multivariate outlier detection applied to the hospital report cards data. No two methods agree which hospitals are unusually good or bad, so we also present ways to compare the agreement between two methods. We identify factors that have a significant impact on the scoring.

  6. Automatic EEG spike detection.

    Science.gov (United States)

    Harner, Richard

    2009-10-01

    Since the 1970s advances in science and technology during each succeeding decade have renewed the expectation of efficient, reliable automatic epileptiform spike detection (AESD). But even when reinforced with better, faster tools, clinically reliable unsupervised spike detection remains beyond our reach. Expert-selected spike parameters were the first and still most widely used for AESD. Thresholds for amplitude, duration, sharpness, rise-time, fall-time, after-coming slow waves, background frequency, and more have been used. It is still unclear which of these wave parameters are essential, beyond peak-peak amplitude and duration. Wavelet parameters are very appropriate to AESD but need to be combined with other parameters to achieve desired levels of spike detection efficiency. Artificial Neural Network (ANN) and expert-system methods may have reached peak efficiency. Support Vector Machine (SVM) technology focuses on outliers rather than centroids of spike and nonspike data clusters and should improve AESD efficiency. An exemplary spike/nonspike database is suggested as a tool for assessing parameters and methods for AESD and is available in CSV or Matlab formats from the author at brainvue@gmail.com. Exploratory Data Analysis (EDA) is presented as a graphic method for finding better spike parameters and for the step-wise evaluation of the spike detection process.

  7. Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

    KAUST Repository

    Huang, Huang

    2017-07-16

    This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as

  8. Short-term change detection for UAV video

    Science.gov (United States)

    Saur, Günter; Krüger, Wolfgang

    2012-11-01

    In the last years, there has been an increased use of unmanned aerial vehicles (UAV) for video reconnaissance and surveillance. An important application in this context is change detection in UAV video data. Here we address short-term change detection, in which the time between observations ranges from several minutes to a few hours. We distinguish this task from video motion detection (shorter time scale) and from long-term change detection, based on time series of still images taken between several days, weeks, or even years. Examples for relevant changes we are looking for are recently parked or moved vehicles. As a pre-requisite, a precise image-to-image registration is needed. Images are selected on the basis of the geo-coordinates of the sensor's footprint and with respect to a certain minimal overlap. The automatic imagebased fine-registration adjusts the image pair to a common geometry by using a robust matching approach to handle outliers. The change detection algorithm has to distinguish between relevant and non-relevant changes. Examples for non-relevant changes are stereo disparity at 3D structures of the scene, changed length of shadows, and compression or transmission artifacts. To detect changes in image pairs we analyzed image differencing, local image correlation, and a transformation-based approach (multivariate alteration detection). As input we used color and gradient magnitude images. To cope with local misalignment of image structures we extended the approaches by a local neighborhood search. The algorithms are applied to several examples covering both urban and rural scenes. The local neighborhood search in combination with intensity and gradient magnitude differencing clearly improved the results. Extended image differencing performed better than both the correlation based approach and the multivariate alternation detection. The algorithms are adapted to be used in semi-automatic workflows for the ABUL video exploitation system of Fraunhofer

  9. Fast clustering using adaptive density peak detection.

    Science.gov (United States)

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  10. Performances of the New Real Time Tsunami Detection Algorithm applied to tide gauges data

    Science.gov (United States)

    Chierici, F.; Embriaco, D.; Morucci, S.

    2017-12-01

    Real-time tsunami detection algorithms play a key role in any Tsunami Early Warning System. We have developed a new algorithm for tsunami detection (TDA) based on the real-time tide removal and real-time band-pass filtering of seabed pressure time series acquired by Bottom Pressure Recorders. The TDA algorithm greatly increases the tsunami detection probability, shortens the detection delay and enhances detection reliability with respect to the most widely used tsunami detection algorithm, while containing the computational cost. The algorithm is designed to be used also in autonomous early warning systems with a set of input parameters and procedures which can be reconfigured in real time. We have also developed a methodology based on Monte Carlo simulations to test the tsunami detection algorithms. The algorithm performance is estimated by defining and evaluating statistical parameters, namely the detection probability, the detection delay, which are functions of the tsunami amplitude and wavelength, and the occurring rate of false alarms. In this work we present the performance of the TDA algorithm applied to tide gauge data. We have adapted the new tsunami detection algorithm and the Monte Carlo test methodology to tide gauges. Sea level data acquired by coastal tide gauges in different locations and environmental conditions have been used in order to consider real working scenarios in the test. We also present an application of the algorithm to the tsunami event generated by Tohoku earthquake on March 11th 2011, using data recorded by several tide gauges scattered all over the Pacific area.

  11. Least-Squares Linear Regression and Schrodinger's Cat: Perspectives on the Analysis of Regression Residuals.

    Science.gov (United States)

    Hecht, Jeffrey B.

    The analysis of regression residuals and detection of outliers are discussed, with emphasis on determining how deviant an individual data point must be to be considered an outlier and the impact that multiple suspected outlier data points have on the process of outlier determination and treatment. Only bivariate (one dependent and one independent)…

  12. Application of surface enhanced Raman scattering and competitive adaptive reweighted sampling on detecting furfural dissolved in transformer oil

    Directory of Open Access Journals (Sweden)

    Weigen Chen

    2018-03-01

    Full Text Available Detecting the dissolving furfural in mineral oil is an essential technical method to evaluate the ageing condition of oil-paper insulation and the degradation of mechanical properties. Compared with the traditional detection method, Raman spectroscopy is obviously convenient and timesaving in operation. This study explored the method of applying surface enhanced Raman scattering (SERS on quantitative analysis of the furfural dissolved in oil. Oil solution with different concentration of furfural were prepared and calibrated by high-performance liquid chromatography. Confocal laser Raman spectroscopy (CLRS and SERS technology were employed to acquire Raman spectral data. Monte Carlo cross validation (MCCV was used to eliminate the outliers in sample set, then competitive adaptive reweighted sampling (CARS was developed to select an optimal combination of informative variables that most reflect the chemical properties of concern. Based on selected Raman spectral features, support vector machine (SVM combined with particle swarm algorithm (PSO was used to set up a furfural quantitative analysis model. Finally, the generalization ability and prediction precision of the established method were verified by the samples made in lab. In summary, a new spectral method is proposed to quickly detect furfural in oil, which lays a foundation for evaluating the ageing of oil-paper insulation in oil immersed electrical equipment.

  13. Application of surface enhanced Raman scattering and competitive adaptive reweighted sampling on detecting furfural dissolved in transformer oil

    Science.gov (United States)

    Chen, Weigen; Zou, Jingxin; Wan, Fu; Fan, Zhou; Yang, Dingkun

    2018-03-01

    Detecting the dissolving furfural in mineral oil is an essential technical method to evaluate the ageing condition of oil-paper insulation and the degradation of mechanical properties. Compared with the traditional detection method, Raman spectroscopy is obviously convenient and timesaving in operation. This study explored the method of applying surface enhanced Raman scattering (SERS) on quantitative analysis of the furfural dissolved in oil. Oil solution with different concentration of furfural were prepared and calibrated by high-performance liquid chromatography. Confocal laser Raman spectroscopy (CLRS) and SERS technology were employed to acquire Raman spectral data. Monte Carlo cross validation (MCCV) was used to eliminate the outliers in sample set, then competitive adaptive reweighted sampling (CARS) was developed to select an optimal combination of informative variables that most reflect the chemical properties of concern. Based on selected Raman spectral features, support vector machine (SVM) combined with particle swarm algorithm (PSO) was used to set up a furfural quantitative analysis model. Finally, the generalization ability and prediction precision of the established method were verified by the samples made in lab. In summary, a new spectral method is proposed to quickly detect furfural in oil, which lays a foundation for evaluating the ageing of oil-paper insulation in oil immersed electrical equipment.

  14. Autoimmune hepatitis in a teenage boy: 'overlap' or 'outlier' syndrome--dilemma for internists.

    Science.gov (United States)

    Talukdar, Arunansu; Khanra, Dibbendhu; Mukherjee, Kabita; Saha, Manjari

    2013-02-08

    An 18-year-old boy presented with upper gastrointestinal bleeding and jaundice. Investigations revealed coarse hepatomegaly, splenomegaly and advanced oesophageal varices. Blood reports showed marked rise of alkaline phosphatase and more than twofold rise of transaminases and IgG. Liver histology was suggestive of piecemeal necrosis, interphase hepatitis and bile duct proliferation. Antinuclear antibody was positive in high titre along with positive antismooth muscle antibody and antimitochondrial antibody. The patient was positive for human leukocyte antigen DR3 type. Although an 'overlap' syndrome exists between autoimmune hepatitis (AIH) and primary biliary cirrhosis (PBC), a cholestatic variant of AIH, a rare 'outlier' syndrome could not be excluded in our case. Moreover, 'the chicken or the egg', AIH or PBC, the dilemma for the internists continued. The patient was put on steroid and ursodeoxycholic acid with unsatisfactory response. The existing international criteria for diagnosis of AIH are not generous enough to accommodate its variant forms.

  15. Automated Detection of Knickpoints and Knickzones Across Transient Landscapes

    Science.gov (United States)

    Gailleton, B.; Mudd, S. M.; Clubb, F. J.

    2017-12-01

    Mountainous regions are ubiquitously dissected by river channels, which transmit climate and tectonic signals to the rest of the landscape by adjusting their long profiles. Fluvial response to allogenic forcing is often expressed through the upstream propagation of steepened reaches, referred to as knickpoints or knickzones. The identification and analysis of these steepened reaches has numerous applications in geomorphology, such as modelling long-term landscape evolution, understanding controls on fluvial incision, and constraining tectonic uplift histories. Traditionally, the identification of knickpoints or knickzones from fluvial profiles requires manual selection or calibration. This process is both time-consuming and subjective, as different workers may select different steepened reaches within the profile. We propose an objective, statistically-based method to systematically pick knickpoints/knickzones on a landscape scale using an outlier-detection algorithm. Our method integrates river profiles normalised by drainage area (Chi, using the approach of Perron and Royden, 2013), then separates the chi-elevation plots into a series of transient segments using the method of Mudd et al. (2014). This method allows the systematic detection of knickpoints across a DEM, regardless of size, using a high-performance algorithm implemented in the open-source Edinburgh Land Surface Dynamics Topographic Tools (LSDTopoTools) software package. After initial knickpoint identification, outliers are selected using several sorting and binning methods based on the Median Absolute Deviation, to avoid the influence sample size. We test our method on a series of DEMs and grid resolutions, and show that our method consistently identifies accurate knickpoint locations across each landscape tested.

  16. Robust motion correction and outlier rejection of in vivo functional MR images of the fetal brain and placenta during maternal hyperoxia

    OpenAIRE

    You, Wonsang; Serag, Ahmed; Evangelou, Iordanis E.; Andescavage, Nickie; Limperopoulos, Catherine

    2017-01-01

    Subject motion is a major challenge in functional magnetic resonance imaging studies (fMRI) of the fetal brain and placenta during maternal hyperoxia. We propose a motion correction and volume outlier rejection method for the correction of severe motion artifacts in both fetal brain and placenta. The method is optimized to the experimental design by processing different phases of acquisition separately. It also automatically excludes high-motion volumes and all the missing data are regressed ...

  17. Robust motion correction and outlier rejection of in vivo functional MR images of the fetal brain and placenta during maternal hyperoxia

    OpenAIRE

    You, Wonsang; Serag, Ahmed; Evangelou, Iordanis E.; Andescavage, Nickie; Limperopoulos, Catherine

    2015-01-01

    Subject motion is a major challenge in functional magnetic resonance imaging studies (fMRI) of the fetal brain and placenta during maternal hyperoxia. We propose a motion correction and volume outlier rejection method for the correction of severe motion artifacts in both fetal brain and placenta. The method is optimized to the experimental design by processing different phases of acquisition separately. It also automatically excludes high-motion volumes and all the missing data are regressed ...

  18. Implicitly Weighted Methods in Robust Image Analysis

    Czech Academy of Sciences Publication Activity Database

    Kalina, Jan

    2012-01-01

    Roč. 44, č. 3 (2012), s. 449-462 ISSN 0924-9907 R&D Projects: GA MŠk(CZ) 1M06014 Institutional research plan: CEZ:AV0Z10300504 Keywords : robustness * high breakdown point * outlier detection * robust correlation analysis * template matching * face recognition Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.767, year: 2012

  19. Fault Detection Based on Tracking Differentiator Applied on the Suspension System of Maglev Train

    Directory of Open Access Journals (Sweden)

    Hehong Zhang

    2015-01-01

    Full Text Available A fault detection method based on the optimized tracking differentiator is introduced. It is applied on the acceleration sensor of the suspension system of maglev train. It detects the fault of the acceleration sensor by comparing the acceleration integral signal with the speed signal obtained by the optimized tracking differentiator. This paper optimizes the control variable when the states locate within or beyond the two-step reachable region to improve the performance of the approximate linear discrete tracking differentiator. Fault-tolerant control has been conducted by feedback based on the speed signal acquired from the optimized tracking differentiator when the acceleration sensor fails. The simulation and experiment results show the practical usefulness of the presented method.

  20. Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection.

    Science.gov (United States)

    Wei, Pan; Ball, John E; Anderson, Derek T

    2018-03-17

    A significant challenge in object detection is accurate identification of an object's position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS) applications.

  1. Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection

    Directory of Open Access Journals (Sweden)

    Pan Wei

    2018-03-01

    Full Text Available A significant challenge in object detection is accurate identification of an object’s position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS applications.

  2. Total Variation Depth for Functional Data

    KAUST Repository

    Huang, Huang

    2016-11-15

    There has been extensive work on data depth-based methods for robust multivariate data analysis. Recent developments have moved to infinite-dimensional objects such as functional data. In this work, we propose a new notion of depth, the total variation depth, for functional data. As a measure of depth, its properties are studied theoretically, and the associated outlier detection performance is investigated through simulations. Compared to magnitude outliers, shape outliers are often masked among the rest of samples and harder to identify. We show that the proposed total variation depth has many desirable features and is well suited for outlier detection. In particular, we propose to decompose the total variation depth into two components that are associated with shape and magnitude outlyingness, respectively. This decomposition allows us to develop an effective procedure for outlier detection and useful visualization tools, while naturally accounting for the correlation in functional data. Finally, the proposed methodology is demonstrated using real datasets of curves, images, and video frames.

  3. Empirical Tryout of a New Statistic for Detecting Temporally Inconsistent Responders.

    Science.gov (United States)

    Kerry, Matthew J

    2018-01-01

    Statistical screening of self-report data is often advised to support the quality of analyzed responses - For example, reduction of insufficient effort responding (IER). One recently introduced index based on Mahalanobis's D for detecting outliers in cross-sectional designs replaces centered scores with difference scores between repeated-measure items: Termed person temporal consistency ( D 2 ptc ). Although the adapted D 2 ptc index demonstrated usefulness in simulation datasets, it has not been applied to empirical data. The current study addresses D 2 ptc 's low uptake by critically appraising its performance across three empirical applications. Independent samples were selected to represent a range of scenarios commonly encountered by organizational researchers. First, in Sample 1, a repeat-measure of future time perspective (FTP) inexperienced working adults (age >40-years; n = 620) indicated that temporal inconsistency was significantly related to respondent age and item reverse-scoring. Second, in repeat-measure of team efficacy aggregations, D 2 ptc successfully detected team-level inconsistency across repeat-performance cycles. Thirdly, the usefulness of the D 2 ptc was examined in an experimental study dataset of subjective life expectancy indicated significantly more stable responding in experimental conditions compared to controls. The empirical findings support D 2 ptc 's flexible and useful application to distinct study designs. Discussion centers on current limitations and further extensions that may be of value to psychologists screening self-report data for strengthening response quality and meaningfulness of inferences from repeated-measures self-reports. Taken together, the findings support the usefulness of the newly devised statistic for detecting IER and other extreme response patterns.

  4. Damage detection in carbon composite material typical of wind turbine blades using auto-associative neural networks

    Science.gov (United States)

    Dervilis, N.; Barthorpe, R. J.; Antoniadou, I.; Staszewski, W. J.; Worden, K.

    2012-04-01

    The structure of a wind turbine blade plays a vital role in the mechanical and structural operation of the turbine. As new generations of offshore wind turbines are trying to achieve a leading role in the energy market, key challenges such as a reliable Structural Health Monitoring (SHM) of the blades is significant for the economic and structural efficiency of the wind energy. Fault diagnosis of wind turbine blades is a "grand challenge" due to their composite nature, weight and length. The damage detection procedure involves additional difficulties focused on aerodynamic loads, environmental conditions and gravitational loads. It will be shown that vibration dynamic response data combined with AANNs is a robust and powerful tool, offering on-line and real time damage prediction. In this study the features used for SHM are Frequency Response Functions (FRFs) acquired via experimental methods based on an LMS system by which identification of mode shapes and natural frequencies is accomplished. The methods used are statistical outlier analysis which allows a diagnosis of deviation from normality and an Auto-Associative Neural Network (AANN). Both of these techniques are trained by adopting the FRF data for normal and damage condition. The AANN is a method which has not yet been widely used in the condition monitoring of composite materials of blades. This paper is trying to introduce a new scheme for damage detection, localisation and severity assessment by adopting simple measurements such as FRFs and exploiting multilayer neural networks and outlier novelty detection.

  5. AFLP genome scanning reveals divergent selection in natural populations of Liriodendron chinense (Magnoliaceae along a latitudinal transect

    Directory of Open Access Journals (Sweden)

    Aihong eYang

    2016-05-01

    Full Text Available Understanding adaptive genetic variation and its relation to environmental factors are important for understanding how plants adapt to climate change and for managing genetic resources. Genome scans for the loci exhibiting either notably high or low levels of population differentiation (outlier loci provide one means of identifying genomic regions possibly associated with convergent or divergent selection. In this study, we combined AFLP genome scan and environmental association analysis to test for signals of natural selection in natural populations of Liriodendron chinense (Chinese Tulip Tree; Magnoliaceae along a latitudinal transect. We genotyped 276 individuals from 11 populations of L. chinense using 987 AFLP markers. Two complementary methods (Dfdist and BayeScan and association analysis between AFLP loci and climate factors were applied to detect outlier loci. Our analyses recovered both neutral and potentially adaptive genetic differentiation among populations of L. chinense. We found moderate genetic diversity within populations and high genetic differentiation among populations with reduced genetic diversity towards the periphery of the species ranges. Nine AFLP marker loci showed evidence of being outliers for population differentiation for both detection methods. Of these, six were strongly associated with at least one climate factor. Temperature, precipitation and radiation were found to be three important factors influencing local adaptation of L. chinense. The outlier AFLP loci are likely not the target of natural selection, but the neighboring genes of these loci might be involved in local adaptation. Hence, these candidates should be validated by further studies.

  6. Outlier Removal in Model-Based Missing Value Imputation for Medical Datasets

    Directory of Open Access Journals (Sweden)

    Min-Wei Huang

    2018-01-01

    Full Text Available Many real-world medical datasets contain some proportion of missing (attribute values. In general, missing value imputation can be performed to solve this problem, which is to provide estimations for the missing values by a reasoning process based on the (complete observed data. However, if the observed data contain some noisy information or outliers, the estimations of the missing values may not be reliable or may even be quite different from the real values. The aim of this paper is to examine whether a combination of instance selection from the observed data and missing value imputation offers better performance than performing missing value imputation alone. In particular, three instance selection algorithms, DROP3, GA, and IB3, and three imputation algorithms, KNNI, MLP, and SVM, are used in order to find out the best combination. The experimental results show that that performing instance selection can have a positive impact on missing value imputation over the numerical data type of medical datasets, and specific combinations of instance selection and imputation methods can improve the imputation results over the mixed data type of medical datasets. However, instance selection does not have a definitely positive impact on the imputation result for categorical medical datasets.

  7. GMDH and neural networks applied in monitoring and fault detection in sensors in nuclear power plants

    Energy Technology Data Exchange (ETDEWEB)

    Bueno, Elaine Inacio [Instituto Federal de Educacao, Ciencia e Tecnologia, Guarulhos, SP (Brazil); Pereira, Iraci Martinez; Silva, Antonio Teixeira e, E-mail: martinez@ipen.b, E-mail: teixeira@ipen.b [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil)

    2011-07-01

    In this work a new monitoring and fault detection methodology was developed using GMDH (Group Method of Data Handling) algorithm and artificial neural networks (ANNs) which was applied in the IEA-R1 research reactor at IPEN. The monitoring and fault detection system was developed in two parts: the first was dedicated to preprocess information, using GMDH algorithm; and the second to the process information using ANNs. The preprocess information was divided in two parts. In the first part, the GMDH algorithm was used to generate a better database estimate, called matrix z, which was used to train the ANNs. In the second part the GMDH was used to study the best set of variables to be used to train the ANNs, resulting in a best monitoring variable estimative. The methodology was developed and tested using five different models: one theoretical model and for models using different sets of reactor variables. After an exhausting study dedicated to the sensors monitoring, the fault detection in sensors was developed by simulating faults in the sensors database using values of +5%, +10%, +15% and +20% in these sensors database. The good results obtained through the present methodology shows the viability of using GMDH algorithm in the study of the best input variables to the ANNs, thus making possible the use of these methods in the implementation of a new monitoring and fault detection methodology applied in sensors. (author)

  8. GMDH and neural networks applied in monitoring and fault detection in sensors in nuclear power plants

    International Nuclear Information System (INIS)

    Bueno, Elaine Inacio; Pereira, Iraci Martinez; Silva, Antonio Teixeira e

    2011-01-01

    In this work a new monitoring and fault detection methodology was developed using GMDH (Group Method of Data Handling) algorithm and artificial neural networks (ANNs) which was applied in the IEA-R1 research reactor at IPEN. The monitoring and fault detection system was developed in two parts: the first was dedicated to preprocess information, using GMDH algorithm; and the second to the process information using ANNs. The preprocess information was divided in two parts. In the first part, the GMDH algorithm was used to generate a better database estimate, called matrix z, which was used to train the ANNs. In the second part the GMDH was used to study the best set of variables to be used to train the ANNs, resulting in a best monitoring variable estimative. The methodology was developed and tested using five different models: one theoretical model and for models using different sets of reactor variables. After an exhausting study dedicated to the sensors monitoring, the fault detection in sensors was developed by simulating faults in the sensors database using values of +5%, +10%, +15% and +20% in these sensors database. The good results obtained through the present methodology shows the viability of using GMDH algorithm in the study of the best input variables to the ANNs, thus making possible the use of these methods in the implementation of a new monitoring and fault detection methodology applied in sensors. (author)

  9. Efficient alpha particle detection by CR-39 applying 50 Hz-HV electrochemical etching method

    International Nuclear Information System (INIS)

    Sohrabi, M.; Soltani, Z.

    2016-01-01

    Alpha particles can be detected by CR-39 by applying either chemical etching (CE), electrochemical etching (ECE), or combined pre-etching and ECE usually through a multi-step HF-HV ECE process at temperatures much higher than room temperature. By applying pre-etching, characteristics responses of fast-neutron-induced recoil tracks in CR-39 by HF-HV ECE versus KOH normality (N) have shown two high-sensitivity peaks around 5–6 and 15–16 N and a large-diameter peak with a minimum sensitivity around 10–11 N at 25°C. On the other hand, 50 Hz-HV ECE method recently advanced in our laboratory detects alpha particles with high efficiency and broad registration energy range with small ECE tracks in polycarbonate (PC) detectors. By taking advantage of the CR-39 sensitivity to alpha particles, efficacy of 50 Hz-HV ECE method and CR-39 exotic responses under different KOH normalities, detection characteristics of 0.8 MeV alpha particle tracks were studied in 500 μm CR-39 for different fluences, ECE duration and KOH normality. Alpha registration efficiency increased as ECE duration increased to 90 ± 2% after 6–8 h beyond which plateaus are reached. Alpha track density versus fluence is linear up to 10 6  tracks cm −2 . The efficiency and mean track diameter versus alpha fluence up to 10 6  alphas cm −2 decrease as the fluence increases. Background track density and minimum detection limit are linear functions of ECE duration and increase as normality increases. The CR-39 processed for the first time in this study by 50 Hz-HV ECE method proved to provide a simple, efficient and practical alpha detection method at room temperature. - Highlights: • Alpha particles of 0.8 MeV were detected in CR-39 by 50 Hz-HV ECE method. • Efficiency/track diameter was studied vs fluence and time for 3 KOH normality. • Background track density and minimum detection limit vs duration were studied. • A new simple, efficient and low-cost alpha detection method

  10. MIDAS robust trend estimator for accurate GPS station velocities without step detection

    Science.gov (United States)

    Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

    2016-03-01

    Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj-xi)/(tj-ti) computed between all data pairs i > j. For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.

  11. Damage Detection in an Operating Vestas V27 Wind Turbine Blade by use of Outlier Analysis

    DEFF Research Database (Denmark)

    Ulriksen, Martin Dalgaard; Tcherniak, Dmitri; Damkilde, Lars

    2015-01-01

    The present paper explores the application of a well-established vibration-based damage detection method to an operating Vestas V27 wind turbine blade. The blade is analyzed in a total of four states, namely, a healthy one plus three damaged ones in which trailing edge openings of increasing sizes...

  12. Protein Detection Using the Multiplexed Proximity Extension Assay (PEA) from Plasma and Vaginal Fluid Applied to the Indicating FTA Elute Micro Card™

    Science.gov (United States)

    Berggrund, Malin; Ekman, Daniel; Gustavsson, Inger; Sundfeldt, Karin; Olovsson, Matts; Enroth, Stefan; Gyllensten, Ulf

    2016-01-01

    The indicating FTA elute micro card™ has been developed to collect and stabilize the nucleic acid in biological samples and is widely used in human and veterinary medicine and other disciplines. This card is not recommended for protein analyses, since surface treatment may denature proteins. We studied the ability to analyse proteins in human plasma and vaginal fluid as applied to the indicating FTA elute micro card™ using the sensitive proximity extension assay (PEA). Among 92 proteins in the Proseek Multiplex Oncology Iv2 panel, 87 were above the limit of detection (LOD) in liquid plasma and 56 among 92 above LOD in plasma applied to FTA cards. Washing and protein elution protocols were compared to identify an optimal method. Liquid-based cytology samples showed a lower number of proteins above LOD than FTA cards with vaginal fluid samples applied. Our results demonstrate that samples applied to the indicating FTA elute micro card™ are amendable to protein analyses, given that a sensitive protein detection assay is used. The results imply that biological samples applied to FTA cards can be used for DNA, RNA and protein detection. PMID:28936257

  13. Protein Detection Using the Multiplexed Proximity Extension Assay (PEA from Plasma and Vaginal Fluid Applied to the Indicating FTA Elute Micro Card™

    Directory of Open Access Journals (Sweden)

    Malin Berggrund

    2016-01-01

    Full Text Available The indicating FTA elute micro card™ has been developed to collect and stabilize the nucleic acid in biological samples and is widely used in human and veterinary medicine and other disciplines. This card is not recommended for protein analyses, since surface treatment may denature proteins. We studied the ability to analyse proteins in human plasma and vaginal fluid as applied to the indicating FTA elute micro card™ using the sensitive proximity extension assay (PEA. Among 92 proteins in the Proseek Multiplex Oncology Iv2 panel, 87 were above the limit of detection (LOD in liquid plasma and 56 among 92 above LOD in plasma applied to FTA cards. Washing and protein elution protocols were compared to identify an optimal method. Liquid-based cytology samples showed a lower number of proteins above LOD than FTA cards with vaginal fluid samples applied. Our results demonstrate that samples applied to the indicating FTA elute micro card™ are amendable to protein analyses, given that a sensitive protein detection assay is used. The results imply that biological samples applied to FTA cards can be used for DNA, RNA and protein detection.

  14. Protein Detection Using the Multiplexed Proximity Extension Assay (PEA) from Plasma and Vaginal Fluid Applied to the Indicating FTA Elute Micro Card™.

    Science.gov (United States)

    Berggrund, Malin; Ekman, Daniel; Gustavsson, Inger; Sundfeldt, Karin; Olovsson, Matts; Enroth, Stefan; Gyllensten, Ulf

    2016-01-01

    The indicating FTA elute micro card™ has been developed to collect and stabilize the nucleic acid in biological samples and is widely used in human and veterinary medicine and other disciplines. This card is not recommended for protein analyses, since surface treatment may denature proteins. We studied the ability to analyse proteins in human plasma and vaginal fluid as applied to the indicating FTA elute micro card™ using the sensitive proximity extension assay (PEA). Among 92 proteins in the Proseek Multiplex Oncology Iv2 panel, 87 were above the limit of detection (LOD) in liquid plasma and 56 among 92 above LOD in plasma applied to FTA cards. Washing and protein elution protocols were compared to identify an optimal method. Liquid-based cytology samples showed a lower number of proteins above LOD than FTA cards with vaginal fluid samples applied. Our results demonstrate that samples applied to the indicating FTA elute micro card™ are amendable to protein analyses, given that a sensitive protein detection assay is used. The results imply that biological samples applied to FTA cards can be used for DNA, RNA and protein detection.

  15. Optical fiber applied to radiation detection

    Energy Technology Data Exchange (ETDEWEB)

    Junior, Francisco A.B.; Costa, Antonella L.; Oliveira, Arno H. de; Vasconcelos, Danilo C., E-mail: fanbra@yahoo.com.br, E-mail: antonella@nuclear.ufmg.br, E-mail: heeren@nuclear.ufmg.br, E-mail: danilochagas@yahoo.com.br [Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, MG (Brazil). Escola de Engenharia. Departamento de Engenharia Nuclear

    2015-07-01

    In the last years, the production of optical fibers cables has make possible the development of a range of spectroscopic probes for in situ analysis performing beyond nondestructive tests, environmental monitoring, security investigation, application in radiotherapy for dose monitoring, verification and validation. In this work, a system using an optical fiber cable to light signal transmission from a NaI(Tl) radiation detector is presented. The innovative device takes advantage mainly of the optical fibers small signal attenuation and immunity to electromagnetic interference to application for radiation detection systems. The main aim was to simplify the detection system making it to reach areas where the conventional device cannot access due to its lack of mobility and external dimensions. Some tests with this innovative system are presented and the results stimulate the continuity of the researches. (author)

  16. Risetime discrimination applied to pressurized Xe gas proportional counter for hard x-ray detection

    International Nuclear Information System (INIS)

    Fujii, Masami; Doi, Kosei

    1978-01-01

    A high pressure Xe proportional counter has been developed for hard X-ray observation. This counter has better energy-resolving power than a NaI scintillation counter, and the realization of large area is relatively easy. This counter is constructed with a cylindrical aluminum tube, and this tube can be used at 40 atmospheric pressure. The detection efficiency curves were obtained in relation to gas pressure. It is necessary to reduce impurities in the Xe gas to increase the energy-resolving power of the counter. The increase of gas pressure made the resolving power worse. The characteristics of the counter were stable for at least a few months. The wave form discrimination was applied to reduce the background signals such as pulses caused by charged particles and gamma-ray. This method has been used for normal pressure counter, and in the present study, it was applied for the high pressure counter. It was found that the discrimination method was able to be applied to this case. (Kato, T.)

  17. Detection of anomalous signals in temporally correlated data (Invited)

    Science.gov (United States)

    Langbein, J. O.

    2010-12-01

    Detection of transient tectonic signals in data obtained from large geodetic networks requires the ability to detect signals that are both temporally and spatially coherent. In this report I will describe a modification to an existing method that estimates both the coefficients of temporally correlated noise model and an efficient filter based on the noise model. This filter, when applied to the original time-series, effectively whitens (or flattens) the power spectrum. The filtered data provide the means to calculate running averages which are then used to detect deviations from the background trends. For large networks, time-series of signal-to-noise ratio (SNR) can be easily constructed since, by filtering, each of the original time-series has been transformed into one that is closer to having a Gaussian distribution with a variance of 1.0. Anomalous intervals may be identified by counting the number of GPS sites for which the SNR exceeds a specified value. For example, during one time interval, if there were 5 out of 20 time-series with SNR>2, this would be considered anomalous; typically, one would expect at 95% confidence that there would be at least 1 out of 20 time-series with an SNR>2. For time intervals with an anomalously large number of high SNR, the spatial distribution of the SNR is mapped to identify the location of the anomalous signal(s) and their degree of spatial clustering. Estimating the filter that should be used to whiten the data requires modification of the existing methods that employ maximum likelihood estimation to determine the temporal covariance of the data. In these methods, it is assumed that the noise components in the data are a combination of white, flicker and random-walk processes and that they are derived from three different and independent sources. Instead, in this new method, the covariance matrix is constructed assuming that only one source is responsible for the noise and that source can be represented as a white

  18. When the Plus Sign is a Negative: Challenging and Reinforcing Embodied Stigmas Through Outliers and Counter-Narratives.

    Science.gov (United States)

    Lippert, Alexandra

    2017-11-30

    When individuals become aware of their stigma, they attempt to manage their identity through discourses that both challenge and reinforce power. Identity management is fraught with tensions between the desire to fit normative social constructions and counter the same discourse. This essay explores identity management in the midst of the embodied stigmas concerning unplanned pregnancy during college and raising a biracial son. In doing so, this essay points to the difference between outlier narratives and counter-narratives. The author encourages health communication scholars to explore conditions under which storytelling moves beyond the personal to the political. Emancipatory intent does not guarantee emancipatory outcomes. Storytelling can function therapeutically for individuals while failing to redress forces that constrain human potential and agency.

  19. Group method of data handling and neral networks applied in monitoring and fault detection in sensors in nuclear power plants

    International Nuclear Information System (INIS)

    Bueno, Elaine Inacio

    2011-01-01

    The increasing demand in the complexity, efficiency and reliability in modern industrial systems stimulated studies on control theory applied to the development of Monitoring and Fault Detection system. In this work a new Monitoring and Fault Detection methodology was developed using GMDH (Group Method of Data Handling) algorithm and Artificial Neural Networks (ANNs) which was applied to the IEA-R1 research reactor at IPEN. The Monitoring and Fault Detection system was developed in two parts: the first was dedicated to preprocess information, using GMDH algorithm; and the second part to the process information using ANNs. The GMDH algorithm was used in two different ways: firstly, the GMDH algorithm was used to generate a better database estimated, called matrix z , which was used to train the ANNs. After that, the GMDH was used to study the best set of variables to be used to train the ANNs, resulting in a best monitoring variable estimative. The methodology was developed and tested using five different models: one Theoretical Model and four Models using different sets of reactor variables. After an exhausting study dedicated to the sensors Monitoring, the Fault Detection in sensors was developed by simulating faults in the sensors database using values of 5%, 10%, 15% and 20% in these sensors database. The results obtained using GMDH algorithm in the choice of the best input variables to the ANNs were better than that using only ANNs, thus making possible the use of these methods in the implementation of a new Monitoring and Fault Detection methodology applied in sensors. (author)

  20. MORPHOLOGICAL FILLING OF DIGITAL ELEVATION MODELS

    Directory of Open Access Journals (Sweden)

    T. Krauß

    2012-09-01

    Full Text Available In this paper a new approach for a more detailed post processing and filling of digital elevation models (DEMs in urban areas is presented. To reach the required specifications in a first step the errors in digital surface models (DSMs generated by dense stereo algorithms are analyzed and methods for detection and classification of the different types of errors are implemented. Subsequently the classified erroneous areas are handled in separate manner to eliminate outliers and fill the DSM properly. The errors which can be detected in DSMs range from outliers – single pixels or small areas containing extremely high or low values – over noise from mismatches, single small holes to occlusions, where large areas are not visible in one of the images of the stereo pair. To validate the presented method artificial DSMs are generated and superimposed with all different kinds of described errors like noise (small holes cut in, outliers (small areas moved up/down, occlusions (larger areas beneath steep walls and so on. The method is subsequently applied to the artificial DSMs and the resulting filled DSMs are compared to the original artificial DSMs without the introduced errors. Also the method is applied to stereo satellite generated DSMs from the ISPRS Comission 1 WG4 benchmark dataset and the results are checked with the also provided first pulse laser DSM data. Finally the results are discussed, strengths and weaknesses of the approach are shown and suggestions for application and optimization are given.

  1. Fetal cardiac cine imaging using highly accelerated dynamic MRI with retrospective motion correction and outlier rejection.

    Science.gov (United States)

    van Amerom, Joshua F P; Lloyd, David F A; Price, Anthony N; Kuklisova Murgasova, Maria; Aljabar, Paul; Malik, Shaihan J; Lohezic, Maelene; Rutherford, Mary A; Pushparajah, Kuberan; Razavi, Reza; Hajnal, Joseph V

    2018-01-01

    Development of a MRI acquisition and reconstruction strategy to depict fetal cardiac anatomy in the presence of maternal and fetal motion. The proposed strategy involves i) acquisition and reconstruction of highly accelerated dynamic MRI, followed by image-based ii) cardiac synchronization, iii) motion correction, iv) outlier rejection, and finally v) cardiac cine reconstruction. Postprocessing entirely was automated, aside from a user-defined region of interest delineating the fetal heart. The method was evaluated in 30 mid- to late gestational age singleton pregnancies scanned without maternal breath-hold. The combination of complementary acquisition/reconstruction and correction/rejection steps in the pipeline served to improve the quality of the reconstructed 2D cine images, resulting in increased visibility of small, dynamic anatomical features. Artifact-free cine images successfully were produced in 36 of 39 acquired data sets; prolonged general fetal movements precluded processing of the remaining three data sets. The proposed method shows promise as a motion-tolerant framework to enable further detail in MRI studies of the fetal heart and great vessels. Processing data in image-space allowed for spatial and temporal operations to be applied to the fetal heart in isolation, separate from extraneous changes elsewhere in the field of view. Magn Reson Med 79:327-338, 2018. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.

  2. Environmetrics. Part 1. Modeling of water salinity and air quality data

    International Nuclear Information System (INIS)

    Braibanti, A.; Gollapalli, N. R.; Jonnalagaddaj, S. B.; Duvvuru, S.; Rupenaguntla, S. R.

    2001-01-01

    Environmetrics utilities advanced mathematical, statistical and information tools to extract information. Two typical environmental data sets are analysed using MVATOB (Multi Variate Tool Box). The first data set corresponds to the variable river salinity. Least median squares (LMS) detected the outliers whereas linear least squares (LLS) could not detect and remove the outliers. The second data set consists of daily readings of air quality values. Outliers are detected by LMS and unbiased regression coefficients are estimated by multi-linear regression (MLR). As explanatory variables are not independent, principal component regression (PCR) and partial least squares regression (PLSR) are used. Both examples demonstrate the superiority of LMS over LLS [it

  3. Effect of the wire geometry and an externally applied magnetic field on the detection efficiency of superconducting nanowire single-photon detectors

    Energy Technology Data Exchange (ETDEWEB)

    Lusche, Robert; Semenov, Alexey; Huebers, Heinz-Willhelm [DLR, Institut fuer Planetenforschung, Berlin (Germany); Ilin, Konstantin; Siegel, Michael [Karlsruher Institut fuer Technologie (Germany); Korneeva, Yuliya; Trifonov, Andrey; Korneev, Alexander; Goltsman, Gregory [Moscow State Pedagogical University (Russian Federation)

    2013-07-01

    The interest in single-photon detectors in the near-infrared wavelength regime for applications, e.g. in quantum cryptography has immensely increased in the last years. Superconducting nanowire single-photon detectors (SNSPD) already show quite reasonable detection efficiencies in the NIR which can even be further improved. Novel theoretical approaches including vortex-assisted photon counting state that the detection efficiency in the long wavelength region can be enhanced by the detector geometry and an applied magnetic field. We present spectral measurements in the wavelength range from 350-2500 nm of the detection efficiency of meander-type TaN and NbN SNSPD with varying nanowire line width from 80 to 250 nm. Due to the used experimental setup we can accurately normalize the measured spectra and are able to extract the intrinsic detection efficiency (IDE) of our detectors. The results clearly indicate an improvement of the IDE depending on the wire width according to the theoretic models. Furthermore we experimentally found that the smallest detectable photon-flux can be increased by applying a small magnetic field to the detectors.

  4. Cost-effectiveness analysis in melanoma detection: A transition model applied to dermoscopy.

    Science.gov (United States)

    Tromme, Isabelle; Legrand, Catherine; Devleesschauwer, Brecht; Leiter, Ulrike; Suciu, Stefan; Eggermont, Alexander; Sacré, Laurine; Baurain, Jean-François; Thomas, Luc; Beutels, Philippe; Speybroeck, Niko

    2016-11-01

    The main aim of this study is to demonstrate how our melanoma disease model (MDM) can be used for cost-effectiveness analyses (CEAs) in the melanoma detection field. In particular, we used the data of two cohorts of Belgian melanoma patients to investigate the cost-effectiveness of dermoscopy. A MDM, previously constructed to calculate the melanoma burden, was slightly modified to be suitable for CEAs. Two cohorts of patients entered into the model to calculate morbidity, mortality and costs. These cohorts were constituted by melanoma patients diagnosed by dermatologists adequately, or not adequately, trained in dermoscopy. Effectiveness and costs were calculated for each cohort and compared. Effectiveness was expressed in quality-adjusted life years (QALYs), a composite measure depending on melanoma-related morbidity and mortality. Costs included costs of treatment and follow-up as well as costs of detection in non-melanoma patients and costs of excision and pathology of benign lesions excised to rule out melanoma. The result of our analysis concluded that melanoma diagnosis by dermatologists adequately trained in dermoscopy resulted in both a gain of QALYs (less morbidity and/or mortality) and a reduction in costs. This study demonstrates how our MDM can be used in CEAs in the melanoma detection field. The model and the methodology suggested in this paper were applied to two cohorts of Belgian melanoma patients. Their analysis concluded that adequate dermoscopy training is cost-effective. The results should be confirmed by a large-scale randomised study. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Data Fault Detection in Medical Sensor Networks

    Directory of Open Access Journals (Sweden)

    Yang Yang

    2015-03-01

    Full Text Available Medical body sensors can be implanted or attached to the human body to monitor the physiological parameters of patients all the time. Inaccurate data due to sensor faults or incorrect placement on the body will seriously influence clinicians’ diagnosis, therefore detecting sensor data faults has been widely researched in recent years. Most of the typical approaches to sensor fault detection in the medical area ignore the fact that the physiological indexes of patients aren’t changing synchronously at the same time, and fault values mixed with abnormal physiological data due to illness make it difficult to determine true faults. Based on these facts, we propose a Data Fault Detection mechanism in Medical sensor networks (DFD-M. Its mechanism includes: (1 use of a dynamic-local outlier factor (D-LOF algorithm to identify outlying sensed data vectors; (2 use of a linear regression model based on trapezoidal fuzzy numbers to predict which readings in the outlying data vector are suspected to be faulty; (3 the proposal of a novel judgment criterion of fault state according to the prediction values. The simulation results demonstrate the efficiency and superiority of DFD-M.

  6. How Can Synchrotron Radiation Techniques Be Applied for Detecting Microstructures in Amorphous Alloys?

    Directory of Open Access Journals (Sweden)

    Gu-Qing Guo

    2015-11-01

    Full Text Available In this work, how synchrotron radiation techniques can be applied for detecting the microstructure in metallic glass (MG is studied. The unit cells are the basic structural units in crystals, though it has been suggested that the co-existence of various clusters may be the universal structural feature in MG. Therefore, it is a challenge to detect microstructures of MG even at the short-range scale by directly using synchrotron radiation techniques, such as X-ray diffraction and X-ray absorption methods. Here, a feasible scheme is developed where some state-of-the-art synchrotron radiation-based experiments can be combined with simulations to investigate the microstructure in MG. By studying a typical MG composition (Zr70Pd30, it is found that various clusters do co-exist in its microstructure, and icosahedral-like clusters are the popular structural units. This is the structural origin where there is precipitation of an icosahedral quasicrystalline phase prior to phase transformation from glass to crystal when heating Zr70Pd30 MG.

  7. Screening mammography-detected cancers: the sensitivity of the computer-aided detection system as applied to full-field digital mammography

    International Nuclear Information System (INIS)

    Yang, Sang Kyu; Cho, Nariya; Ko, Eun Sook; Kim, Do Yeon; Moon, Woo Kyung

    2006-01-01

    We wanted to evaluate the sensitivity of the computer-aided detection (CAD) system for performing full-field digital mammography (FFDM) on the breast cancers that were originally detected by screening mammography. The CAD system (Image Checker v3.1, R2 Technology, Los Altos, Calif.) together with a full-field digital mammography system (Senographe 2000D, GE Medical Systems, Buc, France) was prospectively applied to the mammograms of 70 mammographically detected breast cancer patients (age range, 37-69; median age, 51 years) who had negative findings on their clinical examinations. The sensitivity of the CAD system, according to histopathologic findings and radiologic primary features (i.e, mass, microcalcifications or mass with microcalcifications) and also the false-positive marking rate were then determined. The CAD system correctly depicted 67 of 70 breast cancer lesions (97.5%). The CAD system marked 29 of 30 breast cancers that presented with microcalcifications only (sensitivity 96.7%) and all 18 breast cancers the presented with mass together with microcalcifications (sensitivity 100%). Twenty of the 22 lesions that appeared as a mass only were marked correctly by the CAD system (sensitivity 90.9%). The CAD system correctly depicted all 22 lesions of ductal carcinoma in situ (sensitivity: 100%), all 13 lesions of invasive ductal carcinoma with ductal carcinoma in situ (sensitivity: 100%) and the 1 lesion of invasive lobular carcinoma (sensitivity: 100%). Thirty one of the 34 lesions of invasive ductal carcinoma were marked correctly by the CAD system (sensitivity: 91.8%). The rate of false-positive marks was 0.21 mass marks per image and 0.16 microcalcification marks per image. The overall rate of false-positive marks was 0.37 per image. The CAD system using FFDM is useful for the detection of asymptomatic breast cancers, and it has a high overall tumor detection rate. The false negative cases were found in relatively small invasive ductal carcinoma

  8. Screening mammography-detected cancers: the sensitivity of the computer-aided detection system as applied to full-field digital mammography

    Energy Technology Data Exchange (ETDEWEB)

    Yang, Sang Kyu; Cho, Nariya; Ko, Eun Sook; Kim, Do Yeon; Moon, Woo Kyung [College of Medicine Seoul National University and The Insititute of Radiation Medicine, Seoul National University Research Center, Seoul (Korea, Republic of)

    2006-04-15

    We wanted to evaluate the sensitivity of the computer-aided detection (CAD) system for performing full-field digital mammography (FFDM) on the breast cancers that were originally detected by screening mammography. The CAD system (Image Checker v3.1, R2 Technology, Los Altos, Calif.) together with a full-field digital mammography system (Senographe 2000D, GE Medical Systems, Buc, France) was prospectively applied to the mammograms of 70 mammographically detected breast cancer patients (age range, 37-69; median age, 51 years) who had negative findings on their clinical examinations. The sensitivity of the CAD system, according to histopathologic findings and radiologic primary features (i.e, mass, microcalcifications or mass with microcalcifications) and also the false-positive marking rate were then determined. The CAD system correctly depicted 67 of 70 breast cancer lesions (97.5%). The CAD system marked 29 of 30 breast cancers that presented with microcalcifications only (sensitivity 96.7%) and all 18 breast cancers the presented with mass together with microcalcifications (sensitivity 100%). Twenty of the 22 lesions that appeared as a mass only were marked correctly by the CAD system (sensitivity 90.9%). The CAD system correctly depicted all 22 lesions of ductal carcinoma in situ (sensitivity: 100%), all 13 lesions of invasive ductal carcinoma with ductal carcinoma in situ (sensitivity: 100%) and the 1 lesion of invasive lobular carcinoma (sensitivity: 100%). Thirty one of the 34 lesions of invasive ductal carcinoma were marked correctly by the CAD system (sensitivity: 91.8%). The rate of false-positive marks was 0.21 mass marks per image and 0.16 microcalcification marks per image. The overall rate of false-positive marks was 0.37 per image. The CAD system using FFDM is useful for the detection of asymptomatic breast cancers, and it has a high overall tumor detection rate. The false negative cases were found in relatively small invasive ductal carcinoma.

  9. Mitochondrial DNA heritage of Cres Islanders--example of Croatian genetic outliers.

    Science.gov (United States)

    Jeran, Nina; Havas Augustin, Dubravka; Grahovac, Blaienka; Kapović, Miljenko; Metspalu, Ene; Villems, Richard; Rudan, Pavao

    2009-12-01

    Diversity of mitochondrial DNA (mtDNA) lineages of the Island of Cres was determined by high-resolution phylogenetic analysis on a sample of 119 adult unrelated individuals from eight settlements. The composition of mtDNA pool of this Island population is in contrast with other Croatian and European populations. The analysis revealed the highest frequency of haplogroup U (29.4%) with the predominance of one single lineage of subhaplogroup U2e (20.2%). Haplogroup H is the second most prevalent one with only 27.7%. Other very interesting features of contemporary Island population are extremely low frequency of haplogroup J (only 0.84%), and much higher frequency of haplogroup W (12.6%) comparing to other Croatian and European populations. Especially interesting finding is a strikingly higher frequency of haplogroup N1a (9.24%) presented with African/south Asian branch almost absent in Europeans, while its European sister-branch, proved to be highly prevalent among Neolithic farmers, is present in contemporary Europeans with only 0.2%. Haplotype analysis revealed that only five mtDNA lineages account for almost 50% of maternal genetic heritage of this island and they present supposed founder lineages. All presented findings confirm that genetic drift, especially founder effect, has played significant role in shaping genetic composition of the isolated population of the Island of Cres. Due to presented data contemporary population of Cres Island can be considered as genetic "outlier" among Croatian populations.

  10. A quick method based on SIMPLISMA-KPLS for simultaneously selecting outlier samples and informative samples for model standardization in near infrared spectroscopy

    Science.gov (United States)

    Li, Li-Na; Ma, Chang-Ming; Chang, Ming; Zhang, Ren-Cheng

    2017-12-01

    A novel method based on SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) and Kernel Partial Least Square (KPLS), named as SIMPLISMA-KPLS, is proposed in this paper for selection of outlier samples and informative samples simultaneously. It is a quick algorithm used to model standardization (or named as model transfer) in near infrared (NIR) spectroscopy. The NIR experiment data of the corn for analysis of the protein content is introduced to evaluate the proposed method. Piecewise direct standardization (PDS) is employed in model transfer. And the comparison of SIMPLISMA-PDS-KPLS and KS-PDS-KPLS is given in this research by discussion of the prediction accuracy of protein content and calculation speed of each algorithm. The conclusions include that SIMPLISMA-KPLS can be utilized as an alternative sample selection method for model transfer. Although it has similar accuracy to Kennard-Stone (KS), it is different from KS as it employs concentration information in selection program. This means that it ensures analyte information is involved in analysis, and the spectra (X) of the selected samples is interrelated with concentration (y). And it can be used for outlier sample elimination simultaneously by validation of calibration. According to the statistical data results of running time, it is clear that the sample selection process is more rapid when using KPLS. The quick algorithm of SIMPLISMA-KPLS is beneficial to improve the speed of online measurement using NIR spectroscopy.

  11. Anomaly Detection for Beam Loss Maps in the Large Hadron Collider

    Science.gov (United States)

    Valentino, Gianluca; Bruce, Roderik; Redaelli, Stefano; Rossi, Roberto; Theodoropoulos, Panagiotis; Jaster-Merz, Sonja

    2017-07-01

    In the LHC, beam loss maps are used to validate collimator settings for cleaning and machine protection. This is done by monitoring the loss distribution in the ring during infrequent controlled loss map campaigns, as well as in standard operation. Due to the complexity of the system, consisting of more than 50 collimators per beam, it is difficult to identify small changes in the collimation hierarchy, which may be due to setting errors or beam orbit drifts with such methods. A technique based on Principal Component Analysis and Local Outlier Factor is presented to detect anomalies in the loss maps and therefore provide an automatic check of the collimation hierarchy.

  12. Anomaly Detection for Beam Loss Maps in the Large Hadron Collider

    International Nuclear Information System (INIS)

    Valentino, Gianluca; Bruce, Roderik; Redaelli, Stefano; Rossi, Roberto; Theodoropoulos, Panagiotis; Jaster-Merz, Sonja

    2017-01-01

    In the LHC, beam loss maps are used to validate collimator settings for cleaning and machine protection. This is done by monitoring the loss distribution in the ring during infrequent controlled loss map campaigns, as well as in standard operation. Due to the complexity of the system, consisting of more than 50 collimators per beam, it is difficult to identify small changes in the collimation hierarchy, which may be due to setting errors or beam orbit drifts with such methods. A technique based on Principal Component Analysis and Local Outlier Factor is presented to detect anomalies in the loss maps and therefore provide an automatic check of the collimation hierarchy. (paper)

  13. A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

    Science.gov (United States)

    Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

    2006-12-01

    We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.

  14. Evaluating Effect of Albendazole on Trichuris trichiura Infection: A Systematic Review Article.

    Science.gov (United States)

    Ahmadi Jouybari, Toraj; Najaf Ghobadi, Khadije; Lotfi, Bahare; Alavi Majd, Hamid; Ahmadi, Nayeb Ali; Rostami-Nejad, Mohammad; Aghaei, Abbas

    2016-01-01

    The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against T. trichiura infection. We searched PubMed, ISI Web of Science, Science Direct, the Cochrane Central Register of Controlled Trials, and WHO library databases between 1983 and 2014. Data from 13 clinical trial articles were used. Each article was included the effect of single oral dose (400 mg) albendazole and placebo in treating two groups of patients with T. trichiura infection. For both groups in each article, sample size, the number of those with T. trichiura infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk and variance were computed. Funnel plot, Beggs and Eggers tests were used for assessment of publication bias. The random effect variance shift outlier model and likelihood ratio test were applied for detecting outliers. In order to detect influence, DFFITS values, Cook's distances and COVRATIO were used. Data were analyzed using STATA and R software. The article number 13 and 9 were outlier and influence, respectively. Outlier is diagnosed by variance shift of target study in inferential method and by RR value in graphical method. Funnel plot and Beggs test did not show the publication bias ( P =0.272). However, the Eggers test confirmed it ( P =0.034). Meta-analysis after removal of article 13 showed that relative risk was 1.99 (CI 95% 1.71 - 2.31). The estimated RR and our meta-analyses show that treatment of T. trichiura with single oral doses of albendazole is unsatisfactory. New anthelminthics are urgently needed.

  15. Time-series models on somatic cell score improve detection of matistis

    DEFF Research Database (Denmark)

    Norberg, E; Korsgaard, I R; Sloth, K H M N

    2008-01-01

    In-line detection of mastitis using frequent milk sampling was studied in 241 cows in a Danish research herd. Somatic cell scores obtained at a daily basis were analyzed using a mixture of four time-series models. Probabilities were assigned to each model for the observations to belong to a normal...... "steady-state" development, change in "level", change of "slope" or "outlier". Mastitis was indicated from the sum of probabilities for the "level" and "slope" models. Time-series models were based on the Kalman filter. Reference data was obtained from veterinary assessment of health status combined...... with bacteriological findings. At a sensitivity of 90% the corresponding specificity was 68%, which increased to 83% using a one-step back smoothing. It is concluded that mixture models based on Kalman filters are efficient in handling in-line sensor data for detection of mastitis and may be useful for similar...

  16. DETECÇÃO DE OUTLIERS NO DESEMPENHO ECONÔMICO-FINANCEIRO DO SPORT CLUB CORINTHIANS PAULISTA NO PERÍODO 2008 A 2010

    Directory of Open Access Journals (Sweden)

    Marke Geisy da Silva Dantas

    2011-12-01

    Full Text Available Os ativos intangíveis permeiam o mercado de futebol onde os principais ativos das entidades futebolísticas são os contratos com os jogadores e os torcedores são considerados usuários importantes da informação contábil, uma vez que fornecem recursos para tais entidades. É dentro desse contexto que o estudo ganha relevância, visando analisar a presença de outliers nas contas do Sport Club Corinthians Paulista, referente aos anos de 2008 e 2009, quando o clube participou da Série B do Campeonato Brasileiro e quando foi efetivada a contratação de Ronaldo, respectivamente. No tocante aos procedimentos metodológicos, essa pesquisa se constitui de um estudo exploratório, demonstrando a utilização do teste de Grubbs para analisar o impacto dos ativos intangíveis sobre as contas do Corinthians, detectando anormalidades nos anos estudados. Os dados foram coletados em sites e artigos que tratavam sobre a mensuração e o enquadramento como ativo dos jogadores de futebol. Para o tratamento dos dados foi utilizada a planilha eletrônica MICROSOFT EXCEL®. Os resultados demonstraram um grande aumento percentual nas contas estudadas na comparação dos anos. Foram encontrados dois outliers em 2008 (Licenciamentos e franquias e Ativo Total, mas, em 2009 foram encontradas cinco contas que ultrapassaram a normalidade (“Licenciamentos e franquias”, “Patrocínio e publicidades”, “Arrecadação de jogos”, “Direitos de TV” e “Premiação em campeonatos”. Em 2010, só a conta “Direitos de TV”.

  17. Automatic Pedestrian Crossing Detection and Impairment Analysis Based on Mobile Mapping System

    Science.gov (United States)

    Liu, X.; Zhang, Y.; Li, Q.

    2017-09-01

    Pedestrian crossing, as an important part of transportation infrastructures, serves to secure pedestrians' lives and possessions and keep traffic flow in order. As a prominent feature in the street scene, detection of pedestrian crossing contributes to 3D road marking reconstruction and diminishing the adverse impact of outliers in 3D street scene reconstruction. Since pedestrian crossing is subject to wearing and tearing from heavy traffic flow, it is of great imperative to monitor its status quo. On this account, an approach of automatic pedestrian crossing detection using images from vehicle-based Mobile Mapping System is put forward and its defilement and impairment are analyzed in this paper. Firstly, pedestrian crossing classifier is trained with low recall rate. Then initial detections are refined by utilizing projection filtering, contour information analysis, and monocular vision. Finally, a pedestrian crossing detection and analysis system with high recall rate, precision and robustness will be achieved. This system works for pedestrian crossing detection under different situations and light conditions. It can recognize defiled and impaired crossings automatically in the meanwhile, which facilitates monitoring and maintenance of traffic facilities, so as to reduce potential traffic safety problems and secure lives and property.

  18. AUTOMATIC PEDESTRIAN CROSSING DETECTION AND IMPAIRMENT ANALYSIS BASED ON MOBILE MAPPING SYSTEM

    Directory of Open Access Journals (Sweden)

    X. Liu

    2017-09-01

    Full Text Available Pedestrian crossing, as an important part of transportation infrastructures, serves to secure pedestrians’ lives and possessions and keep traffic flow in order. As a prominent feature in the street scene, detection of pedestrian crossing contributes to 3D road marking reconstruction and diminishing the adverse impact of outliers in 3D street scene reconstruction. Since pedestrian crossing is subject to wearing and tearing from heavy traffic flow, it is of great imperative to monitor its status quo. On this account, an approach of automatic pedestrian crossing detection using images from vehicle-based Mobile Mapping System is put forward and its defilement and impairment are analyzed in this paper. Firstly, pedestrian crossing classifier is trained with low recall rate. Then initial detections are refined by utilizing projection filtering, contour information analysis, and monocular vision. Finally, a pedestrian crossing detection and analysis system with high recall rate, precision and robustness will be achieved. This system works for pedestrian crossing detection under different situations and light conditions. It can recognize defiled and impaired crossings automatically in the meanwhile, which facilitates monitoring and maintenance of traffic facilities, so as to reduce potential traffic safety problems and secure lives and property.

  19. Interest of the technical detection of the sentinel node applied to uterine cancers: about three cases

    International Nuclear Information System (INIS)

    Ech charraq, I.; Ben Rais, N.; Ech charra, I.; Albertini, A.F.

    2009-01-01

    Introduction The sentinel node technique (S.N.) was proposed in cervical cancers in order to optimise the diagnosis of metastases and the lymphatic micrometastases in the early stages while avoiding useless wide clearings out. The identification of this node is done by injection of a dye and/or a radioactive colloid and its ablation for pathological examination. Patients and methods We report the case of three patients followed for a uterine cancer having benefited from a lymphoscintigraphy before surgery. During the surgical procedure, the detection of the sentinel node was carried out after cervical injection of blue dye and using a gamma detection probe. Results The lymphoscintigraphy was positive for two cases with a positive detection for the three cases during the operation. The pathological study revealed a node metastasis for one case. The technical of the sentinel node applied to uterine cancers appears realizable essentially for uterine cancers of early stage (I). However the risk of false negative can be observed in advanced cancer (III), as it is the case of our patient having a negative lymphoscintigraphy. Conclusion The nuclear medicine is important in the detection of the sentinel node of various cancers, uterine cancer included, thus allowing an appropriate cardiologic management. (authors)

  20. Remote detection device and detection method therefor

    International Nuclear Information System (INIS)

    Kogure, Sumio; Yoshida, Yoji; Matsuo, Takashiro; Takehara, Hidetoshi; Kojima, Shinsaku.

    1997-01-01

    The present invention provides a non-destructive detection device for collectively, efficiently and effectively conducting maintenance and detection for confirming the integrity of a nuclear reactor by way of a shielding member for shielding radiation rays generated from an objective portion to be detected. Namely, devices for direct visual detection using an under water TV camera as a sensor, an eddy current detection using a coil as a sensor and each magnetic powder flow detection are integrated and applied collectively. Specifically, the visual detection by using the TV camera and the eddy current flaw detection are adopted together. The flaw detection with magnetic powder is applied as a means for confirming the results of the two kinds of detections by other method. With such procedures, detection techniques using respective specific theories are combined thereby enabling to enhance the accuracy for the evaluation of the detection. (I.S.)

  1. Statistical methods for damage detection applied to civil structures

    DEFF Research Database (Denmark)

    Gres, Szymon; Ulriksen, Martin Dalgaard; Döhler, Michael

    2017-01-01

    Damage detection consists of monitoring the deviations of a current system from its reference state, characterized by some nominal property repeatable for every healthy state. Preferably, the damage detection is performed directly on vibration data, hereby avoiding modal identification of the str...

  2. A Hybrid Semi-Supervised Anomaly Detection Model for High-Dimensional Data

    Directory of Open Access Journals (Sweden)

    Hongchao Song

    2017-01-01

    Full Text Available Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE and an ensemble k-nearest neighbor graphs- (K-NNG- based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

  3. Volatility persistence in crude oil markets

    International Nuclear Information System (INIS)

    Charles, Amélie; Darné, Olivier

    2014-01-01

    Financial market participants and policy-makers can benefit from a better understanding of how shocks can affect volatility over time. This study assesses the impact of structural changes and outliers on volatility persistence of three crude oil markets – Brent, West Texas Intermediate (WTI) and Organization of Petroleum Exporting Countries (OPEC) – between January 2, 1985 and June 17, 2011. We identify outliers using a new semi-parametric test based on conditional heteroscedasticity models. These large shocks can be associated with particular event patterns, such as the invasion of Kuwait by Iraq, the Operation Desert Storm, the Operation Desert Fox, and the Global Financial Crisis as well as OPEC announcements on production reduction or US announcements on crude inventories. We show that outliers can bias (i) the estimates of the parameters of the equation governing volatility dynamics; (ii) the regularity and non-negativity conditions of GARCH-type models (GARCH, IGARCH, FIGARCH and HYGARCH); and (iii) the detection of structural breaks in volatility, and thus the estimation of the persistence of the volatility. Therefore, taking into account the outliers on the volatility modelling process may improve the understanding of volatility in crude oil markets. - Highlights: • We study the impact of outliers on volatility persistence of crude oil markets. • We identify outliers and patches of outliers due to specific events. • We show that outliers can bias (i) the estimates of the parameters of GARCH models, (ii) the regularity and non-negativity conditions of GARCH-type models, (iii) the detection of structural breaks in volatility of crude oil markets

  4. Automated detection of Lupus white matter lesions in MRI

    Directory of Open Access Journals (Sweden)

    Eloy Roura Perez

    2016-08-01

    Full Text Available Brain magnetic resonance imaging provides detailed information which can be used to detect and segment white matter lesions (WML. In this work we propose an approach to automatically segment WML in Lupus patients by using T1w and fluid-attenuated inversion recovery (FLAIR images. Lupus WML appear as small focal abnormal tissue observed as hyperintensities in the FLAIR images. The quantification of these WML is a key factor for the stratification of lupus patients and therefore both lesion detection and segmentation play an important role. In our approach, the T1w image is first used to classify the three main tissues of the brain, white matter (WM, gray matter (GM and cerebrospinal fluid (CSF, while the FLAIR image is then used to detect focal WML as outliers of its GM intensity distribution. A set of post-processing steps based on lesion size, tissue neighborhood, and location are used to refine the lesion candidates. The proposal is evaluated on 20 patients, presenting qualitative and quantitative results in terms of precision and sensitivity of lesion detection (True Positive Rate (62% and Positive Prediction Value (80% respectively as well as segmentation accuracy (Dice Similarity Coefficient (72%. Obtained results illustrate the validity of the approach to automatically detect and segment lupus lesions. Besides, our approach is publicly available as a SPM8/12 toolbox extension with a simple parameter configuration.

  5. Applied superconductivity

    CERN Document Server

    Newhouse, Vernon L

    1975-01-01

    Applied Superconductivity, Volume II, is part of a two-volume series on applied superconductivity. The first volume dealt with electronic applications and radiation detection, and contains a chapter on liquid helium refrigeration. The present volume discusses magnets, electromechanical applications, accelerators, and microwave and rf devices. The book opens with a chapter on high-field superconducting magnets, covering applications and magnet design. Subsequent chapters discuss superconductive machinery such as superconductive bearings and motors; rf superconducting devices; and future prospec

  6. Outlier detection algorithms for least squares time series regression

    DEFF Research Database (Denmark)

    Johansen, Søren; Nielsen, Bent

    We review recent asymptotic results on some robust methods for multiple regression. The regressors include stationary and non-stationary time series as well as polynomial terms. The methods include the Huber-skip M-estimator, 1-step Huber-skip M-estimators, in particular the Impulse Indicator Sat...

  7. Methods of Detecting Outliers in A Regression Analysis Model ...

    African Journals Online (AJOL)

    PROF. O. E. OSUAGWU

    2013-06-01

    Jun 1, 2013 ... especially true in observational studies .... Simple linear regression and multiple ... The simple linear ..... Grubbs,F.E (1950): Sample Criteria for Testing Outlying observations: Annals of ... In experimental design, the Relative.

  8. Long-range alpha detection applied to soil surface monitoring

    International Nuclear Information System (INIS)

    Caress, R.W.; Allander, K.S.; Bounds, J.A.; Catlett, M.M.; MacArthur, D.W.; Rutherford, D.A.

    1992-01-01

    The long-range alpha detection (LRAD) technique depends on the detection of ion pairs generated by alpha particles losing energy in air rather than on detection of the alpha particles themselves. Typical alpha particles generated by uranium will travel less than 3 cm in air. In contrast, the ions have been successfully detected many inches or feet away from the contamination. Since LRAD detection systems are sensitive to all ions simultaneously, large LRAD soil surface monitors (SSMS) can be used to collect all of the ions from a large sample. The LRAD SSMs are designed around the fan-less LRAD detector. In this case a five-sided box with an open bottom is placed on the soil surface. Ions generated by alpha decays on the soil surface are collected on a charged copper plate within the box. These ions create a small current from the plate to ground which is monitored with a sensitive electrometer. The current measured is proportional to the number of ions in the box, which is, in turn, proportional to the amount of alpha contamination on the surface of the soil. This report includes the design and construction of a 1-m by 1-m SSM as well as the results of a study at Fernald, OH, as part of the Uranium in Soils Integrated Demonstration

  9. Inclusion Detection in Aluminum Alloys Via Laser-Induced Breakdown Spectroscopy

    Science.gov (United States)

    Hudson, Shaymus W.; Craparo, Joseph; De Saro, Robert; Apelian, Diran

    2018-04-01

    Laser-induced breakdown spectroscopy (LIBS) has shown promise as a technique to quickly determine molten metal chemistry in real time. Because of its characteristics, LIBS could also be used as a technique to sense for unwanted inclusions and impurities. Simulated Al2O3 inclusions were added to molten aluminum via a metal-matrix composite. LIBS was performed in situ to determine whether particles could be detected. Outlier analysis on oxygen signal was performed on LIBS data and compared to oxide volume fraction measured through metallography. It was determined that LIBS could differentiate between melts with different amounts of inclusions by monitoring the fluctuations in signal for elements of interest. LIBS shows promise as an enabling tool for monitoring metal cleanliness.

  10. Combining multivariate analysis and monosaccharide composition modeling to identify plant cell wall variations by Fourier Transform Near Infrared spectroscopy

    Directory of Open Access Journals (Sweden)

    Smith-Moritz Andreia M

    2011-08-01

    Full Text Available Abstract We outline a high throughput procedure that improves outlier detection in cell wall screens using FT-NIR spectroscopy of plant leaves. The improvement relies on generating a calibration set from a subset of a mutant population by taking advantage of the Mahalanobis distance outlier scheme to construct a monosaccharide range predictive model using PLS regression. This model was then used to identify specific monosaccharide outliers from the mutant population.

  11. Comparative Performance of Four Single Extreme Outlier Discordancy Tests from Monte Carlo Simulations

    Directory of Open Access Journals (Sweden)

    Surendra P. Verma

    2014-01-01

    Full Text Available Using highly precise and accurate Monte Carlo simulations of 20,000,000 replications and 102 independent simulation experiments with extremely low simulation errors and total uncertainties, we evaluated the performance of four single outlier discordancy tests (Grubbs test N2, Dixon test N8, skewness test N14, and kurtosis test N15 for normal samples of sizes 5 to 20. Statistical contaminations of a single observation resulting from parameters called δ from ±0.1 up to ±20 for modeling the slippage of central tendency or ε from ±1.1 up to ±200 for slippage of dispersion, as well as no contamination (δ=0 and ε=±1, were simulated. Because of the use of precise and accurate random and normally distributed simulated data, very large replications, and a large number of independent experiments, this paper presents a novel approach for precise and accurate estimations of power functions of four popular discordancy tests and, therefore, should not be considered as a simple simulation exercise unrelated to probability and statistics. From both criteria of the Power of Test proposed by Hayes and Kinsella and the Test Performance Criterion of Barnett and Lewis, Dixon test N8 performs less well than the other three tests. The overall performance of these four tests could be summarized as N2≅N15>N14>N8.

  12. Applying additive logistic regression to data derived from sensors monitoring behavioral and physiological characteristics of dairy cows to detect lameness

    NARCIS (Netherlands)

    Kamphuis, C.; Frank, E.; Burke, J.; Verkerk, G.A.; Jago, J.

    2013-01-01

    The hypothesis was that sensors currently available on farm that monitor behavioral and physiological characteristics have potential for the detection of lameness in dairy cows. This was tested by applying additive logistic regression to variables derived from sensor data. Data were collected

  13. Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality?

    Science.gov (United States)

    Helgeland, Jon; Clench-Aas, Jocelyne; Laake, Petter; Veierød, Marit B.

    2018-01-01

    Introduction A common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied. Materials and methods To study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed. Results None of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals

  14. Non-stationary condition monitoring through event alignment

    DEFF Research Database (Denmark)

    Pontoppidan, Niels Henrik; Larsen, Jan

    2004-01-01

    We present an event alignment framework which enables change detection in non-stationary signals. change detection. Classical condition monitoring frameworks have been restrained to laboratory settings with stationary operating conditions, which are not resembling real world operation....... In this paper we apply the technique for non-stationary condition monitoring of large diesel engines based on acoustical emission sensor signals. The performance of the event alignment is analyzed in an unsupervised probabilistic detection framework based on outlier detection with either Principal Component...... Analysis or Gaussian Processes modeling. We are especially interested in the true performance of the condition monitoring performance with mixed aligned and unaligned data, e.g. detection of fault condition of unaligned examples versus false alarms of aligned normal condition data. Further, we expect...

  15. Applying Emax model and bivariate thin plate splines to assess drug interactions.

    Science.gov (United States)

    Kong, Maiying; Lee, J Jack

    2010-01-01

    We review the semiparametric approach previously proposed by Kong and Lee and extend it to a case in which the dose-effect curves follow the Emax model instead of the median effect equation. When the maximum effects for the investigated drugs are different, we provide a procedure to obtain the additive effect based on the Loewe additivity model. Then, we apply a bivariate thin plate spline approach to estimate the effect beyond additivity along with its 95 per cent point-wise confidence interval as well as its 95 per cent simultaneous confidence interval for any combination dose. Thus, synergy, additivity, and antagonism can be identified. The advantages of the method are that it provides an overall assessment of the combination effect on the entire two-dimensional dose space spanned by the experimental doses, and it enables us to identify complex patterns of drug interaction in combination studies. In addition, this approach is robust to outliers. To illustrate this procedure, we analyzed data from two case studies.

  16. Applying the GNSS Volcanic Ash Plume Detection Technique to Consumer Navigation Receivers

    Science.gov (United States)

    Rainville, N.; Palo, S.; Larson, K. M.

    2017-12-01

    Global Navigation Satellite Systems (GNSS) such as the Global Positioning System (GPS) rely on predictably structured and constant power RF signals to fulfill their primary use for navigation and timing. When the received strength of GNSS signals deviates from the expected baseline, it is typically due to a change in the local environment. This can occur when signal reflections from the ground are modified by changes in snow or soil moisture content, as well as by attenuation of the signal from volcanic ash. This effect allows GNSS signals to be used as a source for passive remote sensing. Larson et al. (2017) have developed a detection technique for volcanic ash plumes based on the attenuation seen at existing geodetic GNSS sites. Since these existing networks are relatively sparse, this technique has been extended to use lower cost consumer GNSS receiver chips to enable higher density measurements of volcanic ash. These low-cost receiver chips have been integrated into a fully stand-alone sensor, with independent power, communications, and logging capabilities as part of a Volcanic Ash Plume Receiver (VAPR) network. A mesh network of these sensors transmits data to a local base-station which then streams the data real-time to a web accessible server. Initial testing of this sensor network has uncovered that a different detection approach is necessary when using consumer GNSS receivers and antennas. The techniques to filter and process the lower quality data from consumer receivers will be discussed and will be applied to initial results from a functioning VAPR network installation.

  17. Detection of data taking anomalies for the ATLAS experiment

    CERN Document Server

    De Castro Vargas Fernandes, Julio; The ATLAS collaboration; Lehmann Miotto, Giovanna

    2015-01-01

    The physics signals produced by the ATLAS detector at the Large Hadron Collider (LHC) at CERN are acquired and selected by a distributed Trigger and Data AcQuistition (TDAQ) system, comprising a large number of hardware devices and software components. In this work, we focus on the problem of online detection of anomalies along the data taking period. Anomalies, in this context, are defined as an unexpected behaviour of the TDAQ system that result in a loss of data taking efficiency: the causes for those anomalies may come from the TDAQ itself or from external sources. While the TDAQ system operates, it publishes several useful information (trigger rates, dead times, memory usage…). Such information over time creates a set of time series that can be monitored in order to detect (and react to) problems (or anomalies). Here, we approach TDAQ operation monitoring through a data quality perspective, i.e, an anomaly is seen as a loss of quality (an outlier) and it is reported: this information can be used to rea...

  18. Functionalized gold nanoparticle supported sensory mechanisms applied in detection of chemical and biological threat agents: A review

    International Nuclear Information System (INIS)

    Upadhyayula, Venkata K.K.

    2012-01-01

    Highlights: ► Smart sensors are needed for detection of chemical and biological threat agents. ► Smart sensors detect analytes with rapid speed, high sensitivity and selectivity. ► Functionalized gold nanoparticles (GNPs) can potentially smart sense threat agents. ► Functionalized GNPs support multiple analytical methods for sensing threat agents. ► Threat agents of all types can be detected using functionalized GNPs. - Abstract: There is a great necessity for development of novel sensory concepts supportive of smart sensing capabilities in defense and homeland security applications for detection of chemical and biological threat agents. A smart sensor is a detection device that can exhibit important features such as speed, sensitivity, selectivity, portability, and more importantly, simplicity in identifying a target analyte. Emerging nanomaterial based sensors, particularly those developed by utilizing functionalized gold nanoparticles (GNPs) as a sensing component potentially offer many desirable features needed for threat agent detection. The sensitiveness of physical properties expressed by GNPs, e.g. color, surface plasmon resonance, electrical conductivity and binding affinity are significantly enhanced when they are subjected to functionalization with an appropriate metal, organic or biomolecular functional groups. This sensitive nature of functionalized GNPs can be potentially exploited in the design of threat agent detection devices with smart sensing capabilities. In the presence of a target analyte (i.e., a chemical or biological threat agent) a change proportional to concentration of the analyte is observed, which can be measured either by colorimetric, fluorimetric, electrochemical or spectroscopic means. This article provides a review of how functionally modified gold colloids are applied in the detection of a broad range of threat agents, including radioactive substances, explosive compounds, chemical warfare agents, biotoxins, and

  19. Functionalized gold nanoparticle supported sensory mechanisms applied in detection of chemical and biological threat agents: A review

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyayula, Venkata K.K., E-mail: Upadhyayula.Venkata@epa.gov [Oak Ridge Institute of Science and Education (ORISE), MC-100-44, PO Box 117, Oak Ridge, TN 37831 (United States)

    2012-02-17

    Highlights: Black-Right-Pointing-Pointer Smart sensors are needed for detection of chemical and biological threat agents. Black-Right-Pointing-Pointer Smart sensors detect analytes with rapid speed, high sensitivity and selectivity. Black-Right-Pointing-Pointer Functionalized gold nanoparticles (GNPs) can potentially smart sense threat agents. Black-Right-Pointing-Pointer Functionalized GNPs support multiple analytical methods for sensing threat agents. Black-Right-Pointing-Pointer Threat agents of all types can be detected using functionalized GNPs. - Abstract: There is a great necessity for development of novel sensory concepts supportive of smart sensing capabilities in defense and homeland security applications for detection of chemical and biological threat agents. A smart sensor is a detection device that can exhibit important features such as speed, sensitivity, selectivity, portability, and more importantly, simplicity in identifying a target analyte. Emerging nanomaterial based sensors, particularly those developed by utilizing functionalized gold nanoparticles (GNPs) as a sensing component potentially offer many desirable features needed for threat agent detection. The sensitiveness of physical properties expressed by GNPs, e.g. color, surface plasmon resonance, electrical conductivity and binding affinity are significantly enhanced when they are subjected to functionalization with an appropriate metal, organic or biomolecular functional groups. This sensitive nature of functionalized GNPs can be potentially exploited in the design of threat agent detection devices with smart sensing capabilities. In the presence of a target analyte (i.e., a chemical or biological threat agent) a change proportional to concentration of the analyte is observed, which can be measured either by colorimetric, fluorimetric, electrochemical or spectroscopic means. This article provides a review of how functionally modified gold colloids are applied in the detection of a broad

  20. A collaborative computing framework of cloud network and WBSN applied to fall detection and 3-D motion reconstruction.

    Science.gov (United States)

    Lai, Chin-Feng; Chen, Min; Pan, Jeng-Shyang; Youn, Chan-Hyun; Chao, Han-Chieh

    2014-03-01

    As cloud computing and wireless body sensor network technologies become gradually developed, ubiquitous healthcare services prevent accidents instantly and effectively, as well as provides relevant information to reduce related processing time and cost. This study proposes a co-processing intermediary framework integrated cloud and wireless body sensor networks, which is mainly applied to fall detection and 3-D motion reconstruction. In this study, the main focuses includes distributed computing and resource allocation of processing sensing data over the computing architecture, network conditions and performance evaluation. Through this framework, the transmissions and computing time of sensing data are reduced to enhance overall performance for the services of fall events detection and 3-D motion reconstruction.

  1. Detecting instability in the volatility of carbon prices

    Energy Technology Data Exchange (ETDEWEB)

    Chevallier, Julien [Univ. Paris Dauphine (France)

    2011-01-15

    This article investigates the presence of outliers in the volatility of carbon prices. We compute three different measures of volatility for European Union Allowances, based on daily data (EGARCH model), option prices (implied volatility), and intraday data (realized volatility). Based on the methodology developed by Zeileis et al. (2003) and Zeileis (2006), we detect instability in the volatility of carbon prices based on two kinds of tests: retrospective tests (OLS-/Recursive-based CUSUM processes, F-statistics, and residual sum of squares), and forward-looking tests (by monitoring structural changes recursively or with moving estimates). We show evidence of strong shifts mainly for the EGARCH and IV models during the time period. Overall, we suggest that yearly compliance events, and growing uncertainties in post-Kyoto international agreements, may explain the instability in the volatility of carbon prices. (author)

  2. Efficient Estimation of Dynamic Density Functions with Applications in Streaming Data

    KAUST Repository

    Qahtan, Abdulhakim

    2016-05-11

    Recent advances in computing technology allow for collecting vast amount of data that arrive continuously in the form of streams. Mining data streams is challenged by the speed and volume of the arriving data. Furthermore, the underlying distribution of the data changes over the time in unpredicted scenarios. To reduce the computational cost, data streams are often studied in forms of condensed representation, e.g., Probability Density Function (PDF). This thesis aims at developing an online density estimator that builds a model called KDE-Track for characterizing the dynamic density of the data streams. KDE-Track estimates the PDF of the stream at a set of resampling points and uses interpolation to estimate the density at any given point. To reduce the interpolation error and computational complexity, we introduce adaptive resampling where more/less resampling points are used in high/low curved regions of the PDF. The PDF values at the resampling points are updated online to provide up-to-date model of the data stream. Comparing with other existing online density estimators, KDE-Track is often more accurate (as reflected by smaller error values) and more computationally efficient (as reflected by shorter running time). The anytime available PDF estimated by KDE-Track can be applied for visualizing the dynamic density of data streams, outlier detection and change detection in data streams. In this thesis work, the first application is to visualize the taxi traffic volume in New York city. Utilizing KDE-Track allows for visualizing and monitoring the traffic flow on real time without extra overhead and provides insight analysis of the pick up demand that can be utilized by service providers to improve service availability. The second application is to detect outliers in data streams from sensor networks based on the estimated PDF. The method detects outliers accurately and outperforms baseline methods designed for detecting and cleaning outliers in sensor data. The

  3. An Improvement of the Hotelling T2 Statistic in Monitoring Multivariate Quality Characteristics

    Directory of Open Access Journals (Sweden)

    Ashkan Shabbak

    2012-01-01

    Full Text Available The Hotelling T2 statistic is the most popular statistic used in multivariate control charts to monitor multiple qualities. However, this statistic is easily affected by the existence of more than one outlier in the data set. To rectify this problem, robust control charts, which are based on the minimum volume ellipsoid and the minimum covariance determinant, have been proposed. Most researchers assess the performance of multivariate control charts based on the number of signals without paying much attention to whether those signals are really outliers. With due respect, we propose to evaluate control charts not only based on the number of detected outliers but also with respect to their correct positions. In this paper, an Upper Control Limit based on the median and the median absolute deviation is also proposed. The results of this study signify that the proposed Upper Control Limit improves the detection of correct outliers but that it suffers from a swamping effect when the positions of outliers are not taken into consideration. Finally, a robust control chart based on the diagnostic robust generalised potential procedure is introduced to remedy this drawback.

  4. Robust Curb Detection with Fusion of 3D-Lidar and Camera Data

    Directory of Open Access Journals (Sweden)

    Jun Tan

    2014-05-01

    Full Text Available Curb detection is an essential component of Autonomous Land Vehicles (ALV, especially important for safe driving in urban environments. In this paper, we propose a fusion-based curb detection method through exploiting 3D-Lidar and camera data. More specifically, we first fuse the sparse 3D-Lidar points and high-resolution camera images together to recover a dense depth image of the captured scene. Based on the recovered dense depth image, we propose a filter-based method to estimate the normal direction within the image. Then, by using the multi-scale normal patterns based on the curb’s geometric property, curb point features fitting the patterns are detected in the normal image row by row. After that, we construct a Markov Chain to model the consistency of curb points which utilizes the continuous property of the curb, and thus the optimal curb path which links the curb points together can be efficiently estimated by dynamic programming. Finally, we perform post-processing operations to filter the outliers, parameterize the curbs and give the confidence scores on the detected curbs. Extensive evaluations clearly show that our proposed method can detect curbs with strong robustness at real-time speed for both static and dynamic scenes.

  5. The Role of SPINK1 in ETS Rearrangement Negative Prostate Cancers

    Science.gov (United States)

    Tomlins, Scott A.; Rhodes, Daniel R.; Yu, Jianjun; Varambally, Sooryanarayana; Mehra, Rohit; Perner, Sven; Demichelis, Francesca; Helgeson, Beth E.; Laxman, Bharathi; Morris, David S.; Cao, Qi; Cao, Xuhong; Andrén, Ove; Fall, Katja; Johnson, Laura; Wei, John T.; Shah, Rajal B.; Al-Ahmadie, Hikmat; Eastham, James A.; Eggener, Scott E.; Fine, Samson W.; Hotakainen, Kristina; Stenman, Ulf-Håkan; Tsodikov, Alex; Gerald, William L.; Lilja, Hans; Reuter, Victor E.; Kantoff, Phillip W.; Scardino, Peter T.; Rubin, Mark A.; Bjartell, Anders S.; Chinnaiyan, Arul M.

    2009-01-01

    Summary ETS gene fusions have been characterized in a majority of prostate cancers, however the key molecular alterations in ETS negative cancers are unclear. Here we used an outlier meta-analysis (meta-COPA) to identify SPINK1 outlier-expression exclusively in a subset of ETS rearrangement negative cancers (~10% of total cases). We validated the mutual exclusivity of SPINK1 expression and ETS fusion status, demonstrated that SPINK1 outlier-expression can be detected non-invasively in urine and observed that SPINK1 outlier-expression is an independent predictor of biochemical recurrence after resection. We identified the aggressive 22RV1 cell line as a SPINK1 outlier-expression model, and demonstrate that SPINK1 knockdown in 22RV1 attenuates invasion, suggesting a functional role in ETS rearrangement negative prostate cancers. PMID:18538735

  6. Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection.

    Directory of Open Access Journals (Sweden)

    Jonathan D Power

    Full Text Available Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10-50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed. We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion.

  7. SERS as an analytical tool in environmental science: The detection of sulfamethoxazole in the nanomolar range by applying a microfluidic cartridge setup.

    Science.gov (United States)

    Patze, Sophie; Huebner, Uwe; Liebold, Falk; Weber, Karina; Cialla-May, Dana; Popp, Juergen

    2017-01-01

    Sulfamethoxazole (SMX) is a commonly applied antibiotic for treating urinary tract infections; however, allergic reactions and skin eczema are known side effects that are observed for all sulfonamides. Today, this molecule is present in drinking and surface water sources. The allowed concentration in tap water is 2·10 -7  mol L -1 . SMX could unintentionally be ingested by healthy people when drinking contaminated tap water, representing unnecessary drug intake. To assess the quality of tap water, fast, specific and sensitive detection methods are required, in which consequence measures for improving the purification of water might be initiated in the short term. Herein, the quantitative detection of SMX down to environmentally and physiologically relevant concentrations in the nanomolar range by employing surface-enhanced Raman spectroscopy (SERS) and a microfluidic cartridge system is presented. By applying surface-water samples as matrices, the detection of SMX down to 2.2·10 -9  mol L -1 is achieved, which illustrates the great potential of our proposed method in environmental science. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. In vivo Raman spectroscopy detects increased epidermal antioxidative potential with topically applied carotenoids

    International Nuclear Information System (INIS)

    Lademann, J; Richter, H; Patzelt, A; Darvin, M; Sterry, W; Fluhr, J W; Caspers, P J; Van der Pol, A; Zastrow, L

    2009-01-01

    In the present study, the distribution of the carotenoids as a marker for the complete antioxidative potential in human skin was investigated before and after the topical application of carotenoids by in vivo Raman spectroscopy with an excitation wavelength of 785 nm. The carotenoid profile was assessed after a short term topical application in 4 healthy volunteers. In the untreated skin, the highest concentration of natural carotenoids was detected in different layers of the stratum corneum (SC) close to the skin surface. After topical application of carotenoids, an increase in the antioxidative potential in the skin could be observed. Topically applied carotenoids penetrate deep into the epidermis down to approximately 24 μm. This study supports the hypothesis that antioxidative substances are secreted via eccrine sweat glands and/or sebaceous glands to the skin surface. Subsequently they penetrate into the different layers of the SC

  9. Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

    Energy Technology Data Exchange (ETDEWEB)

    Solaimani, Mohiuddin [Univ. of Texas-Dallas, Richardson, TX (United States); Iftekhar, Mohammed [Univ. of Texas-Dallas, Richardson, TX (United States); Khan, Latifur [Univ. of Texas-Dallas, Richardson, TX (United States); Thuraisingham, Bhavani [Univ. of Texas-Dallas, Richardson, TX (United States); Ingram, Joey Burton [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. As a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.

  10. Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

    Science.gov (United States)

    Hu, Yi-Chung

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.

  11. Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

    Science.gov (United States)

    Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their

  12. Brain tissues volume measurements from 2D MRI using parametric approach

    Science.gov (United States)

    L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

    2018-04-01

    The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.

  13. Automated microaneurysm detection algorithms applied to diabetic retinopathy retinal images

    Directory of Open Access Journals (Sweden)

    Akara Sopharak

    2013-07-01

    Full Text Available Diabetic retinopathy is the commonest cause of blindness in working age people. It is characterised and graded by the development of retinal microaneurysms, haemorrhages and exudates. The damage caused by diabetic retinopathy can be prevented if it is treated in its early stages. Therefore, automated early detection can limit the severity of the disease, improve the follow-up management of diabetic patients and assist ophthalmologists in investigating and treating the disease more efficiently. This review focuses on microaneurysm detection as the earliest clinically localised characteristic of diabetic retinopathy, a frequently observed complication in both Type 1 and Type 2 diabetes. Algorithms used for microaneurysm detection from retinal images are reviewed. A number of features used to extract microaneurysm are summarised. Furthermore, a comparative analysis of reported methods used to automatically detect microaneurysms is presented and discussed. The performance of methods and their complexity are also discussed.

  14. Applying a new computer-aided detection scheme generated imaging marker to predict short-term breast cancer risk

    Science.gov (United States)

    Mirniaharikandehei, Seyedehnafiseh; Hollingsworth, Alan B.; Patel, Bhavika; Heidari, Morteza; Liu, Hong; Zheng, Bin

    2018-05-01

    This study aims to investigate the feasibility of identifying a new quantitative imaging marker based on false-positives generated by a computer-aided detection (CAD) scheme to help predict short-term breast cancer risk. An image dataset including four view mammograms acquired from 1044 women was retrospectively assembled. All mammograms were originally interpreted as negative by radiologists. In the next subsequent mammography screening, 402 women were diagnosed with breast cancer and 642 remained negative. An existing CAD scheme was applied ‘as is’ to process each image. From CAD-generated results, four detection features including the total number of (1) initial detection seeds and (2) the final detected false-positive regions, (3) average and (4) sum of detection scores, were computed from each image. Then, by combining the features computed from two bilateral images of left and right breasts from either craniocaudal or mediolateral oblique view, two logistic regression models were trained and tested using a leave-one-case-out cross-validation method to predict the likelihood of each testing case being positive in the next subsequent screening. The new prediction model yielded the maximum prediction accuracy with an area under a ROC curve of AUC  =  0.65  ±  0.017 and the maximum adjusted odds ratio of 4.49 with a 95% confidence interval of (2.95, 6.83). The results also showed an increasing trend in the adjusted odds ratio and risk prediction scores (p  breast cancer risk.

  15. The effectiveness of robust RMCD control chart as outliers’ detector

    Science.gov (United States)

    Darmanto; Astutik, Suci

    2017-12-01

    A well-known control chart to monitor a multivariate process is Hotelling’s T 2 which its parameters are estimated classically, very sensitive and also marred by masking and swamping of outliers data effect. To overcome these situation, robust estimators are strongly recommended. One of robust estimators is re-weighted minimum covariance determinant (RMCD) which has robust characteristics as same as MCD. In this paper, the effectiveness term is accuracy of the RMCD control chart in detecting outliers as real outliers. In other word, how effectively this control chart can identify and remove masking and swamping effects of outliers. We assessed the effectiveness the robust control chart based on simulation by considering different scenarios: n sample sizes, proportion of outliers, number of p quality characteristics. We found that in some scenarios, this RMCD robust control chart works effectively.

  16. Detecting Solar-like Oscillations in Red Giants with Deep Learning

    Science.gov (United States)

    Hon, Marc; Stello, Dennis; Zinn, Joel C.

    2018-05-01

    Time-resolved photometry of tens of thousands of red giant stars from space missions like Kepler and K2 has created the need for automated asteroseismic analysis methods. The first and most fundamental step in such analysis is to identify which stars show oscillations. It is critical that this step be performed with no, or little, detection bias, particularly when performing subsequent ensemble analyses that aim to compare the properties of observed stellar populations with those from galactic models. However, an efficient, automated solution to this initial detection step still has not been found, meaning that expert visual inspection of data from each star is required to obtain the highest level of detections. Hence, to mimic how an expert eye analyzes the data, we use supervised deep learning to not only detect oscillations in red giants, but also to predict the location of the frequency at maximum power, ν max, by observing features in 2D images of power spectra. By training on Kepler data, we benchmark our deep-learning classifier against K2 data that are given detections by the expert eye, achieving a detection accuracy of 98% on K2 Campaign 6 stars and a detection accuracy of 99% on K2 Campaign 3 stars. We further find that the estimated uncertainty of our deep-learning-based ν max predictions is about 5%. This is comparable to human-level performance using visual inspection. When examining outliers, we find that the deep-learning results are more likely to provide robust ν max estimates than the classical model-fitting method.

  17. Global Disease Detection-Achievements in Applied Public Health Research, Capacity Building, and Public Health Diplomacy, 2001-2016.

    Science.gov (United States)

    Rao, Carol Y; Goryoka, Grace W; Henao, Olga L; Clarke, Kevin R; Salyer, Stephanie J; Montgomery, Joel M

    2017-11-01

    The Centers for Disease Control and Prevention has established 10 Global Disease Detection (GDD) Program regional centers around the world that serve as centers of excellence for public health research on emerging and reemerging infectious diseases. The core activities of the GDD Program focus on applied public health research, surveillance, laboratory, public health informatics, and technical capacity building. During 2015-2016, program staff conducted 205 discrete projects on a range of topics, including acute respiratory illnesses, health systems strengthening, infectious diseases at the human-animal interface, and emerging infectious diseases. Projects incorporated multiple core activities, with technical capacity building being most prevalent. Collaborating with host countries to implement such projects promotes public health diplomacy. The GDD Program continues to work with countries to strengthen core capacities so that emerging diseases can be detected and stopped faster and closer to the source, thereby enhancing global health security.

  18. The Plasma Focus Technology Applied to the Detection of Hydrogenated Substances

    International Nuclear Information System (INIS)

    Ramos, R.; Moreno, C.; Gonzalez, J.; Clausse, A

    2003-01-01

    The feasibility study of an industrial application of thermonuclear pulsors is presented.An experiment was conducted to detect hydrogenated substances using PF technology.The detection system is composed by two neutron detectors operated simultaneously on every shot.The first detector is used to register the PF neutron yield in each shot; whereas the other one was designed to detect neutrons scattered by the blanket.We obtained the detector sensitivity charts as a function of the position in space and frontal area of the substance to be detected

  19. Computer-aided detection system applied to full-field digital mammograms

    International Nuclear Information System (INIS)

    Vega Bolivar, Alfonso; Sanchez Gomez, Sonia; Merino, Paula; Alonso-Bartolome, Pilar; Ortega Garcia, Estrella; Munoz Cacho, Pedro; Hoffmeister, Jeffrey W.

    2010-01-01

    Background: Although mammography remains the mainstay for breast cancer screening, it is an imperfect examination with a sensitivity of 75-92% for breast cancer. Computer-aided detection (CAD) has been developed to improve mammographic detection of breast cancer. Purpose: To retrospectively estimate CAD sensitivity and false-positive rate with full-field digital mammograms (FFDMs). Material and Methods: CAD was used to evaluate 151 cases of ductal carcinoma in situ (DCIS) (n=48) and invasive breast cancer (n=103) detected with FFDM. Retrospectively, CAD sensitivity was estimated based on breast density, mammographic presentation, histopathology type, and lesion size. CAD false-positive rate was estimated with screening FFDMs from 200 women. Results: CAD detected 93% (141/151) of cancer cases: 97% (28/29) in fatty breasts, 94% (81/86) in breasts containing scattered fibroglandular densities, 90% (28/31) in heterogeneously dense breasts, and 80% (4/5) in extremely dense breasts. CAD detected 98% (54/55) of cancers manifesting as calcifications, 89% (74/83) as masses, and 100% (13/13) as mixed masses and calcifications. CAD detected 92% (73/79) of invasive ductal carcinomas, 89% (8/9) of invasive lobular carcinomas, 93% (14/15) of other invasive carcinomas, and 96% (46/48) of DCIS. CAD sensitivity for cancers 1-10 mm was 87% (47/54); 11-20 mm, 99% (70/71); 21-30 mm, 86% (12/14); and larger than 30 mm, 100% (12/12). The CAD false-positive rate was 2.5 marks per case. Conclusion: CAD with FFDM showed a high sensitivity in identifying cancers manifesting as calcifications or masses. CAD sensitivity was maintained in small lesions (1-20 mm) and invasive lobular carcinomas, which have lower mammographic sensitivity

  20. Computer-aided detection system applied to full-field digital mammograms

    Energy Technology Data Exchange (ETDEWEB)

    Vega Bolivar, Alfonso; Sanchez Gomez, Sonia; Merino, Paula; Alonso-Bartolome, Pilar; Ortega Garcia, Estrella (Dept. of Radiology, Univ. Marques of Valdecilla Hospital, Santander (Spain)), e-mail: avegab@telefonica.net; Munoz Cacho, Pedro (Dept. of Statistics, Univ. Marques of Valdecilla Hospital, Santander (Spain)); Hoffmeister, Jeffrey W. (iCAD, Inc., Nashua, NH (United States))

    2010-12-15

    Background: Although mammography remains the mainstay for breast cancer screening, it is an imperfect examination with a sensitivity of 75-92% for breast cancer. Computer-aided detection (CAD) has been developed to improve mammographic detection of breast cancer. Purpose: To retrospectively estimate CAD sensitivity and false-positive rate with full-field digital mammograms (FFDMs). Material and Methods: CAD was used to evaluate 151 cases of ductal carcinoma in situ (DCIS) (n=48) and invasive breast cancer (n=103) detected with FFDM. Retrospectively, CAD sensitivity was estimated based on breast density, mammographic presentation, histopathology type, and lesion size. CAD false-positive rate was estimated with screening FFDMs from 200 women. Results: CAD detected 93% (141/151) of cancer cases: 97% (28/29) in fatty breasts, 94% (81/86) in breasts containing scattered fibroglandular densities, 90% (28/31) in heterogeneously dense breasts, and 80% (4/5) in extremely dense breasts. CAD detected 98% (54/55) of cancers manifesting as calcifications, 89% (74/83) as masses, and 100% (13/13) as mixed masses and calcifications. CAD detected 92% (73/79) of invasive ductal carcinomas, 89% (8/9) of invasive lobular carcinomas, 93% (14/15) of other invasive carcinomas, and 96% (46/48) of DCIS. CAD sensitivity for cancers 1-10 mm was 87% (47/54); 11-20 mm, 99% (70/71); 21-30 mm, 86% (12/14); and larger than 30 mm, 100% (12/12). The CAD false-positive rate was 2.5 marks per case. Conclusion: CAD with FFDM showed a high sensitivity in identifying cancers manifesting as calcifications or masses. CAD sensitivity was maintained in small lesions (1-20 mm) and invasive lobular carcinomas, which have lower mammographic sensitivity

  1. Assessment of the detectability of geo-hazards using Google Earth applied to the Three Parallel Rivers Area, Yunnan province of China

    Science.gov (United States)

    Voermans, Michiel; Mao, Zhun; Baartman, Jantiene EM; Stokes, Alexia

    2017-04-01

    Anthropogenic activities such as hydropower, mining and road construction in mountainous areas can induce and intensify mass wasting geo-hazards (e.g. landslides, gullies, rockslides). This represses local safety and socio-economic development, and endangers biodiversity at larger scale. Until today, data and knowledge to construct geo-hazard databases for further assessments are lacking. This applies in particular to countries with a recently emerged rapid economic growth, where there are no previous hazard documentations and where means to gain data from e.g. intensive fieldwork or VHR satellite imagery and DEM processing are lacking. Google Earth (GE, https://www.google.com/earth/) is a freely available and relatively simple virtual globe, map and geographical information program, which is potentially useful in detecting geo-hazards. This research aimed at (i) testing the capability of Google Earth to detect locations of geo-hazards and (ii) identifying factors affecting the diagnosing quality of the detection, including effects of geo-hazard dimensions, environs setting and professional background and effort of GE users. This was tested on nine geo-hazard sites following road segments in the Three Parallel Rivers Area in the Yunnan province of China, where geo-hazards are frequently occurring. Along each road site, the position and size of each geo-hazard was measured in situ. Next, independent diagnosers with varying professional experience (students, researchers, engineers etc.) were invited to detect geo-hazard occurrence along each of the eight sites via GE. Finally, the inventory and diagnostic data were compared to validate the objectives. Rates of detected geo-hazards from 30 diagnosers ranged from 10% to 48%. No strong correlations were found between the type and size of the geo-hazards and their detection rates. Also the years of expertise of the diagnosers proved not to make a difference, opposite to what may be expected. Meanwhile the amount of time

  2. Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques

    Science.gov (United States)

    Lauzon, N.; Lence, B. J.

    2002-12-01

    This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be

  3. More recent robust methods for the estimation of mean and standard deviation of data

    International Nuclear Information System (INIS)

    Kanisch, G.

    2003-01-01

    Outliers in a data set result in biased values of mean and standard deviation. One way to improve the estimation of a mean is to apply tests to identify outliers and to exclude them from the calculations. Tests according to Grubbs or to Dixon, which are frequently used in practice, especially within laboratory intercomparisons, are not very efficient in identifying outliers. Since more than ten years now so-called robust methods are used more and more, which determine mean and standard deviation by iteration and down-weighting values far from the mean, thereby diminishing the impact of outliers. In 1989 the Analytical Methods Committee of the British Royal Chemical Society published such a robust method. Since 1993 the US Environmental Protection Agency published a more efficient and quite versatile method. Mean and standard deviation are calculated by iteration and application of a special weight function for down-weighting outlier candidates. In 2000, W. Cofino et al. published a very efficient robust method which works quite different from the others. It applies methods taken from the basics of quantum mechanics, such as ''wave functions'' associated with each laboratory mean value and matrix algebra (solving eigenvalue problems). In contrast to the other ones, this method includes the individual measurement uncertainties. (orig.)

  4. Advanced methods for image registration applied to JET videos

    Energy Technology Data Exchange (ETDEWEB)

    Craciunescu, Teddy, E-mail: teddy.craciunescu@jet.uk [EURATOM-MEdC Association, NILPRP, Bucharest (Romania); Murari, Andrea [Consorzio RFX, Associazione EURATOM-ENEA per la Fusione, Padova (Italy); Gelfusa, Michela [Associazione EURATOM-ENEA – University of Rome “Tor Vergata”, Roma (Italy); Tiseanu, Ion; Zoita, Vasile [EURATOM-MEdC Association, NILPRP, Bucharest (Romania); Arnoux, Gilles [EURATOM/CCFE Fusion Association, Culham Science Centre, Abingdon, Oxon (United Kingdom)

    2015-10-15

    Graphical abstract: - Highlights: • Development of an image registration method for JET IR and fast visible cameras. • Method based on SIFT descriptors and coherent point drift points set registration technique. • Method able to deal with extremely noisy images and very low luminosity images. • Computation time compatible with the inter-shot analysis. - Abstract: The last years have witnessed a significant increase in the use of digital cameras on JET. They are routinely applied for imaging in the IR and visible spectral regions. One of the main technical difficulties in interpreting the data of camera based diagnostics is the presence of movements of the field of view. Small movements occur due to machine shaking during normal pulses while large ones may arise during disruptions. Some cameras show a correlation of image movement with change of magnetic field strength. For deriving unaltered information from the videos and for allowing correct interpretation an image registration method, based on highly distinctive scale invariant feature transform (SIFT) descriptors and on the coherent point drift (CPD) points set registration technique, has been developed. The algorithm incorporates a complex procedure for rejecting outliers. The method has been applied for vibrations correction to videos collected by the JET wide angle infrared camera and for the correction of spurious rotations in the case of the JET fast visible camera (which is equipped with an image intensifier). The method has proved to be able to deal with the images provided by this camera frequently characterized by low contrast and a high level of blurring and noise.

  5. Early Automatic Detection of Parkinson's Disease Based on Sleep Recordings

    DEFF Research Database (Denmark)

    Kempfner, Jacob; Sorensen, Helge B D; Nikolic, Miki

    2014-01-01

    SUMMARY: Idiopathic rapid-eye-movement (REM) sleep behavior disorder (iRBD) is most likely the earliest sign of Parkinson's Disease (PD) and is characterized by REM sleep without atonia (RSWA) and consequently increased muscle activity. However, some muscle twitching in normal subjects occurs...... during REM sleep. PURPOSE: There are no generally accepted methods for evaluation of this activity and a normal range has not been established. Consequently, there is a need for objective criteria. METHOD: In this study we propose a full-automatic method for detection of RSWA. REM sleep identification...... the number of outliers during REM sleep was used as a quantitative measure of muscle activity. RESULTS: The proposed method was able to automatically separate all iRBD test subjects from healthy elderly controls and subjects with periodic limb movement disorder. CONCLUSION: The proposed work is considered...

  6. Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

    Science.gov (United States)

    Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

    2015-09-01

    Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.

  7. Image Denoising Using Interquartile Range Filter with Local Averaging

    OpenAIRE

    Jassim, Firas Ajil

    2013-01-01

    Image denoising is one of the fundamental problems in image processing. In this paper, a novel approach to suppress noise from the image is conducted by applying the interquartile range (IQR) which is one of the statistical methods used to detect outlier effect from a dataset. A window of size kXk was implemented to support IQR filter. Each pixel outside the IQR range of the kXk window is treated as noisy pixel. The estimation of the noisy pixels was obtained by local averaging. The essential...

  8. Fourier Transform Infrared Radiation Spectroscopy Applied for Wood Rot Decay and Mould Fungi Growth Detection

    Directory of Open Access Journals (Sweden)

    Bjørn Petter Jelle

    2012-01-01

    Full Text Available Material characterization may be carried out by the attenuated total reflectance (ATR Fourier transform infrared (FTIR radiation spectroscopical technique, which represents a powerful experimental tool. The ATR technique may be applied on both solid state materials, liquids, and gases with none or only minor sample preparations, also including materials which are nontransparent to IR radiation. This facilitation is made possible by pressing the sample directly onto various crystals, for example, diamond, with high refractive indices, in a special reflectance setup. Thus ATR saves time and enables the study of materials in a pristine condition, that is, the comprehensive sample preparation by pressing thin KBr pellets in traditional FTIR transmittance spectroscopy is hence avoided. Materials and their ageing processes, both ageing by natural and accelerated climate exposure, decomposition and formation of chemical bonds and products, may be studied in an ATR-FTIR analysis. In this work, the ATR-FTIR technique is utilized to detect wood rot decay and mould fungi growth on various building material substrates. An experimental challenge and aim is to be able to detect the wood rot decay and mould fungi growth at early stages when it is barely visible to the naked eye. Another goal is to be able to distinguish between various species of fungi and wood rot.

  9. AnyOut : Anytime Outlier Detection Approach for High-dimensional Data

    DEFF Research Database (Denmark)

    Assent, Ira; Kranen, Philipp; Baldauf, Corinna

    2012-01-01

    With the increase of sensor and monitoring applications, data mining on streaming data is receiving increasing research attention. As data is continuously generated, mining algorithms need to be able to analyze the data in a one-pass fashion. In many applications the rate at which the data objects...

  10. Anomaly Detection in Smart Metering Infrastructure with the Use of Time Series Analysis

    Directory of Open Access Journals (Sweden)

    Tomasz Andrysiak

    2017-01-01

    Full Text Available The article presents solutions to anomaly detection in network traffic for critical smart metering infrastructure, realized with the use of radio sensory network. The structure of the examined smart meter network and the key security aspects which have influence on the correct performance of an advanced metering infrastructure (possibility of passive and active cyberattacks are described. An effective and quick anomaly detection method is proposed. At its initial stage, Cook’s distance was used for detection and elimination of outlier observations. So prepared data was used to estimate standard statistical models based on exponential smoothing, that is, Brown’s, Holt’s, and Winters’ models. To estimate possible fluctuations in forecasts of the implemented models, properly parameterized Bollinger Bands was used. Next, statistical relations between the estimated traffic model and its real variability were examined to detect abnormal behavior, which could indicate a cyberattack attempt. An update procedure of standard models in case there were significant real network traffic fluctuations was also proposed. The choice of optimal parameter values of statistical models was realized as forecast error minimization. The results confirmed efficiency of the presented method and accuracy of choice of the proper statistical model for the analyzed time series.

  11. Detecting unknown attacks in wireless sensor networks that contain mobile nodes.

    Science.gov (United States)

    Banković, Zorana; Fraga, David; Moya, José M; Vallejo, Juan Carlos

    2012-01-01

    As wireless sensor networks are usually deployed in unattended areas, security policies cannot be updated in a timely fashion upon identification of new attacks. This gives enough time for attackers to cause significant damage. Thus, it is of great importance to provide protection from unknown attacks. However, existing solutions are mostly concentrated on known attacks. On the other hand, mobility can make the sensor network more resilient to failures, reactive to events, and able to support disparate missions with a common set of sensors, yet the problem of security becomes more complicated. In order to address the issue of security in networks with mobile nodes, we propose a machine learning solution for anomaly detection along with the feature extraction process that tries to detect temporal and spatial inconsistencies in the sequences of sensed values and the routing paths used to forward these values to the base station. We also propose a special way to treat mobile nodes, which is the main novelty of this work. The data produced in the presence of an attacker are treated as outliers, and detected using clustering techniques. These techniques are further coupled with a reputation system, in this way isolating compromised nodes in timely fashion. The proposal exhibits good performances at detecting and confining previously unseen attacks, including the cases when mobile nodes are compromised.

  12. Much of the variation in breast pathology quality assurance data in the UK can be explained by the random order in which cases arrive at individual centres, but some true outliers do exist.

    Science.gov (United States)

    Cross, Simon S; Stephenson, Timothy J; Harrison, Robert F

    2011-10-01

    To investigate the role of random temporal order of patient arrival at screening centres in the variability seen in rates of node positivity and breast cancer grade between centres in the NHS Breast Screening Programme. Computer simulations were performed of the variation in node positivity and breast cancer grade with the random temporal arrival of patients at screening centres based on national UK audit data. Cumulative mean graphs of these data were plotted. Confidence intervals for the parameters were generated, using the binomial distribution. UK audit data were plotted on these control limit graphs. The results showed that much of the variability in the audit data could be accounted for by the effects of random order of arrival of cases at the screening centres. Confidence intervals of 99.7% identified true outliers in the data. Much of the variation in breast pathology quality assurance data in the UK can be explained by the random order in which cases arrive at individual centres. Control charts with confidence intervals of 99.7% plotted against the number of reported cases are useful tools for identification of true outliers. 2011 Blackwell Publishing Limited.

  13. Preliminary studies on DNA retardation by MutS applied to the detection of point mutations in clinical samples

    International Nuclear Information System (INIS)

    Stanislawska-Sachadyn, Anna; Paszko, Zygmunt; Kluska, Anna; Skasko, Elzibieta; Sromek, Maria; Balabas, Aneta; Janiec-Jankowska, Aneta; Wisniewska, Alicja; Kur, Jozef; Sachadyn, Pawel

    2005-01-01

    MutS ability to bind DNA mismatches was applied to the detection of point mutations in PCR products. MutS recognized mismatches from single up to five nucleotides and retarded the electrophoretic migration of mismatched DNA. The electrophoretic detection of insertions/deletions above three nucleotides is also possible without MutS, thanks to the DNA mobility shift caused by the presence of large insertion/deletion loops in the heteroduplex DNA. Thus, the method enables the search for a broad range of mutations: from single up to several nucleotides. The mobility shift assays were carried out in polyacrylamide gels stained with SYBR-Gold. One assay required 50-200 ng of PCR product and 1-3 μg of Thermus thermophilus his 6 -MutS protein. The advantages of this approach are: the small amounts of DNA required for the examination, simple and fast staining, no demand for PCR product purification, no labelling and radioisotopes required. The method was tested in the detection of cancer predisposing mutations in RET, hMSH2, hMLH1, BRCA1, BRCA2 and NBS1 genes. The approach appears to be promising in screening for unknown point mutations

  14. 'Intelligent' triggering methodology for improved detectability of wavelength modulation diode laser absorption spectrometry applied to window-equipped graphite furnaces

    International Nuclear Information System (INIS)

    Gustafsson, Joergen; Axner, Ove

    2003-01-01

    The wavelength modulation-diode laser absorption spectrometry (WM-DLAS) technique experiences a limited detectability when window-equipped sample compartments are used because of multiple reflections between components in the optical system (so-called etalon effects). The problem is particularly severe when the technique is used with a window-equipped graphite furnace (GF) as atomizer since the heating of the furnace induces drifts of the thickness of the windows and thereby also of the background signals. This paper presents a new detection methodology for WM-DLAS applied to a window-equipped GF in which the influence of the background signals from the windows is significantly reduced. The new technique, which is based upon a finding that the WM-DLAS background signals from a window-equipped GF are reproducible over a considerable period of time, consists of a novel 'intelligent' triggering procedure in which the GF is triggered at a user-chosen 'position' in the reproducible drift-cycle of the WM-DLAS background signal. The new methodology makes also use of 'higher-than-normal' detection harmonics, i.e. 4f or 6f, since these previously have shown to have a higher signal-to-background ratio than 2f-detection when the background signals originates from thin etalons. The results show that this new combined background-drift-reducing methodology improves the limit of detection of the WM-DLAS technique used with a window-equipped GF by several orders of magnitude as compared to ordinary 2f-detection, resulting in a limit of detection for a window-equipped GF that is similar to that of an open GF

  15. Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons.

    Science.gov (United States)

    Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

    2016-04-26

    Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target's location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of EKF algorithm and 15.77% more accurate than EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of EKF algorithm and 21.41% better than EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment.

  16. Online platform for applying space–time scan statistics for prospectively detecting emerging hot spots of dengue fever

    Directory of Open Access Journals (Sweden)

    Chien-Chou Chen

    2016-11-01

    Full Text Available Abstract Background Cases of dengue fever have increased in areas of Southeast Asia in recent years. Taiwan hit a record-high 42,856 cases in 2015, with the majority in southern Tainan and Kaohsiung Cities. Leveraging spatial statistics and geo-visualization techniques, we aim to design an online analytical tool for local public health workers to prospectively identify ongoing hot spots of dengue fever weekly at the village level. Methods A total of 57,516 confirmed cases of dengue fever in 2014 and 2015 were obtained from the Taiwan Centers for Disease Control (TCDC. Incorporating demographic information as covariates with cumulative cases (365 days in a discrete Poisson model, we iteratively applied space–time scan statistics by SaTScan software to detect the currently active cluster of dengue fever (reported as relative risk in each village of Tainan and Kaohsiung every week. A village with a relative risk >1 and p value <0.05 was identified as a dengue-epidemic area. Assuming an ongoing transmission might continuously spread for two consecutive weeks, we estimated the sensitivity and specificity for detecting outbreaks by comparing the scan-based classification (dengue-epidemic vs. dengue-free village with the true cumulative case numbers from the TCDC’s surveillance statistics. Results Among the 1648 villages in Tainan and Kaohsiung, the overall sensitivity for detecting outbreaks increases as case numbers grow in a total of 92 weekly simulations. The specificity for detecting outbreaks behaves inversely, compared to the sensitivity. On average, the mean sensitivity and specificity of 2-week hot spot detection were 0.615 and 0.891 respectively (p value <0.001 for the covariate adjustment model, as the maximum spatial and temporal windows were specified as 50% of the total population at risk and 28 days. Dengue-epidemic villages were visualized and explored in an interactive map. Conclusions We designed an online analytical tool for

  17. Unbalance detection in rotor systems with active bearings using self-sensing piezoelectric actuators

    Science.gov (United States)

    Ambur, Ramakrishnan; Rinderknecht, Stephan

    2018-03-01

    Machines which are developed today are highly automated due to increased use of mechatronic systems. To ensure their reliable operation, fault detection and isolation (FDI) is an important feature along with a better control. This research work aims to achieve and integrate both these functions with minimum number of components in a mechatronic system. This article investigates a rotating machine with active bearings equipped with piezoelectric actuators. There is an inherent coupling between their electrical and mechanical properties because of which they can also be used as sensors. Mechanical deflection can be reconstructed from these self-sensing actuators from measured voltage and current signals. These virtual sensor signals are utilised to detect unbalance in a rotor system. Parameters of unbalance such as its magnitude and phase are detected by parametric estimation method in frequency domain. Unbalance location has been identified using hypothesis of localization of faults. Robustness of the estimates against outliers in measurements is improved using weighted least squares method. Unbalances are detected in a real test bench apart from simulation using its model. Experiments are performed in stationary as well as in transient case. As a further step unbalances are estimated during simultaneous actuation of actuators in closed loop with an adaptive algorithm for vibration minimisation. This strategy could be used in systems which aim for both fault detection and control action.

  18. Leak detection in the primary reactor coolant piping of nuclear power plant by applying beam-microphone technology

    International Nuclear Information System (INIS)

    Kasai, Yoshimitsu; Shimanskiy, Sergey; Naoi, Yosuke; Kanazawa, Junichi

    2004-01-01

    A microphone leak detection method was applied to the inlet piping of the ATR-prototype reactor, Fugen. Statistical analysis results showed that the cross-correlation method provided the effective results for detection of a small leakage. However, such a technique has limited application due to significant distortion of the signals on the reactor site. As one of the alternative methods, the beam-microphone provides necessary spatial selectivity and its performance is less affected by signal distortion. A prototype of the beam-microphone was developed and then tested at the O-arai Engineering Center of the Japan Nuclear Cycle Development Institute (JNC). On-site testing of the beam-microphone was carried out in the inlet piping room of an RBMK reactor of the Leningrad Nuclear Power Plant (LNPP) in Russia. A leak sound imitator was used to simulate the leakage sound under the leakage flow condition of 1-3 gpm (0.23-0.7 m 3 /h). Analysis showed that signal distortion does not seriously affect the performance of this method, and that sound reflection may result in the appearance of ghost sound sources. The test results showed that the influences of sound reflection and background noise were smaller at the high frequencies where the leakage location could be estimated with an angular accuracy of 5deg which is the range of localization accuracy required for the leak detection system. (author)

  19. Evaluation of Techniques to Detect Significant Network Performance Problems using End-to-End Active Network Measurements

    Energy Technology Data Exchange (ETDEWEB)

    Cottrell, R.Les; Logg, Connie; Chhaparia, Mahesh; /SLAC; Grigoriev, Maxim; /Fermilab; Haro, Felipe; /Chile U., Catolica; Nazir, Fawad; /NUST, Rawalpindi; Sandford, Mark

    2006-01-25

    End-to-End fault and performance problems detection in wide area production networks is becoming increasingly hard as the complexity of the paths, the diversity of the performance, and dependency on the network increase. Several monitoring infrastructures are built to monitor different network metrics and collect monitoring information from thousands of hosts around the globe. Typically there are hundreds to thousands of time-series plots of network metrics which need to be looked at to identify network performance problems or anomalous variations in the traffic. Furthermore, most commercial products rely on a comparison with user configured static thresholds and often require access to SNMP-MIB information, to which a typical end-user does not usually have access. In our paper we propose new techniques to detect network performance problems proactively in close to realtime and we do not rely on static thresholds and SNMP-MIB information. We describe and compare the use of several different algorithms that we have implemented to detect persistent network problems using anomalous variations analysis in real end-to-end Internet performance measurements. We also provide methods and/or guidance for how to set the user settable parameters. The measurements are based on active probes running on 40 production network paths with bottlenecks varying from 0.5Mbits/s to 1000Mbit/s. For well behaved data (no missed measurements and no very large outliers) with small seasonal changes most algorithms identify similar events. We compare the algorithms' robustness with respect to false positives and missed events especially when there are large seasonal effects in the data. Our proposed techniques cover a wide variety of network paths and traffic patterns. We also discuss the applicability of the algorithms in terms of their intuitiveness, their speed of execution as implemented, and areas of applicability. Our encouraging results compare and evaluate the accuracy of our

  20. feets: feATURE eXTRACTOR for tIME sERIES

    Science.gov (United States)

    Cabral, Juan; Sanchez, Bruno; Ramos, Felipe; Gurovich, Sebastián; Granitto, Pablo; VanderPlas, Jake

    2018-06-01

    feets characterizes and analyzes light-curves from astronomical photometric databases for modelling, classification, data cleaning, outlier detection and data analysis. It uses machine learning algorithms to determine the numerical descriptors that characterize and distinguish the different variability classes of light-curves; these range from basic statistical measures such as the mean or standard deviation to complex time-series characteristics such as the autocorrelation function. The library is not restricted to the astronomical field and could also be applied to any kind of time series. This project is a derivative work of FATS (ascl:1711.017).

  1. A comparison of damage detection methods applied to civil engineering structures

    DEFF Research Database (Denmark)

    Gres, Szymon; Andersen, Palle; Johansen, Rasmus Johan

    2018-01-01

    Facilitating detection of early-stage damage is crucial for in-time repairs and cost-optimized maintenance plans of civil engineering structures. Preferably, the damage detection is performed by use of output vibration data, hereby avoiding modal identification of the structure. Most of the work...

  2. A comparison of damage detection methods applied to civil engineering structures

    DEFF Research Database (Denmark)

    Gres, Szymon; Andersen, Palle; Johansen, Rasmus Johan

    2017-01-01

    Facilitating detection of early-stage damage is crucial for in-time repairs and cost-optimized maintenance plans of civil engineering structures. Preferably, the damage detection is performed by use of output vibration data, hereby avoiding modal identification of the structure. Most of the work...

  3. Incremental Activation Detection for Real-Time fMRI Series Using Robust Kalman Filter

    Directory of Open Access Journals (Sweden)

    Liang Li

    2014-01-01

    Full Text Available Real-time functional magnetic resonance imaging (rt-fMRI is a technique that enables us to observe human brain activations in real time. However, some unexpected noises that emerged in fMRI data collecting, such as acute swallowing, head moving and human manipulations, will cause much confusion and unrobustness for the activation analysis. In this paper, a new activation detection method for rt-fMRI data is proposed based on robust Kalman filter. The idea is to add a variation to the extended kalman filter to handle the additional sparse measurement noise and a sparse noise term to the measurement update step. Hence, the robust Kalman filter is designed to improve the robustness for the outliers and can be computed separately for each voxel. The algorithm can compute activation maps on each scan within a repetition time, which meets the requirement for real-time analysis. Experimental results show that this new algorithm can bring out high performance in robustness and in real-time activation detection.

  4. A measurement-based fault detection approach applied to monitor robots swarm

    KAUST Repository

    Khaldi, Belkacem

    2017-07-10

    Swarm robotics requires continuous monitoring to detect abnormal events and to sustain normal operations. Indeed, swarm robotics with one or more faulty robots leads to degradation of performances complying with the target requirements. This paper present an innovative data-driven fault detection method for monitoring robots swarm. The method combines the flexibility of principal component analysis (PCA) models and the greater sensitivity of the exponentially-weighted moving average control chart to incipient changes. We illustrate through simulated data collected from the ARGoS simulator that a significant improvement in fault detection can be obtained by using the proposed methods as compared to the use of the conventional PCA-based methods.

  5. Applying Parametric Fault Detection to a Mechanical System

    DEFF Research Database (Denmark)

    Felício, P.; Stoustrup, Jakob; Niemann, H.

    2002-01-01

    A way of doing parametric fault detection is described. It is based on the representation of parameter changes as linear fractional transformations (lfts). We describe a model with parametric uncertainty. Then a stabilizing controller is chosen and its robustness properties are studied via mu. Th....... The parameter changes (faults) are estimated based on estimates of the fictitious signals that enter the delta block in the lft. These signal estimators are designed by H-infinity techniques. The chosen example is an inverted pendulum....

  6. Real-Time Monitoring System Using Smartphone-Based Sensors and NoSQL Database for Perishable Supply Chain

    Directory of Open Access Journals (Sweden)

    Ganjar Alfian

    2017-11-01

    Full Text Available Since customer attention is increasing due to growing customer health awareness, it is important for the perishable food supply chain to monitor food quality and safety. This study proposes a real-time monitoring system that utilizes smartphone-based sensors and a big data platform. Firstly, we develop a smartphone-based sensor to gather temperature, humidity, GPS, and image data. The IoT-generated sensor on the smartphone has characteristics such as a large amount of storage, an unstructured format, and continuous data generation. Thus, in this study, we propose an effective big data platform design to handle IoT-generated sensor data. Furthermore, the abnormal sensor data generated by failed sensors is called outliers and may arise in real cases. The proposed system utilizes outlier detection based on statistical and clustering approaches to filter out the outlier data. The proposed system was evaluated for system and gateway performance and tested on the kimchi supply chain in Korea. The results showed that the proposed system is capable of processing a massive input/output of sensor data efficiently when the number of sensors and clients increases. The current commercial smartphones are sufficiently capable of combining their normal operations with simultaneous performance as gateways for transmitting sensor data to the server. In addition, the outlier detection based on the 3-sigma and DBSCAN were used to successfully detect/classify outlier data as separate from normal sensor data. This study is expected to help those who are responsible for developing the real-time monitoring system and implementing critical strategies related to the perishable supply chain.

  7. Applying ISO 11929:2010 Standard to detection limit calculation in least-squares based multi-nuclide gamma-ray spectrum evaluation

    Energy Technology Data Exchange (ETDEWEB)

    Kanisch, G., E-mail: guenter.kanisch@hanse.net

    2017-05-21

    The concepts of ISO 11929 (2010) are applied to evaluation of radionuclide activities from more complex multi-nuclide gamma-ray spectra. From net peak areas estimated by peak fitting, activities and their standard uncertainties are calculated by weighted linear least-squares method with an additional step, where uncertainties of the design matrix elements are taken into account. A numerical treatment of the standard's uncertainty function, based on ISO 11929 Annex C.5, leads to a procedure for deriving decision threshold and detection limit values. The methods shown allow resolving interferences between radionuclide activities also in case of calculating detection limits where they can improve the latter by including more than one gamma line per radionuclide. The co'mmon single nuclide weighted mean is extended to an interference-corrected (generalized) weighted mean, which, combined with the least-squares method, allows faster detection limit calculations. In addition, a new grouped uncertainty budget was inferred, which for each radionuclide gives uncertainty budgets from seven main variables, such as net count rates, peak efficiencies, gamma emission intensities and others; grouping refers to summation over lists of peaks per radionuclide.

  8. A coupled classification - evolutionary optimization model for contamination event detection in water distribution systems.

    Science.gov (United States)

    Oliker, Nurit; Ostfeld, Avi

    2014-03-15

    This study describes a decision support system, alerts for contamination events in water distribution systems. The developed model comprises a weighted support vector machine (SVM) for the detection of outliers, and a following sequence analysis for the classification of contamination events. The contribution of this study is an improvement of contamination events detection ability and a multi-dimensional analysis of the data, differing from the parallel one-dimensional analysis conducted so far. The multivariate analysis examines the relationships between water quality parameters and detects changes in their mutual patterns. The weights of the SVM model accomplish two goals: blurring the difference between sizes of the two classes' data sets (as there are much more normal/regular than event time measurements), and adhering the time factor attribute by a time decay coefficient, ascribing higher importance to recent observations when classifying a time step measurement. All model parameters were determined by data driven optimization so the calibration of the model was completely autonomic. The model was trained and tested on a real water distribution system (WDS) data set with randomly simulated events superimposed on the original measurements. The model is prominent in its ability to detect events that were only partly expressed in the data (i.e., affecting only some of the measured parameters). The model showed high accuracy and better detection ability as compared to previous modeling attempts of contamination event detection. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Long-range alpha detection applied to soil contamination and waste monitoring

    International Nuclear Information System (INIS)

    MacArthur, D.W.; Allander, K.S.; Bounds, J.A.; Close, D.A.; McAtee, J.L.

    1992-01-01

    Alpha contamination monitoring has been traditionally limited by the short range of alpha particles in air and through detector windows. The long-range alpha detector (LRAD) described in this paper circumvents that limitation by detecting alpha-produced ions, rather than alpha particles directly. Since the LRAD is sensitive to all ions, it can monitor all contamination present on a large surface at one time. Because air is the ''detector gas,'' the LRAD can detect contamination on any surface to which air can penetrate. We present data showing the sensitivity of LRAD detectors, as well as documenting their ability to detect alpha sources in previously unmonitorable locations, and verifying the ion lifetime. Specific designs and results for soil contamination and waste monitors are also included

  10. High performance liquid chromatography-charged aerosol detection applying an inverse gradient for quantification of rhamnolipid biosurfactants.

    Science.gov (United States)

    Behrens, Beate; Baune, Matthias; Jungkeit, Janek; Tiso, Till; Blank, Lars M; Hayen, Heiko

    2016-07-15

    A method using high performance liquid chromatography coupled to charged-aerosol detection (HPLC-CAD) was developed for the quantification of rhamnolipid biosurfactants. Qualitative sample composition was determined by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). The relative quantification of different derivatives of rhamnolipids including di-rhamnolipids, mono-rhamnolipids, and their precursors 3-(3-hydroxyalkanoyloxy)alkanoic acids (HAAs) differed for two compared LC-MS instruments and revealed instrument dependent responses. Our here reported HPLC-CAD method provides uniform response. An inverse gradient was applied for the absolute quantification of rhamnolipid congeners to account for the detector's dependency on the solvent composition. The CAD produces a uniform response not only for the analytes but also for structurally different (nonvolatile) compounds. It was demonstrated that n-dodecyl-β-d-maltoside or deoxycholic acid can be used as alternative standards. The method of HPLC-ultra violet (UV) detection after a derivatization of rhamnolipids and HAAs to their corresponding phenacyl esters confirmed the obtained results but required additional, laborious sample preparation steps. Sensitivity determined as limit of detection and limit of quantification for four mono-rhamnolipids was in the range of 0.3-1.0 and 1.2-2.0μg/mL, respectively, for HPLC-CAD and 0.4 and 1.5μg/mL, respectively, for HPLC-UV. Linearity for HPLC-CAD was at least 0.996 (R(2)) in the calibrated range of about 1-200μg/mL. Hence, the here presented HPLC-CAD method allows absolute quantification of rhamnolipids and derivatives. Copyright © 2016 Elsevier B.V. All rights reserved.

  11. Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

    Directory of Open Access Journals (Sweden)

    Aude Chabirand

    Full Text Available A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007, Pelletier (2009 and under patent oligonucleotides achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper

  12. Outliers in American juvenile justice: the need for statutory reform in North Carolina and New York.

    Science.gov (United States)

    Tedeschi, Frank; Ford, Elizabeth

    2015-05-01

    There is a well-established and growing body of evidence from research that adolescents who commit crimes differ in many regards from their adult counterparts and are more susceptible to the negative effects of adjudication and incarceration in adult criminal justice systems. The age of criminal court jurisdiction in the United States has varied throughout history; yet, there are only two remaining states, New York and North Carolina, that continue to automatically charge 16 year olds as adults. This review traces the statutory history of juvenile justice in these two states with an emphasis on political and social factors that have contributed to their outlier status related to the age of criminal court jurisdiction. The neurobiological, psychological, and developmental aspects of the adolescent brain and personality, and how those issues relate both to a greater likelihood of rehabilitation in appropriate settings and to greater vulnerability in adult correctional facilities, are also reviewed. The importance of raising the age in New York and North Carolina not only lies in protecting incarcerated youths but also in preventing the associated stigma following release. Mental health practitioners are vital to the process of local and national juvenile justice reform. They can serve as experts on and advocates for appropriate mental health care and as experts on the adverse effects of the adult criminal justice system on adolescents.

  13. Comparison of algorithms for blood stain detection applied to forensic hyperspectral imagery

    Science.gov (United States)

    Yang, Jie; Messinger, David W.; Mathew, Jobin J.; Dube, Roger R.

    2016-05-01

    Blood stains are among the most important types of evidence for forensic investigation. They contain valuable DNA information, and the pattern of the stains can suggest specifics about the nature of the violence that transpired at the scene. Early detection of blood stains is particularly important since the blood reacts physically and chemically with air and materials over time. Accurate identification of blood remnants, including regions that might have been intentionally cleaned, is an important aspect of forensic investigation. Hyperspectral imaging might be a potential method to detect blood stains because it is non-contact and provides substantial spectral information that can be used to identify regions in a scene with trace amounts of blood. The potential complexity of scenes in which such vast violence occurs can be high when the range of scene material types and conditions containing blood stains at a crime scene are considered. Some stains are hard to detect by the unaided eye, especially if a conscious effort to clean the scene has occurred (we refer to these as "latent" blood stains). In this paper we present the initial results of a study of the use of hyperspectral imaging algorithms for blood detection in complex scenes. We describe a hyperspectral imaging system which generates images covering 400 nm - 700 nm visible range with a spectral resolution of 10 nm. Three image sets of 31 wavelength bands were generated using this camera for a simulated indoor crime scene in which blood stains were placed on a T-shirt and walls. To detect blood stains in the scene, Principal Component Analysis (PCA), Subspace Reed Xiaoli Detection (SRXD), and Topological Anomaly Detection (TAD) algorithms were used. Comparison of the three hyperspectral image analysis techniques shows that TAD is most suitable for detecting blood stains and discovering latent blood stains.

  14. Simulation of Neutron Backscattering applied to organic material detection

    International Nuclear Information System (INIS)

    Forero, N. C.; Cruz, A. H.; Cristancho, F.

    2007-01-01

    The Neutron Backscattering technique is tested when performing the task of localizing hydrogenated explosives hidden in soil. Detector system, landmine, soil and neutron source are simulated with Geant4 in order to obtain the number of neutrons detected when several parameters like mine composition, relative position mine-source and soil moisture are varied

  15. Clustering analysis of line indices for LAMOST spectra with AstroStat

    Science.gov (United States)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  16. SAĞLAM RİDGE REGRESYON ANALİZİ VE BİR UYGULAMA

    Directory of Open Access Journals (Sweden)

    ÖZLEM ALPU

    2013-06-01

    Full Text Available The problem of multicollinearity in multiple linear regression analysis gives unreliable estimates for regression parameters with least squares methods. In addition, outliers in data set cause to lose characteristics of best, unbiased, consistent of least squares estimation. In the presence of multicollinearity and outliers in the data set, it is suggested using biased methods based on robust estimators. In this study, for the data set including outliers both x and y directed, ridge regression analysis based on some robust techniques (M, Least Trimmed Sums of Squares, Least Median of Squares, S and Generalized M is applied and the results are considered as comparatively.

  17. Application of site and haplotype-frequency based approaches for detecting selection signatures in cattle

    Directory of Open Access Journals (Sweden)

    Moore Stephen

    2011-06-01

    Full Text Available Abstract Background 'Selection signatures' delimit regions of the genome that are, or have been, functionally important and have therefore been under either natural or artificial selection. In this study, two different and complementary methods--integrated Haplotype Homozygosity Score (|iHS| and population differentiation index (FST--were applied to identify traces of decades of intensive artificial selection for traits of economic importance in modern cattle. Results We scanned the genome of a diverse set of dairy and beef breeds from Germany, Canada and Australia genotyped with a 50 K SNP panel. Across breeds, a total of 109 extreme |iHS| values exceeded the empirical threshold level of 5% with 19, 27, 9, 10 and 17 outliers in Holstein, Brown Swiss, Australian Angus, Hereford and Simmental, respectively. Annotating the regions harboring clustered |iHS| signals revealed a panel of interesting candidate genes like SPATA17, MGAT1, PGRMC2 and ACTC1, COL23A1, MATN2, respectively, in the context of reproduction and muscle formation. In a further step, a new Bayesian FST-based approach was applied with a set of geographically separated populations including Holstein, Brown Swiss, Simmental, North American Angus and Piedmontese for detecting differentiated loci. In total, 127 regions exceeding the 2.5 per cent threshold of the empirical posterior distribution were identified as extremely differentiated. In a substantial number (56 out of 127 cases the extreme FST values were found to be positioned in poor gene content regions which deviated significantly (p ST values were found in regions of some relevant genes such as SMCP and FGF1. Conclusions Overall, 236 regions putatively subject to recent positive selection in the cattle genome were detected. Both |iHS| and FST suggested selection in the vicinity of the Sialic acid binding Ig-like lectin 5 gene on BTA18. This region was recently reported to be a major QTL with strong effects on productive life

  18. Multicenter validation of PCR-based method for detection of Salmonella in chicken and pig samples

    DEFF Research Database (Denmark)

    Malorny, B.; Cook, N.; D'Agostino, M.

    2004-01-01

    As part of a standardization project, an interlaboratory trial including 15 laboratories from 13 European countries was conducted to evaluate the performance of a noproprietary polymerase chain reaction (PCR)-based method for the detection of Salmonella on artificially contaminated chicken rinse...... or positive. Outlier results caused, for example, by gross departures from the experimental protocol, were omitted from the analysis. For both the chicken rinse and the pig swab samples, the diagnostic sensitivity was 100%, with 100% accordance (repeatability) and concordance (reproducibility). The diagnostic...... specificity was 80.1% (with 85.7% accordance and 67.5% concordance) for chicken rinse, and 91.7% (with 100% accordance and 83.3% concordance) for pig swab. Thus, the interlaboratory variation due to personnel, reagents, thermal cyclers, etc., did not affect the performance of the method, which...

  19. Development of a faulty reactivity detection system applying a digital H∞ estimator

    International Nuclear Information System (INIS)

    Suzuki, Katsuo; Suzudo, Tomoaki; Nabeshima, Kunihiko

    2004-01-01

    This paper concerns an application of digital optimal H ∞ estimator to the detection of faulty reactivity in real-time. The detection system, fundamentally based on the reactivity balance method, is composed of three modules, i.e. the net reactivity estimator, the feedback reactivity estimator and the reactivity balance circuit. H ∞ optimal filters are used for these two reactivity estimators, and the nonlinear neutronics are taken into consideration especially for the design of the net reactivity estimator. A series of performance test of the detection system are conducted by using numerical simulations of reactor dynamics with the insertion of a faulty reactivity for an experimental fast breeder reactor JOYO. The system detects the typical artificial reactivity insertions during a few seconds with no stationary offset and the accuracy of 0.1 cent, and is satisfactory for its practical use. (author)

  20. Dynamic Water Surface Detection Algorithm Applied on PROBA-V Multispectral Data

    Directory of Open Access Journals (Sweden)

    Luc Bertels

    2016-12-01

    Full Text Available Water body detection worldwide using spaceborne remote sensing is a challenging task. A global scale multi-temporal and multi-spectral image analysis method for water body detection was developed. The PROBA-V microsatellite has been fully operational since December 2013 and delivers daily near-global synthesis with a spatial resolution of 1 km and 333 m. The Red, Near-InfRared (NIR and Short Wave InfRared (SWIR bands of the atmospherically corrected 10-day synthesis images are first Hue, Saturation and Value (HSV color transformed and subsequently used in a decision tree classification for water body detection. To minimize commission errors four additional data layers are used: the Normalized Difference Vegetation Index (NDVI, Water Body Potential Mask (WBPM, Permanent Glacier Mask (PGM and Volcanic Soil Mask (VSM. Threshold values on the hue and value bands, expressed by a parabolic function, are used to detect the water bodies. Beside the water bodies layer, a quality layer, based on the water bodies occurrences, is available in the output product. The performance of the Water Bodies Detection Algorithm (WBDA was assessed using Landsat 8 scenes over 15 regions selected worldwide. A mean Commission Error (CE of 1.5% was obtained while a mean Omission Error (OE of 15.4% was obtained for minimum Water Surface Ratio (WSR = 0.5 and drops to 9.8% for minimum WSR = 0.6. Here, WSR is defined as the fraction of the PROBA-V pixel covered by water as derived from high spatial resolution images, e.g., Landsat 8. Both the CE = 1.5% and OE = 9.8% (WSR = 0.6 fall within the user requirements of 15%. The WBDA is fully operational in the Copernicus Global Land Service and products are freely available.

  1. Shell-vial culture and real-time PCR applied to Rickettsia typhi and Rickettsia felis detection.

    Science.gov (United States)

    Segura, Ferran; Pons, Immaculada; Pla, Júlia; Nogueras, María-Mercedes

    2015-11-01

    Murine typhus is a zoonosis transmitted by fleas, whose etiological agent is Rickettsia typhi. Rickettsia felis infection can produces similar symptoms. Both are intracellular microorganisms. Therefore, their diagnosis is difficult and their infections can be misdiagnosed. Early diagnosis prevents severity and inappropriate treatment regimens. Serology can't be applied during the early stages of infection because it requires seroconversion. Shell-vial (SV) culture assay is a powerful tool to detect Rickettsia. The aim of the study was to optimize SV using a real-time PCR as monitoring method. Moreover, the study analyzes which antibiotics are useful to isolate these microorganisms from fleas avoiding contamination by other bacteria. For the first purpose, SVs were inoculated with each microorganism. They were incubated at different temperatures and monitored by real-time PCR and classical methods (Gimenez staining and indirect immunofluorescence assay). R. typhi grew at all temperatures. R. felis grew at 28 and 32 °C. Real-time PCR was more sensitive than classical methods and it detected microorganisms much earlier. Besides, the assay sensitivity was improved by increasing the number of SV. For the second purpose, microorganisms and fleas were incubated and monitored in different concentrations of antibiotics. Gentamicin, sufamethoxazole, trimethoprim were useful for R. typhi isolation. Gentamicin, streptomycin, penicillin, and amphotericin B were useful for R. felis isolation. Finally, the optimized conditions were used to isolate R. felis from fleas collected at a veterinary clinic. R. felis was isolated at 28 and 32 °C. However, successful establishment of cultures were not possible probably due to sub-optimal conditions of samples.

  2. INTERPRETING THE DISTANCE CORRELATION RESULTS FOR THE COMBO-17 SURVEY

    International Nuclear Information System (INIS)

    Richards, Mercedes T.; Richards, Donald St. P.; Martínez-Gómez, Elizabeth

    2014-01-01

    The accurate classification of galaxies in large-sample astrophysical databases of galaxy clusters depends sensitively on the ability to distinguish between morphological types, especially at higher redshifts. This capability can be enhanced through a new statistical measure of association and correlation, called the distance correlation coefficient, which has more statistical power to detect associations than does the classical Pearson measure of linear relationships between two variables. The distance correlation measure offers a more precise alternative to the classical measure since it is capable of detecting nonlinear relationships that may appear in astrophysical applications. We showed recently that the comparison between the distance and Pearson correlation coefficients can be used effectively to isolate potential outliers in various galaxy data sets, and this comparison has the ability to confirm the level of accuracy associated with the data. In this work, we elucidate the advantages of distance correlation when applied to large databases. We illustrate how the distance correlation measure can be used effectively as a tool to confirm nonlinear relationships between various variables in the COMBO-17 database, including the lengths of the major and minor axes, and the alternative redshift distribution. For these outlier pairs, the distance correlation coefficient is routinely higher than the Pearson coefficient since it is easier to detect nonlinear relationships with distance correlation. The V-shaped scatter plots of Pearson versus distance correlation coefficients also reveal the patterns with increasing redshift and the contributions of different galaxy types within each redshift range

  3. Problems of applied geochemistry

    Energy Technology Data Exchange (ETDEWEB)

    Ovchinnikov, L N

    1983-01-01

    The concept of applied geochemistry was introduced for the first time by A. Ye. Fersman. He linked the branched and complicated questions of geochemistry with specific problems of developing the mineral and raw material base of our country. Geochemical prospecting and geochemistry of mineral raw materials are the most important sections of applied geochemistry. This now allows us the right to view applied geochemistry as a sector of science which applies geochemical methodology, set of geochemical methods of analysis, synthesis, geological interpretation of data based on laws governing theoretical geochemistry to the solution of different tasks of geology, petrology, tectonics, stratigraphy, science of minerals and other geological sciences, and also the technology of mineral raw materials, interrelationships of man and nature (ecogeochemistry, technogeochemistry, agrogeochemistry). The main problem of applied geochemistry, geochemistry of ore fields is the prehistory of ore formation. This is especially important for metallogenic and forecasting constructions, for an understanding of the reasons for the development of fields and the detection of laws governing their distribution, their genetic links with the general geological processes and the products of these processes.

  4. Applying quantitative metabolomics based on chemical isotope labeling LC-MS for detecting potential milk adulterant in human milk.

    Science.gov (United States)

    Mung, Dorothea; Li, Liang

    2018-02-25

    There is an increasing demand for donor human milk to feed infants for various reasons including that a mother may be unable to provide sufficient amounts of milk for their child or the milk is considered unsafe for the baby. Selling and buying human milk via the Internet has gained popularity. However, there is a risk of human milk sold containing other adulterants such as animal or plant milk. Analytical tools for rapid detection of adulterants in human milk are needed. We report a quantitative metabolomics method for detecting potential milk adulterants (soy, almond, cow, goat and infant formula milk) in human milk. It is based on the use of a high-performance chemical isotope labeling (CIL) LC-MS platform to profile the metabolome of an unknown milk sample, followed by multivariate or univariate comparison of the resultant metabolomic profile with that of human milk to determine the differences. Using dansylation LC-MS to profile the amine/phenol submetabolome, we could detect an average of 4129 ± 297 (n = 9) soy metabolites, 3080 ± 470 (n = 9) almond metabolites, 4256 ± 136 (n = 18) cow metabolites, 4318 ± 198 (n = 9) goat metabolites, 4444 ± 563 (n = 9) infant formula metabolites, and 4020 ± 375 (n = 30) human metabolites. This high level of coverage allowed us to readily differentiate the six different types of samples. From the analysis of binary mixtures of human milk containing 5, 10, 25, 50 and 75% other type of milk, we demonstrated that this method could be used to detect the presence of as low as 5% adulterant in human milk. We envisage that this method could be applied to detect contaminant or adulterant in other types of food or drinks. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Robust Non-Local TV-L1 Optical Flow Estimation with Occlusion Detection.

    Science.gov (United States)

    Zhang, Congxuan; Chen, Zhen; Wang, Mingrun; Li, Ming; Jiang, Shaofeng

    2017-06-05

    In this paper, we propose a robust non-local TV-L1 optical flow method with occlusion detection to address the problem of weak robustness of optical flow estimation with motion occlusion. Firstly, a TV-L1 form for flow estimation is defined using a combination of the brightness constancy and gradient constancy assumptions in the data term and by varying the weight under the Charbonnier function in the smoothing term. Secondly, to handle the potential risk of the outlier in the flow field, a general non-local term is added in the TV-L1 optical flow model to engender the typical non-local TV-L1 form. Thirdly, an occlusion detection method based on triangulation is presented to detect the occlusion regions of the sequence. The proposed non-local TV-L1 optical flow model is performed in a linearizing iterative scheme using improved median filtering and a coarse-to-fine computing strategy. The results of the complex experiment indicate that the proposed method can overcome the significant influence of non-rigid motion, motion occlusion, and large displacement motion. Results of experiments comparing the proposed method and existing state-of-the-art methods by respectively using Middlebury and MPI Sintel database test sequences show that the proposed method has higher accuracy and better robustness.

  6. A pilot study of dentists' assessment of caries detection and staging systems applied to early caries: PEARL Network findings.

    Science.gov (United States)

    Thompson, Van P; Schenkel, Andrew B; Penugonda, Bapanaiah; Wolff, Mark S; Zeller, Gregory G; Wu, Hongyu; Vena, Don; Grill, Ashley C; Curro, Frederick A

    2016-01-01

    The International Caries Detection and Assessment System (ICDAS II) and the Caries Classification System (CCS) are caries stage description systems proposed for adoption into clinical practice. This pilot study investigated clinicians' training in and use of these systems for detection of early caries and recommendations for individual tooth treatment. Patient participants (N = 8) with a range of noncavitated lesions (CCS ranks 2 and 4 and ICDAS II ranks 2-4) identified by a team of calibrated examiners were recruited from the New York University College of Dentistry clinic. Eighteen dentists-8 from the Practitioners Engaged in Applied Research and Learning (PEARL) Network and 10 recruited from the Academy of General Dentistry-were randomly assigned to 1 of 3 groups: 5 dentists used only visual-tactile (VT) examination, 7 were trained in the ICDAS II, and 6 were trained in the CCS. Lesion stage for each tooth was determined by the ICDAS II and CCS groups, and recommended treatment was decided by all groups. Teeth were assessed both with and without radiographs. Caries was detected in 92.7% (95% CI, 88%-96%) of the teeth by dentists with CCS training, 88.8% (95% CI, 84%-92%) of the teeth by those with ICDAS II training, and 62.3% (95% CI, 55%-69%) of teeth by the VT group. Web-based training was acceptable to all dentists in the CCS group (6 of 6) but fewer of the dentists in the ICDAS II group (5 of 7). The modified CCS translated clinically to more accurate caries detection, particularly compared to detection by untrained dentists (VT group). Moreover, the CCS was more accepted than was the ICDAS II, but dentists in both groups were open to the application of these systems. Agreement on caries staging requires additional training prior to a larger validation study.

  7. Image processing techniques applied to the detection of optic disk: a comparison

    Science.gov (United States)

    Kumari, Vijaya V.; Narayanan, Suriya N.

    2010-02-01

    In retinal image analysis, the detection of optic disk is of paramount importance. It facilitates the tracking of various anatomical features and also in the extraction of exudates, drusens etc., present in the retina of human eye. The health of retina crumbles with age in some people during the presence of exudates causing Diabetic Retinopathy. The existence of exudates increases the risk for age related macular Degeneration (AMRD) and it is the leading cause for blindness in people above the age of 50.A prompt diagnosis when the disease is at the early stage can help to prevent irreversible damages to the diabetic eye. Screening to detect diabetic retinopathy helps to prevent the visual loss. The optic disk detection is the rudimentary requirement for the screening. In this paper few methods for optic disk detection were compared which uses both the properties of optic disk and model based approaches. They are uniquely used to give accurate results in the retinal images.

  8. Data Quality Assessment and Recommendations to Improve the Quality of Hemodialysis Database

    Directory of Open Access Journals (Sweden)

    Neda Firouraghi

    2018-01-01

    Full Text Available Introduction: Since clinical data contain abnormalities, quality assessment and reporting of data errors are necessary. Data quality analysis consists of developing strategies, making recommendations to avoid future errors and improving the quality of data entry by identifying error types and their causes. Therefore, this approach can be extremely useful to improve the quality of the databases. The aim of this study was to analyze hemodialysis (HD patients’ data in order to improve the quality of data entry and avoid future errors. Method: The study was done on Shiraz University of Medical Sciences HD database in 2015. The database consists of 2367 patients who had at least 12 months follow up (22.34±11.52 months in 2012-2014. Duplicated data were removed; outliers were detected based on statistical methods, expert opinion and the relationship between variables; then, the missing values were handled in 72 variables by using IBM SPSS Statistics 22 in order to improve the quality of the database. According to the results, some recommendations were given to improve the data entry process. Results: The variables had outliers in the range of 0-9.28 percent. Seven variables had missing values over 20 percent and in the others they were between 0 and 19.73 percent. The majority of missing values belong to serum alkaline phosphatase, uric acid, high and low density lipoprotein, total iron binding capacity, hepatitis B surface antibody titer, and parathyroid hormone. The variables with displacement (the values of two or more variables were recorded in the wrong attribute were weight, serum creatinine, blood urea nitrogen, systolic and diastolic blood pressure. These variables may lead to decreased data quality. Conclusion: According to the results and expert opinion, applying some data entry principles, such as defining ranges of values, using the relationship between hemodialysis features, developing alert systems about empty or duplicated data and

  9. Identifying genetic signatures of selection in a non-model species, alpine gentian (Gentiana nivalis L.), using a landscape genetic approach

    DEFF Research Database (Denmark)

    Bothwell, H.; Bisbing, S.; Therkildsen, Nina Overgaard

    2013-01-01

    It is generally accepted that most plant populations are locally adapted. Yet, understanding how environmental forces give rise to adaptive genetic variation is a challenge in conservation genetics and crucial to the preservation of species under rapidly changing climatic conditions. Environmental...... loci, we compared outlier locus detection methods with a recently-developed landscape genetic approach. We analyzed 157 loci from samples of the alpine herb Gentiana nivalis collected across the European Alps. Principle coordinates of neighbor matrices (PCNM), eigenvectors that quantify multi...... variables identified eight more potentially adaptive loci than models run without spatial variables. 3) When compared to outlier detection methods, the landscape genetic approach detected four of the same loci plus 11 additional loci. 4) Temperature, precipitation, and solar radiation were the three major...

  10. Seasonal Adjustment with the R Packages x12 and x12GUI

    OpenAIRE

    Kowarik, Alexander; Meraner, Angelika; Templ, Matthias; Schopfhauser, Daniel

    2014-01-01

    The X-12-ARIMA seasonal adjustment program of the US Census Bureau extracts the different components (mainly: seasonal component, trend component, outlier component and irregular component) of a monthly or quarterly time series. It is the state-of-the- art technology for seasonal adjustment used in many statistical offices. It is possible to include a moving holiday effect, a trading day effect and user-defined regressors, and additionally incorporates automatic outlier detection. The procedu...

  11. Can the same edge-detection algorithm be applied to on-line and off-line analysis systems? Validation of a new cinefilm-based geometric coronary measurement software

    NARCIS (Netherlands)

    J. Haase (Jürgen); C. di Mario (Carlo); P.W.J.C. Serruys (Patrick); M.M.J.M. van der Linden (Mark); D.P. Foley (David); W.J. van der Giessen (Wim)

    1993-01-01

    textabstractIn the Cardiovascular Measurement System (CMS) the edge-detection algorithm, which was primarily designed for the Philips digital cardiac imaging system (DCI), is applied to cinefilms. Comparative validation of CMS and DCI was performed in vitro and in vivo with intracoronary insertion

  12. Infrared light sensor applied to early detection of tooth decay

    Science.gov (United States)

    Benjumea, Eberto; Espitia, José; Díaz, Leonardo; Torres, Cesar

    2017-08-01

    The approach dentistry to dental care is gradually shifting to a model focused on early detection and oral-disease prevention; one of the most important methods of prevention of tooth decay is opportune diagnosis of decay and reconstruction. The present study aimed to introduce a procedure for early diagnosis of tooth decay and to compare result of experiment of this method with other common treatments. In this setup, a laser emitting infrared light is injected in core of one bifurcated fiber-optic and conduced to tooth surface and with the same bifurcated fiber the radiation reflected for the same tooth is collected and them conduced to surface of sensor that measures thermal and light frequencies to detect early signs of decay below a tooth surface, where demineralization is difficult to spot with x-ray technology. This device will can be used to diagnose tooth decay without any chemicals and rays such as high power lasers or X-rays.

  13. The stopping rules for winsorized tree

    Science.gov (United States)

    Ch'ng, Chee Keong; Mahat, Nor Idayu

    2017-11-01

    Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.

  14. [Advances of NIR spectroscopy technology applied in seed quality detection].

    Science.gov (United States)

    Zhu, Li-wei; Ma, Wen-guang; Hu, Jin; Zheng, Yun-ye; Tian, Yi-xin; Guan, Ya-jing; Hu, Wei-min

    2015-02-01

    Near infrared spectroscopy (NIRS) technology developed fast in recent years, due to its rapid speed, less pollution, high-efficiency and other advantages. It has been widely used in many fields such as food, chemical industry, pharmacy, agriculture and so on. The seed is the most basic and important agricultural capital goods, and seed quality is important for agricultural production. Most methods presently used for seed quality detecting were destructive, slow and needed pretreatment, therefore, developing one kind of method that is simple and rapid has great significance for seed quality testing. This article reviewed the application and trends of NIRS technology in testing of seed constituents, vigor, disease and insect pests etc. For moisture, starch, protein, fatty acid and carotene content, the model identification rates were high as their relative contents were high; for trace organic, the identification rates were low as their relative content were low. The heat-damaged seeds with low vigor were discriminated by NIRS, the seeds stored for different time could also been identified. The discrimination of frost-damaged seeds was impossible. The NIRS could be used to identify health and infected disease seeds, and did the classification for the health degree; it could identify parts of the fungal pathogens. The NIRS could identify worm-eaten and health seeds, and further distinguished the insect species, however the identification effects for small larval and low injury level of insect pests was not good enough. Finally, in present paper existing problems and development trends for NIRS in seed quality detection was discussed, especially the single seed detecting technology which was characteristic of the seed industry, the standardization of its spectral acquisition accessories will greatly improve its applicability.

  15. Detecting New Pedestrian Facilities from VGI Data Sources

    Science.gov (United States)

    Zhong, S.; Xie, Z.

    2017-12-01

    Pedestrian facility (e.g. footbridge, pedestrian crossing and underground passage) information is an important basic data of location based service (LBS) for pedestrians. However, timely updating pedestrian facility information challenges due to facilities change frequently. Previous pedestrian facility information collecting and updating tasks are mainly completed by highly trained specialized persons. However, this conventional approach has several disadvantages such as high cost, long update cycle and so on. Volunteered Geographic Information (VGI) has proven efficiency to provide new, free and fast growing spatial data. Pedestrian trajectory, which can be seen as measurements of real pedestrian road, is one of the most valuable information of VGI data. Although the accuracy of the trajectories is not too high, due to the large number of measurements, an improvement of quality of the road information can be achieved. Thus, we develop a method for detecting new pedestrian facilities based on the current road network and pedestrian trajectories. Specifically, 1) by analyzing speed, distance and direction, those outliers of pedestrian trajectories are removed, 2) a road network matching algorithm is developed for eliminating redundant trajectories, and 3) a space-time cluster algorithm is adopted for detecting new walking facilities. The performance of the method is evaluated with a series of experiments conducted on a part of the road network of Heifei and a large number of real pedestrian trajectories, and verified the results by using Tencent Street map. The results show that the proposed method is able to detecting new pedestrian facilities from VGI data accurately. We believe that the proposed method provides an alternative way for general road data acquisition, and can improve the quality of LBS for pedestrians.

  16. Adaptive Framework for Classification and Novel Class Detection over Evolving Data Streams with Limited Labeled Data.

    Energy Technology Data Exchange (ETDEWEB)

    Haque, Ahsanul [Univ. of Texas, Dallas, TX (United States); Khan, Latifur [Univ. of Texas, Dallas, TX (United States); Baron, Michael [Univ. of Texas, Dallas, TX (United States); Ingram, Joey Burton [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-09-01

    Most approaches to classifying evolving data streams either divide the stream of data into fixed-size chunks or use gradual forgetting to address the problems of infinite length and concept drift. Finding the fixed size of the chunks or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. As a result, these approaches suffer from a trade-off between performance and sensitivity. To address this problem, we present a framework which uses change detection techniques on the classifier performance to determine chunk boundaries dynamically. Though this framework exhibits good performance, it is heavily dependent on the availability of true labels of data instances. However, labeled data instances are scarce in realistic settings and not readily available. Therefore, we present a second framework which is unsupervised in nature, and exploits change detection on classifier confidence values to determine chunk boundaries dynamically. In this way, it avoids the use of labeled data while still addressing the problems of infinite length and concept drift. Moreover, both of our proposed frameworks address the concept evolution problem by detecting outliers having similar values for the attributes. We provide theoretical proof that our change detection method works better than other state-of-the-art approaches in this particular scenario. Results from experiments on various benchmark and synthetic data sets also show the efficiency of our proposed frameworks.

  17. Detection of material property errors in handbooks and databases using artificial neural networks with hidden correlations

    Science.gov (United States)

    Zhang, Y. M.; Evans, J. R. G.; Yang, S. F.

    2010-11-01

    The authors have discovered a systematic, intelligent and potentially automatic method to detect errors in handbooks and stop their transmission using unrecognised relationships between materials properties. The scientific community relies on the veracity of scientific data in handbooks and databases, some of which have a long pedigree covering several decades. Although various outlier-detection procedures are employed to detect and, where appropriate, remove contaminated data, errors, which had not been discovered by established methods, were easily detected by our artificial neural network in tables of properties of the elements. We started using neural networks to discover unrecognised relationships between materials properties and quickly found that they were very good at finding inconsistencies in groups of data. They reveal variations from 10 to 900% in tables of property data for the elements and point out those that are most probably correct. Compared with the statistical method adopted by Ashby and co-workers [Proc. R. Soc. Lond. Ser. A 454 (1998) p. 1301, 1323], this method locates more inconsistencies and could be embedded in database software for automatic self-checking. We anticipate that our suggestion will be a starting point to deal with this basic problem that affects researchers in every field. The authors believe it may eventually moderate the current expectation that data field error rates will persist at between 1 and 5%.

  18. Color-based scale-invariant feature detection applied in robot vision

    Science.gov (United States)

    Gao, Jian; Huang, Xinhan; Peng, Gang; Wang, Min; Li, Xinde

    2007-11-01

    The scale-invariant feature detecting methods always require a lot of computation yet sometimes still fail to meet the real-time demands in robot vision fields. To solve the problem, a quick method for detecting interest points is presented. To decrease the computation time, the detector selects as interest points those whose scale normalized Laplacian values are the local extrema in the nonholonomic pyramid scale space. The descriptor is built with several subregions, whose width is proportional to the scale factor, and the coordinates of the descriptor are rotated in relation to the interest point orientation just like the SIFT descriptor. The eigenvector is computed in the original color image and the mean values of the normalized color g and b in each subregion are chosen to be the factors of the eigenvector. Compared with the SIFT descriptor, this descriptor's dimension has been reduced evidently, which can simplify the point matching process. The performance of the method is analyzed in theory in this paper and the experimental results have certified its validity too.

  19. Adaptive Statistical Iterative Reconstruction-Applied Ultra-Low-Dose CT with Radiography-Comparable Radiation Dose: Usefulness for Lung Nodule Detection.

    Science.gov (United States)

    Yoon, Hyun Jung; Chung, Myung Jin; Hwang, Hye Sun; Moon, Jung Won; Lee, Kyung Soo

    2015-01-01

    To assess the performance of adaptive statistical iterative reconstruction (ASIR)-applied ultra-low-dose CT (ULDCT) in detecting small lung nodules. Thirty patients underwent both ULDCT and standard dose CT (SCT). After determining the reference standard nodules, five observers, blinded to the reference standard reading results, independently evaluated SCT and both subsets of ASIR- and filtered back projection (FBP)-driven ULDCT images. Data assessed by observers were compared statistically. Converted effective doses in SCT and ULDCT were 2.81 ± 0.92 and 0.17 ± 0.02 mSv, respectively. A total of 114 lung nodules were detected on SCT as a standard reference. There was no statistically significant difference in sensitivity between ASIR-driven ULDCT and SCT for three out of the five observers (p = 0.678, 0.735, ASIR-driven ULDCT in three out of the five observers (p ASIR-driven ULDCT, and SCT were 0.682, 0.772, and 0.821, respectively, and there were no significant differences in FOM values between ASIR-driven ULDCT and SCT (p = 0.11), but the FOM value of FBP-driven ULDCT was significantly lower than that of ASIR-driven ULDCT and SCT (p = 0.01 and 0.00). Adaptive statistical iterative reconstruction-driven ULDCT delivering a radiation dose of only 0.17 mSv offers acceptable sensitivity in nodule detection compared with SCT and has better performance than FBP-driven ULDCT.

  20. Transmission and signal loss in mask designs for a dual neutron and gamma imager applied to mobile standoff detection

    International Nuclear Information System (INIS)

    Ayaz-Maierhafer, Birsen; Hayward, Jason P.; Ziock, Klaus P.; Blackston, Matthew A.; Fabris, Lorenzo

    2013-01-01

    In order to design a next-generation, dual neutron and gamma imager for mobile standoff detection which uses coded aperture imaging as its primary detection modality, the following design parameters have been investigated for gamma and neutron radiation incident upon a hybrid, coded mask: (1) transmission through mask elements for various mask materials and thicknesses; and (2) signal attenuation in the mask versus angle of incidence. Each of these parameters directly affects detection significance, as quantified by the signal-to-noise ratio. The hybrid mask consists of two or three layers: organic material for fast neutron attenuation and scattering, Cd for slow neutron absorption (if applied), and one of three of the following photon or photon and slow neutron attenuating materials—Linotype alloy, CLYC, or CZT. In the MCNP model, a line source of gamma rays (100–2500 keV), fast neutrons (1000–10,000 keV) or thermal neutrons was positioned above the hybrid mask. The radiation penetrating the mask was simply tallied at the surface of an ideal detector, which was located below the surface of the last mask layer. The transmission was calculated as the ratio of the particles transmitted through the fixed aperture to the particles passing through the closed mask. In order to determine the performance of the mask considering relative motion between the source and detector, simulations were used to calculate the signal attenuation for incident radiation angles of 0–50°. The results showed that a hybrid mask can be designed to sufficiently reduce both transmission through the mask and signal loss at large angles of incidence, considering both gamma ray and fast neutron radiations. With properly selected material thicknesses, the signal loss of a hybrid mask, which is necessarily thicker than the mask required for either single mode imaging, is not a setback to the system's detection significance

  1. Quality assurance tool for organ at risk delineation in radiation therapy using a parametric statistical approach.

    Science.gov (United States)

    Hui, Cheukkai B; Nourzadeh, Hamidreza; Watkins, William T; Trifiletti, Daniel M; Alonso, Clayton E; Dutta, Sunil W; Siebers, Jeffrey V

    2018-02-26

    To develop a quality assurance (QA) tool that identifies inaccurate organ at risk (OAR) delineations. The QA tool computed volumetric features from prior OAR delineation data from 73 thoracic patients to construct a reference database. All volumetric features of the OAR delineation are computed in three-dimensional space. Volumetric features of a new OAR are compared with respect to those in the reference database to discern delineation outliers. A multicriteria outlier detection system warns users of specific delineation outliers based on combinations of deviant features. Fifteen independent experimental sets including automatic, propagated, and clinically approved manual delineation sets were used for verification. The verification OARs included manipulations to mimic common errors. Three experts reviewed the experimental sets to identify and classify errors, first without; and then 1 week after with the QA tool. In the cohort of manual delineations with manual manipulations, the QA tool detected 94% of the mimicked errors. Overall, it detected 37% of the minor and 85% of the major errors. The QA tool improved reviewer error detection sensitivity from 61% to 68% for minor errors (P = 0.17), and from 78% to 87% for major errors (P = 0.02). The QA tool assists users to detect potential delineation errors. QA tool integration into clinical procedures may reduce the frequency of inaccurate OAR delineation, and potentially improve safety and quality of radiation treatment planning. © 2018 American Association of Physicists in Medicine.

  2. Cluster detection methods applied to the Upper Cape Cod cancer data

    Directory of Open Access Journals (Sweden)

    Ozonoff David

    2005-09-01

    Full Text Available Abstract Background A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. Methods We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. Results The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. Conclusion The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.

  3. Feature learning and change feature classification based on deep learning for ternary change detection in SAR images

    Science.gov (United States)

    Gong, Maoguo; Yang, Hailun; Zhang, Puzhao

    2017-07-01

    Ternary change detection aims to detect changes and group the changes into positive change and negative change. It is of great significance in the joint interpretation of spatial-temporal synthetic aperture radar images. In this study, sparse autoencoder, convolutional neural networks (CNN) and unsupervised clustering are combined to solve ternary change detection problem without any supervison. Firstly, sparse autoencoder is used to transform log-ratio difference image into a suitable feature space for extracting key changes and suppressing outliers and noise. And then the learned features are clustered into three classes, which are taken as the pseudo labels for training a CNN model as change feature classifier. The reliable training samples for CNN are selected from the feature maps learned by sparse autoencoder with certain selection rules. Having training samples and the corresponding pseudo labels, the CNN model can be trained by using back propagation with stochastic gradient descent. During its training procedure, CNN is driven to learn the concept of change, and more powerful model is established to distinguish different types of changes. Unlike the traditional methods, the proposed framework integrates the merits of sparse autoencoder and CNN to learn more robust difference representations and the concept of change for ternary change detection. Experimental results on real datasets validate the effectiveness and superiority of the proposed framework.

  4. Sources of Artefacts in Synthetic Aperture Radar Interferometry Data Sets

    Science.gov (United States)

    Becek, K.; Borkowski, A.

    2012-07-01

    In recent years, much attention has been devoted to digital elevation models (DEMs) produced using Synthetic Aperture Radar Interferometry (InSAR). This has been triggered by the relative novelty of the InSAR method and its world-famous product—the Shuttle Radar Topography Mission (SRTM) DEM. However, much less attention, if at all, has been paid to sources of artefacts in SRTM. In this work, we focus not on the missing pixels (null pixels) due to shadows or the layover effect, but rather on outliers that were undetected by the SRTM validation process. The aim of this study is to identify some of the causes of the elevation outliers in SRTM. Such knowledge may be helpful to mitigate similar problems in future InSAR DEMs, notably the ones currently being developed from data acquired by the TanDEM-X mission. We analysed many cross-sections derived from SRTM. These cross-sections were extracted over the elevation test areas, which are available from the Global Elevation Data Testing Facility (GEDTF) whose database contains about 8,500 runways with known vertical profiles. Whenever a significant discrepancy between the known runway profile and the SRTM cross-section was detected, a visual interpretation of the high-resolution satellite image was carried out to identify the objects causing the irregularities. A distance and a bearing from the outlier to the object were recorded. Moreover, we considered the SRTM look direction parameter. A comprehensive analysis of the acquired data allows us to establish that large metallic structures, such as hangars or car parking lots, are causing the outliers. Water areas or plain wet terrains may also cause an InSAR outlier. The look direction and the depression angle of the InSAR system in relation to the suspected objects influence the magnitude of the outliers. We hope that these findings will be helpful in designing the error detection routines of future InSAR or, in fact, any microwave aerial- or space-based survey. The

  5. SOURCES OF ARTEFACTS IN SYNTHETIC APERTURE RADAR INTERFEROMETRY DATA SETS

    Directory of Open Access Journals (Sweden)

    K. Becek

    2012-07-01

    Full Text Available In recent years, much attention has been devoted to digital elevation models (DEMs produced using Synthetic Aperture Radar Interferometry (InSAR. This has been triggered by the relative novelty of the InSAR method and its world-famous product—the Shuttle Radar Topography Mission (SRTM DEM. However, much less attention, if at all, has been paid to sources of artefacts in SRTM. In this work, we focus not on the missing pixels (null pixels due to shadows or the layover effect, but rather on outliers that were undetected by the SRTM validation process. The aim of this study is to identify some of the causes of the elevation outliers in SRTM. Such knowledge may be helpful to mitigate similar problems in future InSAR DEMs, notably the ones currently being developed from data acquired by the TanDEM-X mission. We analysed many cross-sections derived from SRTM. These cross-sections were extracted over the elevation test areas, which are available from the Global Elevation Data Testing Facility (GEDTF whose database contains about 8,500 runways with known vertical profiles. Whenever a significant discrepancy between the known runway profile and the SRTM cross-section was detected, a visual interpretation of the high-resolution satellite image was carried out to identify the objects causing the irregularities. A distance and a bearing from the outlier to the object were recorded. Moreover, we considered the SRTM look direction parameter. A comprehensive analysis of the acquired data allows us to establish that large metallic structures, such as hangars or car parking lots, are causing the outliers. Water areas or plain wet terrains may also cause an InSAR outlier. The look direction and the depression angle of the InSAR system in relation to the suspected objects influence the magnitude of the outliers. We hope that these findings will be helpful in designing the error detection routines of future InSAR or, in fact, any microwave aerial- or space

  6. A new HPLC method for the detection of iodine applied to natural samples of edible seaweeds and commercial seaweed food products.

    Science.gov (United States)

    Nitschke, Udo; Stengel, Dagmar B

    2015-04-01

    Rich in micronutrients and considered to contain high iodine levels, seaweeds have multiple applications as food/supplements and nutraceuticals with potential health implications. Here, we describe the development and validation of a new analytical method to quantify iodine as iodide (I(-)) using an isocratic HPLC system with UV detection; algal iodine was converted to I(-) via dry alkaline incineration. The method was successfully applied to 19 macroalgal species from three taxonomic groups and five commercially available seaweed food products. Fesh kelps contained highest levels, reaching >1.0% per dry weight (DW), but concentrations differed amongst thallus parts. In addition to kelps, other brown (Fucales: ∼ 0.05% DW) and some red species (∼ 0.05% DW) can also serve as a rich source of iodine; lowest iodine concentrations were detected in green macroalgae (∼ 0.005% DW), implying that quantities recommended for seaweed consumption may require species-specific re-evaluation to reach adequate daily intake levels. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons

    Directory of Open Access Journals (Sweden)

    Yuan Zhuang

    2016-04-01

    Full Text Available Indoor wireless localization using Bluetooth Low Energy (BLE beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM, channel-separate fingerprinting (FP, outlier detection and extended Kalman filtering (EKF for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target’s location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy. The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m, which performs 35.82% better than <3.99 m from the Propagation Model (PM + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m, the proposed algorithm achieves the accuracies of <3.88 m at

  8. Initiating statistical process control to improve quality outcomes in colorectal surgery.

    Science.gov (United States)

    Keller, Deborah S; Stulberg, Jonah J; Lawrence, Justin K; Samia, Hoda; Delaney, Conor P

    2015-12-01

    Unexpected variations in postoperative length of stay (LOS) negatively impact resources and patient outcomes. Statistical process control (SPC) measures performance, evaluates productivity, and modifies processes for optimal performance. The goal of this study was to initiate SPC to identify LOS outliers and evaluate its feasibility to improve outcomes in colorectal surgery. Review of a prospective database identified colorectal procedures performed by a single surgeon. Patients were grouped into elective and emergent categories and then stratified by laparoscopic and open approaches. All followed a standardized enhanced recovery protocol. SPC was applied to identify outliers and evaluate causes within each group. A total of 1294 cases were analyzed--83% elective (n = 1074) and 17% emergent (n = 220). Emergent cases were 70.5% open and 29.5% laparoscopic; elective cases were 36.8% open and 63.2% laparoscopic. All groups had a wide range in LOS. LOS outliers ranged from 8.6% (elective laparoscopic) to 10.8% (emergent laparoscopic). Evaluation of outliers demonstrated patient characteristics of higher ASA scores, longer operating times, ICU requirement, and temporary nursing at discharge. Outliers had higher postoperative complication rates in elective open (57.1 vs. 20.0%) and elective lap groups (77.6 vs. 26.1%). Outliers also had higher readmission rates for emergent open (11.4 vs. 5.4%), emergent lap (14.3 vs. 9.2%), and elective lap (32.8 vs. 6.9%). Elective open outliers did not follow trends of longer LOS or higher reoperation rates. SPC is feasible and promising for improving colorectal surgery outcomes. SPC identified patient and process characteristics associated with increased LOS. SPC may allow real-time outlier identification, during quality improvement efforts, and reevaluation of outcomes after introducing process change. SPC has clinical implications for improving patient outcomes and resource utilization.

  9. Artificial Intelligence Methods Applied to Parameter Detection of Atrial Fibrillation

    Science.gov (United States)

    Arotaritei, D.; Rotariu, C.

    2015-09-01

    In this paper we present a novel method to develop an atrial fibrillation (AF) based on statistical descriptors and hybrid neuro-fuzzy and crisp system. The inference of system produce rules of type if-then-else that care extracted to construct a binary decision system: normal of atrial fibrillation. We use TPR (Turning Point Ratio), SE (Shannon Entropy) and RMSSD (Root Mean Square of Successive Differences) along with a new descriptor, Teager- Kaiser energy, in order to improve the accuracy of detection. The descriptors are calculated over a sliding window that produce very large number of vectors (massive dataset) used by classifier. The length of window is a crisp descriptor meanwhile the rest of descriptors are interval-valued type. The parameters of hybrid system are adapted using Genetic Algorithm (GA) algorithm with fitness single objective target: highest values for sensibility and sensitivity. The rules are extracted and they are part of the decision system. The proposed method was tested using the Physionet MIT-BIH Atrial Fibrillation Database and the experimental results revealed a good accuracy of AF detection in terms of sensitivity and specificity (above 90%).

  10. Independent Laboratory for Detection of Irradiated Foods. Detection of the irradiated food in the INCT

    International Nuclear Information System (INIS)

    Stachowicz, W.

    2007-01-01

    Lecture shows different methods applied for detection of irradiated foods. Structure and equipment of the Independent Laboratory for Detection of Irradiated Foods operating in the INCT is described. Several examples of detection of food irradiation are given in details

  11. Quantum Dots Applied to Methodology on Detection of Pesticide and Veterinary Drug Residues.

    Science.gov (United States)

    Zhou, Jia-Wei; Zou, Xue-Mei; Song, Shang-Hong; Chen, Guan-Hua

    2018-02-14

    The pesticide and veterinary drug residues brought by large-scale agricultural production have become one of the issues in the fields of food safety and environmental ecological security. It is necessary to develop the rapid, sensitive, qualitative and quantitative methodology for the detection of pesticide and veterinary drug residues. As one of the achievements of nanoscience, quantum dots (QDs) have been widely used in the detection of pesticide and veterinary drug residues. In these methodology studies, the used QD-signal styles include fluorescence, chemiluminescence, electrochemical luminescence, photoelectrochemistry, etc. QDs can also be assembled into sensors with different materials, such as QD-enzyme, QD-antibody, QD-aptamer, and QD-molecularly imprinted polymer sensors, etc. Plenty of study achievements in the field of detection of pesticide and veterinary drug residues have been obtained from the different combinations among these signals and sensors. They are summarized in this paper to provide a reference for the QD application in the detection of pesticide and veterinary drug residues.

  12. "Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses".

    Science.gov (United States)

    Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H

    2008-02-01

    The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed

  13. Treatment on outliers in UBJ-SARIMA models for forecasting dengue cases on age groups not eligible for vaccination in Baguio City, Philippines

    Science.gov (United States)

    Magsakay, Clarenz B.; De Vera, Nora U.; Libatique, Criselda P.; Addawe, Rizavel C.; Addawe, Joel M.

    2017-11-01

    Dengue vaccination has become a breakthrough in the fight against dengue infection. This is however not applicable to all ages. Individuals from 0 to 8 years old and adults older than 45 years old remain susceptible to the vector-borne disease dengue. Forecasting future dengue cases accurately from susceptible age groups would aid in the efforts to prevent further increase in dengue infections. For the age groups of individuals not eligible for vaccination, the presence of outliers was observed and was treated using winsorization, square root, and logarithmic transformations to create a SARIMA model. The best model for the age group 0 to 8 years old was found to be ARIMA(13,1,0)(1,0,0)12 with 10 fixed variables using square root transformation with a 95% winsorization, and the best model for the age group older than 45 years old is ARIMA(7,1,0)(1,0,0)12 with 5 fixed variables using logarithmic transformation with 90% winsorization. These models are then used to forecast the monthly dengue cases for Baguio City for the age groups considered.

  14. A measurement-based fault detection approach applied to monitor robots swarm

    KAUST Repository

    Khaldi, Belkacem; Harrou, Fouzi; Sun, Ying; Cherif, Foudil

    2017-01-01

    present an innovative data-driven fault detection method for monitoring robots swarm. The method combines the flexibility of principal component analysis (PCA) models and the greater sensitivity of the exponentially-weighted moving average control chart

  15. Workshop applied antineutrino physics 2007

    Energy Technology Data Exchange (ETDEWEB)

    Akiri, T.; Andrieu, B.; Anjos, J.; Argyriades, J.; Barouch, G.; Bernstein, A.; Bersillon, O.; Besida, O.; Bowden, N.; Cabrera, A.; Calmet, D.; Collar, J.; Cribier, M.; Kerret, H. de; Meijer, R. de; Dudziak, F.; Enomoto, S.; Fallot, M.; Fioni, G.; Fiorentini, G.; Gale, Ph.; Georgadze, A.; Giot, L.; Gonin, M.; Guillon, B.; Henson, C.; Jonkmans, G.; Kanamaru, S.; Kawasaki, T.; Kornoukhov, V.; Lasserre, Th.; Learned, J.G.; Lefebvre, J.; Letourneau, A.; Lhillier, D.; Lindner, M.; Lund, J.; Mantovani, F.; Mcdonough, B.; Mention, G.; Monteith, A.; Motta, D.; Mueller, Th.; Oberauer, L.; Obolensky, M.; Odrzywolek, A.; Petcov, S.; Porta, A.; Queval, R.; Reinhold, B.; Reyna, D.; Ridikas, D.; Sadler, L.; Schoenert, St.; Sida, J.L.; Sinev, V.; Suekane, F.; Suvorov, Y.; Svoboda, R.; Tang, A.; Tolich, N.; Tolich, K.; Vanka, S.; Vignaud, D.; Volpe, Ch.; Wong, H

    2007-07-01

    The 'Applied Antineutrino Physics 2007' workshop is the fourth international meeting devoted to the opening of the neutrino physics to more applied fields, such as geophysics and geochemistry, nuclear industry, as well as the nonproliferation. This meeting highlights the world efforts already engaged to exploit the single characteristics of the neutrinos for the control of the production of plutonium in the civil nuclear power reactor. The potential industrial application of the measurement of the thermal power of the nuclear plants by the neutrinos is also approached. earth neutrinos were for the first time highlighted in 2002 by the KamLAND experiment. Several international efforts are currently underway to use earth neutrinos to reveal the interior of the Earth. This meeting is an opportunity to adapt the efforts of detection to the real needs of geophysicists and geochemists (sources of radiogenic heat, potassium in the court, feathers.) Finally more futuristic topics such as the detection of nuclear explosions, of low powers, are also discussed. This document gathers only the slides of the presentations.

  16. Workshop applied antineutrino physics 2007

    International Nuclear Information System (INIS)

    Akiri, T.; Andrieu, B.; Anjos, J.; Argyriades, J.; Barouch, G.; Bernstein, A.; Bersillon, O.; Besida, O.; Bowden, N.; Cabrera, A.; Calmet, D.; Collar, J.; Cribier, M.; Kerret, H. de; Meijer, R. de; Dudziak, F.; Enomoto, S.; Fallot, M.; Fioni, G.; Fiorentini, G.; Gale, Ph.; Georgadze, A.; Giot, L.; Gonin, M.; Guillon, B.; Henson, C.; Jonkmans, G.; Kanamaru, S.; Kawasaki, T.; Kornoukhov, V.; Lasserre, Th.; Learned, J.G.; Lefebvre, J.; Letourneau, A.; Lhillier, D.; Lindner, M.; Lund, J.; Mantovani, F.; Mcdonough, B.; Mention, G.; Monteith, A.; Motta, D.; Mueller, Th.; Oberauer, L.; Obolensky, M.; Odrzywolek, A.; Petcov, S.; Porta, A.; Queval, R.; Reinhold, B.; Reyna, D.; Ridikas, D.; Sadler, L.; Schoenert, St.; Sida, J.L.; Sinev, V.; Suekane, F.; Suvorov, Y.; Svoboda, R.; Tang, A.; Tolich, N.; Tolich, K.; Vanka, S.; Vignaud, D.; Volpe, Ch.; Wong, H.

    2007-01-01

    The 'Applied Antineutrino Physics 2007' workshop is the fourth international meeting devoted to the opening of the neutrino physics to more applied fields, such as geophysics and geochemistry, nuclear industry, as well as the nonproliferation. This meeting highlights the world efforts already engaged to exploit the single characteristics of the neutrinos for the control of the production of plutonium in the civil nuclear power reactor. The potential industrial application of the measurement of the thermal power of the nuclear plants by the neutrinos is also approached. earth neutrinos were for the first time highlighted in 2002 by the KamLAND experiment. Several international efforts are currently underway to use earth neutrinos to reveal the interior of the Earth. This meeting is an opportunity to adapt the efforts of detection to the real needs of geophysicists and geochemists (sources of radiogenic heat, potassium in the court, feathers.) Finally more futuristic topics such as the detection of nuclear explosions, of low powers, are also discussed. This document gathers only the slides of the presentations

  17. Recent developments in optical detection methods for microchip separations

    NARCIS (Netherlands)

    Götz, S.; Karst, U.

    2007-01-01

    This paper summarizes the features and performances of optical detection systems currently applied in order to monitor separations on microchip devices. Fluorescence detection, which delivers very high sensitivity and selectivity, is still the most widely applied method of detection. Instruments

  18. User Behavior Analytics

    Energy Technology Data Exchange (ETDEWEB)

    Turcotte, Melissa [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Moore, Juston Shane [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-02-28

    User Behaviour Analytics is the tracking, collecting and assessing of user data and activities. The goal is to detect misuse of user credentials by developing models for the normal behaviour of user credentials within a computer network and detect outliers with respect to their baseline.

  19. Detection of latent fingerprints using high-resolution 3D confocal microscopy in non-planar acquisition scenarios

    Science.gov (United States)

    Kirst, Stefan; Vielhauer, Claus

    2015-03-01

    In digitized forensics the support of investigators in any manner is one of the main goals. Using conservative lifting methods, the detection of traces is done manually. For non-destructive contactless methods, the necessity for detecting traces is obvious for further biometric analysis. High resolutional 3D confocal laser scanning microscopy (CLSM) grants the possibility for a detection by segmentation approach with improved detection results. Optimal scan results with CLSM are achieved on surfaces orthogonal to the sensor, which is not always possible due to environmental circumstances or the surface's shape. This introduces additional noise, outliers and a lack of contrast, making a detection of traces even harder. Prior work showed the possibility of determining angle-independent classification models for the detection of latent fingerprints (LFP). Enhancing this approach, we introduce a larger feature space containing a variety of statistical-, roughness-, color-, edge-directivity-, histogram-, Gabor-, gradient- and Tamura features based on raw data and gray-level co-occurrence matrices (GLCM) using high resolutional data. Our test set consists of eight different surfaces for the detection of LFP in four different acquisition angles with a total of 1920 single scans. For each surface and angles in steps of 10, we capture samples from five donors to introduce variance by a variety of sweat compositions and application influences such as pressure or differences in ridge thickness. By analyzing the present test set with our approach, we intend to determine angle- and substrate-dependent classification models to determine optimal surface specific acquisition setups and also classification models for a general detection purpose for both, angles and substrates. The results on overall models with classification rates up to 75.15% (kappa 0.50) already show a positive tendency regarding the usability of the proposed methods for LFP detection on varying surfaces in non

  20. Design and analysis of experiments in the health sciences

    National Research Council Canada - National Science Library

    Van Belle, Gerald; Kerr, Kathleen F

    2012-01-01

    .... Since actual datasets are employed, users deal with real-life modeling issues and situations such as handling missing values, applying variable transformations, and addressing outliers, among others...

  1. Attribute and topology based change detection in a constellation of previously detected objects

    Science.gov (United States)

    Paglieroni, David W.; Beer, Reginald N.

    2016-01-19

    A system that applies attribute and topology based change detection to networks of objects that were detected on previous scans of a structure, roadway, or area of interest. The attributes capture properties or characteristics of the previously detected objects, such as location, time of detection, size, elongation, orientation, etc. The topology of the network of previously detected objects is maintained in a constellation database that stores attributes of previously detected objects and implicitly captures the geometrical structure of the network. A change detection system detects change by comparing the attributes and topology of new objects detected on the latest scan to the constellation database of previously detected objects.

  2. Applying the J-optimal channelized quadratic observer to SPECT myocardial perfusion defect detection

    Science.gov (United States)

    Kupinski, Meredith K.; Clarkson, Eric; Ghaly, Michael; Frey, Eric C.

    2016-03-01

    To evaluate performance on a perfusion defect detection task from 540 image pairs of myocardial perfusion SPECT image data we apply the J-optimal channelized quadratic observer (J-CQO). We compare AUC values of the linear Hotelling observer and J-CQO when the defect location is fixed and when it occurs in one of two locations. As expected, when the location is fixed a single channels maximizes AUC; location variability requires multiple channels to maximize the AUC. The AUC is estimated from both the projection data and reconstructed images. J-CQO is quadratic since it uses the first- and second- order statistics of the image data from both classes. The linear data reduction by the channels is described by an L x M channel matrix and in prior work we introduced an iterative gradient-based method for calculating the channel matrix. The dimensionality reduction from M measurements to L channels yields better estimates of these sample statistics from smaller sample sizes, and since the channelized covariance matrix is L x L instead of M x M, the matrix inverse is easier to compute. The novelty of our approach is the use of Jeffrey's divergence (J) as the figure of merit (FOM) for optimizing the channel matrix. We previously showed that the J-optimal channels are also the optimum channels for the AUC and the Bhattacharyya distance when the channel outputs are Gaussian distributed with equal means. This work evaluates the use of J as a surrogate FOM (SFOM) for AUC when these statistical conditions are not satisfied.

  3. STREAMED VERTICAL RECTANGLE DETECTION IN TERRESTRIAL LASER SCANS FOR FACADE DATABASE PRODUCTION

    Directory of Open Access Journals (Sweden)

    J. Demantké

    2012-07-01

    Full Text Available A reliable and accurate facade database would be a major asset in applications such as localization of autonomous vehicles, registration and fine building modeling. Mobile mapping devices now provide the data required to create such a database, but efficient methods should be designed in order to tackle the enormous amount of data collected by such means (a million point per second for hours of acquisition. Another important limitation is the presence of numerous objects in urban scenes of many different types. This paper proposes a method that overcomes these two issues: – The facade detection algorithm is streamed: the data is processed in the order it was acquired. More precisely, the input data is split into overlapping blocks which are analysed in turn to extract facade parts. Close overlapping parts are then merged in order to recover the full facade rectangle. – The geometry of the neighborhood of each point is analysed to define a probability that the point belongs to a vertical planar patch. This probability is then injected in a RANdom SAmple Consensus (RANSAC algorithm both in the sampling step and in the hypothesis validation, in order to favour the most reliable candidates. This ensures much more robustness against outliers during the facade detection. This way, the main vertical rectangles are detected without any prior knowledge about the data. The only assumptions are that the facades are roughly planar and vertical. The method has been successfully tested on a large dataset in Paris. The facades are detected despite the presence of trees occluding large areas of some facades. The robustness and accuracy of the detected facade rectangles makes them useful for localization applications and for registration of other scans of the same city or of entire city models.

  4. A SVM-based quantitative fMRI method for resting-state functional network detection.

    Science.gov (United States)

    Song, Xiaomu; Chen, Nan-kuei

    2014-09-01

    Resting-state functional magnetic resonance imaging (fMRI) aims to measure baseline neuronal connectivity independent of specific functional tasks and to capture changes in the connectivity due to neurological diseases. Most existing network detection methods rely on a fixed threshold to identify functionally connected voxels under the resting state. Due to fMRI non-stationarity, the threshold cannot adapt to variation of data characteristics across sessions and subjects, and generates unreliable mapping results. In this study, a new method is presented for resting-state fMRI data analysis. Specifically, the resting-state network mapping is formulated as an outlier detection process that is implemented using one-class support vector machine (SVM). The results are refined by using a spatial-feature domain prototype selection method and two-class SVM reclassification. The final decision on each voxel is made by comparing its probabilities of functionally connected and unconnected instead of a threshold. Multiple features for resting-state analysis were extracted and examined using an SVM-based feature selection method, and the most representative features were identified. The proposed method was evaluated using synthetic and experimental fMRI data. A comparison study was also performed with independent component analysis (ICA) and correlation analysis. The experimental results show that the proposed method can provide comparable or better network detection performance than ICA and correlation analysis. The method is potentially applicable to various resting-state quantitative fMRI studies. Copyright © 2014 Elsevier Inc. All rights reserved.

  5. Robust w-Estimators for Cryo-EM Class Means

    Science.gov (United States)

    Huang, Chenxi; Tagare, Hemant D.

    2016-01-01

    A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the “class mean”, improves the signal-to-noise ratio in single particle reconstruction (SPR). The averaging step is often compromised because of outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods is done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a “w-estimator” of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions (CTFs) is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers. PMID:26841397

  6. Crack detecting method

    International Nuclear Information System (INIS)

    Narita, Michiko; Aida, Shigekazu

    1998-01-01

    A penetration liquid or a slow drying penetration liquid prepared by mixing a penetration liquid and a slow drying liquid is filled to the inside of an artificial crack formed to a member to be detected such as of boiler power generation facilities and nuclear power facilities. A developing liquid is applied to the periphery of the artificial crack on the surface of a member to be detected. As the slow-drying liquid, an oil having a viscosity of 56 is preferably used. Loads are applied repeatedly to the member to be detected, and when a crack is caused to the artificial crack, the permeation liquid penetrates into the crack. The penetration liquid penetrated into the crack is developed by the developing liquid previously coated to the periphery of the artificial crack of the surface of the member to be detected. When a crack is caused, since the crack is developed clearly even if it is a small opening, the crack can be recognized visually reliably. (I.N.)

  7. Detection and segmentation of virus plaque using HOG and SVM: toward automatic plaque assay.

    Science.gov (United States)

    Mao, Yihao; Liu, Hong; Ye, Rong; Shi, Yonghong; Song, Zhijian

    2014-01-01

    Plaque assaying, measurement of the number, diameter, and area of plaques in a Petri dish image, is a standard procedure gauging the concentration of phage in biology. This paper presented a novel and effective method for implementing automatic plaque assaying. The method was mainly comprised of the following steps: In the training stage, after pre-processing the images for noise suppression, an initial training set was readied by sampling positive (with a plaque at the center) and negative (plaque-free) patches from the training images, and extracting the HOG features from each patch. The linear SVM classifier was trained in a self-learnt supervised learning strategy to avoid possible missing detection. Specifically, the training set which contained positive and negative patches sampled manually from training images was used to train the preliminary classifier which exhaustively searched the training images to predict the label for the unlabeled patches. The mislabeled patches were evaluated by experts and relabeled. And all the newly labeled patches and their corresponding HOG features were added to the initial training set to train the final classifier. In the testing stage, a sliding-window technique was first applied to the unseen image for obtaining HOG features, which were inputted into the classifier to predict whether the patch was positive. Second, a locally adaptive Otsu method was performed on the positive patches to segment the plaques. Finally, after removing the outliers, the parameters of the plaques were measured in the segmented plaques. The experimental results demonstrated that the accuracy of the proposed method was similar to the one measured manually by experts, but it took less than 30 seconds.

  8. Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.

    Science.gov (United States)

    Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

    2017-08-08

    Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.

  9. Applying Lean-Six-Sigma Methodology in radiotherapy: Lessons learned by the breast daily repositioning case.

    Science.gov (United States)

    Mancosu, Pietro; Nicolini, Giorgia; Goretti, Giulia; De Rose, Fiorenza; Franceschini, Davide; Ferrari, Chiara; Reggiori, Giacomo; Tomatis, Stefano; Scorsetti, Marta

    2018-03-06

    Lean Six Sigma Methodology (LSSM) was introduced in industry to provide near-perfect services to large processes, by reducing improbable occurrence. LSSM has been applied to redesign the 2D-2D breast repositioning process (Lean) by the retrospective analysis of the database (Six Sigma). Breast patients with daily 2D-2D matching before RT were considered. The five DMAIC (define, measure, analyze, improve, and control) LSSM steps were applied. The process was retrospectively measured over 30 months (7/2014-12/2016) by querying the RT Record&Verify database. Two Lean instruments (Poka-Yoke and Visual Management) were considered for advancing the process. The new procedure was checked over 6 months (1-6/2017). 14,931 consecutive shifts from 1342 patients were analyzed. Only 0.8% of patients presented median shifts >1 cm. The major observed discrepancy was the monthly percentage of fractions with almost zero shifts (AZS = 13.2% ± 6.1%). Ishikawa fishbone diagram helped in defining the main discrepancy con-causes. Procedure harmonization involving a multidisciplinary team to increase confidence in matching procedure was defined. AZS was reduced to 4.8% ± 0.6%. Furthermore, distribution symmetry improvement (Skewness moved from 1.4 to 1.1) and outlier reduction, verified by Kurtosis diminution, demonstrated a better "normalization" of the procedure after the LSSM application. LSSM was implemented in a RT department, allowing to redesign the breast repositioning matching procedure. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Improving the Accuracy of Cloud Detection Using Machine Learning

    Science.gov (United States)

    Craddock, M. E.; Alliss, R. J.; Mason, M.

    2017-12-01

    Cloud detection from geostationary satellite imagery has long been accomplished through multi-spectral channel differencing in comparison to the Earth's surface. The distinction of clear/cloud is then determined by comparing these differences to empirical thresholds. Using this methodology, the probability of detecting clouds exceeds 90% but performance varies seasonally, regionally and temporally. The Cloud Mask Generator (CMG) database developed under this effort, consists of 20 years of 4 km, 15minute clear/cloud images based on GOES data over CONUS and Hawaii. The algorithms to determine cloudy pixels in the imagery are based on well-known multi-spectral techniques and defined thresholds. These thresholds were produced by manually studying thousands of images and thousands of man-hours to determine the success and failure of the algorithms to fine tune the thresholds. This study aims to investigate the potential of improving cloud detection by using Random Forest (RF) ensemble classification. RF is the ideal methodology to employ for cloud detection as it runs efficiently on large datasets, is robust to outliers and noise and is able to deal with highly correlated predictors, such as multi-spectral satellite imagery. The RF code was developed using Python in about 4 weeks. The region of focus selected was Hawaii and includes the use of visible and infrared imagery, topography and multi-spectral image products as predictors. The development of the cloud detection technique is realized in three steps. First, tuning of the RF models is completed to identify the optimal values of the number of trees and number of predictors to employ for both day and night scenes. Second, the RF models are trained using the optimal number of trees and a select number of random predictors identified during the tuning phase. Lastly, the model is used to predict clouds for an independent time period than used during training and compared to truth, the CMG cloud mask. Initial results

  11. GIS applied to location of fires detection towers in domain area of tropical forest.

    Science.gov (United States)

    Eugenio, Fernando Coelho; Rosa Dos Santos, Alexandre; Fiedler, Nilton Cesar; Ribeiro, Guido Assunção; da Silva, Aderbal Gomes; Juvanhol, Ronie Silva; Schettino, Vitor Roberto; Marcatti, Gustavo Eduardo; Domingues, Getúlio Fonseca; Alves Dos Santos, Gleissy Mary Amaral Dino; Pezzopane, José Eduardo Macedo; Pedra, Beatriz Duguy; Banhos, Aureo; Martins, Lima Deleon

    2016-08-15

    In most countries, the loss of biodiversity caused by the fires is worrying. In this sense, the fires detection towers are crucial for rapid identification of fire outbreaks and can also be used in environmental inspection, biodiversity monitoring, telecommunications mechanisms, telemetry and others. Currently the methodologies for allocating fire detection towers over large areas are numerous, complex and non-standardized by government supervisory agencies. Therefore, this study proposes and evaluates different methodologies to best location of points to install fire detection towers considering the topography, risk areas, conservation units and heat spots. Were used Geographic Information Systems (GIS) techniques and unaligned stratified systematic sampling for implementing and evaluating 9 methods for allocating fire detection towers. Among the methods evaluated, the C3 method was chosen, represented by 140 fire detection towers, with coverage of: a) 67% of the study area, b) 73.97% of the areas with high risk, c) 70.41% of the areas with very high risk, d) 70.42% of the conservation units and e) 84.95% of the heat spots in 2014. The proposed methodology can be adapted to areas of other countries. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Clarifying the Hubble constant tension with a Bayesian hierarchical model of the local distance ladder

    Science.gov (United States)

    Feeney, Stephen M.; Mortlock, Daniel J.; Dalmasso, Niccolò

    2018-05-01

    Estimates of the Hubble constant, H0, from the local distance ladder and from the cosmic microwave background (CMB) are discrepant at the ˜3σ level, indicating a potential issue with the standard Λ cold dark matter (ΛCDM) cosmology. A probabilistic (i.e. Bayesian) interpretation of this tension requires a model comparison calculation, which in turn depends strongly on the tails of the H0 likelihoods. Evaluating the tails of the local H0 likelihood requires the use of non-Gaussian distributions to faithfully represent anchor likelihoods and outliers, and simultaneous fitting of the complete distance-ladder data set to ensure correct uncertainty propagation. We have hence developed a Bayesian hierarchical model of the full distance ladder that does not rely on Gaussian distributions and allows outliers to be modelled without arbitrary data cuts. Marginalizing over the full ˜3000-parameter joint posterior distribution, we find H0 = (72.72 ± 1.67) km s-1 Mpc-1 when applied to the outlier-cleaned Riess et al. data, and (73.15 ± 1.78) km s-1 Mpc-1 with supernova outliers reintroduced (the pre-cut Cepheid data set is not available). Using our precise evaluation of the tails of the H0 likelihood, we apply Bayesian model comparison to assess the evidence for deviation from ΛCDM given the distance-ladder and CMB data. The odds against ΛCDM are at worst ˜10:1 when considering the Planck 2015 XIII data, regardless of outlier treatment, considerably less dramatic than naïvely implied by the 2.8σ discrepancy. These odds become ˜60:1 when an approximation to the more-discrepant Planck Intermediate XLVI likelihood is included.

  13. Making of a solar spectral irradiance dataset I: observations, uncertainties, and methods

    Directory of Open Access Journals (Sweden)

    Schöll Micha

    2016-01-01

    Full Text Available Context. Changes in the spectral solar irradiance (SSI are a key driver of the variability of the Earth’s environment, strongly affecting the upper atmosphere, but also impacting climate. However, its measurements have been sparse and of different quality. The “First European Comprehensive Solar Irradiance Data Exploitation project” (SOLID aims at merging the complete set of European irradiance data, complemented by archive data that include data from non-European missions. Aims. As part of SOLID, we present all available space-based SSI measurements, reference spectra, and relevant proxies in a unified format with regular temporal re-gridding, interpolation, gap-filling as well as associated uncertainty estimations. Methods. We apply a coherent methodology to all available SSI datasets. Our pipeline approach consists of the pre-processing of the data, the interpolation of missing data by utilizing the spectral coherency of SSI, the temporal re-gridding of the data, an instrumental outlier detection routine, and a proxy-based interpolation for missing and flagged values. In particular, to detect instrumental outliers, we combine an autoregressive model with proxy data. We independently estimate the precision and stability of each individual dataset and flag all changes due to processing in an accompanying quality mask. Results. We present a unified database of solar activity records with accompanying meta-data and uncertainties. Conclusions. This dataset can be used for further investigations of the long-term trend of solar activity and the construction of a homogeneous SSI record.

  14. Sensor Fusion of Position- and Micro-Sensors (MEMS) integrated in a Wireless Sensor Network for movement detection in landslide areas

    Science.gov (United States)

    Arnhardt, Christian; Fernández-Steeger, Tomas; Azzam, Rafig

    2010-05-01

    Monitoring systems in landslide areas are important elements of effective Early Warning structures. Data acquisition and retrieval allows the detection of movement processes and thus is essential to generate warnings in time. Apart from the precise measurement, the reliability of data is fundamental, because outliers can trigger false alarms and leads to the loss of acceptance of such systems. For the monitoring of mass movements and their risk it is important to know, if there is movement, how fast it is and how trustworthy is the information. The joint project "Sensorbased landslide early warning system" (SLEWS) deals with these questions, and tries to improve data quality and to reduce false alarm rates, due to the combination of sensor date (sensor fusion). The project concentrates on the development of a prototypic Alarm- and Early Warning system (EWS) for different types of landslides by using various low-cost sensors, integrated in a wireless sensor network (WSN). The network consists of numerous connection points (nodes) that transfer data directly or over other nodes (Multi-Hop) in real-time to a data collection point (gateway). From there all the data packages are transmitted to a spatial data infrastructure (SDI) for further processing, analyzing and visualizing with respect to end-user specifications. The ad-hoc characteristic of the network allows the autonomous crosslinking of the nodes according to existing connections and communication strength. Due to the independent finding of new or more stable connections (self healing) a breakdown of the whole system is avoided. The bidirectional data stream enables the receiving of data from the network but also allows the transfer of commands and pointed requests into the WSN. For the detection of surface deformations in landslide areas small low-cost Micro-Electro-Mechanical-Systems (MEMS) and positionsensors from the automobile industries, different industrial applications and from other measurement

  15. Automated chromatographic system with polarimetric detection laser applied in the control of fermentation processes and seaweed extracts characterization

    International Nuclear Information System (INIS)

    Fajer, V.; Naranjo, S.; Mora, W.; Patinno, R.; Coba, E.; Michelena, G.

    2012-01-01

    There are presented applications and innovations of chromatographic and polarimetric systems in which develop methodologies for measuring the input molasses and the resulting product of a fermentation process of alcohol from a rich honey and evaluation of the fermentation process honey servery in obtaining a drink native to the Yucatan region. Composition was assessed optically active substances in seaweed, of interest to the pharmaceutical industry. The findings provide measurements alternative raw materials and products of the sugar industry, beekeeping and pharmaceutical liquid chromatography with automated polarimetric detection reduces measurement times up to 15 min, making it comparable to the times of high chromatography resolution, significantly reducing operating costs. By chromatography system with polarimetric detection (SCDP) is new columns have included standard size designed by the authors, which allow process samples with volumes up to 1 ml and reduce measurement time to 15 min, decreasing to 5 times the volume sample and halving the time of measurement. Was evaluated determining the concentration of substances using the peaks of the chromatograms obtained for the different columns and calculate the uncertainty of measurements. The results relating to the improvement of a data acquisition program (ADQUIPOL v.2.0) and new programs for the preparation of chromatograms (CROMAPOL CROMAPOL V.1.0 and V.1.2) provide important benefits, which allow a considerable saving of time the processing of the results and can be applied in other chromatography systems with the appropriate adjustments. (Author)

  16. The design method and research status of vehicle detection system based on geomagnetic detection principle

    Science.gov (United States)

    Lin, Y. H.; Bai, R.; Qian, Z. H.

    2018-03-01

    Vehicle detection systems are applied to obtain real-time information of vehicles, realize traffic control and reduce traffic pressure. This paper reviews geomagnetic sensors as well as the research status of the vehicle detection system. Presented in the paper are also our work on the vehicle detection system, including detection algorithms and experimental results. It is found that the GMR based vehicle detection system has a detection accuracy up to 98% with a high potential for application in the road traffic control area.

  17. Accuracy of Travel Time Estimation using Bluetooth Technology

    DEFF Research Database (Denmark)

    Araghi, Bahar Namaki; Skoven Pedersen, Kristian; Tørholm Christensen, Lars

    2012-01-01

    Short-term travel time information plays a critical role in Advanced Traffic Information Systems (ATIS) and Advanced Traffic Management Systems (ATMS). In this context, the need for accurate and reliable travel time information sources is becoming increasingly important. Bluetooth Technology (BT......) has been used as a relatively new cost-effective source of travel time estimation. However, due to low sampling rate of BT compared to other sensor technologies, existence of outliers may significantly affect the accuracy and reliability of the travel time estimates obtained using BT. In this study......, the concept of outliers and corresponding impacts on travel time accuracy are discussed. Four different estimators named Min-BT, Max-BT, Med-BT and Avg-BT with different outlier detection logic are presented in this paper. These methods are used to estimate travel times using a BT derived dataset. In order...

  18. Registration for Optical Multimodal Remote Sensing Images Based on FAST Detection, Window Selection, and Histogram Specification

    Directory of Open Access Journals (Sweden)

    Xiaoyang Zhao

    2018-04-01

    Full Text Available In recent years, digital frame cameras have been increasingly used for remote sensing applications. However, it is always a challenge to align or register images captured with different cameras or different imaging sensor units. In this research, a novel registration method was proposed. Coarse registration was first applied to approximately align the sensed and reference images. Window selection was then used to reduce the search space and a histogram specification was applied to optimize the grayscale similarity between the images. After comparisons with other commonly-used detectors, the fast corner detector, FAST (Features from Accelerated Segment Test, was selected to extract the feature points. The matching point pairs were then detected between the images, the outliers were eliminated, and geometric transformation was performed. The appropriate window size was searched and set to one-tenth of the image width. The images that were acquired by a two-camera system, a camera with five imaging sensors, and a camera with replaceable filters mounted on a manned aircraft, an unmanned aerial vehicle, and a ground-based platform, respectively, were used to evaluate the performance of the proposed method. The image analysis results showed that, through the appropriate window selection and histogram specification, the number of correctly matched point pairs had increased by 11.30 times, and that the correct matching rate had increased by 36%, compared with the results based on FAST alone. The root mean square error (RMSE in the x and y directions was generally within 0.5 pixels. In comparison with the binary robust invariant scalable keypoints (BRISK, curvature scale space (CSS, Harris, speed up robust features (SURF, and commercial software ERDAS and ENVI, this method resulted in larger numbers of correct matching pairs and smaller, more consistent RMSE. Furthermore, it was not necessary to choose any tie control points manually before registration

  19. Near-real time 3D probabilistic earthquakes locations at Mt. Etna volcano

    Science.gov (United States)

    Barberi, G.; D'Agostino, M.; Mostaccio, A.; Patane', D.; Tuve', T.

    2012-04-01

    Automatic procedure for locating earthquake in quasi-real time must provide a good estimation of earthquakes location within a few seconds after the event is first detected and is strongly needed for seismic warning system. The reliability of an automatic location algorithm is influenced by several factors such as errors in picking seismic phases, network geometry, and velocity model uncertainties. On Mt. Etna, the seismic network is managed by INGV and the quasi-real time earthquakes locations are performed by using an automatic-picking algorithm based on short-term-average to long-term-average ratios (STA/LTA) calculated from an approximate squared envelope function of the seismogram, which furnish a list of P-wave arrival times, and the location algorithm Hypoellipse, with a 1D velocity model. The main purpose of this work is to investigate the performances of a different automatic procedure to improve the quasi-real time earthquakes locations. In fact, as the automatic data processing may be affected by outliers (wrong picks), the use of a traditional earthquake location techniques based on a least-square misfit function (L2-norm) often yield unstable and unreliable solutions. Moreover, on Mt. Etna, the 1D model is often unable to represent the complex structure of the volcano (in particular the strong lateral heterogeneities), whereas the increasing accuracy in the 3D velocity models at Mt. Etna during recent years allows their use today in routine earthquake locations. Therefore, we selected, as reference locations, all the events occurred on Mt. Etna in the last year (2011) which was automatically detected and located by means of the Hypoellipse code. By using this dataset (more than 300 events), we applied a nonlinear probabilistic earthquake location algorithm using the Equal Differential Time (EDT) likelihood function, (Font et al., 2004; Lomax, 2005) which is much more robust in the presence of outliers in the data. Successively, by using a probabilistic

  20. Landscape genomics and biased FST approaches reveal single nucleotide polymorphisms under selection in goat breeds of North-East Mediterranean

    Directory of Open Access Journals (Sweden)

    Joost Stephane

    2009-02-01

    Full Text Available Abstract Background In this study we compare outlier loci detected using a FST based method with those identified by a recently described method based on spatial analysis (SAM. We tested a panel of single nucleotide polymorphisms (SNPs previously genotyped in individuals of goat breeds of southern areas of the Mediterranean basin (Italy, Greece and Albania. We evaluate how the SAM method performs with SNPs, which are increasingly employed due to their high number, low cost and easy of scoring. Results The combined use of the two outlier detection approaches, never tested before using SNP polymorphisms, resulted in the identification of the same three loci involved in milk and meat quality data by using the two methods, while the FST based method identified 3 more loci as under selection sweep in the breeds examined. Conclusion Data appear congruent by using the two methods for FST values exceeding the 99% confidence limits. The methods of FST and SAM can independently detect signatures of selection and therefore can reduce the probability of finding false positives if employed together. The outlier loci identified in this study could indicate adaptive variation in the analysed species, characterized by a large range of climatic conditions in the rearing areas and by a history of intense trade, that implies plasticity in adapting to new environments.