Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Statistical interpretation of geochemical data
Carambula, M.
1990-01-01
Statistical results have been obtained from a geochemical research from the following four aerial photographies Zapican, Carape, Las Canias, Alferez. They have been studied 3020 samples in total, to 22 chemical elements using plasma emission spectrometry methods.
Statistics and Data Interpretation for Social Work
Rosenthal, James
2011-01-01
"Without question, this text will be the most authoritative source of information on statistics in the human services. From my point of view, it is a definitive work that combines a rigorous pedagogy with a down to earth (commonsense) exploration of the complex and difficult issues in data analysis (statistics) and interpretation. I welcome its publication.". -Praise for the First Edition. Written by a social worker for social work students, this is a nuts and bolts guide to statistics that presents complex calculations and concepts in clear, easy-to-understand language. It includes
Statistical transformation and the interpretation of inpatient glucose control data.
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-03-01
To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.
Theoretical, analytical, and statistical interpretation of environmental data
Lombard, S.M.
1974-01-01
The reliability of data from radiochemical analyses of environmental samples cannot be determined from nuclear counting statistics alone. The rigorous application of the principles of propagation of errors, an understanding of the physics and chemistry of the species of interest in the environment, and the application of information from research on the analytical procedure are all necessary for a valid estimation of the errors associated with analytical results. The specific case of the determination of plutonium in soil is considered in terms of analytical problems and data reliability. (U.S.)
A statistical model for interpreting computerized dynamic posturography data
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Statistical Literacy: High School Students in Reading, Interpreting and Presenting Data
Hafiyusholeh, M.; Budayasa, K.; Siswono, T. Y. E.
2018-01-01
One of the foundations for high school students in statistics is to be able to read data; presents data in the form of tables and diagrams and its interpretation. The purpose of this study is to describe high school students’ competencies in reading, interpreting and presenting data. Subjects were consisted of male and female students who had high levels of mathematical ability. Collecting data was done in form of task formulation which is analyzed by reducing, presenting and verifying data. Results showed that the students read the data based on explicit explanations on the diagram, such as explaining the points in the diagram as the relation between the x and y axis and determining the simple trend of a graph, including the maximum and minimum point. In interpreting and summarizing the data, both subjects pay attention to general data trends and use them to predict increases or decreases in data. The male estimates the value of the (n+1) of weight data by using the modus of the data, while the females estimate the weigth by using the average. The male tend to do not consider the characteristics of the data, while the female more carefully consider the characteristics of data.
Misuse of statistics in the interpretation of data on low-level radiation
International Nuclear Information System (INIS)
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds
Misuse of statistics in the interpretation of data on low-level radiation
Hamilton, L.D.
1982-01-01
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-05-01
Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box-Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. © 2014 Diabetes Technology Society.
Elżbieta Biernat
2014-12-01
Full Text Available Background: The aim of this paper is to assess whether basic descriptive statistics is sufficient to interpret the data on physical activity of Poles within occupational domain of life. Material and Methods: The study group consisted of 964 randomly selected Polish working professionals. The long version of the International Physical Activity Questionnaire (IPAQ was used. Descriptive statistics included characteristics of variables using: mean (M, median (Me, maximal and minimal values (max–min., standard deviation (SD and percentile values. Statistical inference was based on the comparison of variables with the significance level of 0.05 (Kruskal-Wallis and Pearson’s Chi2 tests. Results: Occupational physical activity (OPA was declared by 46.4% of respondents (vigorous – 23.5%, moderate – 30.2%, walking – 39.5%. The total OPA amounted to 2751.1 MET-min/week (Metabolic Equivalent of Task with very high standard deviation (SD = 5302.8 and max = 35 511 MET-min/week. It concerned different types of activities. Approximately 10% (90th percentile overstated the average. However, there was no significant difference depended on the character of the profession, or the type of activity. The average time of sitting was 256 min/day. As many as 39% of the respondents met the World Health Organization standards only due to OPA (42.5% of white-collar workers, 38% of administrative and technical employees and only 37.9% of physical workers. Conclusions: In the data analysis it is necessary to define quantiles to provide a fuller picture of the distributions of OPA in MET-min/week. It is also crucial to update the guidelines for data processing and analysis of long version of IPAQ. It seems that 16 h of activity/day is not a sufficient criterion for excluding the results from further analysis. Med Pr 2014;65(6:743–753
Statistics translated a step-by-step guide to analyzing and interpreting data
Terrell, Steven R
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying independent and dependent variables, and selecting and interpreting appropriate statistical tests. All techniques are demonstrated both manually and with the help of SPSS software. The book provides students and others who may need to read and interpret sta
Application of Statistical Tools for Data Analysis and Interpretation in Rice Plant Pathology
Parsuram Nayak
2018-01-01
Full Text Available There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
The statistical interpretations of counting data from measurements of low-level radioactivity
International Nuclear Information System (INIS)
Donn, J.J.; Wolke, R.L.
1977-01-01
The statistical model appropriate to measurements of low-level or background-dominant radioactivity is examined and the derived relationships are applied to two practical problems involving hypothesis testing: 'Does the sample exhibit a net activity above background' and 'Is the activity of the sample below some preselected limit'. In each of these cases, the appropriate decision rule is formulated, procedures are developed for estimating the preset count which is necessary to achieve a desired probability of detection, and a specific sequence of operations is provided for the worker in the field. (author)
Podorozhnyi, D.M.; Postnikov, E.B.; Sveshnikova, L.G.; Turundaevsky, A.N.
2005-01-01
A multivariate statistical procedure for solving problems of estimating physical parameters on the basis of data from measurements with multichannel equipment is described. Within the multivariate procedure, an algorithm is constructed for estimating the energy of primary cosmic rays and the exponent in their power-law spectrum. They are investigated by using the KLEM spectrometer (NUCLEON project) as a specific example of measuring equipment. The results of computer experiments simulating the operation of the multivariate procedure for this equipment are given, the proposed approach being compared in these experiments with the one-parameter approach presently used in data processing
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures
Data Interpretation: Using Probability
Drummond, Gordon B.; Vowler, Sarah L.
2011-01-01
Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…
Nash, J. Thomas; Frishman, David
1983-01-01
Analytical results for 61 elements in 370 samples from the Ranger Mine area are reported. Most of the rocks come from drill core in the Ranger No. 1 and Ranger No. 3 deposits, but 20 samples are from unmineralized drill core more than 1 km from ore. Statistical tests show that the elements Mg, Fe, F, Be, Co, Li, Ni, Pb, Sc, Th, Ti, V, CI, As, Br, Au, Ce, Dy, La Sc, Eu, Tb, Yb, and Tb have positive association with uranium, and Si, Ca, Na, K, Sr, Ba, Ce, and Cs have negative association. For most lithologic subsets Mg, Fe, Li, Cr, Ni, Pb, V, Y, Sm, Sc, Eu, and Yb are significantly enriched in ore-bearing rocks, whereas Ca, Na, K, Sr, Ba, Mn, Ce, and Cs are significantly depleted. These results are consistent with petrographic observations on altered rocks. Lithogeochemistry can aid exploration, but for these rocks requires methods that are expensive and not amenable to routine use.
Does environmental data collection need statistics?
Pulles, M.P.J.
1998-01-01
The term 'statistics' with reference to environmental science and policymaking might mean different things: the development of statistical methodology, the methodology developed by statisticians to interpret and analyse such data, or the statistical data that are needed to understand environmental
Statistical data analysis handbook
National Research Council Canada - National Science Library
Wall, Francis J
1986-01-01
It must be emphasized that this is not a text book on statistics. Instead it is a working tool that presents data analysis in clear, concise terms which can be readily understood even by those without formal training in statistics
Tadaki, Kohtaro
2010-01-01
The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T is an element of (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.
Interpreting Data: The Hybrid Mind
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
The Statistical Interpretation of Entropy: An Activity
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
Advanced statistics to improve the physical interpretation of atomization processes
Panão, Miguel R.O.; Radu, Lucian
2013-01-01
Highlights: ► Finite pdf mixtures improves physical interpretation of sprays. ► Bayesian approach using MCMC algorithm is used to find the best finite mixture. ► Statistical method identifies multiple droplet clusters in a spray. ► Multiple drop clusters eventually associated with multiple atomization mechanisms. ► Spray described by drop size distribution and not only its moments. -- Abstract: This paper reports an analysis of the physics of atomization processes using advanced statistical tools. Namely, finite mixtures of probability density functions, which best fitting is found using a Bayesian approach based on a Markov chain Monte Carlo (MCMC) algorithm. This approach takes into account eventual multimodality and heterogeneities in drop size distributions. Therefore, it provides information about the complete probability density function of multimodal drop size distributions and allows the identification of subgroups in the heterogeneous data. This allows improving the physical interpretation of atomization processes. Moreover, it also overcomes the limitations induced by analyzing the spray droplets characteristics through moments alone, particularly, the hindering of different natures of droplet formation. Finally, the method is applied to physically interpret a case-study based on multijet atomization processes
UN Data- Environmental Statistics: Waste
Beginning statistics with data analysis
Mosteller, Frederick; Rourke, Robert EK
2013-01-01
This introduction to the world of statistics covers exploratory data analysis, methods for collecting data, formal statistical inference, and techniques of regression and analysis of variance. 1983 edition.
Baseline Statistics of Linked Statistical Data
Scharnhorst, Andrea; Meroño-Peñuela, Albert; Guéret, Christophe
2014-01-01
We are surrounded by an ever increasing ocean of information, everybody will agree to that. We build sophisticated strategies to govern this information: design data models, develop infrastructures for data sharing, building tool for data analysis. Statistical datasets curated by National
Combinatorial interpretation of Haldane-Wu fractional exclusion statistics.
Aringazin, A K; Mazhitov, M I
2002-08-01
Assuming that the maximal allowed number of identical particles in a state is an integer parameter, q, we derive the statistical weight and analyze the associated equation that defines the statistical distribution. The derived distribution covers Fermi-Dirac and Bose-Einstein ones in the particular cases q=1 and q--> infinity (n(i)/q-->1), respectively. We show that the derived statistical weight provides a natural combinatorial interpretation of Haldane-Wu fractional exclusion statistics, and present exact solutions of the distribution equation.
Data analysis and interpretation for environmental surveillance
1992-06-01
The Data Analysis and Interpretation for Environmental Surveillance Conference was held in Lexington, Kentucky, February 5--7, 1990. The conference was sponsored by what is now the Office of Environmental Compliance and Documentation, Oak Ridge National Laboratory. Participants included technical professionals from all Martin Marietta Energy Systems facilities, Westinghouse Materials Company of Ohio, Pacific Northwest Laboratory, and several technical support contractors. Presentations at the conference ranged the full spectrum of issues that effect the analysis and interpretation of environmental data. Topics included tracking systems for samples and schedules associated with ongoing programs; coalescing data from a variety of sources and pedigrees into integrated data bases; methods for evaluating the quality of environmental data through empirical estimates of parameters such as charge balance, pH, and specific conductance; statistical applications to the interpretation of environmental information; and uses of environmental information in risk and dose assessments. Hearing about and discussing this wide variety of topics provided an opportunity to capture the subtlety of each discipline and to appreciate the continuity that is required among the disciplines in order to perform high-quality environmental information analysis
Hahn, A.A.
1994-11-01
The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Energy statistical data. Europe
2002-04-01
This report summarizes in a series of tables the key energy data of 1999 for 9 European countries (Germany, Belgium, Denmark, Spain, France, Italy, Netherlands, UK, Sweden). Data concern: the energy intensity, the share of renewable energy sources in the total primary consumption, the structure of power production, the CO 2 emissions and their structure, and the end-use, primary consumption and energy prices per energy source. (J.S.)
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Data Literacy is Statistical Literacy
Gould, Robert
2017-01-01
Past definitions of statistical literacy should be updated in order to account for the greatly amplified role that data now play in our lives. Experience working with high-school students in an innovative data science curriculum has shown that teaching statistical literacy, augmented by data literacy, can begin early.
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
Official statistics and Big Data
Peter Struijs
2014-07-01
Full Text Available The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.
Pattern recognition approach to data interpretation
Wolff, Diane D; Parsons, M. L
1983-01-01
An attempt is made in this book to give scientists a detailed working knowledge of the powerful mathematical tools available to aid in data interpretation, especially when confronted with large data
Interpretation of thorium bioassay data
Juliao, L.M.Q.C.; Azeredo, A.M.G.F.; Santos, M.S.; Melo, D.R.; Dantas, B.M.; Lipsztein, J.L.
1994-01-01
A comparison have been made between bioassay data of thorium-exposed workers from two different facilities. The first of these facilities is a monazite sand extraction plant. Isotopic equilibrium between 232 Th and 238 Th was not observed in excreta samples of these workers. The second facility is a gas mantle factory. An isotopic equilibrium between 232 Th and 228 Th was observed in extra samples. Whole body counter measurements have indicated a very low intake of thorium through inhalation. As the concentration of thorium in feces was very high it was concluded that the main pathway of entrance of the nuclide was ingestion, mainly via contamination through dirty hands. The comparison between the bioassay results of workers from the two facilities shows that the lack of Th isotopic equilibrium observed in the excretion from the workers at the monazite sand plant possibly occurred due to an additional Th intake by ingestion of contaminated fresh food. This is presumably because 228 Ra is more efficiently taken up from the soil by plants, in comparison to 228 Th or 232 Th, and subsequently, 228 Th grows in from its immediate parent, 228 Ra. (author) 5 refs.; 3 tabs
Statistical analysis and data management
Anon.
1981-01-01
This report provides an overview of the history of the WIPP Biology Program. The recommendations of the American Institute of Biological Sciences (AIBS) for the WIPP biology program are summarized. The data sets available for statistical analyses and problems associated with these data sets are also summarized. Biological studies base maps are presented. A statistical model is presented to evaluate any correlation between climatological data and small mammal captures. No statistically significant relationship between variance in small mammal captures on Dr. Gennaro's 90m x 90m grid and precipitation records from the Duval Potash Mine were found
Statistical Methods for Fuzzy Data
Viertl, Reinhard
2011-01-01
Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
Structural interpretation of seismic data and inherent uncertainties
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
Statistical analysis of environmental data
Beauchamp, J.J.; Bowman, K.O.; Miller, F.L. Jr.
1975-10-01
This report summarizes the analyses of data obtained by the Radiological Hygiene Branch of the Tennessee Valley Authority from samples taken around the Browns Ferry Nuclear Plant located in Northern Alabama. The data collection was begun in 1968 and a wide variety of types of samples have been gathered on a regular basis. The statistical analysis of environmental data involving very low-levels of radioactivity is discussed. Applications of computer calculations for data processing are described
Vocational students' learning preferences: the interpretability of ipsative data.
Smith, P J
2000-02-01
A number of researchers have argued that ipsative data are not suitable for statistical procedures designed for normative data. Others have argued that the interpretability of such analyses of ipsative data are little affected where the number of variables and the sample size are sufficiently large. The research reported here represents a factor analysis of the scores on the Canfield Learning Styles Inventory for 1,252 students in vocational education. The results of the factor analysis of these ipsative data were examined in a context of existing theory and research on vocational students and lend support to the argument that the factor analysis of ipsative data can provide sensibly interpretable results.
Analysis of Visual Interpretation of Satellite Data
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
ANALYSIS OF VISUAL INTERPRETATION OF SATELLITE DATA
H. Svatonova
2016-06-01
Full Text Available Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape and b to selected characteristics of users (expertise, gender, age. The results of the research showed that (1 false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2 colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour increases the success rate of identifying the element (3 experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4 men and women are equally successful in the interpretation of visual image data.
Robust statistics and geochemical data analysis
Di, Z.
1987-01-01
Advantages of robust procedures over ordinary least-squares procedures in geochemical data analysis is demonstrated using NURE data from the Hot Springs Quadrangle, South Dakota, USA. Robust principal components analysis with 5% multivariate trimming successfully guarded the analysis against perturbations by outliers and increased the number of interpretable factors. Regression with SINE estimates significantly increased the goodness-of-fit of the regression and improved the correspondence of delineated anomalies with known uranium prospects. Because of the ubiquitous existence of outliers in geochemical data, robust statistical procedures are suggested as routine procedures to replace ordinary least-squares procedures
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Application of descriptive statistics in analysis of experimental data
Mirilović Milorad; Pejin Ivana
2008-01-01
Statistics today represent a group of scientific methods for the quantitative and qualitative investigation of variations in mass appearances. In fact, statistics present a group of methods that are used for the accumulation, analysis, presentation and interpretation of data necessary for reaching certain conclusions. Statistical analysis is divided into descriptive statistical analysis and inferential statistics. The values which represent the results of an experiment, and which are the subj...
Alternative interpretations of statistics on health effects of low-level radiation
International Nuclear Information System (INIS)
Hamilton, L.D.
1983-01-01
Four examples of the interpretation of statistics of data on low-level radiation are reviewed: (a) genetic effects of the atomic bombs at Hiroshima and Nagasaki, (b) cancer at Rocky Flats, (c) childhood leukemia and fallout in Utah, and (d) cancer among workers at the Portsmouth Naval Shipyard. Aggregation of data, adjustment for age, and other problems related to the determination of health effects of low-level radiation are discussed. Troublesome issues related to post hoc analysis are considered
Spatial Statistical Data Fusion (SSDF)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
Statistical processing of technological and radiochemical data
Lahodova, Zdena; Vonkova, Kateřina
2011-01-01
The project described in this article had two goals. The main goal was to compare technological and radiochemical data from two units of nuclear power plant. The other goal was to check the collection, organization and interpretation of routinely measured data. Monitoring of analytical and radiochemical data is a very valuable source of knowledge for some processes in the primary circuit. Exploratory analysis of one-dimensional data was performed to estimate location and variability and to find extreme values, data trends, distribution, autocorrelation etc. This process allowed for the cleaning and completion of raw data. Then multiple analyses such as multiple comparisons, multiple correlation, variance analysis, and so on were performed. Measured data was organized into a data matrix. The results and graphs such as Box plots, Mahalanobis distance, Biplot, Correlation, and Trend graphs are presented in this article as statistical analysis tools. Tables of data were replaced with graphs because graphs condense large amounts of information into easy-to-understand formats. The significant conclusion of this work is that the collection and comprehension of data is a very substantial part of statistical processing. With well-prepared and well-understood data, its accurate evaluation is possible. Cooperation between the technicians who collect data and the statistician who processes it is also very important. (author)
Statistical Data Editing in Scientific Articles.
Habibzadeh, Farrokh
2017-07-01
Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.
Probabilistic interpretation of data a physicist's approach
Miller, Guthrie
2013-01-01
This book is a physicists approach to interpretation of data using Markov Chain Monte Carlo (MCMC). The concepts are derived from first principles using a style of mathematics that quickly elucidates the basic ideas, sometimes with the aid of examples. Probabilistic data interpretation is a straightforward problem involving conditional probability. A prior probability distribution is essential, and examples are given. In this small book (200 pages) the reader is led from the most basic concepts of mathematical probability all the way to parallel processing algorithms for Markov Chain Monte Carlo. Fortran source code (for eigenvalue analysis of finite discrete Markov Chains, for MCMC, and for nonlinear least squares) is included with the supplementary material for this book (available online).
Interpretable Categorization of Heterogeneous Time Series Data
Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua
2017-01-01
We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.
Data Systems and Reports as Active Participants in Data Interpretation
Rankin, Jenny Grant
2016-01-01
Most data-informed decision-making in education is undermined by flawed interpretations. Educator-driven interventions to improve data use are beneficial but not omnipotent, as data misunderstandings persist at schools and school districts commended for ideal data use support. Meanwhile, most data systems and reports display figures without…
Statistical processing of experimental data
NAVRÁTIL, Pavel
2012-01-01
This thesis contains theory of probability and statistical sets. Solved and unsolved problems of probability, random variable and distributions random variable, random vector, statistical sets, regression and correlation analysis. Unsolved problems contains solutions.
Marviken test-data interpretation, second project
International Nuclear Information System (INIS)
Collen, J.; Johansson, A.
1978-12-01
A brief description is given of the investigations carried out and the corclusions drawn within the MARTIN-II project, which involved the evaluation and interpretation of the data from the full scale containment response tests at the Marviken Power Station. The data from the tests, which were completed in 1976, provide information about the periodic pressure oscillations and rapid pressure spikes induced in the pressure-suppression containment during study comprise the following items: - Influence of test parameters on pressure oscillations and pressure spikes - Pressure spikes in the wetwell pool - High frequency oscillations - Comparisons between single-pipe and multi-pipe data The study was carried out by Studsvik Energiteknik AB with consulting efforts from AB ASEA-ATOM. It was financed by the Swedish Nuclear Power Inspectorate. (Auth.)
Farrell, Mary Beth
2018-06-01
This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being
Interpreting Statistical Findings A Guide For Health Professionals And Students
Walker, Jan
2010-01-01
This book is aimed at those studying and working in the field of health care, including nurses and the professions allied to medicine, who have little prior knowledge of statistics but for whom critical review of research is an essential skill.
Systems Analysis for Interpretation of Phosphoproteomics Data
Munk, Stephanie; Refsgaard, Jan C; Olsen, Jesper V
2016-01-01
Global phosphoproteomics investigations yield overwhelming datasets with up to tens of thousands of quantified phosphosites. The main challenge after acquiring such large-scale data is to extract the biological meaning and relate this to the experimental question at hand. Systems level analysis...... provides the best means for extracting functional insights from such types of datasets, and this has primed a rapid development of bioinformatics tools and resources over the last decade. Many of these tools are specialized databases that can be mined for annotation and pathway enrichment, whereas others...... provide a platform to generate functional protein networks and explore the relations between proteins of interest. The use of these tools requires careful consideration with regard to the input data, and the interpretation demands a critical approach. This chapter provides a summary of the most...
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Statistical analysis of management data
Gatignon, Hubert
2013-01-01
This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
Laterally constrained inversion for CSAMT data interpretation
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
Statistical data analysis using SAS intermediate statistical methods
Marasinghe, Mervyn G
2018-01-01
The aim of this textbook (previously titled SAS for Data Analytics) is to teach the use of SAS for statistical analysis of data for advanced undergraduate and graduate students in statistics, data science, and disciplines involving analyzing data. The book begins with an introduction beyond the basics of SAS, illustrated with non-trivial, real-world, worked examples. It proceeds to SAS programming and applications, SAS graphics, statistical analysis of regression models, analysis of variance models, analysis of variance with random and mixed effects models, and then takes the discussion beyond regression and analysis of variance to conclude. Pedagogically, the authors introduce theory and methodological basis topic by topic, present a problem as an application, followed by a SAS analysis of the data provided and a discussion of results. The text focuses on applied statistical problems and methods. Key features include: end of chapter exercises, downloadable SAS code and data sets, and advanced material suitab...
The interpretation of quantitative microbial data
DEFF Research Database (Denmark)
Ribeiro Duarte, Ana Sofia
, there are several distribution alternatives available to describe concentrations and several methods to fit distributions to bacterial data; on the other hand predictive models are built based on controlled laboratory experiments of microbial behaviour, andmay not be appropriate to apply in the context of real food...... zeroes as censored below a quantification threshold. The method that is presented estimates the prevalence of contamination within a food lot and the parameters (mean and standard deviation)characterizing the within-lot distribution of concentrations, without assuming a LOQ, and using raw plate count....... Perspectives of future work include the validation of the method developed in manuscript I with real data, and its presentation as a tool made available to the scientific community by developing, for example, a working package for the statistical software R. Also, the author expects that a standardized way...
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
HistFitter software framework for statistical data analysis
Baak, M.; Côte, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-01-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fitted to data and interpreted with statistical tests. A key innovation of HistFitter is its design, which is rooted in core analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its very fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with mu...
Handbook of univariate and multivariate data analysis and interpretation with SPSS
Ho, Robert
2006-01-01
Many statistics texts tend to focus more on the theory and mathematics underlying statistical tests than on their applications and interpretation. This can leave readers with little understanding of how to apply statistical tests or how to interpret their findings. While the SPSS statistical software has done much to alleviate the frustrations of social science professionals and students who must analyze data, they still face daunting challenges in selecting the proper tests, executing the tests, and interpreting the test results.With emphasis firmly on such practical matters, this handbook
Distributed data collection for a database of radiological image interpretations
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of
Statistical methods for astronomical data analysis
Chattopadhyay, Asis Kumar
2014-01-01
This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
Interpretation of the results of statistical measurements. [search for basic probability model
Olshevskiy, V. V.
1973-01-01
For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.
Shafieloo, Arman
2012-01-01
By introducing Crossing functions and hyper-parameters I show that the Bayesian interpretation of the Crossing Statistics [1] can be used trivially for the purpose of model selection among cosmological models. In this approach to falsify a cosmological model there is no need to compare it with other models or assume any particular form of parametrization for the cosmological quantities like luminosity distance, Hubble parameter or equation of state of dark energy. Instead, hyper-parameters of Crossing functions perform as discriminators between correct and wrong models. Using this approach one can falsify any assumed cosmological model without putting priors on the underlying actual model of the universe and its parameters, hence the issue of dark energy parametrization is resolved. It will be also shown that the sensitivity of the method to the intrinsic dispersion of the data is small that is another important characteristic of the method in testing cosmological models dealing with data with high uncertainties
Systematic interpretation of microarray data using experiment annotations
Frohme Marcus
2006-12-01
Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.
Statistical Literacy: Data Tell a Story
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
Braun, Stefan; Pokorná, Šárka; Šachl, Radek; Hof, Martin; Heerklotz, Heiko; Hoernke, Maria
2018-01-23
The mode of action of membrane-active molecules, such as antimicrobial, anticancer, cell penetrating, and fusion peptides and their synthetic mimics, transfection agents, drug permeation enhancers, and biological signaling molecules (e.g., quorum sensing), involves either the general or local destabilization of the target membrane or the formation of defined, rather stable pores. Some effects aim at killing the cell, while others need to be limited in space and time to avoid serious damage. Biological tests reveal translocation of compounds and cell death but do not provide a detailed, mechanistic, and quantitative understanding of the modes of action and their molecular basis. Model membrane studies of membrane leakage have been used for decades to tackle this issue, but their interpretation in terms of biology has remained challenging and often quite limited. Here we compare two recent, powerful protocols to study model membrane leakage: the microscopic detection of dye influx into giant liposomes and time-correlated single photon counting experiments to characterize dye efflux from large unilamellar vesicles. A statistical treatment of both data sets does not only harmonize apparent discrepancies but also makes us aware of principal issues that have been confusing the interpretation of model membrane leakage data so far. Moreover, our study reveals a fundamental difference between nano- and microscale systems that needs to be taken into account when conclusions about microscale objects, such as cells, are drawn from nanoscale models.
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
Particle interpretations of the PVLAS data
International Nuclear Information System (INIS)
Ringwald, A.
2007-04-01
Recently the PVLAS collaboration reported the observation of a rotation of linearly polarized laser light induced by a transverse magnetic field - a signal being unexpected within standard QED. In this review, we emphasize two mechanisms which have been proposed to explain this result: production of a single light neutral spin-zero particle or pair production of light minicharged particles. We discuss a class of models, involving, in addition to our familiar ''visible'' photon, further light ''hidden paraphotons'', which mix kinematically with the visible one, and further light paracharged particles. In these models, very strong astrophysical and cosmological bounds on the weakly interacting light particles mentioned above can be evaded. In the upcoming year, a number of decisive laboratory based tests of the particle interpretation of the PVLAS anomaly will be done. More generally, such experiments, exploiting high fluxes of low-energy photons and/or large electromagnetic fields, will dig into previously unconstrained parameter space of the above mentioned models. (orig.)
Interpretation of lunar heat flow data
International Nuclear Information System (INIS)
Conel, J.E.; Morton, J.B.
1975-01-01
Lunar heat flow observations at the Apollo 15 and 17 sites can be interpreted to imply bulk U concentrations for the Moon of 5 to 8 times those of normal chondrites and 2 to 4 times terrestrial values inferred from the Earth's heat flow and the assumption of thermal steady state between surface heat flow and heat production. A simple model of nearsurface structure that takes into account the large difference in (highly insulating) regolith thickness between mare and highland provinces is considered. This model predicts atypically high local values of heat flow near the margins of mare regions--possibly a factor of 10 or so higher than the global average. A test of the proposed model using multifrequency microwave techniques appears possible wherein heat flow traverse measurements are made across mare-highland contacts. The theoretical considerations discussed here urge caution in attributing global significance to point heat-flow measurements on the Moon
Statistical Challenges in "Big Data" Human Neuroimaging.
Smith, Stephen M; Nichols, Thomas E
2018-01-17
Smith and Nichols discuss "big data" human neuroimaging studies, with very large subject numbers and amounts of data. These studies provide great opportunities for making new discoveries about the brain but raise many new analytical challenges and interpretational risks. Copyright © 2017 Elsevier Inc. All rights reserved.
Collecting operational event data for statistical analysis
Atwood, C.L.
1994-09-01
This report gives guidance for collecting operational data to be used for statistical analysis, especially analysis of event counts. It discusses how to define the purpose of the study, the unit (system, component, etc.) to be studied, events to be counted, and demand or exposure time. Examples are given of classification systems for events in the data sources. A checklist summarizes the essential steps in data collection for statistical analysis
Particle interpretations of the PVLAS data
Ringwald, A.
2007-04-15
Recently the PVLAS collaboration reported the observation of a rotation of linearly polarized laser light induced by a transverse magnetic field - a signal being unexpected within standard QED. In this review, we emphasize two mechanisms which have been proposed to explain this result: production of a single light neutral spin-zero particle or pair production of light minicharged particles. We discuss a class of models, involving, in addition to our familiar ''visible'' photon, further light ''hidden paraphotons'', which mix kinematically with the visible one, and further light paracharged particles. In these models, very strong astrophysical and cosmological bounds on the weakly interacting light particles mentioned above can be evaded. In the upcoming year, a number of decisive laboratory based tests of the particle interpretation of the PVLAS anomaly will be done. More generally, such experiments, exploiting high fluxes of low-energy photons and/or large electromagnetic fields, will dig into previously unconstrained parameter space of the above mentioned models. (orig.)
Statistical data filtration in neutron coincidence counting
International Nuclear Information System (INIS)
Beddingfield, D.H.; Menlove, H.O.
1992-11-01
We assessed the effectiveness of statistical data filtration to minimize the contribution of matrix materials in 200-ell drums to the nondestructive assay of plutonium. Those matrices were examined: polyethylene, concrete, aluminum, iron, cadmium, and lead. Statistical filtration of neutron coincidence data improved the low-end sensitivity of coincidence counters. Spurious data arising from electrical noise, matrix spallation, and geometric effects were smoothed in a predictable fashion by the statistical filter. The filter effectively lowers the minimum detectable mass limit that can be achieved for plutonium assay using passive neutron coincidence counting
Statistical data fusion for cross-tabulation
Kamakura, W.A.; Wedel, M.
The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of
Statistical Literacy in the Data Science Workplace
Grant, Robert
2017-01-01
Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…
Gerrits, Reinie G; Kringos, Dionne S; van den Berg, Michael J; Klazinga, Niek S
2018-03-07
Policy-makers, managers, scientists, patients and the general public are confronted daily with figures on health and healthcare through public reporting in newspapers, webpages and press releases. However, information on the key characteristics of these figures necessary for their correct interpretation is often not adequately communicated, which can lead to misinterpretation and misinformed decision-making. The objective of this research was to map the key characteristics relevant to the interpretation of figures on health and healthcare, and to develop a Figure Interpretation Assessment Tool-Health (FIAT-Health) through which figures on health and healthcare can be systematically assessed, allowing for a better interpretation of these figures. The abovementioned key characteristics of figures on health and healthcare were identified through systematic expert consultations in the Netherlands on four topic categories of figures, namely morbidity, healthcare expenditure, healthcare outcomes and lifestyle. The identified characteristics were used as a frame for the development of the FIAT-Health. Development of the tool and its content was supported and validated through regular review by a sounding board of potential users. Identified characteristics relevant for the interpretation of figures in the four categories relate to the figures' origin, credibility, expression, subject matter, population and geographical focus, time period, and underlying data collection methods. The characteristics were translated into a set of 13 dichotomous and 4-point Likert scale questions constituting the FIAT-Health, and two final assessment statements. Users of the FIAT-Health were provided with a summary overview of their answers to support a final assessment of the correctness of a figure and the appropriateness of its reporting. FIAT-Health can support policy-makers, managers, scientists, patients and the general public to systematically assess the quality of publicly reported
An Optimization Framework for Travel Pattern Interpretation of Cellular Data
Sarit Freund
2013-09-01
This paper explores methods for identifying travel patterns from cellular data. A primary challenge in this research is to provide an interpretation of the raw data that distinguishes between activity durations and travel durations. A novel framework is proposed for this purpose, based on a grading scheme for candidate interpretations of the raw data. A genetic algorithm is used to find interpretations with high grades, which are considered as the most reasonable ones. The proposed method is tested on a dataset of records covering 9454 cell-phone users over a period of one week. Preliminary evaluation of the resulting interpretations is presented.
Workshop statistics discovery with data and Minitab
Rossman, Allan J
1998-01-01
Shorn of all subtlety and led naked out of the protec tive fold of educational research literature, there comes a sheepish little fact: lectures don't work nearly as well as many of us would like to think. -George Cobb (1992) This book contains activities that guide students to discover statistical concepts, explore statistical principles, and apply statistical techniques. Students work toward these goals through the analysis of genuine data and through inter action with one another, with their instructor, and with technology. Providing a one-semester introduction to fundamental ideas of statistics for college and advanced high school students, Warkshop Statistics is designed for courses that employ an interactive learning environment by replacing lectures with hands on activities. The text contains enough expository material to stand alone, but it can also be used to supplement a more traditional textbook. Some distinguishing features of Workshop Statistics are its emphases on active learning, conceptu
Topology for statistical modeling of petascale data.
Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
Autonomic Differentiation Map: A Novel Statistical Tool for Interpretation of Heart Rate Variability
Directory of Open Access Journals (Sweden)
Daniela Lucini
2018-04-01
Full Text Available In spite of the large body of evidence suggesting Heart Rate Variability (HRV alone or combined with blood pressure variability (providing an estimate of baroreflex gain as a useful technique to assess the autonomic regulation of the cardiovascular system, there is still an ongoing debate about methodology, interpretation, and clinical applications. In the present investigation, we hypothesize that non-parametric and multivariate exploratory statistical manipulation of HRV data could provide a novel informational tool useful to differentiate normal controls from clinical groups, such as athletes, or subjects affected by obesity, hypertension, or stress. With a data-driven protocol in 1,352 ambulant subjects, we compute HRV and baroreflex indices from short-term data series as proxies of autonomic (ANS regulation. We apply a three-step statistical procedure, by first removing age and gender effects. Subsequently, by factor analysis, we extract four ANS latent domains that detain the large majority of information (86.94%, subdivided in oscillatory (40.84%, amplitude (18.04%, pressure (16.48%, and pulse domains (11.58%. Finally, we test the overall capacity to differentiate clinical groups vs. control. To give more practical value and improve readability, statistical results concerning individual discriminant ANS proxies and ANS differentiation profiles are displayed through peculiar graphical tools, i.e., significance diagram and ANS differentiation map, respectively. This approach, which simultaneously uses all available information about the system, shows what domains make up the difference in ANS discrimination. e.g., athletes differ from controls in all domains, but with a graded strength: maximal in the (normalized oscillatory and in the pulse domains, slightly less in the pressure domain and minimal in the amplitude domain. The application of multiple (non-parametric and exploratory statistical and graphical tools to ANS proxies defines
Basic statistical tools in research and data analysis
Directory of Open Access Journals (Sweden)
Zulfiqar Ali
2016-01-01
Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
Telemetry Boards Interpret Rocket, Airplane Engine Data
2009-01-01
For all the data gathered by the space shuttle while in orbit, NASA engineers are just as concerned about the information it generates on the ground. From the moment the shuttle s wheels touch the runway to the break of its electrical umbilical cord at 0.4 seconds before its next launch, sensors feed streams of data about the status of the vehicle and its various systems to Kennedy Space Center s shuttle crews. Even while the shuttle orbiter is refitted in Kennedy s orbiter processing facility, engineers constantly monitor everything from power levels to the testing of the mechanical arm in the orbiter s payload bay. On the launch pad and up until liftoff, the Launch Control Center, attached to the large Vehicle Assembly Building, screens all of the shuttle s vital data. (Once the shuttle clears its launch tower, this responsibility shifts to Mission Control at Johnson Space Center, with Kennedy in a backup role.) Ground systems for satellite launches also generate significant amounts of data. At Cape Canaveral Air Force Station, across the Banana River from Kennedy s location on Merritt Island, Florida, NASA rockets carrying precious satellite payloads into space flood the Launch Vehicle Data Center with sensor information on temperature, speed, trajectory, and vibration. The remote measurement and transmission of systems data called telemetry is essential to ensuring the safe and successful launch of the Agency s space missions. When a launch is unsuccessful, as it was for this year s Orbiting Carbon Observatory satellite, telemetry data also provides valuable clues as to what went wrong and how to remedy any problems for future attempts. All of this information is streamed from sensors in the form of binary code: strings of ones and zeros. One small company has partnered with NASA to provide technology that renders raw telemetry data intelligible not only for Agency engineers, but also for those in the private sector.
Classification, (big) data analysis and statistical learning
Conversano, Claudio; Vichi, Maurizio
2018-01-01
This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...
Component fragilities - data collection, analysis and interpretation
International Nuclear Information System (INIS)
Bandyopadhyay, K.K.; Hofmayer, C.H.
1986-01-01
As part of the component fragility research program sponsored by the US Nuclear Regulatory Commission, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment, by identifying, collecting and analyzing existing test data from various sources. BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices of various manufacturers and models. Through a cooperative agreement, BNL has also obtained test data from EPRI/ANCO. An analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. An extensive amount of additional fragility or high level test data exists. If completely collected and properly analyzed, the entire data bank is expected to greatly reduce the need for additional testing to establish fragility levels for most equipment
Challenges in computational statistics and data mining
Mielniczuk, Jan
2016-01-01
This volume contains nineteen research papers belonging to the areas of computational statistics, data mining, and their applications. Those papers, all written specifically for this volume, are their authors’ contributions to honour and celebrate Professor Jacek Koronacki on the occcasion of his 70th birthday. The book’s related and often interconnected topics, represent Jacek Koronacki’s research interests and their evolution. They also clearly indicate how close the areas of computational statistics and data mining are.
Advances in statistical models for data analysis
Minerva, Tommaso; Vichi, Maurizio
2015-01-01
This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.
Component fragilities. Data collection, analysis and interpretation
Bandyopadhyay, K.K.; Hofmayer, C.H.
1985-01-01
As part of the component fragility research program sponsored by the US NRC, BNL is involved in establishing seismic fragility levels for various nuclear power plant equipment with emphasis on electrical equipment. To date, BNL has reviewed approximately seventy test reports to collect fragility or high level test data for switchgears, motor control centers and similar electrical cabinets, valve actuators and numerous electrical and control devices, e.g., switches, transmitters, potentiometers, indicators, relays, etc., of various manufacturers and models. BNL has also obtained test data from EPRI/ANCO. Analysis of the collected data reveals that fragility levels can best be described by a group of curves corresponding to various failure modes. The lower bound curve indicates the initiation of malfunctioning or structural damage, whereas the upper bound curve corresponds to overall failure of the equipment based on known failure modes occurring separately or interactively. For some components, the upper and lower bound fragility levels are observed to vary appreciably depending upon the manufacturers and models. For some devices, testing even at the shake table vibration limit does not exhibit any failure. Failure of a relay is observed to be a frequent cause of failure of an electrical panel or a system. An extensive amount of additional fregility or high level test data exists
Statistical treatment of fatigue test data
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations
Statistical analysis of next generation sequencing data
Nettleton, Dan
2014-01-01
This book offers a comprehensive approach to multivariate statistical analyses. It provides theoretical knowledge of the concepts underlying the most important multivariate techniques and an overview of actual applications.
Michelle Redman-MacLaren
2014-08-01
Full Text Available Background: Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective: To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design: A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results: New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions: Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Statistics and analysis of scientific data
Bonamente, Massimiliano
2013-01-01
Statistics and Analysis of Scientific Data covers the foundations of probability theory and statistics, and a number of numerical and analytical methods that are essential for the present-day analyst of scientific data. Topics covered include probability theory, distribution functions of statistics, fits to two-dimensional datasheets and parameter estimation, Monte Carlo methods and Markov chains. Equal attention is paid to the theory and its practical application, and results from classic experiments in various fields are used to illustrate the importance of statistics in the analysis of scientific data. The main pedagogical method is a theory-then-application approach, where emphasis is placed first on a sound understanding of the underlying theory of a topic, which becomes the basis for an efficient and proactive use of the material for practical applications. The level is appropriate for undergraduates and beginning graduate students, and as a reference for the experienced researcher. Basic calculus is
Interpretation of elevated temperature fatigue data
Energy Technology Data Exchange (ETDEWEB)
Tomkins, B.
1976-06-15
The general problem of thermal cycling is examined in relation to component stress-strain response and failure development. The complexities of component failure analysis under thermal cycling conditions are clear. It is also clear that no single test procedure will enable good assessment and design rules to be delineated. More uniaxial plain specimen endurance data and simple crack propagation data are needed. But also more complex specimen test information is needed on specimens which simulate more closely the thermal stress-strain fields encountered in components. In addition, the role of environment and material metallurgical changes with time in long term failure development requires a better understanding. One inbuilt safety feature of many thermal cycling situations is the tendency to crack arrest conditions within the bulk material. The bounds on this factor are, however, not clear particularly when additional mechanical loads or constraints are present, as these can produce accelerating rather than decelerating crack conditions.
HistFitter software framework for statistical data analysis
Energy Technology Data Exchange (ETDEWEB)
Baak, M. [CERN, Geneva (Switzerland); Besjes, G.J. [Radboud University Nijmegen, Nijmegen (Netherlands); Nikhef, Amsterdam (Netherlands); Cote, D. [University of Texas, Arlington (United States); Koutsman, A. [TRIUMF, Vancouver (Canada); Lorenz, J. [Ludwig-Maximilians-Universitaet Muenchen, Munich (Germany); Excellence Cluster Universe, Garching (Germany); Short, D. [University of Oxford, Oxford (United Kingdom)
2015-04-15
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
HistFitter software framework for statistical data analysis
International Nuclear Information System (INIS)
Baak, M.; Besjes, G.J.; Cote, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-01-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
Bayesian methods for interpreting plutonium urinalysis data
Miller, G.; Inkret, W.C.
1995-01-01
The authors discuss an internal dosimetry problem, where measurements of plutonium in urine are used to calculate radiation doses. The authors have developed an algorithm using the MAXENT method. The method gives reasonable results, however the role of the entropy prior distribution is to effectively fit the urine data using intakes occurring close in time to each measured urine result, which is unrealistic. A better approximation for the actual prior is the log-normal distribution; however, with the log-normal distribution another calculational approach must be used. Instead of calculating the most probable values, they turn to calculating expectation values directly from the posterior probability, which is feasible for a small number of intakes
Interpretation of Genomic Data Questions and Answers
Simon, Richard
2008-01-01
Using a question and answer format we describe important aspects of using genomic technologies in cancer research. The main challenges are not managing the mass of data, but rather the design, analysis and accurate reporting of studies that result in increased biological knowledge and medical utility. Many analysis issues address the use of expression microarrays but are also applicable to other whole genome assays. Microarray based clinical investigations have generated both unrealistic hyperbole and excessive skepticism. Genomic technologies are tremendously powerful and will play instrumental roles in elucidating the mechanisms of oncogenesis and in devlopingan era of predictive medicine in which treatments are tailored to individual tumors. Achieving these goals involves challenges in re-thinking many paradigms for the conduct of basic and clinical cancer research and for the organization of interdisciplinary collaboration. PMID:18582627
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Udey, Ruth Norma [Michigan State Univ., East Lansing, MI (United States)
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Statistical application of groundwater monitoring data at the Hanford Site
International Nuclear Information System (INIS)
Chou, C.J.; Johnson, V.G.; Hodges, F.N.
1993-09-01
Effective use of groundwater monitoring data requires both statistical and geohydrologic interpretations. At the Hanford Site in south-central Washington state such interpretations are used for (1) detection monitoring, assessment monitoring, and/or corrective action at Resource Conservation and Recovery Act sites; (2) compliance testing for operational groundwater surveillance; (3) impact assessments at active liquid-waste disposal sites; and (4) cleanup decisions at Comprehensive Environmental Response Compensation and Liability Act sites. Statistical tests such as the Kolmogorov-Smirnov two-sample test are used to test the hypothesis that chemical concentrations from spatially distinct subsets or populations are identical within the uppermost unconfined aquifer. Experience at the Hanford Site in applying groundwater background data indicates that background must be considered as a statistical distribution of concentrations, rather than a single value or threshold. The use of a single numerical value as a background-based standard ignores important information and may result in excessive or unnecessary remediation. Appropriate statistical evaluation techniques include Wilcoxon rank sum test, Quantile test, ''hot spot'' comparisons, and Kolmogorov-Smirnov types of tests. Application of such tests is illustrated with several case studies derived from Hanford groundwater monitoring programs. To avoid possible misuse of such data, an understanding of the limitations is needed. In addition to statistical test procedures, geochemical, and hydrologic considerations are integral parts of the decision process. For this purpose a phased approach is recommended that proceeds from simple to the more complex, and from an overview to detailed analysis
Statistical Methods for Unusual Count Data
Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads
2016-01-01
microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...... microchimerism values, applied to simulated data sets and 2 observed data sets, to make recommendations for analytic practice. Modeling the level of quantitative microchimerism as a rate via Poisson or negative binomial model with the rate of detection defined as a count of microchimerism genome equivalents per...
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
A Statistical Toolkit for Data Analysis
Donadio, S.; Guatelli, S.; Mascialino, B.; Pfeiffer, A.; Pia, M.G.; Ribon, A.; Viarengo, P.
2006-01-01
The present project aims to develop an open-source and object-oriented software Toolkit for statistical data analysis. Its statistical testing component contains a variety of Goodness-of-Fit tests, from Chi-squared to Kolmogorov-Smirnov, to less known, but generally much more powerful tests such as Anderson-Darling, Goodman, Fisz-Cramer-von Mises, Kuiper, Tiku. Thanks to the component-based design and the usage of the standard abstract interfaces for data analysis, this tool can be used by other data analysis systems or integrated in experimental software frameworks. This Toolkit has been released and is downloadable from the web. In this paper we describe the statistical details of the algorithms, the computational features of the Toolkit and describe the code validation
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Statistics and analysis of scientific data
Bonamente, Massimiliano
2017-01-01
The revised second edition of this textbook provides the reader with a solid foundation in probability theory and statistics as applied to the physical sciences, engineering and related fields. It covers a broad range of numerical and analytical methods that are essential for the correct analysis of scientific data, including probability theory, distribution functions of statistics, fits to two-dimensional data and parameter estimation, Monte Carlo methods and Markov chains. Features new to this edition include: • a discussion of statistical techniques employed in business science, such as multiple regression analysis of multivariate datasets. • a new chapter on the various measures of the mean including logarithmic averages. • new chapters on systematic errors and intrinsic scatter, and on the fitting of data with bivariate errors. • a new case study and additional worked examples. • mathematical derivations and theoretical background material have been appropriately marked,to improve the readabili
The Seismic Analyzer: Interpreting and Illustrating 2D Seismic Data
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seism...
Model-Based Integration and Interpretation of Data
Petersen, Johannes
2004-01-01
Data integration and interpretation plays a crucial role in supervisory control. The paper defines a set of generic inference steps for the data integration and interpretation process based on a three-layer model of system representations. The three-layer model is used to clarify the combination...... of constraint and object-centered representations of the work domain throwing new light on the basic principles underlying the data integration and interpretation process of Rasmussen's abstraction hierarchy as well as other model-based approaches combining constraint and object-centered representations. Based...
Topology for Statistical Modeling of Petascale Data
Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
Gas, electricity, coal: 1998 statistical data
International Nuclear Information System (INIS)
1999-01-01
This document brings together the main statistical data from the French direction of gas, electricity and coal and presents a selection of the most significant numbered data: origin of production, share of the consumption, price levels, resources-employment status. These data are presented in a synthetic and accessible way in order to make useful references for the actors of the energy sector. (J.S.)
The seismic analyzer: interpreting and illustrating 2D seismic data.
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, M Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seismic data, such as deformed texturing and line and texture transfer functions. The illustrative rendering results in multi-attribute and scale invariant visualizations where features are represented clearly in both highly zoomed in and zoomed out views. Thumbnail views in combination with interactive appearance control allows for a quick overview of the data before detailed interpretation takes place. These techniques help reduce the work of seismic illustrators and interpreters.
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Register-based statistics statistical methods for administrative data
Wallgren, Anders
2014-01-01
This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking.
Guler, Mustafa; Gursoy, Kadir; Guven, Bulent
2016-01-01
Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…
Translation of EPA Research: Data Interpretation and Communication Strategies
Symposium Title: Social Determinants of Health, Environmental Exposures, and Disproportionately Impacted Communities: What We Know and How We Tell Others Topic 3: Community Engagement and Research Translation Title: Translation of EPA Research: Data Interpretation and Communicati...
Interpreting New Data from the High Energy Frontier
Energy Technology Data Exchange (ETDEWEB)
Thaler, Jesse [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-09-26
This is the final technical report for DOE grant DE-SC0006389, "Interpreting New Data from the High Energy Frontier", describing research accomplishments by the PI in the field of theoretical high energy physics.
Performing Inferential Statistics Prior to Data Collection
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
Lan, B.L.
2001-01-01
An alternative interpretation to Bohm's 'quantum force' and 'active information' is proposed. Numerical evidence is presented, which suggests that the time series of Bohm's 'quantum force' evaluated at the Bohmian position for non-stationary quantum states are typically non-Gaussian stable distributed with a flat power spectrum in classically chaotic Hamiltonian systems. An important implication of these statistical properties is briefly mentioned. (orig.)
Vapor Pressure Data Analysis and Statistics
2016-12-01
near 8, 2000, and 200, respectively. The A (or a) value is directly related to vapor pressure and will be greater for high vapor pressure materials...1, (10) where n is the number of data points, Yi is the natural logarithm of the i th experimental vapor pressure value, and Xi is the...VAPOR PRESSURE DATA ANALYSIS AND STATISTICS ECBC-TR-1422 Ann Brozena RESEARCH AND TECHNOLOGY DIRECTORATE
Data Mining and Statistics for Decision Making
Tufféry, Stéphane
2011-01-01
Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized
Statistical Analysis of Data for Timber Strengths
Sørensen, John Dalsgaard
2003-01-01
Statistical analyses are performed for material strength parameters from a large number of specimens of structural timber. Non-parametric statistical analysis and fits have been investigated for the following distribution types: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull...... fits to the data available, especially if tail fits are used whereas the Log Normal distribution generally gives a poor fit and larger coefficients of variation, especially if tail fits are used. The implications on the reliability level of typical structural elements and on partial safety factors...... for timber are investigated....
Interpretive Reporting of Protein Electrophoresis Data by Microcomputer
Talamo, Thomas S.; Losos, Frank J.; Kessler, G. Frederick
1982-01-01
A microcomputer based system for interpretive reporting of protein electrophoretic data has been developed. Data for serum, urine and cerebrospinal fluid protein electrophoreses as well as immunoelectrophoresis can be entered. Patient demographic information is entered through the keyboard followed by manual entry of total and fractionated protein levels obtained after densitometer scanning of the electrophoretic strip. The patterns are then coded, interpreted, and final reports generated. In most cases interpretation time is less than one second. Misinterpretation by computer is uncommon and can be corrected by edit functions within the system. These discrepancies between computer and pathologist interpretation are automatically stored in a data file for later review and possible program modification. Any or all previous tests on a patient may be reviewed with graphic display of the electrophoretic pattern. The system has been in use for several months and is presently well accepted by both laboratory and clinical staff. It also allows rapid storage, retrieval and analysis of protein electrophoretic datab.
Dimensional enrichment of statistical linked open data
Varga, Jovan; Vaisman, Alejandro; Romero, Oscar
2016-01-01
On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies...... for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels......) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits...
Hunting Down Interpretations of the HERA Large-$Q^{2}$ data
Ellis, John R.
1999-01-01
Possible interpretations of the HERA large-Q^2 data are reviewed briefly. The possibility of statistical fluctuations cannot be ruled out, and it seems premature to argue that the H1 and ZEUS anomalies are incompatible. The data cannot be explained away by modifications of parton distributions, nor do contact interactions help. A leptoquark interpretation would need a large tau-q branching ratio. Several R-violating squark interpretations are still viable despite all the constraints, and offer interesting experimental signatures, but please do not hold your breath.
Uncertainty analysis with statistically correlated failure data
International Nuclear Information System (INIS)
Modarres, M.; Dezfuli, H.; Roush, M.L.
1987-01-01
Likelihood of occurrence of the top event of a fault tree or sequences of an event tree is estimated from the failure probability of components that constitute the events of the fault/event tree. Component failure probabilities are subject to statistical uncertainties. In addition, there are cases where the failure data are statistically correlated. At present most fault tree calculations are based on uncorrelated component failure data. This chapter describes a methodology for assessing the probability intervals for the top event failure probability of fault trees or frequency of occurrence of event tree sequences when event failure data are statistically correlated. To estimate mean and variance of the top event, a second-order system moment method is presented through Taylor series expansion, which provides an alternative to the normally used Monte Carlo method. For cases where component failure probabilities are statistically correlated, the Taylor expansion terms are treated properly. Moment matching technique is used to obtain the probability distribution function of the top event through fitting the Johnson Ssub(B) distribution. The computer program, CORRELATE, was developed to perform the calculations necessary for the implementation of the method developed. (author)
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Critical analysis of adsorption data statistically
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are mango leaf powder.
Statistical analysis of network data with R
Networks have permeated everyday life through everyday realities like the Internet, social networks, and viral marketing. As such, network analysis is an important growth area in the quantitative sciences, with roots in social network analysis going back to the 1930s and graph theory going back centuries. Measurement and analysis are integral components of network research. As a result, statistical methods play a critical role in network analysis. This book is the first of its kind in network research. It can be used as a stand-alone resource in which multiple R packages are used to illustrate how to conduct a wide range of network analyses, from basic manipulation and visualization, to summary and characterization, to modeling of network data. The central package is igraph, which provides extensive capabilities for studying network graphs in R. This text builds on Eric D. Kolaczyk’s book Statistical Analysis of Network Data (Springer, 2009).
Normative Data for Interpreting the BREAST-Q: Augmentation
Background The BREAST-Q is a rigorously developed, well-validated, patient-reported outcome (PRO) instrument with a module designed for evaluating breast augmentation outcomes. However, there are no published normative BREAST-Q scores, limiting interpretation. Methods Normative data were generated for the BREAST-Q Augmentation Module via the Army of Women (AOW), an online community of women (with and without breast cancer) engaged in breast-cancer related research. Members were recruited via email, with women 18 years or older without a history of breast cancer or breast surgery invited to participate. Descriptive statistics and a linear multivariate regression were performed. A separate analysis compared normative scores to findings from previously published BREAST-Q augmentation studies. Results The preoperative BREAST-Q Augmentation Module was completed by 1,211 women. Mean age was 54 ±24 years, mean body mass index (BMI) was 27 ±6, and 39% (n=467) had a bra cup size ≥D. Mean scores were Satisfaction with Breasts (54 ±19), Psychosocial Well-being (66 ±20), Sexual Well-being (49 ±20), and Physical Well-being (86 ±15). Women with a BMI of 30 or greater and bra cup size D or greater had lower scores. In comparison to AOW scores, published BREAST-Q augmentation scores were lower before and higher after surgery for all scales except Physical Well-being. Conclusions The AOW normative data represent breast-related satisfaction and well-being in woman not actively seeking breast augmentation. This data may be used as normative comparison values for those seeking and undergoing surgery as we did, demonstrating the value of breast augmentation in this patient population. PMID:28350657
Interpretation of magnetotelluric data: Pasco Basin, south central Washington
The purpose of this project was to review, evaluate, and interpret magnetotelluric (MT) data collected in support of the Basalt Waste Isolation Project. The integrated interpretation presented is related to regional and site-specific geology and associated borehole, gravity, and magnetic data. The MT interpretation procedure placed strong reliance on computer models based upon the inferred physical parameters of the subsurface materials and their anticipated variability. Much of the MT data is of poor quality by current standards; however, significant qualitative observations can be made. The quantification of these observations, including the procedures and assumption utilized, are discussed in detail. Problems related to ambiguities inherent in the MT method are discussed as related to the Pasco Basin MT data. 117 refs., 77 figs., 3 tabs
Interpreting biomarker data from the COPHES/DEMOCOPHES twin projects
implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were...... of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the...
Statistical analysis of dragline monitoring data
Dragline monitoring systems are normally the best tool used to collect data on the machine performance and operational parameters of a dragline operation. This paper discusses results of a time study using data from a dragline monitoring system captured over a four month period. Statistical summaries of the time study in terms of average values, standard deviation and frequency distributions showed that the mode of operation and the geological conditions have a significant influence on the dragline performance parameters. 6 refs., 14 figs., 3 tabs.
Innovative statistical methods for public health data
The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference, and it can be used in graduate level classes.
Common misconceptions about data analysis and statistics.
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood.
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Statistical methods for data analysis in particle physics
This concise set of course-based notes provides the reader with the main concepts and tools to perform statistical analysis of experimental data, in particular in the field of high-energy physics (HEP). First, an introduction to probability theory and basic statistics is given, mainly as reminder from advanced undergraduate studies, yet also in view to clearly distinguish the Frequentist versus Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on upper limits as many applications in HEP concern hypothesis testing, where often the main goal is to provide better and better limits so as to be able to distinguish eventually between competing hypotheses or to rule out some of them altogether. Many worked examples will help newcomers to the field and graduate students to understand the pitfalls in applying theoretical concepts to actual data
Software for statistical data analysis used in Higgs searches
The analysis and interpretation of data collected by the Large Hadron Collider (LHC) requires advanced statistical tools in order to quantify the agreement between observation and theoretical models. RooStats is a project providing a statistical framework for data analysis with the focus on discoveries, confidence intervals and combination of different measurements in both Bayesian and frequentist approaches. It employs the RooFit data modelling language where mathematical concepts such as variables, (probability density) functions and integrals are represented as C++ objects. RooStats and RooFit rely on the persistency technology of the ROOT framework. The usage of a common data format enables the concept of digital publishing of complicated likelihood functions. The statistical tools have been developed in close collaboration with the LHC experiments to ensure their applicability to real-life use cases. Numerous physics results have been produced using the RooStats tools, with the discovery of the Higgs boson by the ATLAS and CMS experiments being certainly the most popular among them. We will discuss tools currently used by LHC experiments to set exclusion limits, to derive confidence intervals and to estimate discovery significances based on frequentist statistics and the asymptotic behaviour of likelihood functions. Furthermore, new developments in RooStats and performance optimisation necessary to cope with complex models depending on more than 1000 variables will be reviewed
Lumped parameter models for the interpretation of environmental tracer data
Principles of the lumped-parameter approach to the interpretation of environmental tracer data are given. The following models are considered: the piston flow model (PFM), exponential flow model (EM), linear model (LM), combined piston flow and exponential flow model (EPM), combined linear flow and piston flow model (LPM), and dispersion model (DM). The applicability of these models for the interpretation of different tracer data is discussed for a steady state flow approximation. Case studies are given to exemplify the applicability of the lumped-parameter approach. Description of a user-friendly computer program is given. (author). 68 refs, 25 figs, 4 tabs
Lumped parameter models for the interpretation of environmental tracer data
Principles of the lumped-parameter approach to the interpretation of environmental tracer data are given. The following models are considered: the piston flow model (PFM), exponential flow model (EM), linear model (LM), combined piston flow and exponential flow model (EPM), combined linear flow and piston flow model (LPM), and dispersion model (DM). The applicability of these models for the interpretation of different tracer data is discussed for a steady state flow approximation. Case studies are given to exemplify the applicability of the lumped-parameter approach. Description of a user-friendly computer program is given. (author). 68 refs, 25 figs, 4 tabs.
Full Text Available BACKGROUND: Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. METHODOLOGY: Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. CONCLUSIONS: The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.
Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.
Statistical methods and computing for big data
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Statistical methods and computing for big data.
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Building software tools to help contextualize and interpret monitoring data
Even modest monitoring efforts at landscape scales produce large volumes of data.These are most useful if they can be interpreted relative to land potential or other similar sites. However, for many ecological systems reference conditions may not be defined or are poorly described, which hinders und...
Functional MRI experiments : acquisition, analysis and interpretation of data
Functional MRI is widely used to address basic and clinical neuroscience questions. In the key domains of fMRI experiments, i.e. acquisition, processing and analysis, and interpretation of data, developments are ongoing. The main issues are sensitivity for changes in fMRI signal that are associated
Analysis of Preference Data Using Intermediate Test Statistic Abstract
Jun 1, 2013 ... West African Journal of Industrial and Academic Research Vol.7 No. 1 June ... Keywords:-Preference data, Friedman statistic, multinomial test statistic, intermediate test statistic. ... new method and consequently a new statistic ...
Securing cooperation from persons supplying statistical data.
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary.Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary.Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made.Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements.The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information.Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and Neurdenburg
Statistical methods for data analysis in particle physics
This concise set of course-based notes provides the reader with the main concepts and tools needed to perform statistical analyses of experimental data, in particular in the field of high-energy physics (HEP). First, the book provides an introduction to probability theory and basic statistics, mainly intended as a refresher from readers’ advanced undergraduate studies, but also to help them clearly distinguish between the Frequentist and Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on both discoveries and upper limits, as many applications in HEP concern hypothesis testing, where the main goal is often to provide better and better limits so as to eventually be able to distinguish between competing hypotheses, or to rule out some of them altogether. Many worked-out examples will help newcomers to the field and graduate students alike understand the pitfalls involved in applying theoretical co...
Bayesian maximum posterior probability method for interpreting plutonium urinalysis data
A new internal dosimetry code for interpreting urinalysis data in terms of radionuclide intakes is described for the case of plutonium. The mathematical method is to maximise the Bayesian posterior probability using an entropy function as the prior probability distribution. A software package (MEMSYS) developed for image reconstruction is used. Some advantages of the new code are that it ensures positive calculated dose, it smooths out fluctuating data, and it provides an estimate of the propagated uncertainty in the calculated doses. (author)
Encoding Dissimilarity Data for Statistical Model Building.
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding new objects into this space. This allows the dissimilarity information to be incorporated into a Smoothing Spline ANOVA penalized likelihood model, a Support Vector Machine, or any model that will admit Reproducing Kernel Hilbert Space components, for nonparametric regression, supervised learning, or semi-supervised learning. Future work and open questions are discussed. The papers are: F. Lu, S. Keles, S. Wright and G. Wahba 2005. A framework for kernel regularization with application to protein clustering. Proceedings of the National Academy of Sciences 102, 12332-1233.G. Corrada Bravo, G. Wahba, K. Lee, B. Klein, R. Klein and S. Iyengar 2009. Examining the relative influence of familial, genetic and environmental covariate information in flexible risk models. Proceedings of the National Academy of Sciences 106, 8128-8133F. Lu, Y. Lin and G. Wahba. Robust manifold unfolding with kernel regularization. TR 1008, Department of Statistics, University of Wisconsin-Madison.
Statistical Analysis of Data for Timber Strengths
Sørensen, John Dalsgaard; Hoffmeyer, P.
Statistical analyses are performed for material strength parameters from approximately 6700 specimens of structural timber. Non-parametric statistical analyses and fits to the following distributions types have been investigated: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull...
A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Full Text Available The concept of personalized nutrition and exercise prescription represents a topical and exciting progression for the discipline given the large inter-individual variability that exists in response to virtually all performance and health related interventions. Appropriate interpretation of intervention-based data from an individual or group of individuals requires practitioners and researchers to consider a range of concepts including the confounding influence of measurement error and biological variability. In addition, the means to quantify likely statistical and practical improvements are facilitated by concepts such as confidence intervals (CIs and smallest worthwhile change (SWC. The purpose of this review is to provide accessible and applicable recommendations for practitioners and researchers that interpret, and report personalized data. To achieve this, the review is structured in three sections that progressively develop a statistical framework. Section 1 explores fundamental concepts related to measurement error and describes how typical error and CIs can be used to express uncertainty in baseline measurements. Section 2 builds upon these concepts and demonstrates how CIs can be combined with the concept of SWC to assess whether meaningful improvements occur post-intervention. Finally, section 3 introduces the concept of biological variability and discusses the subsequent challenges in identifying individual response and non-response to an intervention. Worked numerical examples and interactive Supplementary Material are incorporated to solidify concepts and assist with implementation in practice.
A Climate Statistics Tool and Data Repository
Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.
STATISTICS, Program System for Statistical Analysis of Experimental Data
1 - Description of problem or function: The package is composed of 83 routines, the most important of which are the following: BINDTR: Binomial distribution; HYPDTR: Hypergeometric distribution; POIDTR: Poisson distribution; GAMDTR: Gamma distribution; BETADTR: Beta-1 and Beta-2 distributions; NORDTR: Normal distribution; CHIDTR: Chi-square distribution; STUDTR : Distribution of 'Student's T'; FISDTR: Distribution of F; EXPDTR: Exponential distribution; WEIDTR: Weibull distribution; FRAKTIL: Calculation of the fractiles of the normal, chi-square, Student's, and F distributions; VARVGL: Test for equality of variance for several sample observations; ANPAST: Kolmogorov-Smirnov test and chi-square test of goodness of fit; MULIRE: Multiple linear regression analysis for a dependent variable and a set of independent variables; STPRG: Performs a stepwise multiple linear regression analysis for a dependent variable and a set of independent variables. At each step, the variable entered into the regression equation is the one which has the greatest amount of variance between it and the dependent variable. Any independent variable can be forced into or deleted from the regression equation, irrespective of its contribution to the equation. LTEST: Tests the hypotheses of linearity of the data. SPRANK: Calculates the Spearman rank correlation coefficient. 2 - Method of solution: VARVGL: The Bartlett's Test, the Cochran's Test and the Hartley's Test are performed in the program. MULIRE: The Gauss-Jordan method is used in the solution of the normal equations. STPRG: The abbreviated Doolittle method is used to (1) determine variables to enter into the regression, and (2) complete regression coefficient calculation. 3 - Restrictions on the complexity of the problem: VARVGL: The Hartley's Test is only performed if the sample observations are all of the same size
Measurement of Osteogenic Exercise – How to Interpret Accelerometric Data?
Bone tissue adapts to its mechanical loading environment. We review here the accelerometric measurements with special emphasis on osteogenic exercise. The accelerometric method offers a unique opportunity to assess the intensity of mechanical loadings. We present methods to interpret accelerometric data, reducing it to the daily distributions of magnitude, slope, area, and energy of signal. These features represent the intensity level of physical activities, and were associated with the chang...
Network Data: Statistical Theory and New Models
and with environmental scientists at JPL and Emory University to retrieval from NASA MISR remote sensing images aerosol index AOD for air pollution ...Beijing, May, 2013 Beijing Statistics Forum, Beijing, May, 2013 Statistics Seminar, CREST-ENSAE, Paris , March, 2013 Statistics Seminar, University...to retrieval from NASA MISR remote sensing images aerosol index AOD for air pollution monitoring and management. Satellite- retrieved Aerosol Optical
Interpretation of bioassay data from nuclear fuel fabrication workers
Full text: In nuclear fuel fabrication facilities, workers are exposed to different compounds of enriched uranium. Although in this kind of facility the main route of intake is inhalation, ingestion may occur in some situations. The interpretation of the bioassay data is very complex, since it is necessary taking into account all the different parameters, which is a big challenge. Due to the high cost of the individual monitoring programme for internal dose assessment in the routine monitoring programmes, usually only one type of measurement is assigned. In complex situations like the one described in this paper, where several parameters can compromise the accuracy of the bioassay interpretation it is need to have a combination of techniques to evaluate the internal dose. According to ICRP 78 (1997), the general order of preference in terms of accuracy of interpretation is: body activity measurement, excreta analysis and personal air sampling. Results of monitoring of working environment may provide information that assists in interpretation on particle size, chemical form and solubility, time of intake. A group of seventeen workers from controlled area of the fuel fabrication facility was selected to evaluate the internal dose using all different available techniques during a certain period. The workers were monitored for determination of uranium content in the daily urinary and faecal excretion (collected over a period of 3 consecutive days), chest counting and personal air sampling. The results have shown that at least two types of sensitivity techniques must be used, since there are some sources of uncertainties on the bioassay interpretation, like mixture of uranium compounds intake and different routes of intake. The combination of urine and faeces analysis has shown to be the more appropriate methodology for assessing internal dose in this situation. (author)
This book presents the proceedings of the 2nd Pacific Rim Statistical Conference for Production Engineering: Production Engineering, Big Data and Statistics, which took place at Seoul National University in Seoul, Korea in December, 2016. The papers included discuss a wide range of statistical challenges, methods and applications for big data in production engineering, and introduce recent advances in relevant statistical methods.
Computer science and machine learning in particular are increasingly lauded for their potential to aid medical practice. However, the highly technical nature of the state of the art techniques can be a major obstacle in their usability by health care professionals and thus, their adoption and actual practical benefit. In this paper we describe a software tool which focuses on the visualization of predictions made by a recently developed method which leverages data in the form of large scale electronic records for making diagnostic predictions. Guided by risk predictions, our tool allows the user to explore interactively different diagnostic trajectories, or display cumulative long term prognostics, in an intuitive and easily interpretable manner.
Statistical interpretation of the process of evolution and functioning of Audiovisual Archives
Full Text Available The article provides a type of the operating conditions of audiovisual archives, using for this purpose the interpretation of the results obtained in the study of quantitative sampling. The study involved 43 institutions of different nature of dimension since the national and foreign organizations, from of the questions answered by services of communication and of cultural institutions. The analysis of the object of study found a variety of guidelines on the management of information preservation, as featured the typology of records collections of each file. The data collection thus allowed building an overview of the operating model of each organization surveyed in this study.
Symbolic Data Analysis Conceptual Statistics and Data Mining
With the advent of computers, very large datasets have become routine. Standard statistical methods don't have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal s
On the statistical assessment of classifiers using DNA microarray data
Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate
Empirical approach to interpreting card-sorting data
Full Text Available Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [Cogn. Sci. 5, 121 (1981COGSD50364-021310.1207/s15516709cog0502_2]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
Human biomonitoring data interpretation and ethics; obstacles or surmountable challenges?
Full Text Available Abstract The use of human samples to assess environmental exposure and uptake of chemicals is more than an analytical exercise and requires consideration of the utility and interpretation of data as well as due consideration of ethical issues. These aspects are inextricably linked. In 2004 the EC expressed its commitment to the development of a harmonised approach to human biomonitoring (HBM by including an action in the EU Environment and Health Strategy to develop a Human Biomonitoring Pilot Study. This further underlined the need for interpretation strategies as well as guidance on ethical issues. A workshop held in December 2006 brought together stakeholders from academia, policy makers as well as non-governmental organisations and chemical industry associations to a two day workshop built a mutual understanding of the issues in an open and frank discussion forum. This paper describes the discussion and recommendations from the workshop. The workshop developed key recommendations for a Pan-European HBM Study: 1. A strategy for the interpretation of human biomonitoring data should be developed. 2. The pilot study should include the development of a strategy to integrate health data and environmental monitoring with human biomonitoring data at national and international levels. 3. Communication strategies should be developed when designing the study and evolve as the study continues. 4. Early communication with stakeholders is essential to achieve maximum efficacy of policy developments and facilitate subsequent monitoring. 5. Member states will have to apply individually for project approval from their National Research Ethics Committees. 6. The study population needs to have sufficient information on the way data will be gathered, interpreted and disseminated and how samples will be stored and used in the future (if biobanking before they can give informed consent. 7. The participants must be given the option of anonymity. This has an impact
Data bases and statistical systems: demography
This article deals with the availability of large-scale data for demographic analysis. The main sources of data that demographers work with are censuses data, microcensus data, population registers, other administrative data, survey data, and big data. Data of this kind can be used to generate
47 CFR 1.363 - Introduction of statistical data.
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall be...
Topics in statistical data analysis for high-energy physics
These lectures concert two topics that are becoming increasingly important in the analysis of high-energy physics data: Bayesian statistics and multivariate methods. In the Bayesian approach, we extend the interpretation of probability not only to cover the frequency of repeatable outcomes but also to include a degree of belief. In this way we are able to associate probability with a hypothesis and thus to answer directly questions that cannot be addressed easily with traditional frequentist methods. In multivariate analysis, we try to exploit as much information as possible from the characteristics that we measure for each event to distinguish between event types. In particular we will look at a method that has gained popularity in high-energy physics in recent years: the boosted decision tree. Finally, we give a brief sketch of how multivariate methods may be applied in a search for a new signal process. (author)
Infrared spectroscopy for geologic interpretation of TIMS data
The Portable Field Emission Spectrometer (PFES) was designed to collect meaningful spectra in the field under climatic, thermal, and sky conditions that approximate those at the time of the overflight. The specifications and procedures of PFES are discussed. Laboratory reflectance measurements of rocks and minerals were examined for the purpose of interpreting Thermal Infrared Multispectral Scanner (TIMS) data. The capability is currently being developed to perform direct laboratory measurement of the normal spectral radiance of Earth surface materials at low temperatures (20 to 30 C) at the Jet Propulsion Laboratory.
Analysis and interpretation of diffraction data from complex, anisotropic materials
Most materials are elastically anisotropic and exhibit additional anisotropy beyond elastic deformation. For instance, in ferroelectric materials the main inelastic deformation mode is via domains, which are highly anisotropic crystallographic features. To quantify this anisotropy of ferroelectrics, advanced X-ray and neutron diffraction methods were employed. Extensive sets of data were collected from tetragonal BaTiO3, PZT and other ferroelectric ceramics. Data analysis was challenging due to the complex constitutive behavior of these materials. To quantify the elastic strain and texture evolution in ferroelectrics under loading, a number of data analysis techniques such as the single peak and Rietveld methods were used and their advantages and disadvantages compared. It was observed that the single peak analysis fails at low peak intensities especially after domain switching while the Rietveld method does not account for lattice strain anisotropy although it overcomes the low intensity problem via whole pattern analysis. To better account for strain anisotropy the constant stress (Reuss) approximation was employed within the Rietveld method and new formulations to estimate lattice strain were proposed. Along the way, new approaches for handling highly anisotropic lattice strain data were also developed and applied. All of the ceramics studied exhibited significant changes in their crystallographic texture after loading indicating non-180° domain switching. For a full interpretation of domain switching the spherical harmonics method was employed in Rietveld. A procedure for simultaneous refinement of multiple data sets was established for a complete texture analysis. To further interpret diffraction data, a solid mechanics model based on the self-consistent approach was used in calculating lattice strain and texture evolution during the loading of a polycrystalline ferroelectric. The model estimates both the macroscopic average response of a specimen and its hkl
New Cosmological Model and Its Implications on Observational Data Interpretation
Full Text Available The paradigm of ΛCDM cosmology works impressively well and with the concept of inflation it explains the universe after the time of decoupling. However there are still a few concerns; after much effort there is no detection of dark matter and there are significant problems in the theoretical description of dark energy. We will consider a variant of the cosmological spherical shell model, within FRW formalism and will compare it with the standard ΛCDM model. We will show that our new topological model satisfies cosmological principles and is consistent with all observable data, but that it may require new interpretation for some data. Considered will be constraints imposed on the model, as for instance the range for the size and allowed thickness of the shell, by the supernovae luminosity distance and CMB data. In this model propagation of the light is confined along the shell, which has as a consequence that observed CMB originated from one point or a limited space region. It allows to interpret the uniformity of the CMB without inflation scenario. In addition this removes any constraints on the uniformity of the universe at the early stage and opens a possibility that the universe was not uniform and that creation of galaxies and large structures is due to the inhomogeneities that originated in the Big Bang.
Flexibility in data interpretation: effects of representational format.
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design.
Statistical process control for serially correlated data
Statistical Process Control (SPC) aims at quality improvement through reduction of variation. The best known tool of SPC is the control chart. Over the years, the control chart has proved to be a successful practical technique for monitoring process measurements. However, its usefulness in practice
Statistical data of the uranium industry
Historical facts and figures of the uranium industry through 1975 are compiled. Areas covered are ore and concentrate purchases; uranium resources; distribution of $10, $15, and $30 reserves; drilling statistics; uranium exploration expenditures; land holdings for uranium mining and exploration; employment; commercial U 3 O 8 sales and requirements; and processing mills
Statistical Analysis Of Reconnaissance Geochemical Data From ...
, Co, Mo, Hg, Sb, Tl, Sc, Cr, Ni, La, W, V, U, Th, Bi, Sr and Ga in 56 stream sediment samples collected from Orle drainage system were subjected to univariate and multivariate statistical analyses. The univariate methods used include ...
Interpretation of self-potential data for dam seepage investigations
This book represents one of a series on the subject of geophysical methods and their use in assessing seepage and internal erosion in embankment dams. This manual facilitates the interpretation of self-potential (SP) data generated by subsurface fluid flow, with an emphasis on dam seepage studies. It is intended for users with a background in geophysics or engineering having a general familiarity with both the SP and direct-current (DC) resistivity methods and their applications. It includes an extensive reference list covering all aspects of available SP interpretation techniques, including qualitative, analytical and numerical methods. Particular emphasis is placed on the use of geometric source analytical modeling methods to evaluate SP anomalies. These methods provide a simple yet efficient means of estimating the location and depth of current sources of observed SP data, which may be linked to fluid flow in the subsurface. The manual is primarily oriented toward embankment dams and earthen structures such as levees and dikes. SP methods have been used to investigate seepage through pervious zones and cracks in concrete and concrete-faced structures. The manual describes the nature of SP fields generated by both uniform and non-uniform dam seepage flow, as well as non-seepage sources of SP variations. These methods enable the study of more complex systems and require a more comprehensive analysis of a given field site. refs., tabs., figs.
Improved interpretation of satellite altimeter data using genetic algorithms
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Challenges in dental statistics: data and modelling
The aim of this work is to present the reflections and proposals derived from the first Workshop of the SISMEC STATDENT working group on statistical methods and applications in dentistry, held in Ancona (Italy) on 28th September 2011. STATDENT began as a forum of comparison and discussion for statisticians working in the field of dental research in order to suggest new and improve existing biostatistical and clinical epidemiological methods. During the meeting, we dealt with very important to...
Experimental software for modeling and interpreting educational data analysis processes
Full Text Available Problems, tasks and processes of educational data mining are considered in this article. The objective is to create a fundamentally new information system of the University using the results educational data analysis. One of the functions of such a system is knowledge extraction from accumulated in the operation process data. The creation of the national system of this type is an iterative and time-consuming process requiring the preliminary studies and incremental prototyping modules. The novelty of such systems is that there is a lack of those using this methodology of the development, for this purpose a number of experiments was carried out in order to collect data, choose appropriate methods for the study and to interpret them. As a result of the experiment, the authors were available sources available for analysis in the information environment of the home university. The data were taken from the semester performance, obtained from the information system of the training department of the Institute of IT MTU MIREA, the data obtained as a result of the independent work of students and data, using specially designed Google-forms. To automate the collection of information and analysis of educational data, an experimental software package was created. As a methodology for developing the experimental software complex, a decision was made using the methodologies of rational-empirical complexes (REX and single-experimentation program technologies (TPEI. The details of the program implementation of the complex are described in detail, conclusions are given about the availability of the data sources used, and conclusions are drawn about the prospects for further development.
Analysis of filament statistics in fast camera data on MAST
Coherent filamentary structures have been shown to play a dominant role in turbulent cross-field particle transport [D'Ippolito 2011]. An improved understanding of filaments is vital in order to control scrape off layer (SOL) density profiles and thus control first wall erosion, impurity flushing and coupling of radio frequency heating in future devices. The Elzar code [T. Farley, 2017 in prep.] is applied to MAST data. The code uses information about the magnetic equilibrium to calculate the intensity of light emission along field lines as seen in the camera images, as a function of the field lines' radial and toroidal locations at the mid-plane. In this way a `pseudo-inversion' of the intensity profiles in the camera images is achieved from which filaments can be identified and measured. In this work, a statistical analysis of the intensity fluctuations along field lines in the camera field of view is performed using techniques similar to those typically applied in standard Langmuir probe analyses. These filament statistics are interpreted in terms of the theoretical ergodic framework presented by F. Militello & J.T. Omotani, 2016, in order to better understand how time averaged filament dynamics produce the more familiar SOL density profiles. This work has received funding from the RCUK Energy programme (Grant Number EP/P012450/1), from Euratom (Grant Agreement No. 633053) and from the EUROfusion consortium.
The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.
Theoretical interpretation of data from high-energy nuclear collisions
Nuclear collision data at energies ranging from medium to relativistic are interpreted theoretically. The major objective is a better understanding of high-energy heavy-ion collisions, with particular emphasis on the properties of excited nuclear matter. Further progress towards a satisfactory description of excited subsaturation nuclear matter is achieved. The mean free path of a nucleon in nuclear matter, which is a critical parameter in assessing the applicability of certain nuclear collision models, is investigated. Experimental information is used together with theoretical concepts in collaborations with experimentalists in order to learn about the reaction mechanism and about excited nuclear matter properties. In the framework of a more strictly theoretical program development, subnuclear degrees of freedom and nonlinear phenomena in model field theories are studied
Statistical analysis of medical data using SAS
Der, Geoff
An Introduction to SASDescribing and Summarizing DataBasic InferenceScatterplots Correlation: Simple Regression and SmoothingAnalysis of Variance and CovarianceMultiple RegressionLogistic RegressionThe Generalized Linear ModelGeneralized Additive ModelsNonlinear Regression ModelsThe Analysis of Longitudinal Data IThe Analysis of Longitudinal Data II: Models for Normal Response VariablesThe Analysis of Longitudinal Data III: Non-Normal ResponseSurvival AnalysisAnalysis Multivariate Date: Principal Components and Cluster AnalysisReferences
The Statistical Analysis of Failure Time Data
Contains additional discussion and examples on left truncation as well as material on more general censoring and truncation patterns.Introduces the martingale and counting process formulation swil lbe in a new chapter.Develops multivariate failure time data in a separate chapter and extends the material on Markov and semi Markov formulations.Presents new examples and applications of data analysis.
Statistical Analysis of Research Data | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data. The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.
We present a statistical analysis of time-resolved spontaneous emission decay curves from ensembles of emitters, such as semiconductor quantum dots, with the aim of interpreting ubiquitous non-single-exponential decay. Contrary to what is widely assumed, the density of excited emitters and the
Fetal Alcohol Spectrum Disorders (FASDs): Data and Statistics
... alcohol screening and counseling for all women Data & Statistics Recommend on Facebook Tweet Share Compartir Prevalence of ... conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a ...
Statistical methods for categorical data analysis
Powers, Daniel
This book provides a comprehensive introduction to methods and models for categorical data analysis and their applications in social science research. Companion website also available, at https://webspace.utexas.edu/dpowers/www/
Statistical modeling and extrapolation of carcinogenesis data
Mathematical models of carcinogenesis are reviewed, including pharmacokinetic models for metabolic activation of carcinogenic substances. Maximum likelihood procedures for fitting these models to epidemiological data are discussed, including situations where the time to tumor occurrence is unobservable. The plausibility of different possible shapes of the dose response curve at low doses is examined, and a robust method for linear extrapolation to low doses is proposed and applied to epidemiological data on radiation carcinogenesis
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
This contribution contains a brief presentation and comparison of the different Statistical Multistep Approaches, presently available for practical nuclear data calculations. (author). 46 refs, 5 figs
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
International Nuclear Information System (INIS)
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs
78 FR 10166 - Access Interpreting; Transfer of Data
2013-02-13
... regulations. Access Interpreting has been awarded a contract to perform work for OPP, and access to this information will enable Access Interpreting to fulfill the obligations of the contract. DATES: Access.... Contractor Requirements Under Contract No. EP10H000109, this contract is to provide the Environmental...
Statistical methods for handling incomplete data
Kim, Jae Kwang
2013-01-01
""… this book nicely blends the theoretical material and its application through examples, and will be of interest to students and researchers as a textbook or a reference book. Extensive coverage of recent advances in handling missing data provides resources and guidelines for researchers and practitioners in implementing the methods in new settings. … I plan to use this as a textbook for my teaching and highly recommend it.""-Biometrics, September 2014
Implementation of statistical analysis methods for medical physics data
The objective of biomedical research with different radiation natures is to contribute for the understanding of the basic physics and biochemistry of the biological systems, the disease diagnostic and the development of the therapeutic techniques. The main benefits are: the cure of tumors through the therapy, the anticipated detection of diseases through the diagnostic, the using as prophylactic mean for blood transfusion, etc. Therefore, for the better understanding of the biological interactions occurring after exposure to radiation, it is necessary for the optimization of therapeutic procedures and strategies for reduction of radioinduced effects. The group pf applied physics of the Physics Institute of UERJ have been working in the characterization of biological samples (human tissues, teeth, saliva, soil, plants, sediments, air, water, organic matrixes, ceramics, fossil material, among others) using X-rays diffraction and X-ray fluorescence. The application of these techniques for measurement, analysis and interpretation of the biological tissues characteristics are experimenting considerable interest in the Medical and Environmental Physics. All quantitative data analysis must be initiated with descriptive statistic calculation (means and standard deviations) in order to obtain a previous notion on what the analysis will reveal. It is well known que o high values of standard deviation found in experimental measurements of biologicals samples can be attributed to biological factors, due to the specific characteristics of each individual (age, gender, environment, alimentary habits, etc). This work has the main objective the development of a program for the use of specific statistic methods for the optimization of experimental data an analysis. The specialized programs for this analysis are proprietary, another objective of this work is the implementation of a code which is free and can be shared by the other research groups. As the program developed since the
Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the
Plasma data analysis using statistical analysis system
Multivariate factor analysis has been applied to a plasma data base of REPUTE-1. The characteristics of the reverse field pinch plasma in REPUTE-1 are shown to be explained by four independent parameters which are described in the report. The well known scaling laws F/sub chi/ proportional to I/sub p/, T/sub e/ proportional to I/sub p/, and tau/sub E/ proportional to N/sub e/ are also confirmed. 4 refs., 8 figs., 1 tab
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data
Directory of Open Access Journals (Sweden)
2009-02-01
Full Text Available Abstract Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx, that leverages the gene ontology (GO to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172. We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
2008-11-07
Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of
Measurement of osteogenic exercise – How to interpret accelerometric data?
Directory of Open Access Journals (Sweden)
Full Text Available Bone tissue adapts to its mechanical loading environment. We review here the accelerometric measurements with special emphasis on osteogenic exercise. The accelerometric method offers a unique opportunity to assess the intensity of mechanical loadings. We present methods to interpret accelerometric data, reducing it to the daily distributions of magnitude, slope, area and energy of signal. These features represent the intensity level of physical activities, and were associated with the changes in bone density, bone geometry, physical performance and metabolism in healthy premenopausal women. Bone adaptations presented a dose- and intensity dependent relationship with impact loading. Changes in hip were threshold dependent, indicating the importance of high impacts exceeding acceleration of 4 g or slope of 100 g/s as an osteogenic stimulus. The number of impacts needed was 60 per day. We also present the Daily Impact Score to describe the osteogenic potential of daily mechanical loading with a single score. The methodology presented here can be used to study musculoskeletal adaptation to exercise in other target groups as well.
Using Facebook Data to Turn Introductory Statistics Students into Consultants
Facebook provides businesses and organizations with copious data that describe how users are interacting with their page. This data affords an excellent opportunity to turn introductory statistics students into consultants to analyze the Facebook data using descriptive and inferential statistics. This paper details a semester-long project that…
Statistical data processing with automatic system for environmental radiation monitoring
Practice of statistical data processing for radiation monitoring is exemplified, and some results obtained are presented. Experience in practical application of mathematical statistics methods for radiation monitoring data processing allowed to develop a concrete algorithm of statistical processing realized in M-6000 minicomputer. The suggested algorithm by its content is divided into 3 parts: parametrical data processing and hypotheses test, pair and multiple correlation analysis. Statistical processing programms are in a dialogue operation. The above algorithm was used to process observed data over radioactive waste disposal control region. Results of surface waters monitoring processing are presented
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
On the statistical interpretation of quantum mechanics: evolution of the density matrix
Without attempting to identify ontological interpretation with a mathematical structure, we reduce philosophical speculation to five theses. In the discussion of these, a central role is devoted to the mathematical problem of the evolution of the density matrix. This article relates to the first 3 of these 5 theses [fr
Environmetric data interpretation to assess surface water quality
Two multivariate statistical methods (Cluster analysis /CA/ and Principal components analysis /PCA/) were applied for model assessment of the water quality of Maritsa River and Tundja River on Bulgarian territory. The study used long-term monitoring data from many sampling sites characterized by various surface water quality indicators. The application of CA to the indicators results in formation of clusters showing the impact of biological, anthropogenic and eutrophication sources. For further assessment of the monitoring data, PCA was implemented, which identified, again, latent factors confirming, in principle, the clustering output. Their identification coincide correctly to the location of real pollution sources along the rivers catchments. The linkage of the sampling sites along the river flow by CA identified several special patterns separated by specific tracers levels. The apportionment models of the pollution determined the contribution of each one of identified pollution factors to the total concentration of each one of the water quality parameters. Thus, a better risk management of the surface water quality is achieved both on local and national level
A nonparametric spatial scan statistic for continuous data.
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Statistics of meteorological data at Tokai Research Establishment in JAERI
The meteorological observation data at Tokai site were analyzed statistically based on a 'Guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). This report shows the meteorological analysis of wind direction, wind velocity and atmospheric stability etc. to assess the public dose around the Tokai site caused by the released gaseous radioactivity. The statistical period of meteorological data is every 5 years from 1981 to 1995. (author)
Using Data from Climate Science to Teach Introductory Statistics
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
The value of statistical tools to detect data fabrication
We aim to investigate how statistical tools can help detect potential data fabrication in the social- and medical sciences. In this proposal we outline three projects to assess the value of such statistical tools to detect potential data fabrication and make the first steps in order to apply them
Journal data sharing policies and statistical reporting inconsistencies in psychology.
In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of
Simple statistical methods for software engineering data and patterns
Pandian, C Ravindranath
2015-01-01
National Vital Statistics System (NVSS) - National Cardiovascular Disease Surveillance Data
U.S. Department of Health & Human Services — 2000 forward. NVSS is a secure, web-based data management system that collects and disseminates the Nation's official vital statistics. Indicators from this data...
Experimental uncertainty estimation and statistics for data having interval uncertainty.
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Mining gene expression data by interpreting principal components
Directory of Open Access Journals (Sweden)
2006-04-01
Full Text Available Abstract Background There are many methods for analyzing microarray data that group together genes having similar patterns of expression over all conditions tested. However, in many instances the biologically important goal is to identify relatively small sets of genes that share coherent expression across only some conditions, rather than all or most conditions as required in traditional clustering; e.g. genes that are highly up-regulated and/or down-regulated similarly across only a subset of conditions. Equally important is the need to learn which conditions are the decisive ones in forming such gene sets of interest, and how they relate to diverse conditional covariates, such as disease diagnosis or prognosis. Results We present a method for automatically identifying such candidate sets of biologically relevant genes using a combination of principal components analysis and information theoretic metrics. To enable easy use of our methods, we have developed a data analysis package that facilitates visualization and subsequent data mining of the independent sources of significant variation present in gene microarray expression datasets (or in any other similarly structured high-dimensional dataset. We applied these tools to two public datasets, and highlight sets of genes most affected by specific subsets of conditions (e.g. tissues, treatments, samples, etc.. Statistically significant associations for highlighted gene sets were shown via global analysis for Gene Ontology term enrichment. Together with covariate associations, the tool provides a basis for building testable hypotheses about the biological or experimental causes of observed variation. Conclusion We provide an unsupervised data mining technique for diverse microarray expression datasets that is distinct from major methods now in routine use. In test uses, this method, based on publicly available gene annotations, appears to identify numerous sets of biologically relevant genes. It
Computationally inexpensive interpretation of magnetic data for finite spin clusters
We show that high-temperature expansion of the partition function is a computationally convenient tool to interpretation of magnetic properties of spin clusters wherein the spin centers are interacting via an isotropic Heisenberg exchange operator. High-temperature expansions up to order 12 are u...
Directory of Open Access Journals (Sweden)
Ergodic theory, interpretations of probability and the foundations of statistical mechanics
The traditional use of ergodic theory in the foundations of equilibrium statistical mechanics is that it provides a link between thermodynamic observables and microcanonical probabilities. First of all, the ergodic theorem demonstrates the equality of microcanonical phase averages and infinite time
2012-01-01
We, herein, present a statistical method for diagnostics of the outliers in phase equilibrium data (dissociation data) of simple clathrate hydrates. The applied algorithm is performed on the basis of the Leverage mathematical approach, in which the statistical Hat matrix, Williams Plot, and the r......We, herein, present a statistical method for diagnostics of the outliers in phase equilibrium data (dissociation data) of simple clathrate hydrates. The applied algorithm is performed on the basis of the Leverage mathematical approach, in which the statistical Hat matrix, Williams Plot...... in exponential form is used to represent/predict the hydrate dissociation pressures for three-phase equilibrium conditions (liquid water/ice–vapor-hydrate). The investigated hydrate formers are methane, ethane, propane, carbon dioxide, nitrogen, and hydrogen sulfide. It is interpreted from the obtained results...
Numeric computation and statistical data analysis on the Java platform
Numerical computation, knowledge discovery and statistical data analysis integrated with powerful 2D and 3D graphics for visualization are the key topics of this book. The Python code examples powered by the Java platform can easily be transformed to other programming languages, such as Java, Groovy, Ruby and BeanShell. This book equips the reader with a computational platform which, unlike other statistical programs, is not limited by a single programming language. The author focuses on practical programming aspects and covers a broad range of topics, from basic introduction to the Python language on the Java platform (Jython), to descriptive statistics, symbolic calculations, neural networks, non-linear regression analysis and many other data-mining topics. He discusses how to find regularities in real-world data, how to classify data, and how to process data for knowledge discoveries. The code snippets are so short that they easily fit into single pages. Numeric Computation and Statistical Data Analysis ...
Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.
2008-01-01
Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the
Complex Data Modeling and Computationally Intensive Statistical Methods
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
Using Data Mining to Teach Applied Statistics and Correlation
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Insights in Experimental Data : Interactive Statistics with the ILLMO Program
Empirical researchers turn to statistics to assist them in drawing conclusions, also called inferences, from their collected data. Often, this data is experimental data, i.e., it consists of (repeated) measurements collected in one or more distinct conditions. The observed data can hence be
Explorations in Statistics: The Analysis of Ratios and Normalized Data
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
STATCAT, Statistical Analysis of Parametric and Non-Parametric Data
1 - Description of program or function: A suite of 26 programs designed to facilitate the appropriate statistical analysis and data handling of parametric and non-parametric data, using classical and modern univariate and multivariate methods. 2 - Method of solution: Data is read entry by entry, using a choice of input formats, and the resultant data bank is checked for out-of- range, rare, extreme or missing data. The completed STATCAT data bank can be treated by a variety of descriptive and inferential statistical methods, and modified, using other standard programs as required
Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Status Data
Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...
Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Dynamics Data
Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...
2011-12-01
High-resolution total-field magnetic data can be collected rapidly and relatively cheaply over large archaeological sites due to recent advances in data collection. However, interpretation of these datasets still generally comprises a sequence of data correction and filtering operations prior to a 2D visual interpretation based on pattern recognition. In contrast, current developments in aero-magnetic interpretation have led to several tools for identifying location, shape and depth information of anomalous sources. These methods often fail when directly applied to archaeo-magnetic data, due to the particular noise content typical in very near-surface surveys. Here techniques are explored that allow these aero-magnetic interpretation tools to be applied to archaeological problems, without the need for extensive, often biased user input. It is shown that full 3D quantitative interpretation of the subsurface is possible from just the magnetic data alone. Inversion of magnetic data is increasingly being applied to aero-magnetic surveys to produce 3D models of the subsurface magnetisation. Typically, an objective function is minimised in order to create a smooth distribution of magnetisation away from a reference model (or halfspace if no a-priori information is available). Often, although a good fit to the observed values may be obtained, the final model will be non-unique and biased by the reference model. Testing of synthetic data shows that when archaeo-magnetic datasets are inverted without applying a-priori information, large discrepancies between the true and modelled depths can occur. Where no a-priori information is available, information regarding the horizontal location of sources can be obtained from derivative-based methods such as the absolute horizontal gradient, tilt-angle and theta-map. Using pseudogravity data with these techniques, overcomes the problem of noise amplification that has previously hampered archaeological uses of these techniques. Depth
Introduction. There is an urgent need for a method of analysing FECRT data that is computationally simple and statistically robust. A method for evaluating the statistical power of a proposed FECRT study would also greatly enhance the current guidelines. Methods. A novel statistical framework has...... been developed that evaluates observed FECRT data against two null hypotheses: (1) the observed efficacy is consistent with the expected efficacy, and (2) the observed efficacy is inferior to the expected efficacy. The method requires only four simple summary statistics of the observed data. Power...... that the notional type 1 error rate of the new statistical test is accurate. Power calculations demonstrate a power of only 65% with a sample size of 20 treatment and control animals, which increases to 69% with 40 control animals or 79% with 40 treatment animals. Discussion. The method proposed is simple...
Data-driven inference for the spatial scan statistic
Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
2015-01-01
Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study. © 2014 Wiley Periodicals, Inc.
Maximum entropy prior uncertainty and correlation of statistical economic data
Empirical estimates of source statistical economic data such as trade flows, greenhouse gas emissions or employment figures are always subject to uncertainty (stemming from measurement errors or confidentiality) but information concerning that uncertainty is often missing. This paper uses concepts
Simpson's Paradox in the Interpretation of "Leaky Pipeline" Data
The traditional "leaky pipeline" plots are widely used to inform gender equality policy and practice. Herein, we demonstrate how a statistical phenomenon known as Simpson's paradox can obscure trends in gender "leaky pipeline" plots. Our approach has been to use Excel spreadsheets to generate hypothetical "leaky…
2006-01-01
The paper discusses the gamma-ray spectrum interpretation technology on nuclear logging. The principles of familiar quantitative interpretation methods, including the average content method and the traditional spectrum striping method, are introduced, and their limitation of determining the contents of radioactive elements on unsaturated ledges (where radioactive elements distribute unevenly) is presented. On the basis of the intensity gamma-logging quantitative interpretation technology by using the deconvolution method, a new quantitative interpretation method of separating radioactive elements is presented for interpreting the gamma spectrum logging. This is a point-by-point spectrum striping deconvolution technology which can give the logging data a quantitative interpretation. (authors)
2007-01-01
We present a statistical analysis of time-resolved spontaneous emission decay curves from ensembles of emitters, such as semiconductor quantum dots, with the aim of interpreting ubiquitous non-single-exponential decay. Contrary to what is widely assumed, the density of excited emitters...... and the intensity in an emission decay curve are not proportional, but the density is a time integral of the intensity. The integral relation is crucial to correctly interpret non-single-exponential decay. We derive the proper normalization for both a discrete and a continuous distribution of rates, where every...... decay component is multiplied by its radiative decay rate. A central result of our paper is the derivation of the emission decay curve when both radiative and nonradiative decays are independently distributed. In this case, the well-known emission quantum efficiency can no longer be expressed...
Improved custom statistics visualization for CA Performance Center data
The main goal of my project is to understand and experiment the possibilities that CA Performance Center (CA PC) offers for creating custom applications to display stored information through interesting visual means, such as maps. In particular, I have re-written some of the network statistics web pages in order to fetch data from new statistics modules in CA PC, which has its own API, and stop using the RRD data.
Statistical and Methodological Considerations for the Interpretation of Intranasal Oxytocin Studies.
Over the last decade, oxytocin (OT) has received focus in numerous studies associating intranasal administration of this peptide with various aspects of human social behavior. These studies in humans are inspired by animal research, especially in rodents, showing that central manipulations of the OT system affect behavioral phenotypes related to social cognition, including parental behavior, social bonding, and individual recognition. Taken together, these studies in humans appear to provide compelling, but sometimes bewildering, evidence for the role of OT in influencing a vast array of complex social cognitive processes in humans. In this article, we investigate to what extent the human intranasal OT literature lends support to the hypothesis that intranasal OT consistently influences a wide spectrum of social behavior in humans. We do this by considering statistical features of studies within this field, including factors like statistical power, prestudy odds, and bias. Our conclusion is that intranasal OT studies are generally underpowered and that there is a high probability that most of the published intranasal OT findings do not represent true effects. Thus, the remarkable reports that intranasal OT influences a large number of human social behaviors should be viewed with healthy skepticism, and we make recommendations to improve the reliability of human OT studies in the future. Copyright © 2016 Society of Biological Psychiatry. Published by Elsevier Inc. All rights reserved.
Statistical methods of combining information: Applications to sensor data fusion
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
LSD Dimensions: Use and Reuse of Linked Statistical Data
RDF Data Cube (QB) has boosted the publication of Linked Statistical Data (LSD) on the Web, making them linkable to other related datasets and concepts following the Linked Data paradigm. In this demo we present LSD Dimensions, a web based application that monitors the usage of dimensions and codes
Big Data as a Source for Official Statistics
2015-06-01
Full Text Available More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.
Testing the statistical compatibility of independent data sets
We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ 2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed
Statistical summaries of selected Iowa streamflow data through September 2013
Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.
Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays
The forensic fingerprint community has faced increasing amounts of criticism by scientific and legal commentators, challenging the validity and reliability of fingerprint evidence due to the lack of an empirically demonstrable basis to evaluate and report the strength of the evidence in a given case. This paper presents a method, developed as a stand-alone software application, FRStat, which provides a statistical assessment of the strength of fingerprint evidence. The performance was evaluated using a variety of mated and non-mated datasets. The results show strong performance characteristics, often with values supporting specificity rates greater than 99%. This method provides fingerprint experts the capability to demonstrate the validity and reliability of fingerprint evidence in a given case and report the findings in a more transparent and standardized fashion with clearly defined criteria for conclusions and known error rate information thereby responding to concerns raised by the scientific and legal communities. Published by Elsevier B.V.
A Comprehensive Statistically-Based Method to Interpret Real-Time Flowing Measurements
2007-01-15
With the recent development of temperature measurement systems, continuous temperature profiles can be obtained with high precision. Small temperature changes can be detected by modern temperature measuring instruments such as fiber optic distributed temperature sensor (DTS) in intelligent completions and will potentially aid the diagnosis of downhole flow conditions. In vertical wells, since elevational geothermal changes make the wellbore temperature sensitive to the amount and the type of fluids produced, temperature logs can be used successfully to diagnose the downhole flow conditions. However, geothermal temperature changes along the wellbore being small for horizontal wells, interpretations of a temperature log become difficult. The primary temperature differences for each phase (oil, water, and gas) are caused by frictional effects. Therefore, in developing a thermal model for horizontal wellbore, subtle temperature changes must be accounted for. In this project, we have rigorously derived governing equations for a producing horizontal wellbore and developed a prediction model of the temperature and pressure by coupling the wellbore and reservoir equations. Also, we applied Ramey's model (1962) to the build section and used an energy balance to infer the temperature profile at the junction. The multilateral wellbore temperature model was applied to a wide range of cases at varying fluid thermal properties, absolute values of temperature and pressure, geothermal gradients, flow rates from each lateral, and the trajectories of each build section. With the prediction models developed, we present inversion studies of synthetic and field examples. These results are essential to identify water or gas entry, to guide flow control devices in intelligent completions, and to decide if reservoir stimulation is needed in particular horizontal sections. This study will complete and validate these inversion studies.
This paper addresses the problems and promises of micro-indentation testing of thin solid films. It has discussed basic penetration hardness testing philosophy, the peculiarities of low load-shallow penetration tests of uncoated metals, and it has compared coated with uncoated behavior so that some of the unique responses of coatings can be distinguished from typical hardness versus load behavior. As the uses of thin solid coatings with technological interest continue to proliferate, microindentation testing methodology will increasingly be challenged to provide useful tools for their characterization. The understanding of microindentation response must go hand-in-hand with machine design so that the capability of measurement precision does not outstrip our abilities to interpret test results in a meaningful way.
Using demographic data to better interpret pitfall trap catches
2011-05-01
Full Text Available The results of pitfall trapping are often interpreted as abundance in a particular habitat. At the same time, there are numerous cases of almost unrealistically high catches of ground beetles in seemingly unsuitable sites. The correlation of catches by pitfall trapping with the true distribution and abundance of Carabidae needs corroboration. During a full year survey in 2006/07 in the Lake Elton region (Volgograd Area, Russia, 175 species of ground beetles were trapped. Considering the differences in demographic structure of the local populations, and not their abundances, three groups of species were recognized: residents, migrants and sporadic. In residents, the demographic structure of local populations is complete, and their habitats can be considered “residential”. In migrants and sporadic species, the demographic structure of the local populations is incomplete, and their habitats can be considered “transit”. Residents interact both with their prey and with each other in a particular habitat. Sporadic species are hardly important to a carabid community because of their low abundances. The contribution of migrants to the structure of carabid communities is not apparent and requires additional research. Migrants and sporadic species represent a “labile” component in ground beetles communities, as opposed to a “stable” component, represented by residents. The variability of the labile component substantially limits our interpretation of species diversity in carabid communities. Thus, the criteria for determining the most abundant, or dominant species inevitably vary because the abundance of migrants in some cases can be one order of magnitude higher than that of residents. The results of pitfall trapping adequately reflect the state of carabid communities only in zonal habitats, while azonal and disturbed habitats are merely transit ones for many species of ground beetles. A study of the demographic structure of local
Using demographic data to better interpret pitfall trap catches.
The results of pitfall trapping are often interpreted as abundance in a particular habitat. At the same time, there are numerous cases of almost unrealistically high catches of ground beetles in seemingly unsuitable sites. The correlation of catches by pitfall trapping with the true distribution and abundance of Carabidae needs corroboration. During a full year survey in 2006/07 in the Lake Elton region (Volgograd Area, Russia), 175 species of ground beetles were trapped. Considering the differences in demographic structure of the local populations, and not their abundances, three groups of species were recognized: residents, migrants and sporadic. In residents, the demographic structure of local populations is complete, and their habitats can be considered "residential". In migrants and sporadic species, the demographic structure of the local populations is incomplete, and their habitats can be considered "transit". Residents interact both with their prey and with each other in a particular habitat. Sporadic species are hardly important to a carabid community because of their low abundances. The contribution of migrants to the structure of carabid communities is not apparent and requires additional research. Migrants and sporadic species represent a "labile" component in ground beetles communities, as opposed to a "stable" component, represented by residents. The variability of the labile component substantially limits our interpretation of species diversity in carabid communities. Thus, the criteria for determining the most abundant, or dominant species inevitably vary because the abundance of migrants in some cases can be one order of magnitude higher than that of residents. The results of pitfall trapping adequately reflect the state of carabid communities only in zonal habitats, while azonal and disturbed habitats are merely transit ones for many species of ground beetles. A study of the demographic structure of local populations and assessment of the
Full Text Available Using as much administrative data as possible is a general trend among most national statistical institutes. Different kinds of administrative sources, from tax authorities or other administrative bodies, are very helpful material in the production of business statistics. However, these sources often have to be completed by information collected through statistical surveys. This article describes the way Insee has implemented such a strategy in order to produce French structural business statistics. The originality of the French procedure is that administrative and survey variables are used jointly for the same enterprises, unlike the majority of multisource systems, in which the two kinds of sources generally complement each other for different categories of units. The idea is to use, as much as possible, the richness of the administrative sources combined with the timeliness of a survey, even if the latter is conducted only on a sample of enterprises. One main issue is the classification of enterprises within the NACE nomenclature, which is a cornerstone variable in producing the breakdown of the results by industry. At a given date, two values of the corresponding code may coexist: the value of the register, not necessarily up to date, and the value resulting from the data collected via the survey, but only from a sample of enterprises. Using all this information together requires the implementation of specific statistical estimators combining some properties of the difference estimators with calibration techniques. This article presents these estimators, as well as their statistical properties, and compares them with those of other methods.
1981-01-01
Epidemiologic studies to evaluate the occupational risks associated with employment in the nuclear industry are currently being conducted by the Department of Energy. Data that have potential value in evaluating any long-term health effects of occupational exposure to low levels of radiation are obtained for each individual at a given facility. We propose a general data structure for statistical analysis that is used to define transformations from the data management system into the data analysis system. Statistical methods of interest in epidemiologic studies include contingency table analysis and survival analysis procedures that can be used to evaluate potential associations between occupational radiation exposure and mortality. The purposes of this paper are to discuss (1) the adequacy of this data structure for single- and multiple-facility analysis and (2) the statistical computing problems encountered in dealing with large populations over extended periods of time
Estimation of global network statistics from incomplete data.
Full Text Available Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Statistical Data Processing with R – Metadata Driven Approach
2016-06-01
Full Text Available In recent years the Statistical Office of the Republic of Slovenia has put a lot of effort into re-designing its statistical process. We replaced the classical stove-pipe oriented production system with general software solutions, based on the metadata driven approach. This means that one general program code, which is parametrized with process metadata, is used for data processing for a particular survey. Currently, the general program code is entirely based on SAS macros, but in the future we would like to explore how successfully statistical software R can be used for this approach. Paper describes the metadata driven principle for data validation, generic software solution and main issues connected with the use of statistical software R for this approach.
Heuristics of the algorithm: Big Data, user interpretation and institutional translation
2015-10-01
Full Text Available Intelligence on mass media audiences was founded on representative statistical samples, analysed by statisticians at the market departments of media corporations. The techniques for aggregating user data in the age of pervasive and ubiquitous personal media (e.g. laptops, smartphones, credit cards/swipe cards and radio-frequency identification build on large aggregates of information (Big Data analysed by algorithms that transform data into commodities. While the former technologies were built on socio-economic variables such as age, gender, ethnicity, education, media preferences (i.e. categories recognisable to media users and industry representatives alike, Big Data technologies register consumer choice, geographical position, web movement, and behavioural information in technologically complex ways that for most lay people are too abstract to appreciate the full consequences of. The data mined for pattern recognition privileges relational rather than demographic qualities. We argue that the agency of interpretation at the bottom of market decisions within media companies nevertheless introduces a ‘heuristics of the algorithm’, where the data inevitably becomes translated into social categories. In the paper we argue that although the promise of algorithmically generated data is often implemented in automated systems where human agency gets increasingly distanced from the data collected (it is our technological gadgets that are being surveyed, rather than us as social beings, one can observe a felt need among media users and among industry actors to ‘translate back’ the algorithmically produced relational statistics into ‘traditional’ social parameters. The tenacious social structures within the advertising industries work against the techno-economically driven tendencies within the Big Data economy.
Statistical distributions as applied to environmental surveillance data
Application of normal, log normal, and Weibull distributions to environmental surveillance data was investigated for approximately 300 nuclide-medium-year-location combinations. Corresponding W test calculations were made to determine the probability of a particular data set falling within the distribution of interest. Conclusions are drawn as to the fit of any data group to the various distributions. The significance of fitting statistical distributions to the data is discussed
The application of bayesian statistic in data fit processing
International Nuclear Information System (INIS)
The rationality and disadvantage of least squares fitting that is usually used in data processing is analyzed, and the theory and commonly method that Bayesian statistic is applied in data processing is shown in detail. As it is proved in analysis, Bayesian approach avoid the limitative hypothesis that least squares fitting has in data processing, and the result has traits that it is more scientific and more easily understood, may replace the least squares fitting to apply in data processing. (authors)
Statistics and data analysis for financial engineering with R examples
The new edition of this influential textbook, geared towards graduate or advanced undergraduate students, teaches the statistics necessary for financial engineering. In doing so, it illustrates concepts using financial markets and economic data, R Labs with real-data exercises, and graphical and analytic methods for modeling and diagnosing modeling errors. Financial engineers now have access to enormous quantities of data. To make use of these data, the powerful methods in this book, particularly about volatility and risks, are essential. Strengths of this fully-revised edition include major additions to the R code and the advanced topics covered. Individual chapters cover, among other topics, multivariate distributions, copulas, Bayesian computations, risk management, multivariate volatility and cointegration. Suggested prerequisites are basic knowledge of statistics and probability, matrices and linear algebra, and calculus. There is an appendix on probability, statistics and linear algebra. Practicing fina...
Variation in benthic long-term data of transitional waters: Is interpretation more than speculation?
Full Text Available Biological long-term data series in marine habitats are often used to identify anthropogenic impacts on the environment or climate induced regime shifts. However, particularly in transitional waters, environmental properties like water mass dynamics, salinity variability and the occurrence of oxygen minima not necessarily caused by either human activities or climate change can attenuate or mask apparent signals. At first glance it very often seems impossible to interpret the strong fluctuations of e.g. abundances or species richness, since abiotic variables like salinity and oxygen content vary simultaneously as well as in apparently erratic ways. The long-term development of major macrozoobenthic parameters (abundance, biomass, species numbers and derivative macrozoobenthic indices (Shannon diversity, Margalef, Pilou's evenness and Hurlbert has been successfully interpreted and related to the long-term fluctuations of salinity and oxygen, incorporation of the North Atlantic Oscillation index (NAO index, relying on the statistical analysis of modelled and measured data during 35 years of observation at three stations in the south-western Baltic Sea. Our results suggest that even at a restricted spatial scale the benthic system does not appear to be tightly controlled by any single environmental driver and highlight the complexity of spatially varying temporal response.
Data bases and environmental research Concern for the environment has grown rapidly in the last decade(s). Many potential threats to our surroundings have been recognized. The precise effects of these threats are often not known, and much research is needed. Considerable effort is put into the
Qualitative Data Collection and Interpretation: A Turkish Social Studies Lesson
2016-05-01
Full Text Available The classroom with its teaching-learning dynamics creates a kind of “embryonic society” in which the micro-policies of collective social knowledge construction and meaning can be re-constructed; therefore, it can be considered as a kind of “mirror” of political culture. Thus, comparative lesson research, which requires indepth classroom observation, has been getting much attention among educational community. On the other hand, there have not been done many studies that represent social studies and civics in particular, in this research tradition. Naturally, this research tradition is based on qualitative research paradigm. Likewise, qualitative research tradition has been getting increasing attention among educational community. Thus, the first purpose of this article is to explain all documentation and pre-interpretation process of this lesson so that it can provide an example for qualitative researchers. The second purpose of this article is to provide an example lesson of political education from Turkey so that educators worldwide can compare one example of social studies education practice in Turkey and with their countries.
Imputing historical statistics, soils information, and other land-use data to crop area
In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Data Warehousing: How To Make Your Statistics Meaningful.
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
Statistical mechanics of learning: A variational approach for real data
Using a variational technique, we generalize the statistical physics approach of learning from random examples to make it applicable to real data. We demonstrate the validity and relevance of our method by computing approximate estimators for generalization errors that are based on training data alone
The role of metrology in the interpretation of analytical data
Disposition: Science of measurement Definition of the measurand Specification of the measurand Measurement uncertainty Result of measurement a. Correction for bias b. Verification of uncertainty Type A: Sources of variability – uncertainty components Type B: All other sources of uncertainty Treat...... Treatment of data c. Comparison between observed and expected variability d. Distributions, i.e. Poisson, Gaussian, log-normal Examples from the literature e. JRNC 183 f. Hair data from 151 g. Irina data from 187...
[Do we always correctly interpret the results of statistical nonparametric tests].
Mann-Whitney, Wilcoxon, Kruskal-Wallis and Friedman tests create a group of commonly used tests to analyze the results of clinical and laboratory data. These tests are considered to be extremely flexible and their asymptotic relative efficiency exceeds 95 percent. Compared with the corresponding parametric tests they do not require checking the fulfillment of the conditions such as the normality of data distribution, homogeneity of variance, the lack of correlation means and standard deviations, etc. They can be used both in the interval and or-dinal scales. The article presents an example Mann-Whitney test, that does not in any case the choice of these four nonparametric tests treated as a kind of gold standard leads to correct inference.
Data management and statistical analysis for environmental assessment
Data management and statistical analysis for environmental assessment are important issues on the interface of computer science and statistics. Data collection for environmental decision making can generate large quantities of various types of data. A database/GIS system developed is described which provides efficient data storage as well as visualization tools which may be integrated into the data analysis process. FIMAD is a living database and GIS system. The system has changed and developed over time to meet the needs of the Los Alamos National Laboratory Restoration Program. The system provides a repository for data which may be accessed by different individuals for different purposes. The database structure is driven by the large amount and varied types of data required for environmental assessment. The integration of the database with the GIS system provides the foundation for powerful visualization and analysis capabilities
Improved Power Quality Monitoring through Phasor Measurement Unit Data Interpretation
and wind power production on the voltage unbalance was analyzed. PMU data and NTP-synchronized data from two different MV networks were used. It has been found that PV production has only a minor negative impact on the voltage unbalance whereas the wind power production has a great positive impact...
Effects of galvanic distortions on magnetotelluric data: Interpretation ...
But in the case of field data the problem is ... The distorted data set is corrected using the MT response for DRS model and further ... ments, the apparent resistivity and phase at differ- ... from the telluric field, which is of galvanic or inductive ...
HUMAN MILK BIOMONITORING DATA: INTERPRETATION AND RISK ASSESSMENT ISSUES
Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology
Directory of Open Access Journals (Sweden)
Full Text Available In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of statistical significance (Wicherts, Bakker, & Molenaar, 2011. We therefore hypothesized that journal policies about data sharing and data sharing itself would reduce these inconsistencies. In Study 1, we compared the prevalence of reporting inconsistencies in two similar journals on decision making with different data sharing policies. In Study 2, we compared reporting inconsistencies in psychology articles published in PLOS journals (with a data sharing policy and Frontiers in Psychology (without a stipulated data sharing policy. In Study 3, we looked at papers published in the journal Psychological Science to check whether papers with or without an Open Practice Badge differed in the prevalence of reporting errors. Overall, we found no relationship between data sharing and reporting inconsistencies. We did find that journal policies on data sharing seem extremely effective in promoting data sharing. We argue that open data is essential in improving the quality of psychological science, and we discuss ways to detect and reduce reporting inconsistencies in the literature.
A Review of Statistical Techniques for 2x2 and RxC Categorical Data Tables In SPSS
Full Text Available In this study, a review of statistical techniques for RxC categorical data tables is explained in detail. The emphasis is given to the association of techniques and their corresponding data considerations. Some suggestions to how to handle specific categorical data tables in SPSS and common mistakes in the interpretation of the SPSS outputs are shown.
Artificial neural systems for interpretation and inversion of seismic data
Calderon-Macias, Carlos
The goal of this work is to investigate the feasibility of using neural network (NN) models for solving geophysical exploration problems. First, a feedforward neural network (FNN) is used to solve inverse problems. The operational characteristics of a FNN are primarily controlled by a set of weights and a nonlinear function that performs a mapping between two sets of data. In a process known as training, the FNN weights are iteratively adjusted to perform the mapping. After training, the computed weights encode important features of the data that enable one pattern to be distinguished from another. Synthetic data computed from an ensemble of earth models and the corresponding models provide the training data. Two training methods are studied: the backpropagation method which is a gradient scheme, and a global optimization method called very fast simulated annealing (VFSA). A trained network is then used to predict models from new data (e.g., data from a new location) in a one-step procedure. The application of this method to the problems of obtaining formation resistivities and layer thicknesses from resistivity sounding data and 1D velocity models from seismic data shows that trained FNNs produce reasonably accurate earth models when observed data are input to the FNNs. In a second application, a FNN is used for automating the NMO correction process of seismic reflection data. The task of the FNN is to map CMP data at control locations along a seismic line into subsurface velocities. The network is trained while the velocity analyses are performed at the control locations. Once trained, the computed weights are used as an operator that acts on the remaining CMP data as a velocity interpolator, resulting in a fast method for NMO correction. The second part of this dissertation describes the application of a Hopfield neural network (HNN) to the problems of deconvolution and multiple attenuation. In these applications, the unknown parameters (reflection coefficients
Full Text Available One of the most commonly observational study designs employed in veterinary is the cross-sectional study with binary outcomes. To measure an association with exposure, the use of prevalence ratios (PR or odds ratios (OR are possible. In human epidemiology, much has been discussed about the use of the OR exclusively for case–control studies and some authors reported that there is no good justification for fitting logistic regression when the prevalence of the disease is high, in which OR overestimate the PR. Nonetheless, interpretation of OR is difficult since confusing between risk and odds can lead to incorrect quantitative interpretation of data such as “the risk is X times greater,” commonly reported in studies that use OR. The aims of this study were (1 to review articles with cross-sectional designs to assess the statistical method used and the appropriateness of the interpretation of the estimated measure of association and (2 to illustrate the use of alternative statistical methods that estimate PR directly. An overview of statistical methods and its interpretation using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA guidelines was conducted and included a diverse set of peer-reviewed journals among the veterinary science field using PubMed as the search engine. From each article, the statistical method used and the appropriateness of the interpretation of the estimated measure of association were registered. Additionally, four alternative models for logistic regression that estimate directly PR were tested using our own dataset from a cross-sectional study on bovine viral diarrhea virus. The initial search strategy found 62 articles, in which 6 articles were excluded and therefore 56 studies were used for the overall analysis. The review showed that independent of the level of prevalence reported, 96% of articles employed logistic regression, thus estimating the OR. Results of the multivariate models
Statistical and Visualization Data Mining Tools for Foundry Production
Full Text Available In recent years a rapid development of a new, interdisciplinary knowledge area, called data mining, is observed. Its main task is extracting useful information from previously collected large amount of data. The main possibilities and potential applications of data mining in manufacturing industry are characterized. The main types of data mining techniques are briefly discussed, including statistical, artificial intelligence, data base and visualization tools. The statistical methods and visualization methods are presented in more detail, showing their general possibilities, advantages as well as characteristic examples of applications in foundry production. Results of the author’s research are presented, aimed at validation of selected statistical tools which can be easily and effectively used in manufacturing industry. A performance analysis of ANOVA and contingency tables based methods, dedicated for determination of the most significant process parameters as well as for detection of possible interactions among them, has been made. Several numerical tests have been performed using simulated data sets, with assumed hidden relationships as well some real data, related to the strength of ductile cast iron, collected in a foundry. It is concluded that the statistical methods offer relatively easy and fairly reliable tools for extraction of that type of knowledge about foundry manufacturing processes. However, further research is needed, aimed at explanation of some imperfections of the investigated tools as well assessment of their validity for more complex tasks.
Given a multiplicity distribution belonging to the class of probability distributions which are superpositions of Poisson distributions whose two components are independently (binomially) distributed, we derive joint and conditional probabilities for the two components. Specializing to the negative binomial case, we can explain the linearity and magnitude of slope and intercept of the forward-backward correlation in a way compatible with the KNO plot for the multiplicity data provided that the final particles are produced in clusters. Generalization to allow for coherent emission allows one to put limits on the amount of coherence, a result not known from high precision fits to multiplicity. (orig.)
Interpretation of high resolution aeromagnetic data over southern ...
analyzed in order to estimate the depth of magnetic sources and to map the
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
Data analysis using the Gnu R system for statistical computation
R is a language system for statistical computation. It is widely used in statistics, bioinformatics, machine learning, data mining, quantitative finance, and the analysis of clinical drug trials. Among the advantages of R are: it has become the standard language for developing statistical techniques, it is being actively developed by a large and growing global user community, it is open source software, it is highly portable (Linux, OS-X and Windows), it has a built-in documentation system, it produces high quality graphics and it is easily extensible with over four thousand extension library packages available covering statistics and applications. This report gives a very brief introduction to R with some examples using lattice QCD simulation results. It then discusses the development of R packages designed for chi-square minimization fits for lattice n-pt correlation functions.
Statistical methods for longitudinal data with agricultural applications
The PhD study focuses on modeling two kings of longitudinal data arising in agricultural applications: continuous time series data and discrete longitudinal data. Firstly, two statistical methods, neural networks and generalized additive models, are applied to predict masistis using multivariate algorithm. This was found to compare favourably with the algorithm implemented in the well-known Beagle software. Finally, an R package to apply APFA models developed as part of the PhD project is described
Reducing bias in the analysis of counting statistics data
In the analysis of counting statistics data it is common practice to estimate the variance of the measured data points as the data points themselves. This practice introduces a bias into the results of further analysis which may be significant, and under certain circumstances lead to false conclusions. In the case of normal weighted least squares fitting this bias is quantified and methods to avoid it are proposed. (orig.)
QB2OLAP : enabling OLAP on statistical linked open data
Varga, Jovan; Etcheverry, Lorena; Vaisman, Alejandro; Romero Moral, Óscar; Bach Pedersen, Torben; Thomsen, Christian
2016-01-01
Publication and sharing of multidimensional (MD) data on the Semantic Web (SW) opens new opportunities for the use of On-Line Analytical Processing (OLAP). The RDF Data Cube (QB) vocabulary, the current standard for statistical data publishing, however, lacks key MD concepts such as dimension hierarchies and aggregate functions. QB4OLAP was proposed to remedy this. However, QB4OLAP requires extensive manual annotation and users must still write queries in SPARQL, the standard query language f
Some statistical properties of gene expression clustering for array data
DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https
This article presents a reflection on an aspect of research methodology, particularly on the interpretation strategy of data from a Science and Indigenous Knowledge Systems Project (SIKSP) in a South African university. The data interpretation problem arose while we were analysing the effects of a series of SIKSP-based workshops on the views of a…
Interpretation of neutron activation analysis data of ancient silver
Results from work on Sasanian silver and objects from related periods and geographic provenances are used to demonstrate that analytical data in combination with other properties can be used with reasonable success in establishing groups of objects of common geographic provenances, in providing information on the production, use, and distribution of silver metal, and on ancient metal working techniques
interpretation of reflection seismic data from the usangu basin, east
basin, where the ages of the sedimentary sequences have been established on the basis of drill hole data. The Karoo beds, deposited on an undulating weathered basement surface, are relatively thin (- 200m). The Red Sandstone Group reach a maximum thickness of up to 420 m while the Lake Beds are up to 289 m thick.
Interpreting behavioural data from Radio-Acoustic Positioning ...
To detect behavioural patterns of individually tagged squid Loligo vulgaris reynaudii in a Radio-Acoustic Positioning Telemetry (RAPT) buoy array, trajectories reflecting the four dimensions of latitude, longitude, depth and time were plotted from data collected during field experiments in South Africa. Finding a continuous ...
Analyzing sickness absence with statistical models for survival data
OBJECTIVES: Sickness absence is the outcome in many epidemiologic studies and is often based on summary measures such as the number of sickness absences per year. In this study the use of modern statistical methods was examined by making better use of the available information. Since sickness...... absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...
A statistical study on fracture toughness data of Japanese RPVS
In a cooperative study for investigating fracture toughness on pressure vessel steels produced in Japan, a number of heats of ASTM A533B cl.1 and A508 cl.3 steels have been studied. Approximately 3000 fracture toughness data and 8000 mechanical properties data were obtained and filed in a computer data bank. Statistical characterization of toughness data in the transition region has been carried out using the computer data bank. Curve fitting technique for toughness data has been examined. Approach using the function to model the transition behaviours of each toughness has been applied. The aims of fitting curve technique were as follows; (1) Summarization of an enormous toughness data base to permit comparison heats, materials and testing methods; (2) Investigating the relationships among static, dynamic and arrest toughness; (3) Examining the ASME K(IR) curve statistically. The methodology used in this study for analyzing a large quantity of fracture toughness data was found to be useful for formulating a statistically based K(IR) curve. (orig./HP)
Statistical methods to evaluate thermoluminescence ionizing radiation dosimetry data
Ionizing radiation levels, evaluated through the exposure of CaF 2 :Dy thermoluminescence dosimeters (TLD- 200), have been monitored at Centro Experimental Aramar (CEA), located at Ipero in Sao Paulo state, Brazil, since 1991 resulting in a large amount of measurements until 2009 (more than 2,000). The data amount associated with measurements dispersion, since every process has deviation, reinforces the utilization of statistical tools to evaluate the results, procedure also imposed by the Brazilian Standard CNEN-NN-3.01/PR- 3.01-008 which regulates the radiometric environmental monitoring. Thermoluminescence ionizing radiation dosimetry data are statistically compared in order to evaluate potential CEA's activities environmental impact. The statistical tools discussed in this work are box plots, control charts and analysis of variance. (author)
Statistical data for the tensile properties of natural fibre composites
Full Text Available This article features a large statistical database on the tensile properties of natural fibre reinforced composite laminates. The data presented here corresponds to a comprehensive experimental testing program of several composite systems including: different material constituents (epoxy and vinyl ester resins; flax, jute and carbon fibres, different fibre configurations (short-fibre mats, unidirectional, and plain, twill and satin woven fabrics and different fibre orientations (0°, 90°, and [0,90] angle plies. For each material, ~50 specimens were tested under uniaxial tensile loading. Here, we provide the complete set of stress–strain curves together with the statistical distributions of their calculated elastic modulus, strength and failure strain. The data is also provided as support material for the research article: “The mechanical properties of natural fibre composite laminates: A statistical study” [1].
Quantitative interpretation of great lakes remote sensing data
Remote sensing has been applied in the past to the surveillance of Great Lakes water quality, but it has been only partially successful because of the completely empirical approach taken in relating the multispectral scanning data at visible and near-infrared wavelengths to water parameters. Any remote sensing approach using water color information must take into account (1) the existence of many different organic and inorganic species throughtout the Greak Lakes, (2) the occurrence of a mixture of species in most locations, and (3) spatial (inter- and interlake as well as vertical) variations in types and concentrations of species. The radiative transfer model provides a potential method for an orderly analysis of remote sensing data and a physical basis for developing quantitative algorithms. Predictions and field measurements of volume reflectances are presented which clearly show the advantage of using a radiative transfer model. Spectral absorptance and backscattering coefficients for two inorganic sediments are reported
ARTEFACT MOBILE DATA MODEL TO SUPPORT CULTURAL HERITAGE DATA COLLECTION AND INTERPRETATION
Full Text Available This paper discusses the limitation of existing data structures in mobile mapping applications to support archaeologists to manage the artefact (any object made or modified by a human culture, and later recovered by an archaeological endeavor details excavated at a cultural heritage site. Current limitations of data structure in the mobile mapping application allow archeologist to record only one artefact per test pit location. In reality, more than one artefact can be excavated from the same test pit location. A spatial data model called Artefact Mobile Data Model (AMDM was developed applying existing Relational Data Base Management System (RDBMS technique to overcome the limitation. The data model was implemented in a mobile database environment called SprintDB Pro which was in turn connected to ArcPad 7.1 mobile mapping application through Open Data Base Connectivity (ODBC. In addition, the design of a user friendly application built on top of AMDM to interpret and record the technology associated with each artefact excavated in the field is also discussed in the paper. In summary, the paper discusses the design and implementation of a data model to facilitate the collection of artefacts in the field using integrated mobile mapping and database approach.
Interpretation of fracture system geometry using well test data
This report presents three methods of determining fracture geometry and interconnection from well test information. Method 1 uses evidence for boundary effects in the well test to determine the distance to and type of fracture boundary. Method 2 uses the spatial dimension of the well test to infer the geometry of the fracture-conduit system. Method 3 obtains information of the spacing and transmissivity distribution of individual conductive fractures from fixed-interval-length (FIL) well tests. The three methods are applied to data from the Site Characterization and Validation (SCV) at the 360 m level of the Stripa Mine. The focus of the technology development is the constant-pressure welltest, although the general approaches apply to constant-rate well test, and to a much lesser extent slug or pulse test, which are relatively insensitive to boundaries and spatial dimension. Application of the techniques to the N and W holes in the SCV area shows that there is little evidence for boundary effects in the well test results. There is, on the other hand, considerable variation in the spatial dimension of the well test data ranging from sub-linear (fractures which decrease in conductivity with distance from the hole) to spherical, for three-dimensional fracture systems. The absence of boundary effects suggest that the rock mass in the SCV area contains a well connected fracture system. Major uncertainties in the analysis of well test data limit the use of single borehole measurements. Without assuming the value of specific storage, one can reliably determine only the spatial dimension, and, for two dimensional flow only, the transmissivity. Among the uncertainties are the effective well radius, the degree to which the fracture conduits fill the n-dimensional space in which flow occurs, and the cross-sectional area of the conduits at the wellbore. This report presents a complete development of constant-pressure well test methods for cylindrical flow and flow of arbitrary
Interpret with caution: multicollinearity in multiple regression of cognitive data.
Shibihara and Kondo in 2002 reported a reanalysis of the 1997 Kanji picture-naming data of Yamazaki, Ellis, Morrison, and Lambon-Ralph in which independent variables were highly correlated. Their addition of the variable visual familiarity altered the previously reported pattern of results, indicating that visual familiarity, but not age of acquisition, was important in predicting Kanji naming speed. The present paper argues that caution should be taken when drawing conclusions from multiple regression analyses in which the independent variables are so highly correlated, as such multicollinearity can lead to unreliable output.
Statistical analysis and interpolation of compositional data in materials science.
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Multivariate statistical analysis of major and trace element data for ...
Multivariate statistical analysis of major and trace element data for niobium exploration in the peralkaline granites of the anorogenic ring-complex province of Nigeria. PO Ogunleye, EC Ike, I Garba. Abstract. No Abstract Available Journal of Mining and Geology Vol.40(2) 2004: 107-117. Full Text: EMAIL FULL TEXT EMAIL ...
Exploring Foundation Concepts in Introductory Statistics Using Dynamic Data Points
This paper analyses introductory statistics students' verbal and gestural expressions as they interacted with a dynamic sketch (DS) designed using "Sketchpad" software. The DS involved numeric data points built on the number line whose values changed as the points were dragged along the number line. The study is framed on aggregate…
Quick Access: Find Statistical Data on the Internet.
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
Data on education: from population statistics to epidemiological research
BACKGROUND: Level of education is in many fields of research used as an indicator of social status. METHODS: Using Statistics Denmark's register for education and employment of the population, we examined highest completed education with a birth-cohort perspective focusing on people born between of population trends by use of extrapolated values, solutions are less obvious in epidemiological research using individual level data.
The Use of Advanced Transportation Monitoring Data for Official Statistics
Traffic and transportation statistics are mainly published as aggregated information, and are traditionally based on surveys or secondary data sources, like public registers and companies' administrations. Nowadays, advanced monitoring systems are installed in the road network, offering
Applications of spatial statistical network models to stream data
Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for...
Statistical Physics in the Era of Big Data
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical
WPS criterion proposition based on experimental data base interpretation
This article gives the background and the methodology developed to define a K J based criterion for brittle fracture of Reactor Pressure Vessel (RPV) submitted to Pressurized Thermal Shock (PTS), and taking into account Warm Pre Stressing effect (WPS). The first step of this methodology is the constitution of an experimental data base. This work was performed through bibliography and partnerships, and allows merging experimental results dealing with: -) Various ferritic steels; -) Various material states (as received, thermally aged, irradiated...); -) Various mode of fracture (cleavage, inter-granular, mixed mode); -) Various specimen geometry and size (CT, SENB, mock-ups); -) Various thermo-mechanical transients. Based on this experimental data base, a simple K J based limit is proposed and compared to experimental results. Parametric studies are performed in order to define the main parameters of the problem. Finally, a simple proposition based on a detailed analysis of tests results is performed. This proposition giving satisfactory results in every cases, it constitutes a good candidate for integration in French RSE-M code for in service assessment. (authors)
A statistical test for outlier identification in data envelopment analysis
Full Text Available In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the presented method, each observation is deleted from the sample once and the resulting linear program is solved, leading to a distribution of efficiency estimates. Based on the achieved distribution, a pared test is designed to identify the potential outlier(s. We illustrate the method through a real data set. The method could be used in a first step, as an exploratory data analysis, before using any frontier estimation.
The Blackboard Model of Computer Programming Applied to the Interpretation of Passive Sonar Data
... (location, course, speed, classification, etc.). At present the potential volume of data produced by modern sonar systems is so large that unless some form of computer assistance is provided with the interpretation of this data, information...
Exploration for geothermal resources is often challenging because there are no geophysical techniques that provide direct images of the parameters of interest, such as porosity, permeability and fluid content. Magnetotelluric (MT) and seismic tomography methods yield information about subsurface distribution of resistivity and seismic velocity on similar scales and resolution. The lack of a fundamental law linking the two parameters, however, has limited joint interpretation to a qualitative analysis. By using a statistical approach in which the resistivity and velocity models are investigated in the joint parameter space, we are able to identify regions of high correlation and map these classes (or structures) back onto the spatial domain. This technique, applied to a seismic tomography-MT profile in the area of the Gross Schoenebeck geothermal site, allows us to identify a number of classes in accordance with the local geology. In particular, a high-velocity, low-resistivity class is interpreted as related to areas with thinner layers of evaporites; regions where these sedimentary layers are highly fractured may be of higher permeability. (author)
Preliminary interpretation of thermal data from the Nevada Test Site
Analysis of data from 60 wells in and around the Nevada Test Site, including 16 in the Yucca Mountain area, indicates a thermal regime characterized by large vertical and lateral gradients in heat flow. Estimates of heat flow indicate considerable variation on both regional and local scales. The variations are attributable primarily to hydrologic processes involving interbasin flow with a vertical component of (seepage) velocity (volume flux) of a few mm/yr. Apart from indicating a general downward movement of water at a few mm/yr, the reults from Yucca Mountain are as yet inconclusive. The purpose of the study was to determine the suitability of the area for proposed repository sites
Theoretical interpretation of experimental data from direct dark matter detection
I derive expressions that allow to reconstruct the normalized one-dimensional velocity distribution function of halo WIMPs and to determine its moments from the recoil energy spectrum as well as from experimental data directly. The reconstruction of the velocity distribution function is further extended to take into account the annual modulation of the event rate. All these expressions are independent of the as yet unknown WIMP density near the Earth as well as of the WIMP-nucleus cross section. The only information about the nature of halo WIMPs which one needs is the WIMP mass. I also present a method for the determination of the WIMP mass by combining two (or more) experiments with different detector materials. This method is not only independent of the model of Galactic halo but also of that of WIMPs. (orig.)
Feature-Based Statistical Analysis of Combustion Simulation Data
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-02
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. Published by Elsevier Ireland Ltd.
Statistical data on butane and kerosene in West Africa
This book gives statistical, technical and economical informations on butane and kerosene used in West Africa in 1990. In a first part, informations on gas and gas using are given: market, energy efficiency, performance, safety, distribution, storage, transport and commercialization. Statistical data on petroleum and natural gas production or consumption are also described. Natural gas and petroleum reserves in Africa are also studied. In the second part, thirty country entries give an economic analysis of each african country. 21 figs., 19 tabs., 5 maps
In a retrospective 18-month study the infusion therapy applied in a great anesthesia institute is examined. The data of the course of anesthesia recorded on magnetic tape by routine are analysed for this purpose bya computer with the statistical program SPSS. It could be proved that the behaviour of the several anesthetists is very different. Various correlations are discussed.
Interpretation of ponded infiltration data using numerical experiments
Full Text Available Ponded infiltration experiment is a simple test used for in-situ determination of soil hydraulic properties, particularly saturated hydraulic conductivity and sorptivity. It is known that infiltration process in natural soils is strongly affected by presence of macropores, soil layering, initial and experimental conditions etc. As a result, infiltration record encompasses a complex of mutually compensating effects that are difficult to separate from each other. Determination of sorptivity and saturated hydraulic conductivity from such infiltration data is complicated. In the present study we use numerical simulation to examine the impact of selected experimental conditions and soil profile properties on the ponded infiltration experiment results, specifically in terms of the hydraulic conductivity and sorptivity evaluation. The effect of following factors was considered: depth of ponding, ring insertion depth, initial soil water content, presence of preferential pathways, hydraulic conductivity anisotropy, soil layering, surface layer retention capacity and hydraulic conductivity, and presence of soil pipes or stones under the infiltration ring. Results were compared with a large database of infiltration curves measured at the experimental site Liz (Bohemian Forest, Czech Republic. Reasonably good agreement between simulated and observed infiltration curves was achieved by combining several of factors tested. Moreover, the ring insertion effect was recognized as one of the major causes of uncertainty in the determination of soil hydraulic parameters.
How to Measure and Interpret Quality Improvement Data.
McQuillan, Rory Francis; Silver, Samuel Adam; Harel, Ziv; Weizman, Adam; Thomas, Alison; Bell, Chaim; Chertow, Glenn M; Chan, Christopher T; Nesrallah, Gihad
2016-05-06
This article will demonstrate how to conduct a quality improvement project using the change idea generated in "How To Use Quality Improvement Tools in Clinical Practice: How To Diagnose Solutions to a Quality of Care Problem" by Dr. Ziv Harel and colleagues in this Moving Points feature. This change idea involves the introduction of a nurse educator into a CKD clinic with a goal of increasing rates of patients performing dialysis independently at home (home hemodialysis or peritoneal dialysis). Using this example, we will illustrate a Plan-Do-Study-Act (PDSA) cycle in action and highlight the principles of rapid cycle change methodology. We will then discuss the selection of outcome, process, and balancing measures, and the practicalities of collecting these data in the clinic environment. We will also introduce the PDSA worksheet as a practical way to oversee the progress of a quality improvement project. Finally, we will demonstrate how run charts are used to visually illustrate improvement in real time, and how this information can be used to validate achievement, respond appropriately to challenges the project may encounter, and prove the significance of results. This article aims to provide readers with a clear and practical framework upon which to trial their own ideas for quality improvement in the clinical setting. Copyright © 2016 by the American Society of Nephrology.
Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard
2017-11-01
Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.
Data and statistical methods for analysis of trends and patterns
This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data
STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis.
Dwork, Cynthia; Feldman, Vitaly; Hardt, Moritz; Pitassi, Toniann; Reingold, Omer; Roth, Aaron
2015-08-07
Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses. Copyright © 2015, American Association for the Advancement of Science.
Klose, C. D.; Giese, R.; Löw, S.; Borm, G.
Especially for deep underground excavations, the prediction of the locations of small- scale hazardous geotechnical structures is nearly impossible when exploration is re- stricted to surface based methods. Hence, for the AlpTransit base tunnels, exploration ahead has become an essential component of the excavation plan. The project de- scribed in this talk aims at improving the technology for the geological interpretation of reflection seismic data. The discovered geological-seismic relations will be used to develop an interpretation system based on artificial intelligence to predict hazardous geotechnical structures of the advancing tunnel face. This talk gives, at first, an overview about the data mining of geological and seismic properties of metamorphic rocks within the Penninic gneiss zone in Southern Switzer- land. The data results from measurements of a specific geophysical prediction system developed by the GFZ Potsdam, Germany, along the 2600 m long and 1400 m deep Faido access tunnel. The goal is to find those seismic features (i.e. compression and shear wave velocities, velocity ratios and velocity gradients) which show a significant relation to geological properties (i.e. fracturing and fabric features). The seismic properties were acquired from different tomograms, whereas the geolog- ical features derive from tunnel face maps. The features are statistically compared with the seismic rock properties taking into account the different methods used for the tunnel excavation (TBM and Drill/Blast). Fracturing and the mica content stay in a positive relation to the velocity values. Both, P- and S-wave velocities near the tunnel surface describe the petrology better, whereas in the interior of the rock mass they correlate to natural micro- and macro-scopic fractures surrounding tectonites, i.e. cataclasites. The latter lie outside of the excavation damage zone and the tunnel loos- ening zone. The shear wave velocities are better indicators for rock
Measuring the data universe data integration using statistical data and metadata exchange
Stahl, Reinhold
2018-01-01
This richly illustrated book provides an easy-to-read introduction to the challenges of organizing and integrating modern data worlds, explaining the contribution of public statistics and the ISO standard SDMX (Statistical Data and Metadata Exchange). As such, it is a must for data experts as well those aspiring to become one. Today, exponentially growing data worlds are increasingly determining our professional and private lives. The rapid increase in the amount of globally available data, fueled by search engines and social networks but also by new technical possibilities such as Big Data, offers great opportunities. But whatever the undertaking – driving the block chain revolution or making smart phones even smarter – success will be determined by how well it is possible to integrate, i.e. to collect, link and evaluate, the required data. One crucial factor in this is the introduction of a cross-domain order system in combination with a standardization of the data structure. Using everyday examples, th...
Research on the Construction of Remote Sensing Automatic Interpretation Symbol Big Data
Gao, Y.; Liu, R.; Liu, J.; Cheng, T.
2018-04-01
Remote sensing automatic interpretation symbol (RSAIS) is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013-2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era.
At the Nuclear Engineering and Analytics Inc. Rossendorf near Dresden (Germany) occupationally exposed persons are working with Uranium and Thorium. In accordance with German guides urine and faecal analysis is carried out. But for the interpretation the data in terms of dose or intake it is important to have knowledge about the portion of the activity measured caused by natural sources. For this reason 16 occupationally exposed persons who did not have any history of occupational exposure to Thorium or Uranium have been checked concerning the excretion data since 1994. The excretion data in mBq per day for all persons covers the following ranges: Faeces: U-234 1 to 310 mBq/d, U-235 0.2 to 3.7 mBq/d, U-238 1.3 to 72 mBq/d. Th-228 7 to 89 mBq/d, Th-230 0.7 to 19 mBq/d, Th-232 0.7 to 16 mBq/d. Urine: all values below the detection limits of about 1 mBq/l. The large variation results from differences between the individual excretion rates but also from the variation of the excretion rate of one person. For example, the U-234-faecal excretion of one person reaches from 77 to 310 mBq per day. In the paper the faecal excretion for some individuals in dependence on the time are given. These excretion date caused by natural sources are taken into account by interpreting faecal excretion data of occupationally exposed persons working with Uranium or Thorium. If the measured faecal excretion per day is within the range caused by natural sources no interpretation will be done. By exceeding these values additional faeces and urine samples will be collected and measured. In dependence on these additional results intake and dose will be assessed some times by using lung counter or whole body counter measuring results. In the paper some examples are described. (author)
Information systems for marine protected areas: How do users interpret desirable data attributes?
Carballo Cárdenas, E.C.; Mol, A.P.J.; Tobi, H.
2013-01-01
The purpose of this paper is to provide empirical evidence on how various user groups related to Marine Protected Areas (MPAs) interpret desirable data attributes, whether their interpretations differ and to what extent. Moreover, this study aims to make a methodological contribution to the
Interpretable decision-tree induction in a big data parallel framework
Full Text Available When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Using statistical correlation to compare geomagnetic data sets
Stanton, T.
2009-04-01
The major features of data curves are often matched, to a first order, by bump and wiggle matching to arrive at an offset between data sets. This poster describes a simple statistical correlation program that has proved useful during this stage by determining the optimal correlation between geomagnetic curves using a variety of fixed and floating windows. Its utility is suggested by the fact that it is simple to run, yet generates meaningful data comparisons, often when data noise precludes the obvious matching of curve features. Data sets can be scaled, smoothed, normalised and standardised, before all possible correlations are carried out between selected overlapping portions of each curve. Best-fit offset curves can then be displayed graphically. The program was used to cross-correlate directional and palaeointensity data from Holocene lake sediments (Stanton et al., submitted) and Holocene lava flows. Some example curve matches are shown, including some that illustrate the potential of this technique when examining particularly sparse data sets. Stanton, T., Snowball, I., Zillén, L. and Wastegård, S., submitted. Detecting potential errors in varve chronology and 14C ages using palaeosecular variation curves, lead pollution history and statistical correlation. Quaternary Geochronology.
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Statistics in experimental design, preprocessing, and analysis of proteomics data.
Jung, Klaus
2011-01-01
High-throughput experiments in proteomics, such as 2-dimensional gel electrophoresis (2-DE) and mass spectrometry (MS), yield usually high-dimensional data sets of expression values for hundreds or thousands of proteins which are, however, observed on only a relatively small number of biological samples. Statistical methods for the planning and analysis of experiments are important to avoid false conclusions and to receive tenable results. In this chapter, the most frequent experimental designs for proteomics experiments are illustrated. In particular, focus is put on studies for the detection of differentially regulated proteins. Furthermore, issues of sample size planning, statistical analysis of expression levels as well as methods for data preprocessing are covered.
Patterns of ureteral motion: Data compression and statistics
Images of ureteral peristaltics (ureteral kinetography) have been recorded at Tuebingen University Hospital since 1978. These images give a synoptical picture of ureteral motion in highly compressed form. Possibilities of data compression are discussed on the basis of functional path-time images, the ROI series, the in the path-time matrix, and the background subtraction. Particular attention is paid to problems of urethral activity statistics. (WU) [de
Statistical Approaches to Assess Biosimilarity from Analytical Data.
Burdick, Richard; Coffey, Todd; Gutka, Hiten; Gratzl, Gyöngyi; Conlon, Hugh D; Huang, Chi-Ting; Boyne, Michael; Kuehne, Henriette
2017-01-01
Protein therapeutics have unique critical quality attributes (CQAs) that define their purity, potency, and safety. The analytical methods used to assess CQAs must be able to distinguish clinically meaningful differences in comparator products, and the most important CQAs should be evaluated with the most statistical rigor. High-risk CQA measurements assess the most important attributes that directly impact the clinical mechanism of action or have known implications for safety, while the moderate- to low-risk characteristics may have a lower direct impact and thereby may have a broader range to establish similarity. Statistical equivalence testing is applied for high-risk CQA measurements to establish the degree of similarity (e.g., highly similar fingerprint, highly similar, or similar) of selected attributes. Notably, some high-risk CQAs (e.g., primary sequence or disulfide bonding) are qualitative (e.g., the same as the originator or not the same) and therefore not amenable to equivalence testing. For biosimilars, an important step is the acquisition of a sufficient number of unique originator drug product lots to measure the variability in the originator drug manufacturing process and provide sufficient statistical power for the analytical data comparisons. Together, these analytical evaluations, along with PK/PD and safety data (immunogenicity), provide the data necessary to determine if the totality of the evidence warrants a designation of biosimilarity and subsequent licensure for marketing in the USA. In this paper, a case study approach is used to provide examples of analytical similarity exercises and the appropriateness of statistical approaches for the example data.
Statistical Challenges of Big Data Analysis in Medicine
Roč. 3, č. 1 (2015), s. 24-27 ISSN 1805-8698 R&D Projects: GA ČR GA13-23940S Grant - others:CESNET Development Fund(CZ) 494/2013 Institutional support: RVO:67985807 Keywords : big data * variable selection * classification * cluster analysis Subject RIV: BB - Applied Statistics, Operational Research http://www.ijbh.org/ijbh2015-1.pdf
Maximum Likelihood, Consistency and Data Envelopment Analysis: A Statistical Foundation
Rajiv D. Banker
1993-01-01
This paper provides a formal statistical basis for the efficiency evaluation techniques of data envelopment analysis (DEA). DEA estimators of the best practice monotone increasing and concave production function are shown to be also maximum likelihood estimators if the deviation of actual output from the efficient output is regarded as a stochastic variable with a monotone decreasing probability density function. While the best practice frontier estimator is biased below the theoretical front...
Analysis of spectral data with rare events statistics
The case is considered of analyzing experimental data, when the results of individual experimental runs cannot be summed due to large systematic errors. A statistical analysis of the hypothesis about the persistent peaks in the spectra has been performed by means of the Neyman-Pearson test. The computations demonstrate the confidence level for the hypothesis about the presence of a persistent peak in the spectrum is proportional to the square root of the number of independent experimental runs, K. 5 refs
Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry
Mertens, Bart
2017-01-01
This book presents an overview of computational and statistical design and analysis of mass spectrometry-based proteomics, metabolomics, and lipidomics data. This contributed volume provides an introduction to the special aspects of statistical design and analysis with mass spectrometry data for the new omic sciences. The text discusses common aspects of design and analysis between and across all (or most) forms of mass spectrometry, while also providing special examples of application with the most common forms of mass spectrometry. Also covered are applications of computational mass spectrometry not only in clinical study but also in the interpretation of omics data in plant biology studies. Omics research fields are expected to revolutionize biomolecular research by the ability to simultaneously profile many compounds within either patient blood, urine, tissue, or other biological samples. Mass spectrometry is one of the key analytical techniques used in these new omic sciences. Liquid chromatography mass ...
Markert, K. N.; Ashmall, W.; Johnson, G.; Saah, D. S.; Anderson, E.; Flores Cordova, A. I.; Díaz, A. S. P.; Mollicone, D.; Griffin, R.
2017-12-01
Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.
Markert, Kel; Ashmall, William; Johnson, Gary; Saah, David; Mollicone, Danilo; Diaz, Alfonso Sanchez-Paus; Anderson, Eric; Flores, Africa; Griffin, Robert
2017-01-01
Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.
Model-independent plot of dynamic PET data facilitates data interpretation and model selection.
Munk, Ole Lajord
2012-02-21
When testing new PET radiotracers or new applications of existing tracers, the blood-tissue exchange and the metabolism need to be examined. However, conventional plots of measured time-activity curves from dynamic PET do not reveal the inherent kinetic information. A novel model-independent volume-influx plot (vi-plot) was developed and validated. The new vi-plot shows the time course of the instantaneous distribution volume and the instantaneous influx rate. The vi-plot visualises physiological information that facilitates model selection and it reveals when a quasi-steady state is reached, which is a prerequisite for the use of the graphical analyses by Logan and Gjedde-Patlak. Both axes of the vi-plot have direct physiological interpretation, and the plot shows kinetic parameter in close agreement with estimates obtained by non-linear kinetic modelling. The vi-plot is equally useful for analyses of PET data based on a plasma input function or a reference region input function. The vi-plot is a model-independent and informative plot for data exploration that facilitates the selection of an appropriate method for data analysis. Copyright © 2011 Elsevier Ltd. All rights reserved.
SAS and R data management, statistical analysis, and graphics
Kleinman, Ken
2009-01-01
An All-in-One Resource for Using SAS and R to Carry out Common TasksProvides a path between languages that is easier than reading complete documentationSAS and R: Data Management, Statistical Analysis, and Graphics presents an easy way to learn how to perform an analytical task in both SAS and R, without having to navigate through the extensive, idiosyncratic, and sometimes unwieldy software documentation. The book covers many common tasks, such as data management, descriptive summaries, inferential procedures, regression analysis, and the creation of graphics, along with more complex applicat
Using R for Data Management, Statistical Analysis, and Graphics
Horton, Nicholas J
2010-01-01
This title offers quick and easy access to key element of documentation. It includes worked examples across a wide variety of applications, tasks, and graphics. "Using R for Data Management, Statistical Analysis, and Graphics" presents an easy way to learn how to perform an analytical task in R, without having to navigate through the extensive, idiosyncratic, and sometimes unwieldy software documentation and vast number of add-on packages. Organized by short, clear descriptive entries, the book covers many common tasks, such as data management, descriptive summaries, inferential proc
Accidents in Malaysian construction industry: statistical data and court cases.
Chong, Heap Yih; Low, Thuan Siang
2014-01-01
Safety and health issues remain critical to the construction industry due to its working environment and the complexity of working practises. This research attempts to adopt 2 research approaches using statistical data and court cases to address and identify the causes and behavior underlying construction safety and health issues in Malaysia. Factual data on the period of 2000-2009 were retrieved to identify the causes and agents that contributed to health issues. Moreover, court cases were tabulated and analyzed to identify legal patterns of parties involved in construction site accidents. Approaches of this research produced consistent results and highlighted a significant reduction in the rate of accidents per construction project in Malaysia.
Diagnostic Interpretation of Array Data Using Public Databases and Internet Sources
de Leeuw, Nicole; Dijkhuizen, Trijnie; Hehir-Kwa, Jayne Y.; Carter, Nigel P.; Feuk, Lars; Firth, Helen V.; Kuhn, Robert M.; Ledbetter, David H.; Martin, Christa Lese; van Ravenswaaij-Arts, Conny M. A.; Scherer, Steven W.; Shams, Soheil; Van Vooren, Steven; Sijmons, Rolf; Swertz, Morris; Hastings, Ros
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy-number variants (CNVs) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires
Adobe Illustrator drawing showing geophysical and topographical survey data and interpretations
Wallace, Lacey; Ferraby, Rose
2016-01-01
Adobe Illustrator drawing at 1:2000 that shows the rasters and interpretations of the geophysics, the topographical contours, and the survey areas, with British National Grid coordinates and Ordnance Survey Master Map data included.
The emphasis of the mission was the provision of training to the staff of the Department of Agriculture, Government of Thailand, in the analysis and interpretation of data from experiments concerning fertilizer applications in agriculture
A Statistical Toolbox For Mining And Modeling Spatial Data
Full Text Available Most data mining projects in spatial economics start with an evaluation of a set of attribute variables on a sample of spatial entities, looking for the existence and strength of spatial autocorrelation, based on the Moran’s and the Geary’s coefficients, the adequacy of which is rarely challenged, despite the fact that when reporting on their properties, many users seem likely to make mistakes and to foster confusion. My paper begins by a critical appraisal of the classical definition and rational of these indices. I argue that while intuitively founded, they are plagued by an inconsistency in their conception. Then, I propose a principled small change leading to corrected spatial autocorrelation coefficients, which strongly simplifies their relationship, and opens the way to an augmented toolbox of statistical methods of dimension reduction and data visualization, also useful for modeling purposes. A second section presents a formal framework, adapted from recent work in statistical learning, which gives theoretical support to our definition of corrected spatial autocorrelation coefficients. More specifically, the multivariate data mining methods presented here, are easily implementable on the existing (free software, yield methods useful to exploit the proposed corrections in spatial data analysis practice, and, from a mathematical point of view, whose asymptotic behavior, already studied in a series of papers by Belkin & Niyogi, suggests that they own qualities of robustness and a limited sensitivity to the Modifiable Areal Unit Problem (MAUP, valuable in exploratory spatial data analysis.
Solar radiation data - statistical analysis and simulation models
The activities consisted in collecting meteorological data on magnetic tape for ten european locations (with latitudes ranging from 42/sup 0/ to 56/sup 0/ N), analysing the multi-year sequences, developing mathematical models to generate synthetic sequences having the same statistical properties of the original data sets, and producing one or more Short Reference Years (SRY's) for each location. The meteorological parameters examinated were (for all the locations) global + diffuse radiation on horizontal surface, dry bulb temperature, sunshine duration. For some of the locations additional parameters were available, namely, global, beam and diffuse radiation on surfaces other than horizontal, wet bulb temperature, wind velocity, cloud type, cloud cover. The statistical properties investigated were mean, variance, autocorrelation, crosscorrelation with selected parameters, probability density function. For all the meteorological parameters, various mathematical models were built: linear regression, stochastic models of the AR and the DAR type. In each case, the model with the best statistical behaviour was selected for the production of a SRY for the relevant parameter/location.
Sources of Safety Data and Statistical Strategies for Design and Analysis: Clinical Trials.
Zink, Richard C; Marchenko, Olga; Sanchez-Kam, Matilde; Ma, Haijun; Jiang, Qi
2018-03-01
There has been an increased emphasis on the proactive and comprehensive evaluation of safety endpoints to ensure patient well-being throughout the medical product life cycle. In fact, depending on the severity of the underlying disease, it is important to plan for a comprehensive safety evaluation at the start of any development program. Statisticians should be intimately involved in this process and contribute their expertise to study design, safety data collection, analysis, reporting (including data visualization), and interpretation. In this manuscript, we review the challenges associated with the analysis of safety endpoints and describe the safety data that are available to influence the design and analysis of premarket clinical trials. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from clinical trials compared to other sources. Clinical trials are an important source of safety data that contribute to the totality of safety information available to generate evidence for regulators, sponsors, payers, physicians, and patients. This work is a result of the efforts of the American Statistical Association Biopharmaceutical Section Safety Working Group.
Statistical distributions as applied to environmental surveillance data
Application of normal, lognormal, and Weibull distributions to radiological environmental surveillance data was investigated for approximately 300 nuclide-medium-year-location combinations. The fit of data to distributions was compared through probability plotting (special graph paper provides a visual check) and W test calculations. Results show that 25% of the data fit the normal distribution, 50% fit the lognormal, and 90% fit the Weibull.Demonstration of how to plot each distribution shows that normal and lognormal distributions are comparatively easy to use while Weibull distribution is complicated and difficult to use. Although current practice is to use normal distribution statistics, normal fit the least number of data groups considered in this study
Outpatient health care statistics data warehouse--implementation.
Zilli, D
1999-01-01
Data warehouse implementation is assumed to be a very knowledge-demanding, expensive and long-lasting process. As such it requires senior management sponsorship, involvement of experts, a big budget and probably years of development time. Presented Outpatient Health Care Statistics Data Warehouse implementation research provides ample evidence against the infallibility of the above statements. New, inexpensive, but powerful technology, which provides outstanding platform for On-Line Analytical Processing (OLAP), has emerged recently. Presumably, it will be the basis for the estimated future growth of data warehouse market, both in the medical and in other business fields. Methods and tools for building, maintaining and exploiting data warehouses are also briefly discussed in the paper.
Explorations in statistics: the analysis of ratios and normalized data.
Curran-Everett, Douglas
2013-09-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of Explorations in Statistics explores the analysis of ratios and normalized-or standardized-data. As researchers, we compute a ratio-a numerator divided by a denominator-to compute a proportion for some biological response or to derive some standardized variable. In each situation, we want to control for differences in the denominator when the thing we really care about is the numerator. But there is peril lurking in a ratio: only if the relationship between numerator and denominator is a straight line through the origin will the ratio be meaningful. If not, the ratio will misrepresent the true relationship between numerator and denominator. In contrast, regression techniques-these include analysis of covariance-are versatile: they can accommodate an analysis of the relationship between numerator and denominator when a ratio is useless.
Common misconceptions about data analysis and statistics1
Motulsky, Harvey J
2015-01-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word “significant”. (4) Overreliance on standard errors, which are often misunderstood. PMID:25692012
Statistical analysis of field data for aircraft warranties
Air Force and Navy maintenance data collection systems were researched to determine their scientific applicability to the warranty process. New and unique algorithms were developed to extract failure distributions which were then used to characterize how selected families of equipment typically fails. Families of similar equipment were identified in terms of function, technology and failure patterns. Statistical analyses and applications such as goodness-of-fit test, maximum likelihood estimation and derivation of confidence intervals for the probability density function parameters were applied to characterize the distributions and their failure patterns. Statistical and reliability theory, with relevance to equipment design and operational failures were also determining factors in characterizing the failure patterns of the equipment families. Inferences about the families with relevance to warranty needs were then made.
Statistical mechanics of complex neural systems and high dimensional data
Recent experimental advances in neuroscience have opened new vistas into the immense complexity of neuronal networks. This proliferation of data challenges us on two parallel fronts. First, how can we form adequate theoretical frameworks for understanding how dynamical network processes cooperate across widely disparate spatiotemporal scales to solve important computational problems? Second, how can we extract meaningful models of neuronal systems from high dimensional datasets? To aid in these challenges, we give a pedagogical review of a collection of ideas and theoretical methods arising at the intersection of statistical physics, computer science and neurobiology. We introduce the interrelated replica and cavity methods, which originated in statistical physics as powerful ways to quantitatively analyze large highly heterogeneous systems of many interacting degrees of freedom. We also introduce the closely related notion of message passing in graphical models, which originated in computer science as a distributed algorithm capable of solving large inference and optimization problems involving many coupled variables. We then show how both the statistical physics and computer science perspectives can be applied in a wide diversity of contexts to problems arising in theoretical neuroscience and data analysis. Along the way we discuss spin glasses, learning theory, illusions of structure in noise, random matrices, dimensionality reduction and compressed sensing, all within the unified formalism of the replica method. Moreover, we review recent conceptual connections between message passing in graphical models, and neural computation and learning. Overall, these ideas illustrate how statistical physics and computer science might provide a lens through which we can uncover emergent computational functions buried deep within the dynamical complexities of neuronal networks. (paper)
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
Statistical analysis of hydrologic data for Yucca Mountain
The geologic formations in the unsaturated zone at Yucca Mountain are currently being studied as the host rock for a potential radioactive waste repository. Data from several drill holes have been collected to provide the preliminary information needed for planning site characterization for the Yucca Mountain Project. Hydrologic properties have been measured on the core samples and the variables analyzed here are thought to be important in the determination of groundwater travel times. This report presents a statistical analysis of four hydrologic variables: saturated-matrix hydraulic conductivity, maximum moisture content, suction head, and calculated groundwater travel time. It is important to modelers to have as much information about the distribution of values of these variables as can be obtained from the data. The approach taken in this investigation is to (1) identify regions at the Yucca Mountain site that, according to the data, are distinctly different; (2) estimate the means and variances within these regions; (3) examine the relationships among the variables; and (4) investigate alternative statistical methods that might be applicable when more data become available. The five different functional stratigraphic units at three different locations are compared and grouped into relatively homogeneous regions. Within these regions, the expected values and variances associated with core samples of different sizes are estimated. The results provide a rough estimate of the distribution of hydrologic variables for small core sections within each region
Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data
Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.
2015-01-01
Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977
Metaviz: interactive statistical and visual analysis of metagenomic data.
Wagner, Justin; Chelaru, Florin; Kancherla, Jayaram; Paulson, Joseph N; Zhang, Alexander; Felix, Victor; Mahurkar, Anup; Elmqvist, Niklas; Corrada Bravo, Héctor
2018-04-06
Large studies profiling microbial communities and their association with healthy or disease phenotypes are now commonplace. Processed data from many of these studies are publicly available but significant effort is required for users to effectively organize, explore and integrate it, limiting the utility of these rich data resources. Effective integrative and interactive visual and statistical tools to analyze many metagenomic samples can greatly increase the value of these data for researchers. We present Metaviz, a tool for interactive exploratory data analysis of annotated microbiome taxonomic community profiles derived from marker gene or whole metagenome shotgun sequencing. Metaviz is uniquely designed to address the challenge of browsing the hierarchical structure of metagenomic data features while rendering visualizations of data values that are dynamically updated in response to user navigation. We use Metaviz to provide the UMD Metagenome Browser web service, allowing users to browse and explore data for more than 7000 microbiomes from published studies. Users can also deploy Metaviz as a web service, or use it to analyze data through the metavizr package to interoperate with state-of-the-art analysis tools available through Bioconductor. Metaviz is free and open source with the code, documentation and tutorials publicly accessible.
Statistical methods in regression and calibration analysis of chromosome aberration data
The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)
Statistical Approaches Accomodating Uncertainty in Modern Genomic Data
the contributed method applicable to case-control studies as well as mapping of quantitative traits. The contributed method provides a needed association test for quantitative traits in the presence of uncertain genotypes and it further allows correction for population structure in association tests for disease...... the potential of the technological advances. The first of the four papers included in this thesis describes a new method for association mapping that accommodates uncertain genotypes from low-coverage re-sequencing data. The method allows uncertain genotypes using a score statistic based on the joint likelihood...... of the observed phenotypes and the observed sequencing data. This joint likelihood accounts for the genotype uncertainties via the posterior probabilities of each genotype given the observed sequencing data and the phenotype distributions are modelled using a generalised linear model framework which makes...
Statistical analysis of environmental dose data for Trombay environment
The microprocessor based environmental dose logging system is functioning at six stations at Trombay for the past couple of years. The site emergency control centre (SECC) at modular laboratory receives telemetered data every five minutes from main guard house (South Site), Bhabha point (top of the hill), Cirus reactor, Mod Lab terrace, Hall No. 7 and Training School Hostel. The data collected are being stored in dbase III + format for easy processing in a PC. Various statistical parameters and distributions of environmental gamma dose are determined from the hourly dose data. On the basis of the reactor operation status an attempt has been made to separate the natural background and the gamma dose contribution due to the operating research reactors in each one of these monitoring stations. Similar investigations are being carried out for Tarapur environment. (author). 2 refs., 3 tabs., 2 figs
Summary Statistics for Homemade ?Play Dough? -- Data Acquired at LLNL
Using x-ray computerized tomography (CT), we have characterized the x-ray linear attenuation coefficients (LAC) of a homemade Play Dough{trademark}-like material, designated as PDA. Table 1 gives the first-order statistics for each of four CT measurements, estimated with a Gaussian kernel density estimator (KDE) analysis. The mean values of the LAC range from a high of about 2700 LMHU{sub D} 100kVp to a low of about 1200 LMHUD at 300kVp. The standard deviation of each measurement is around 10% to 15% of the mean. The entropy covers the range from 6.0 to 7.4. Ordinarily, we would model the LAC of the material and compare the modeled values to the measured values. In this case, however, we did not have the detailed chemical composition of the material and therefore did not model the LAC. Using a method recently proposed by Lawrence Livermore National Laboratory (LLNL), we estimate the value of the effective atomic number, Z{sub eff}, to be near 10. LLNL prepared about 50mL of the homemade 'Play Dough' in a polypropylene vial and firmly compressed it immediately prior to the x-ray measurements. We used the computer program IMGREC to reconstruct the CT images. The values of the key parameters used in the data capture and image reconstruction are given in this report. Additional details may be found in the experimental SOP and a separate document. To characterize the statistical distribution of LAC values in each CT image, we first isolated an 80% central-core segment of volume elements ('voxels') lying completely within the specimen, away from the walls of the polypropylene vial. All of the voxels within this central core, including those comprised of voids and inclusions, are included in the statistics. We then calculated the mean value, standard deviation and entropy for (a) the four image segments and for (b) their digital gradient images. (A digital gradient image of a given image was obtained by taking the absolute value of the difference
In stochastic damages, the numbers of events, e.g. the persons who are affected by or have died of cancer, and thus the relative frequencies (incidence or mortality) are binomially distributed random variables. Their statistical fluctuations can be characterized by confidence intervals. For epidemiologic questions, especially for the analysis of stochastic damages in the low dose range, the following issues are interesting: - Is a sample (a group of persons) with a definite observed damage frequency part of the whole population? - Is an observed frequency difference between two groups of persons random or statistically significant? - Is an observed increase or decrease of the frequencies with increasing dose random or statistically significant and how large is the regression coefficient (= risk coefficient) in this case? These problems can be solved by sttistical tests. So-called distribution-free tests and tests which are not bound to the supposition of normal distribution are of particular interest, such as: - χ 2 -independence test (test in contingency tables); - Fisher-Yates-test; - trend test according to Cochran; - rank correlation test given by Spearman. These tests are explained in terms of selected epidemiologic data, e.g. of leukaemia clusters, of the cancer mortality of the Japanese A-bomb survivors especially in the low dose range as well as on the sample of the cancer mortality in the high background area in Yangjiang (China). (orig.) [de
Role of Melt Curve Analysis in Interpretation of Nutrigenomics' MicroRNA Expression Data.
Ahmed, Farid E; Gouda, Mostafa M; Hussein, Laila A; Ahmed, Nancy C; Vos, Paul W; Mohammad, Mahmoud A
2017-01-01
This article illustrates the importance of melt curve analysis (MCA) in interpretation of mild nutrogenomic micro(mi)RNA expression data, by measuring the magnitude of the expression of key miRNA molecules in stool of healthy human adults as molecular markers, following the intake of Pomegranate juice (PGJ), functional fermented sobya (FS), rich in potential probiotic lactobacilli, or their combination. Total small RNA was isolated from stool of 25 volunteers before and following a three-week dietary intervention trial. Expression of 88 miRNA genes was evaluated using Qiagen's 96 well plate RT 2 miRNA qPCR arrays. Employing parallel coordinates plots, there was no observed significant separation for the gene expression (Cq) values, using Roche 480® PCR LightCycler instrument used in this study, and none of the miRNAs showed significant statistical expression after controlling for the false discovery rate. On the other hand, melting temperature profiles produced during PCR amplification run, found seven significant genes (miR-184, miR-203, miR-373, miR-124, miR-96, miR-373 and miR-301a), which separated candidate miRNAs that could function as novel molecular markers of relevance to oxidative stress and immunoglobulin function, for the intake of polyphenol (PP)-rich, functional fermented foods rich in lactobacilli (FS), or their combination. We elaborate on these data, and present a detailed review on use of melt curves for analyzing nutigenomic miRNA expression data, which initially appear to show no significant expressions, but are actually more subtle than this simplistic view, necessitating the understanding of the role of MCA for a comprehensive understanding of what the collective expression and MCA data collectively imply. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Inferential Statistics from Black Hispanic Breast Cancer Survival Data
Directory of Open Access Journals (Sweden)
Hafiz M. R. Khan
2014-01-01
Full Text Available In this paper we test the statistical probability models for breast cancer survival data for race and ethnicity. Data was collected from breast cancer patients diagnosed in United States during the years 1973–2009. We selected a stratified random sample of Black Hispanic female patients from the Surveillance Epidemiology and End Results (SEER database to derive the statistical probability models. We used three common model building criteria which include Akaike Information Criteria (AIC, Bayesian Information Criteria (BIC, and Deviance Information Criteria (DIC to measure the goodness of fit tests and it was found that Black Hispanic female patients survival data better fit the exponentiated exponential probability model. A novel Bayesian method was used to derive the posterior density function for the model parameters as well as to derive the predictive inference for future response. We specifically focused on Black Hispanic race. Markov Chain Monte Carlo (MCMC method was used for obtaining the summary results of posterior parameters. Additionally, we reported predictive intervals for future survival times. These findings would be of great significance in treatment planning and healthcare resource allocation.
Application of statistical dynamical turbulence closures to data assimilation
We describe the development of an accurate yet computationally tractable statistical dynamical closure theory for general inhomogeneous turbulent flows, coined the quasi-diagonal direct interaction approximation closure (QDIA), and its application to problems in data assimilation. The QDIA provides prognostic equations for evolving mean fields, covariances and higher-order non-Gaussian terms, all of which are also required in the formulation of data assimilation schemes for nonlinear geophysical flows. The QDIA is a generalization of the class of direct interaction approximation theories, initially developed by Kraichnan (1959 J. Fluid Mech. 5 497) for isotropic turbulence, to fully inhomogeneous flows and has been further generalized to allow for both inhomogeneous and non-Gaussian initial conditions and long integrations. A regularization procedure or empirical vertex renormalization that ensures correct inertial range spectra is also described. The aim of this paper is to provide a coherent mathematical description of the QDIA turbulence closure and closure-based data assimilation scheme we have labeled the statistical dynamical Kalman filter. The mathematical formalism presented has been synthesized from recent works of the authors with some additional material and is presented in sufficient detail that the paper is of a pedagogical nature.
A statistically self-consistent type Ia supernova data analysis
Full text: The type Ia supernovae are one of the main cosmological probes nowadays and are used as standardized candles in distance measurements. The standardization processes, among which SALT2 and MLCS2k2 are the most used ones, are based on empirical relations and leave room for a residual dispersion in the light curves of the supernovae. This dispersion is introduced in the chi squared used to fit the parameters of the model in the expression for the variance of the data, as an attempt to quantify our ignorance in modeling the supernovae properly. The procedure used to assign a value to this dispersion is statistically inconsistent and excludes the possibility of comparing different cosmological models. In addition, the SALT2 light curve fitter introduces parameters on the model for the variance that are also used in the model for the data. In the chi squared statistics context the minimization of such a quantity yields, in the best case scenario, a bias. An iterative method has been developed in order to perform the minimization of this chi squared but it is not well grounded, although it is used by several groups. We propose an analysis of the type Ia supernovae data that is based on the likelihood itself and makes it possible to address both inconsistencies mentioned above in a straightforward way. (author)
The United Nations recommendations and data efforts: international migration statistics.
Simmons, A B
1987-01-01
This article reviews the UN's efforts to improve international migration statistics. The review addresses the challenges faced by the UN, the direction in which this effort is going, gaps in the current approach, and priorities for future action. The content of the UN recommendations has changed in the past and seems to be moving toward further changes. At each stage, the direction of change corresponds broadly to earlier shifts in the overall context of world social-economic affairs and related transformations in international travel and migration patterns. Early (1953) objectives were vaguely stated in terms of social, economic, and demographic impacts of long term settlement. 1976 recommendations continued the focus on long term resettlement and, at the same time, gave more attention to at least 1 kind of short term (work-related) movement. Most recent recommendations have given more attention to other classes of short term travellers, such as refugees and contract workers. Recommendations on the measures and data sources have changed over time, also. The 1953 recommendations were limited to flow data from international border statistics. 1976 recommendations drew attention to stock data and the use of civil registration data to supplement border crossing data. Recent UN reflections recognize that the volume of border crossings has now reached the point where many countries simply refuse to gather data on all travellers, choosing instead to make estimates. It is implied that either sample surveys at border points and/or visas and entry permits may be the best way of counting various specific kinds of migrants. Future recommendations corresponding to contemporary and emerging concerns will require that the guidelines be restructured: 1) to give more explicit attention in international migration statistics to citizenship and access to political and welfare benefits; 2) to distinguish more carefully various sub-classes of movers; 3) to expand objectives of data
Information gathering for the Transportation Statistics Data Bank
The Transportation Statistics Data Bank (TSDB) was developed in 1974 to collect information on the transport of Department of Energy (DOE) materials. This computer program may be used to provide the framework for collecting more detailed information on DOE shipments of radioactive materials. This report describes the type of information that is needed in this area and concludes that the existing system could be readily modified to collect and process it. The additional needed information, available from bills of lading and similar documents, could be gathered from DOE field offices and transferred in a standard format to the TSDB system. Costs of the system are also discussed briefly
Evaluation of the Wishart test statistics for polarimetric SAR data
A test statistic for equality of two covariance matrices following the complex Wishart distribution has previously been used in new algorithms for change detection, edge detection and segmentation in polarimetric SAR images. Previously, the results for change detection and edge detection have been...... quantitatively evaluated. This paper deals with the evaluation of segmentation. A segmentation performance measure originally developed for single-channel SAR images has been extended to polarimetric SAR images, and used to evaluate segmentation for a merge-using-moment algorithm for polarimetric SAR data....
JAWS data collection, analysis highlights, and microburst statistics
Mccarthy, J.; Roberts, R.; Schreiber, W.
1983-01-01
Organization, equipment, and the current status of the Joint Airport Weather Studies project initiated in relation to the microburst phenomenon are summarized. Some data collection techniques and preliminary statistics on microburst events recorded by Doppler radar are discussed as well. Radar studies show that microbursts occur much more often than expected, with majority of the events being potentially dangerous to landing or departing aircraft. Seventy events were registered, with the differential velocities ranging from 10 to 48 m/s; headwind/tailwind velocity differentials over 20 m/s are considered seriously hazardous. It is noted that a correlation is yet to be established between the velocity differential and incoherent radar reflectivity.
Isocount scintillation scanner with preset statistical data reliability
A scintillation detector scans an object such as a live body along horizontal straight scanning lines in such a manner that the scintillation detector is stopped at a scanning point during the time interval T required for counting a predetermined number of N pulses. The rate R/sub N/ = N/T is then calculated and the output signal pulses the number of which represents the rate R or the corresponding output signal is used as the recording signal for forming the scintigram. In contrast to the usual scanner, the isocount scanner scans an object stepwise in order to gather data with statistically uniform reliability
77 FR 65177 - Swap Data Repositories: Interpretative Statement Regarding the Confidentiality and...
2012-10-25
... COMMODITY FUTURES TRADING COMMISSION Swap Data Repositories: Interpretative Statement Regarding...\\ which requires all swaps-- whether cleared or uncleared--to be reported to swap data repositories... of the CEA to add a definition of the term ``swap data repository.'' Pursuant to CEA section 1a(48...
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data
Directory of Open Access Journals (Sweden)
Scherer Stephen W
2011-05-01
Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Data analysis for radiological characterisation: Geostatistical and statistical complementarity
Radiological characterisation may cover a large range of evaluation objectives during a decommissioning and dismantling (D and D) project: removal of doubt, delineation of contaminated materials, monitoring of the decontamination work and final survey. At each stage, collecting relevant data to be able to draw the conclusions needed is quite a big challenge. In particular two radiological characterisation stages require an advanced sampling process and data analysis, namely the initial categorization and optimisation of the materials to be removed and the final survey to demonstrate compliance with clearance levels. On the one hand the latter is widely used and well developed in national guides and norms, using random sampling designs and statistical data analysis. On the other hand a more complex evaluation methodology has to be implemented for the initial radiological characterisation, both for sampling design and for data analysis. The geostatistical framework is an efficient way to satisfy the radiological characterisation requirements providing a sound decision-making approach for the decommissioning and dismantling of nuclear premises. The relevance of the geostatistical methodology relies on the presence of a spatial continuity for radiological contamination. Thus geo-statistics provides reliable methods for activity estimation, uncertainty quantification and risk analysis, leading to a sound classification of radiological waste (surfaces and volumes). This way, the radiological characterization of contaminated premises can be divided into three steps. First, the most exhaustive facility analysis provides historical and qualitative information. Then, a systematic (exhaustive or not) surface survey of the contamination is implemented on a regular grid. Finally, in order to assess activity levels and contamination depths, destructive samples are collected at several locations within the premises (based on the surface survey results) and analysed. Combined with
Sources of Safety Data and Statistical Strategies for Design and Analysis: Postmarket Surveillance.
Izem, Rima; Sanchez-Kam, Matilde; Ma, Haijun; Zink, Richard; Zhao, Yueqin
2018-03-01
Safety data are continuously evaluated throughout the life cycle of a medical product to accurately assess and characterize the risks associated with the product. The knowledge about a medical product's safety profile continually evolves as safety data accumulate. This paper discusses data sources and analysis considerations for safety signal detection after a medical product is approved for marketing. This manuscript is the second in a series of papers from the American Statistical Association Biopharmaceutical Section Safety Working Group. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from passive postmarketing surveillance systems compared to other sources. Signal detection has traditionally relied on spontaneous reporting databases that have been available worldwide for decades. However, current regulatory guidelines and ease of reporting have increased the size of these databases exponentially over the last few years. With such large databases, data-mining tools using disproportionality analysis and helpful graphics are often used to detect potential signals. Although the data sources have many limitations, analyses of these data have been successful at identifying safety signals postmarketing. Experience analyzing these dynamic data is useful in understanding the potential and limitations of analyses with new data sources such as social media, claims, or electronic medical records data.
Fuzzy logic and image processing techniques for the interpretation of seismic data
Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation
Computer-aided structure elucidation Pt. 2. /sup 1/H-NMR data interpretation
A computerized /sup 1/H-NMR data interpretation system has been developed using the artificial intelligence approach. An attempt has been made to overcome the difficulties of interpreting higher order spin systems. Proton-containing functional groups are divided into subgroups according to their spectroscopic behaviour and the information they bear. Spin simulation is used to study the effect of substituents on the higher order splitting patterns. Illustrative examples are given.
Bayesian Sensitivity Analysis of Statistical Models with Missing Data.
Zhu, Hongtu; Ibrahim, Joseph G; Tang, Niansheng
2014-04-01
Methods for handling missing data depend strongly on the mechanism that generated the missing values, such as missing completely at random (MCAR) or missing at random (MAR), as well as other distributional and modeling assumptions at various stages. It is well known that the resulting estimates and tests may be sensitive to these assumptions as well as to outlying observations. In this paper, we introduce various perturbations to modeling assumptions and individual observations, and then develop a formal sensitivity analysis to assess these perturbations in the Bayesian analysis of statistical models with missing data. We develop a geometric framework, called the Bayesian perturbation manifold, to characterize the intrinsic structure of these perturbations. We propose several intrinsic influence measures to perform sensitivity analysis and quantify the effect of various perturbations to statistical models. We use the proposed sensitivity analysis procedure to systematically investigate the tenability of the non-ignorable missing at random (NMAR) assumption. Simulation studies are conducted to evaluate our methods, and a dataset is analyzed to illustrate the use of our diagnostic measures.
Right-sizing statistical models for longitudinal data.
Wood, Phillip K; Steinley, Douglas; Jackson, Kristina M
2015-12-01
Arguments are proposed that researchers using longitudinal data should consider more and less complex statistical model alternatives to their initially chosen techniques in an effort to "right-size" the model to the data at hand. Such model comparisons may alert researchers who use poorly fitting, overly parsimonious models to more complex, better-fitting alternatives and, alternatively, may identify more parsimonious alternatives to overly complex (and perhaps empirically underidentified and/or less powerful) statistical models. A general framework is proposed for considering (often nested) relationships between a variety of psychometric and growth curve models. A 3-step approach is proposed in which models are evaluated based on the number and patterning of variance components prior to selection of better-fitting growth models that explain both mean and variation-covariation patterns. The orthogonal free curve slope intercept (FCSI) growth model is considered a general model that includes, as special cases, many models, including the factor mean (FM) model (McArdle & Epstein, 1987), McDonald's (1967) linearly constrained factor model, hierarchical linear models (HLMs), repeated-measures multivariate analysis of variance (MANOVA), and the linear slope intercept (linearSI) growth model. The FCSI model, in turn, is nested within the Tuckerized factor model. The approach is illustrated by comparing alternative models in a longitudinal study of children's vocabulary and by comparing several candidate parametric growth and chronometric models in a Monte Carlo study. (c) 2015 APA, all rights reserved).
Assessing Research Data Deposits and Usage Statistics within IDEALS
Directory of Open Access Journals (Sweden)
Christie A. Wiley
2017-12-01
Full Text Available Objectives:This study follows up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1 What is the file composition of datasets ingested into the University of Illinois at Urbana-Champaign (UIUC campus repository? Are datasets more likely to be single-file or multiple-file items? (2 What is the usage data associated with these datasets? Which items are most popular? Methods: The dataset records collected in this study were identified by filtering item types categorized as “data” or “dataset” using the advanced search function in IDEALS. Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item’s statistics report. The Handle identifier represents the dataset record’s persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository. Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into IDEALS. Results: A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first timeframe a large number of PDFs were deposited by the Illinois Department of Agriculture. Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single-file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per
Securing co-operation from persons supplying statistical data
Aubenque, M. J.; Blaikley, R. M.; Harris, F. Fraser; Lal, R. B.; Neurdenburg, M. G.; Hernández, R. de Shelly
1954-01-01
Securing the co-operation of persons supplying information required for medical statistics is essentially a problem in human relations, and an understanding of the motivations, attitudes, and behaviour of the respondents is necessary. Before any new statistical survey is undertaken, it is suggested by Aubenque and Harris that a preliminary review be made so that the maximum use is made of existing information. Care should also be taken not to burden respondents with an overloaded questionnaire. Aubenque and Harris recommend simplified reporting. Complete population coverage is not necessary. Neurdenburg suggests that the co-operation and support of such organizations as medical associations and social security boards are important and that propaganda should be directed specifically to the groups whose co-operation is sought. Informal personal contacts are valuable and desirable, according to Blaikley, but may have adverse effects if the right kind of approach is not made. Financial payments as an incentive in securing co-operation are opposed by Neurdenburg, who proposes that only postage-free envelopes or similar small favours be granted. Blaikley and Harris, on the other hand, express the view that financial incentives may do much to gain the support of those required to furnish data; there are, however, other incentives, and full use should be made of the natural inclinations of respondents. Compulsion may be necessary in certain instances, but administrative rather than statutory measures should be adopted. Penalties, according to Aubenque, should be inflicted only when justified by imperative health requirements. The results of surveys should be made available as soon as possible to those who co-operated, and Aubenque and Harris point out that they should also be of practical value to the suppliers of the information. Greater co-operation can be secured from medical persons who have an understanding of the statistical principles involved; Aubenque and
Multivariate statistical analysis of atom probe tomography data
The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed.
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology. Copyright © 2014 Elsevier Inc. All rights reserved.
Siska, William; Gupta, Aradhana; Tomlinson, Lindsay; Tripathi, Niraj; von Beust, Barbara
Clinical pathology testing is routinely performed in target animal safety studies in order to identify potential toxicity associated with administration of an investigational veterinary pharmaceutical product. Regulatory and other testing guidelines that address such studies provide recommendations for clinical pathology testing but occasionally contain outdated analytes and do not take into account interspecies physiologic differences that affect the practical selection of appropriate clinical pathology tests. Additionally, strong emphasis is often placed on statistical analysis and use of reference intervals for interpretation of test article-related clinical pathology changes, with limited attention given to the critical scientific review of clinically, toxicologically, or biologically relevant changes. The purpose of this communication from the Regulatory Affairs Committee of the American Society for Veterinary Clinical Pathology is to provide current recommendations for clinical pathology testing and data interpretation in target animal safety studies and thereby enhance the value of clinical pathology testing in these studies.
Statistical analysis and data display an intermediate course with examples in R
Heiberger, Richard M
2015-01-01
This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The authors demonstrate how to analyze data—showing code, graphics, and accompanying tabular listings—for all the methods they cover. They emphasize how to construct and interpret graphs. They discuss principles of graphical design. They identify situations where visual impressions from graphs may need confirmation from traditional tabular results. All chapters have exercises. The authors provide and discuss R functions for all the new graphical display formats. All graphs and tabular output in the book were constructed using these functions. Complete R scripts for all examples and figures are provided for readers to use as models for their own analyses. This book can serve as a standalone text for statistics majors at the master’s level and for other quantitatively oriented disciplines at the doctoral level, and as a reference book for researchers. In-de...
Multiple point statistical simulation using uncertain (soft) conditional data
Hansen, Thomas Mejer; Vu, Le Thanh; Mosegaard, Klaus; Cordua, Knud Skou
2018-05-01
Geostatistical simulation methods have been used to quantify spatial variability of reservoir models since the 80s. In the last two decades, state of the art simulation methods have changed from being based on covariance-based 2-point statistics to multiple-point statistics (MPS), that allow simulation of more realistic Earth-structures. In addition, increasing amounts of geo-information (geophysical, geological, etc.) from multiple sources are being collected. This pose the problem of integration of these different sources of information, such that decisions related to reservoir models can be taken on an as informed base as possible. In principle, though difficult in practice, this can be achieved using computationally expensive Monte Carlo methods. Here we investigate the use of sequential simulation based MPS simulation methods conditional to uncertain (soft) data, as a computational efficient alternative. First, it is demonstrated that current implementations of sequential simulation based on MPS (e.g. SNESIM, ENESIM and Direct Sampling) do not account properly for uncertain conditional information, due to a combination of using only co-located information, and a random simulation path. Then, we suggest two approaches that better account for the available uncertain information. The first make use of a preferential simulation path, where more informed model parameters are visited preferentially to less informed ones. The second approach involves using non co-located uncertain information. For different types of available data, these approaches are demonstrated to produce simulation results similar to those obtained by the general Monte Carlo based approach. These methods allow MPS simulation to condition properly to uncertain (soft) data, and hence provides a computationally attractive approach for integration of information about a reservoir model.
Statistical Analysis of 30 Years Rainfall Data: A Case Study
Arvind, G.; Ashok Kumar, P.; Girish Karthi, S.; Suribabu, C. R.
2017-07-01
Rainfall is a prime input for various engineering design such as hydraulic structures, bridges and culverts, canals, storm water sewer and road drainage system. The detailed statistical analysis of each region is essential to estimate the relevant input value for design and analysis of engineering structures and also for crop planning. A rain gauge station located closely in Trichy district is selected for statistical analysis where agriculture is the prime occupation. The daily rainfall data for a period of 30 years is used to understand normal rainfall, deficit rainfall, Excess rainfall and Seasonal rainfall of the selected circle headquarters. Further various plotting position formulae available is used to evaluate return period of monthly, seasonally and annual rainfall. This analysis will provide useful information for water resources planner, farmers and urban engineers to assess the availability of water and create the storage accordingly. The mean, standard deviation and coefficient of variation of monthly and annual rainfall was calculated to check the rainfall variability. From the calculated results, the rainfall pattern is found to be erratic. The best fit probability distribution was identified based on the minimum deviation between actual and estimated values. The scientific results and the analysis paved the way to determine the proper onset and withdrawal of monsoon results which were used for land preparation and sowing.
A spatial scan statistic for compound Poisson data.
Rosychuk, Rhonda J; Chang, Hsing-Ming
2013-12-20
The topic of spatial cluster detection gained attention in statistics during the late 1980s and early 1990s. Effort has been devoted to the development of methods for detecting spatial clustering of cases and events in the biological sciences, astronomy and epidemiology. More recently, research has examined detecting clusters of correlated count data associated with health conditions of individuals. Such a method allows researchers to examine spatial relationships of disease-related events rather than just incident or prevalent cases. We introduce a spatial scan test that identifies clusters of events in a study region. Because an individual case may have multiple (repeated) events, we base the test on a compound Poisson model. We illustrate our method for cluster detection on emergency department visits, where individuals may make multiple disease-related visits. Copyright © 2013 John Wiley & Sons, Ltd.
The Need for the Dissemination of Statistical Data and Information
Directory of Open Access Journals (Sweden)
Anna-Alexandra Frunza
2016-01-01
Full Text Available There is an emphasis nowadays on knowledge, so the access to information has increased inrelevance in the modern economies which have developed their competitive advantage thoroughtheir dynamic response to the market changes. The effort for transparency has increasedtremendously within the last decades which have been also influenced by the weight that the digitalsupport has provided. The need for the dissemination of statistical data and information has metnew challenges in terms of aggregating the practices that both private and public organizations usein order to ensure the optimum access to the end users. The article stresses some key questions thatcan be introduced which ease the process of collection and presentation of the results subject todissemination.
Statistical Analysis of Data with Non-Detectable Values
Environmental exposure measurements are, in general, positive and may be subject to left censoring, i.e. the measured value is less than a ''limit of detection''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. A basic problem of interest in environmental risk assessment is to determine if the mean concentration of an analyte is less than a prescribed action level. Parametric methods, used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level and/or an upper percentile (e.g. the 95th percentile) are used to characterize exposure levels, and upper confidence limits are needed to describe the uncertainty in these estimates. In certain situations it is of interest to estimate the probability of observing a future (or ''missed'') value of a lognormal variable. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on the 95th percentile (i.e. the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical
Full Text Available A3S(Arwin-Adang-Aciek-Sembiring is a method of information fusion at a single observation and OMA3S(Observation Multi-time A3S is a method of information fusion for time-series data. This paper proposes OMA3S-based Cognitive Artificial-Intelligence method for interpreting Transformer Condition, which is calculated based on maintenance data from Indonesia National Electric Company (PLN. First, the proposed method is tested using the previously published data, and then followed by implementation on maintenance data. Maintenance data are fused to obtain part condition, and part conditions are fused to obtain transformer condition. Result shows proposed method is valid for DGA fault identification with the average accuracy of 91.1%. The proposed method not only can interpret the major fault, it can also identify the minor fault occurring along with the major fault, allowing early warning feature. Result also shows part conditions can be interpreted using information fusion on maintenance data, and the transformer condition can be interpreted using information fusion on part conditions. The future works on this research is to gather more data, to elaborate more factors to be fused, and to design a cognitive processor that can be used to implement this concept of intelligent instrumentation.
RESEARCH ON THE CONSTRUCTION OF REMOTE SENSING AUTOMATIC INTERPRETATION SYMBOL BIG DATA
Directory of Open Access Journals (Sweden)
Y. Gao
2018-04-01
Full Text Available Remote sensing automatic interpretation symbol (RSAIS is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013–2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era.
Statistical Clustering and Compositional Modeling of Iapetus VIMS Spectral Data
Pinilla-Alonso, N.; Roush, T. L.; Marzo, G.; Dalle Ore, C. M.; Cruikshank, D. P.
2009-12-01
It has long been known that the surfaces of Saturn's major satellites are predominantly icy objects [e.g. 1 and references therein]. Since 2004, these bodies have been the subject of observations by the Cassini-VIMS (Visual and Infrared Mapping Spectrometer) experiment [2]. Iapetus has the unique property that the hemisphere centered on the apex of its locked synchronous orbital motion around Saturn has a very low geometrical albedo of 2-6%, while the opposite hemisphere is about 10 times more reflective. The nature and origin of the dark material of Iapetus has remained a question since its discovery [3 and references therein]. The nature of this material and how it is distributed on the surface of this body, can shed new light into the knowledge of the Saturnian system. We apply statistical clustering [4] and theoretical modeling [5,6] to address the surface composition of Iapetus. The VIMS data evaluated were obtained during the second flyby of Iapetus, in September 2007. This close approach allowed VIMS to obtain spectra at relatively high spatial resolution, ~1-22 km/pixel. The data we study sampled the trailing hemisphere and part of the dark leading one. The statistical clustering [4] is used to identify statistically distinct spectra on Iapetus. The composition of these distinct spectra are evaluated using theoretical models [5,6]. We thank Allan Meyer for his help. This research was supported by an appointment to the NASA Postdoctoral Program at the Ames Research Center, administered by Oak Ridge Associated Universities through a contract with NASA. [1] A, Coradini et al., 2009, Earth, Moon & Planets, 105, 289-310. [2] Brown et al., 2004, Space Science Reviews, 115, 111-168. [3] Cruikshank, D. et al Icarus, 2008, 193, 334-343. [4] Marzo, G. et al. 2008, Journal of Geophysical Research, 113, E12, CiteID E12009. [5] Hapke, B. 1993, Theory of reflectance and emittance spectroscopy, Cambridge University Press. [6] Shkuratov, Y. et al. 1999, Icarus, 137, 235-246.
The International Coal Statistics Data Base program maintenance guide
Data Analysis & Statistical Methods for Command File Errors
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Statistical inference for imperfect maintenance models with missing data
The paper considers complex industrial systems with incomplete maintenance history. A corrective maintenance is performed after the occurrence of a failure and its efficiency is assumed to be imperfect. In maintenance analysis, the databases are not necessarily complete. Specifically, the observations are assumed to be window-censored. This situation arises relatively frequently after the purchase of a second-hand unit or in the absence of maintenance record during the burn-in phase. The joint assessment of the wear-out of the system and the maintenance efficiency is investigated under missing data. A review along with extensions of statistical inference procedures from an observation window are proposed in the case of perfect and minimal repair using the renewal and Poisson theories, respectively. Virtual age models are employed to model imperfect repair. In this framework, new estimation procedures are developed. In particular, maximum likelihood estimation methods are derived for the most classical virtual age models. The benefits of the new estimation procedures are highlighted by numerical simulations and an application to a real data set. - Highlights: • New estimation procedures for window-censored observations and imperfect repair. • Extensions of inference methods for perfect and minimal repair with missing data. • Overview of maximum likelihood method with complete and incomplete observations. • Benefits of the new procedures highlighted by simulation studies and real application.
A number of (p,n), (n,p), and ( 3 He, p) reactions have been interpreted on the basis of the statistical multistep compound emission mechanism. Good agreement with experiment is found both in spectrum shape and in the value of the coherence widths
Bayesian inference – a way to combine statistical data and semantic analysis meaningfully
Directory of Open Access Journals (Sweden)
Eila Lindfors
2011-11-01
Full Text Available This article focuses on presenting the possibilities of Bayesian modelling (Finite Mixture Modelling in the semantic analysis of statistically modelled data. The probability of a hypothesis in relation to the data available is an important question in inductive reasoning. Bayesian modelling allows the researcher to use many models at a time and provides tools to evaluate the goodness of different models. The researcher should always be aware that there is no such thing as the exact probability of an exact event. This is the reason for using probabilistic models. Each model presents a different perspective on the phenomenon in focus, and the researcher has to choose the most probable model with a view to previous research and the knowledge available.The idea of Bayesian modelling is illustrated here by presenting two different sets of data, one from craft science research (n=167 and the other (n=63 from educational research (Lindfors, 2007, 2002. The principles of how to build models and how to combine different profiles are described in the light of the research mentioned.Bayesian modelling is an analysis based on calculating probabilities in relation to a specific set of quantitative data. It is a tool for handling data and interpreting it semantically. The reliability of the analysis arises from an argumentation of which model can be selected from the model space as the basis for an interpretation, and on which arguments.Keywords: method, sloyd, Bayesian modelling, student teachersURN:NBN:no-29959
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ``error`` in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand.
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ''error'' in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand
The purpose of this project was to reprocess, evaluate, and reinterpret 14 line miles of seismic reflection data acquired at the Hanford Site. Regional and area-specific geology has been reviewed, the data acquisition parameters as they relate to the limitations inherent in the data have been discussed, and the reprocessing procedures have been described in detail along with an evaluation of the original processing. After initial testing, the focus of the reprocessing was placed on resolution of the geologic horizons at and near the top of the basalt. The reprocessed seismic data shows significant improvement over the original processing. The improvement is the result of the integrated processing and interpretation approach where each processing step has been tested in sequence and the intermediate results examined carefully in accordance with the project goals. The interpretation procedure placed strong reliance upon synthetic seismograms and models calculated based upon the physical parameters of the subsurface materials, and upon associated geophysical (reflection, gravity, magnetic) data. The final interpretation of the seismic data is in agreement with the structural contour maps based primarily on borehole information. The seismic interpretation has added important detail concerning areas which should be considered for further study. 60 figs., 1 tab
What defines an Expert? - Uncertainty in the interpretation of seismic data
Bond, C. E.
2008-12-01
Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?
SEDA: A software package for the Statistical Earthquake Data Analysis
Lombardi, A. M.
2017-03-01
In this paper, the first version of the software SEDA (SEDAv1.0), designed to help seismologists statistically analyze earthquake data, is presented. The package consists of a user-friendly Matlab-based interface, which allows the user to easily interact with the application, and a computational core of Fortran codes, to guarantee the maximum speed. The primary factor driving the development of SEDA is to guarantee the research reproducibility, which is a growing movement among scientists and highly recommended by the most important scientific journals. SEDAv1.0 is mainly devoted to produce accurate and fast outputs. Less care has been taken for the graphic appeal, which will be improved in the future. The main part of SEDAv1.0 is devoted to the ETAS modeling. SEDAv1.0 contains a set of consistent tools on ETAS, allowing the estimation of parameters, the testing of model on data, the simulation of catalogs, the identification of sequences and forecasts calculation. The peculiarities of routines inside SEDAv1.0 are discussed in this paper. More specific details on the software are presented in the manual accompanying the program package.
Dozmorov, Mikhail G
2017-10-15
One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. mikhail.dozmorov@vcuhealth.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Full Text Available Digital Imaging Processing (DIP requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and digital imaging processing service, called M-DIP. The objective of the system is to (1 automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC, Neuroimaging Informatics Technology Initiative (NIFTI to RAW formats; (2 speed up querying of imaging measurement; and (3 display high level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle- layer database, a stand-alone DIP server and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data a multiple zoom levels and to increase its quality to meet users expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J; Ullmann, Jeremy F P; Janke, Andrew L
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users' expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.
Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu
2018-04-17
Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and
Advanced data analysis in neuroscience integrating statistical and computational models
Durstewitz, Daniel
2017-01-01
This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering. Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...
Enerdata statistical yearbook. ''the key-data of energy worldwide''. 1999 data
Kennedy, R R; Merry, A F
2011-09-01
Anaesthesia involves processing large amounts of information over time. One task of the anaesthetist is to detect substantive changes in physiological variables promptly and reliably. It has been previously demonstrated that a graphical trend display of historical data leads to more rapid detection of such changes. We examined the effect of a graphical indication of the magnitude of Trigg's Tracking Variable, a simple statistically based trend detection algorithm, on the accuracy and latency of the detection of changes in a micro-simulation. Ten anaesthetists each viewed 20 simulations with four variables displayed as the current value with a simple graphical trend display. Values for these variables were generated by a computer model, and updated every second; after a period of stability a change occurred to a new random value at least 10 units from baseline. In 50% of the simulations an indication of the rate of change was given by a five level graphical representation of the value of Trigg's Tracking Variable. Participants were asked to indicate when they thought a change was occurring. Changes were detected 10.9% faster with the trend indicator present (mean 13.1 [SD 3.1] cycles vs 14.6 [SD 3.4] cycles, 95% confidence interval 0.4 to 2.5 cycles, P = 0.013. There was no difference in accuracy of detection (median with trend detection 97% [interquartile range 95 to 100%], without trend detection 100% [98 to 100%]), P = 0.8. We conclude that simple statistical trend detection may speed detection of changes during routine anaesthesia, even when a graphical trend display is present.
From Aggregation to Interpretation: How Assessors Judge Complex Data in a Competency-Based Portfolio
Oudkerk Pool, Andrea; Govaerts, Marjan J. B.; Jaarsma, Debbie A. D. C.; Driessen, Erik W.
2018-01-01
While portfolios are increasingly used to assess competence, the validity of such portfolio-based assessments has hitherto remained unconfirmed. The purpose of the present research is therefore to further our understanding of how assessors form judgments when interpreting the complex data included in a competency-based portfolio. Eighteen…
Walther, Joachim; Sochacka, Nicola W.; Pawley, Alice L.
2016-01-01
This article explores challenges and opportunities associated with sharing qualitative data in engineering education research. This exploration is theoretically informed by an existing framework of interpretive research quality with a focus on the concept of Communicative Validation. Drawing on practice anecdotes from the authors' work, the…
Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices
Yeh, Christine J.; Inman, Arpana G.
2007-01-01
This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…
Interpreting Evidence-of-Learning: Educational Research in the Era of Big Data
Cope, Bill; Kalantzis, Mary
2015-01-01
In this article, we argue that big data can offer new opportunities and roles for educational researchers. In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and…
Taylor, P. T.; Kis, K. I.; Wittmann, G.
2013-01-01
The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.
/preference responses or ties in choice experiments. Food Quality and Preference, 23, 13–17) noted that this proportion can depend on the product category, have proposed that the expected proportion of preference responses within a given category be called an identicality norm, and have argued that knowledge...... of such norms is valuable for more complete interpretation of 2-Alternative Choice (2-AC) data. For instance, these norms can be used to indicate consumer segmentation even with non-replicated data. In this paper, we show that the statistical test suggested by Ennis and Ennis (2012a) behaves poorly and has too...... when ingredient changes are considered for cost-reduction or health initiative purposes....
PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.
Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A
2015-01-01
Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.
Interpretation of TLD data measured in the vicinity of nuclear power plants
It is shown that incorporating the location-specific characteristics of natural radiation into the interpretation of the surrounding measurements makes some valuable contributions to the improvement of the measuring quality of thermoluminescent enviromental dosimetry. This brings the possibility to determine the net dose of the additional man-made radiations (e.g. caused by the nuclear power plant) with better accuracy. The authors propose a method of analysing the measured results which enables one to include the measured data from the evidence finding phase in the interpretation of the environment monitoring-TLD-measurement (orig./DG) [de
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
Criminal victimization in Ukraine: analysis of statistical data
Full Text Available The article is based on the analysis of statistical data provided by law-enforcement, judicial and other bodies of Ukraine. The given analysis allows us to give an accurate quantity of a current status of crime victimization in Ukraine, to characterize its basic features (level, rate, structure, dynamics, and etc.. L’article se concentre sur l’analyse des données statystiques fournies par les institutions de contrôle sociale (forces de police et magistrature et par d’autres organes institutionnels ukrainiens. Les analyses effectuées attirent l'attention sur la situation actuelle des victimes du crime en Ukraine et aident à délinéer leur principales caractéristiques (niveau, taux, structure, dynamiques, etc.L’articolo si basa sull’analisi dei dati statistici forniti dalle agenzie del controllo sociale (forze dell'ordine e magistratura e da altri organi istituzionali ucraini. Le analisi effettuate forniscono molte informazioni sulla situazione attuale delle vittime del crimine in Ucraina e aiutano a delinearne le caratteristiche principali (livello, tasso, struttura, dinamiche, ecc..
Statistical Modelling of Wind Proles - Data Analysis and Modelling
Sensitivity analysis of ranked data: from order statistics to quantiles
Heidergott, B.F.; Volk-Makarewicz, W.
2015-01-01
In this paper we provide the mathematical theory for sensitivity analysis of order statistics of continuous random variables, where the sensitivity is with respect to a distributional parameter. Sensitivity analysis of order statistics over a finite number of observations is discussed before
Applying Statistical Process Control to Clinical Data: An Illustration.
Pfadt, Al; And Others
1992-01-01
Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…
Khan, Haseeb Ahmad
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for tra...
EBprot: Statistical analysis of labeling-based quantitative proteomics data.
Koh, Hiromi W L; Swa, Hannah L F; Fermin, Damian; Ler, Siok Ghee; Gunaratne, Jayantha; Choi, Hyungwon
2015-08-01
Labeling-based proteomics is a powerful method for detection of differentially expressed proteins (DEPs). The current data analysis platform typically relies on protein-level ratios, which is obtained by summarizing peptide-level ratios for each protein. In shotgun proteomics, however, some proteins are quantified with more peptides than others, and this reproducibility information is not incorporated into the differential expression (DE) analysis. Here, we propose a novel probabilistic framework EBprot that directly models the peptide-protein hierarchy and rewards the proteins with reproducible evidence of DE over multiple peptides. To evaluate its performance with known DE states, we conducted a simulation study to show that the peptide-level analysis of EBprot provides better receiver-operating characteristic and more accurate estimation of the false discovery rates than the methods based on protein-level ratios. We also demonstrate superior classification performance of peptide-level EBprot analysis in a spike-in dataset. To illustrate the wide applicability of EBprot in different experimental designs, we applied EBprot to a dataset for lung cancer subtype analysis with biological replicates and another dataset for time course phosphoproteome analysis of EGF-stimulated HeLa cells with multiplexed labeling. Through these examples, we show that the peptide-level analysis of EBprot is a robust alternative to the existing statistical methods for the DE analysis of labeling-based quantitative datasets. The software suite is freely available on the Sourceforge website http://ebprot.sourceforge.net/. All MS data have been deposited in the ProteomeXchange with identifier PXD001426 (http://proteomecentral.proteomexchange.org/dataset/PXD001426/). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A statistical framework for differential network analysis from microarray data
Directory of Open Access Journals (Sweden)
Datta Somnath
2010-02-01
Full Text Available Abstract Background It has been long well known that genes do not act alone; rather groups of genes act in consort during a biological process. Consequently, the expression levels of genes are dependent on each other. Experimental techniques to detect such interacting pairs of genes have been in place for quite some time. With the advent of microarray technology, newer computational techniques to detect such interaction or association between gene expressions are being proposed which lead to an association network. While most microarray analyses look for genes that are differentially expressed, it is of potentially greater significance to identify how entire association network structures change between two or more biological settings, say normal versus diseased cell types. Results We provide a recipe for conducting a differential analysis of networks constructed from microarray data under two experimental settings. At the core of our approach lies a connectivity score that represents the strength of genetic association or interaction between two genes. We use this score to propose formal statistical tests for each of following queries: (i whether the overall modular structures of the two networks are different, (ii whether the connectivity of a particular set of "interesting genes" has changed between the two networks, and (iii whether the connectivity of a given single gene has changed between the two networks. A number of examples of this score is provided. We carried out our method on two types of simulated data: Gaussian networks and networks based on differential equations. We show that, for appropriate choices of the connectivity scores and tuning parameters, our method works well on simulated data. We also analyze a real data set involving normal versus heavy mice and identify an interesting set of genes that may play key roles in obesity. Conclusions Examining changes in network structure can provide valuable information about the
The interpretation of Charpy impact test data using hyper-logistic fitting functions
International Nuclear Information System (INIS)
Helm, J.L.
1996-01-01
The hyperbolic tangent function is used almost exclusively for computer assisted curve fitting of Charpy impact test data. Unfortunately, there is no physical basis to justify the use of this function and it cannot be generalized to test data that exhibits asymmetry. Using simple physical arguments, a semi-empirical model is derived and identified as a special case of the so called hyper-logistic equation. Although one solution of this equation is the hyperbolic tangent, other more physically interpretable solutions are provided. From the mathematics of the family of functions derived from the hyper-logistic equation, several useful generalizations are made such that asymmetric and wavy Charpy data can be physically interpreted
Chronic Obstructive Pulmonary Disease (COPD): Data and Statistics
... and Statistics Recommend on Facebook Tweet Share Compartir COPD Death Rates in the United States Printable Version [ ... Ohio and Mississippi Rivers. Printable Version [PDF 733KB] COPD Prevalence in the United States Printable Version [PDF ...
Use of Statistics for Data Evaluation in Environmental Radioactivity Measurements
International Nuclear Information System (INIS)
Sutarman
2001-01-01
Counting statistics will give a correction on environmental radioactivity measurement result. Statistics provides formulas to determine standard deviation (S B ) and minimum detectable concentration (MDC) according to the Poisson distribution. Both formulas depend on the background count rate, counting time, counting efficiency, gamma intensity, and sample size. A long time background counting results in relatively low S B and MDC that can present relatively accurate measurement results. (author)
Network similarity and statistical analysis of earthquake seismic data
Deyasi, Krishanu; Chakraborty, Abhijit; Banerjee, Anirban
2016-01-01
We study the structural similarity of earthquake networks constructed from seismic catalogs of different geographical regions. A hierarchical clustering of underlying undirected earthquake networks is shown using Jensen-Shannon divergence in graph spectra. The directed nature of links indicates that each earthquake network is strongly connected, which motivates us to study the directed version statistically. Our statistical analysis of each earthquake region identifies the hub regions. We cal...
A Novel Approach to Asynchronous MVP Data Interpretation Based on Elliptical-Vectors
Kruglyakov, M.; Trofimov, I.; Korotaev, S.; Shneyer, V.; Popova, I.; Orekhova, D.; Scshors, Y.; Zhdanov, M. S.
2014-12-01
We suggest a novel approach to asynchronous magnetic-variation profiling (MVP) data interpretation. Standard method in MVP is based on the interpretation of the coefficients of linear relation between vertical and horizontal components of the measured magnetic field.From mathematical point of view this pair of linear coefficients is not a vector which leads to significant difficulties in asynchronous data interpretation. Our approach allows us to actually treat such a pair of complex numbers as a special vector called an ellipse-vector (EV). By choosing the particular definitions of complex length and direction, the basic relation of MVP can be considered as the dot product. This considerably simplifies the interpretation of asynchronous data. The EV is described by four real numbers: the values of major and minor semiaxes, the angular direction of the major semiaxis and the phase. The notation choice is motivated by historical reasons. It is important that different EV's components have different sensitivity with respect to the field sources and the local heterogeneities. Namely, the value of major semiaxis and the angular direction are mostly determined by the field source and the normal cross-section. On the other hand, the value of minor semiaxis and the phase are responsive to local heterogeneities. Since the EV is the general form of complex vector, the traditional Schmucker vectors can be explicitly expressed through its components.The proposed approach was successfully applied to interpretation the results of asynchronous measurements that had been obtained in the Arctic Ocean at the drift stations "North Pole" in 1962-1976.
Accountability scale data from the Global Nuclear Fuels (GNF) fuel fabrication facility in Wilmington, NC has been collected and analyzed as a part of the Cylinder Accountability and Tracking System (CATS) field trial in 2009. The purpose of the data collection was to demonstrate an authentication method for safeguards applications, and the use of load cell data in cylinder accountability. The scale data was acquired using a commercial off-the-shelf communication server with authentication and encryption capabilities. The authenticated weight data was then analyzed to determine facility operating activities. The data allowed for the determination of the number of full and empty cylinders weighed and the respective weights along with other operational activities. Data authentication concepts, practices and methods, the details of the GNF weight data authentication implementation and scale data interpretation results will be presented.
Singamsetti, Rao
2007-01-01
In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…
Azad Henareh Khalyani; William A. Gould; Eric Harmsen; Adam Terando; Maya Quinones; Jaime A. Collazo
2016-01-01
statistically downscaled general circulation models (GCMs) taking Puerto Rico as a test case. Two model selection/model averaging strategies were used: the average of all available GCMs and the av-erage of the models that are able to...
Bersimis, Sotiris; Panaretos, John; Psarakis, Stelios
2005-01-01
Woodall and Montgomery [35] in a discussion paper, state that multivariate process control is one of the most rapidly developing sections of statistical process control. Nowadays, in industry, there are many situations in which the simultaneous monitoring or control, of two or more related quality - process characteristics is necessary. Process monitoring problems in which several related variables are of interest are collectively known as Multivariate Statistical Process Control (MSPC).This ...
Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben
2017-09-15
Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Overview of sampling, analysis and data interpretation from sumps and pits
International Nuclear Information System (INIS)
Banks, J.C.; Banks, S.J.
1999-01-01
Aspects of sampling, environmental analysis and data interpretation for sumps and pits are discussed. According to regulatory requirements of the Alberta Energy and Utilities Board (EUB) and Alberta Environmental Protection (AEP), if a sump or pit is impacting the surrounding environment, the situation must be assessed for remediation. An impact on the environment occurs when chemicals or compounds are introduced at a level that is significant enough to cause a chemical imbalance. The immediate goal in remediating an impacted site should be to contain the released chemical to avoid the movement of the chemical through the environment by dispersion, evaporation, capillary action, bioaccumulation or transfer to groundwater. This paper also discussed some of the key issues that should be considered in properly interpreting analytical data regarding spills and remedial action. 2 refs
Experimental data at high PT and its interpretation: the role of theory
Experiments, relevant for planetary science, are performed often under extreme conditions of pressure and temperature. This makes them technically difficult. The results are often difficult to interpret correctly, especially in the cases when experimental data are scarce and experimental trends difficult to establish. Theory, while normally is inferior in precision of delivered data, is superior in providing a big picture and details behind materials behavior. We consider the experiments performed for deuterium, Mo, and Fe. We demonstrate that when experimental data is verified by theory, significant insight can be gained. (Author) 26 refs.
Statistical Analysis of CMC Constituent and Processing Data
Fornuff, Jonathan
2004-01-01
Ceramic Matrix Composites (CMCs) are the next "big thing" in high-temperature structural materials. In the case of jet engines, it is widely believed that the metallic superalloys currently being utilized for hot structures (combustors, shrouds, turbine vanes and blades) are nearing their potential limits of improvement. In order to allow for increased turbine temperatures to increase engine efficiency, material scientists have begun looking toward advanced CMCs and SiC/SiC composites in particular. Ceramic composites provide greater strength-to-weight ratios at higher temperatures than metallic alloys, but at the same time require greater challenges in micro-structural optimization that in turn increases the cost of the material as well as increases the risk of variability in the material s thermo-structural behavior. to model various potential CMC engine materials and examines the current variability in these properties due to variability in component processing conditions and constituent materials; then, to see how processing and constituent variations effect key strength, stiffness, and thermal properties of the finished components. Basically, this means trying to model variations in the component s behavior by knowing what went into creating it. inter-phase and manufactured by chemical vapor infiltration (CVI) and melt infiltration (MI) were considered. Examinations of: (1) the percent constituents by volume, (2) the inter-phase thickness, (3) variations in the total porosity, and (4) variations in the chemical composition of the Sic fiber are carried out and modeled using various codes used here at NASA-Glenn (PCGina, NASALife, CEMCAN, etc...). The effects of these variations and the ranking of their respective influences on the various thermo-mechanical material properties are studied and compared to available test data. The properties of the materials as well as minor changes to geometry are then made to the computer model and the detrimental effects
Categorical and nonparametric data analysis choosing the best statistical technique
Nussbaum, E Michael
2014-01-01
Featuring in-depth coverage of categorical and nonparametric statistics, this book provides a conceptual framework for choosing the most appropriate type of test in various research scenarios. Class tested at the University of Nevada, the book's clear explanations of the underlying assumptions, computer simulations, and Exploring the Concept boxes help reduce reader anxiety. Problems inspired by actual studies provide meaningful illustrations of the techniques. The underlying assumptions of each test and the factors that impact validity and statistical power are reviewed so readers can explain
Lintott, Paul R; Davison, Sophie; van Breda, John; Kubasiewicz, Laura; Dowse, David; Daisley, Jonathan; Haddy, Emily; Mathews, Fiona
2018-01-01
Acoustic surveys of bats are one of the techniques most commonly used by ecological practitioners. The results are used in Ecological Impact Assessments to assess the likely impacts of future developments on species that are widely protected in law, and to monitor developments' postconstruction. However, there is no standardized methodology for analyzing or interpreting these data, which can make the assessment of the ecological value of a site very subjective. Comparisons of sites and projects are therefore difficult for ecologists and decision-makers, for example, when trying to identify the best location for a new road based on relative bat activity levels along alternative routes. Here, we present a new web-based, data-driven tool, Ecobat, which addresses the need for a more robust way of interpreting ecological data. Ecobat offers users an easy, standardized, and objective method for analyzing bat activity data. It allows ecological practitioners to compare bat activity data at regional and national scales and to generate a numerical indicator of the relative importance of a night's worth of bat activity. The tool is free and open-source; because the underlying algorithms are already developed, it could easily be expanded to new geographical regions and species. Data donation is required to ensure the robustness of the analyses; we use a positive feedback mechanism to encourage ecological practitioners to share data by providing in return high quality, contextualized data analysis, and graphical visualizations for direct use in ecological reports.
Development of small scale soft x-ray lasers: Aspects of data interpretation
The widespread application of soft x-ray laser technology is contingent on the development of small scale soft x-ray lasers that do not require large laser facilities. Progress in the development of soft x-ray lasers pumped by a Nd laser of energy 6-12J is reported below. Some aspects of data interpretation and gain measurements in such systems are discussed. 11 refs., 11 figs
Bias and sensitivity in the placement of fossil taxa resulting from interpretations of missing data.
Sansom, Robert S
2015-03-01
The utility of fossils in evolutionary contexts is dependent on their accurate placement in phylogenetic frameworks, yet intrinsic and widespread missing data make this problematic. The complex taphonomic processes occurring during fossilization can make it difficult to distinguish absence from non-preservation, especially in the case of exceptionally preserved soft-tissue fossils: is a particular morphological character (e.g., appendage, tentacle, or nerve) missing from a fossil because it was never there (phylogenetic absence), or just happened to not be preserved (taphonomic loss)? Missing data have not been tested in the context of interpretation of non-present anatomy nor in the context of directional shifts and biases in affinity. Here, complete taxa, both simulated and empirical, are subjected to data loss through the replacement of present entries (1s) with either missing (?s) or absent (0s) entries. Both cause taxa to drift down trees, from their original position, toward the root. Absolute thresholds at which downshift is significant are extremely low for introduced absences (two entries replaced, 6% of present characters). The opposite threshold in empirical fossil taxa is also found to be low; two absent entries replaced with presences causes fossil taxa to drift up trees. As such, only a few instances of non-preserved characters interpreted as absences will cause fossil organisms to be erroneously interpreted as more primitive than they were in life. This observed sensitivity to coding non-present morphology presents a problem for all evolutionary studies that attempt to use fossils to reconstruct rates of evolution or unlock sequences of morphological change. Stem-ward slippage, whereby fossilization processes cause organisms to appear artificially primitive, appears to be a ubiquitous and problematic phenomenon inherent to missing data, even when no decay biases exist. Absent characters therefore require explicit justification and taphonomic
Full Text Available Humans can acquire the statistical features of the external world and employ them to control behaviors. Some external events occur in harmony with an agent’s action, and thus humans should also be able to acquire the statistical features between an action and its external outcome. We report that the acquired action-outcome statistical features alter the visual appearance of the action outcome. Pressing either of two assigned keys triggered visual motion whose direction was statistically biased either upward or downward, and observers judged the stimulus motion direction. Points of subjective equality (PSE for judging motion direction were shifted repulsively from the mean of the distribution associated with each key. Our Bayesian model accounted for the PSE shifts, indicating the optimal acquisition of the action-effect statistical relation. The PSE shifts were moderately attenuated when the action-outcome contingency was reduced. The Bayesian model again accounted for the attenuated PSE shifts. On the other hand, when the action-outcome contiguity was greatly reduced, the PSE shifts were greatly attenuated, and however, the Bayesian model could not accounted for the shifts. The results indicate that visual appearance can be modified by prediction based on the optimal acquisition of action-effect causal relation.
Bayesian statistical analysis of censored data in geotechnical engineering
The geotechnical engineer is often faced with the problem ofhow to assess the statistical properties of a soil parameter on the basis ofa sample measured in-situ or in the laboratory with the defect that somevalues have been replaced by interval bounds because the corresponding soilparameter values...
Conducting tests for statistically significant differences using forest inventory data
James A. Westfall; Scott A. Pugh; John W. Coulston
2013-01-01
Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
Extreme value theory and statistics for heavy tail data
CORDSPW - Windows computer program package for graphical interpretation of CORD-2 data
The CORD-2 package, developed at Jozef Stefan Institute, enables determination of the core power distribution and reactivity. Core distributions data generated during the calculation process are stored in CORlib files. CORDSP code, which is a part of the CORD-2 package, displays and compares data contained in CORlib files. Since it runs in the DOS environment, there are several limitations in the presentation of desired data. A CORDSPW package runs in the Windows environment and offers better graphical interpretation of the CORlib data. Core distributions can be displayed, compared, rewritten in the new files and sent to the printer. The user can select the appropriate display of the presented data such as core symmetry, colour and fonts. Core radial and axial distributions can be presented and compared. There are several options to store and print data. The user can choose between standard ASCII and graphical JPG format. (author)
Full Text Available Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation.
Gönci, Balázs; Németh, Valéria; Balogh, Emeric; Szabó, Bálint; Dénes, Ádám; Környei, Zsuzsanna; Vicsek, Tamás
2010-12-20
Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation.
Inversion interpretation of the mise-a-la-masse data; Denryu den`i ho data no inversion kaiseki
A program was developed for the inversion interpretation of the mise-a-la-masse data, and was applied to a numerical model experiment and to the study of data obtained by actual probing. For the development of this program, a program was used that calculated by finite difference approximation the potential produced by a linear current source, and studies were made through forward interpretation, inversion interpretation of the acquired apparent resistivity data, comparison with the true solution, accuracy and tendency, and the limitations. In the simulation of a horizontal 2-layer model, the parametric value after 20 repetitions converged with deviation of 1% or lower. This program was applied to the data from probing the Hatchobara district, Oita Prefecture, using a model wherein the target area was divided into 5 from east to west, and into 2 in the direction of depth. The result suggested that there was a large-scale low-resistivity body deep in the ground in the southeastern part of the investigated area. Furthermore, there was a spot detected in the direction of east-northeast that suggested an electric structure continuous in the direction of depth and a fault-like structure discontinuous in the transverse direction. 7 refs., 9 figs.