Equivalent statistics and data interpretation.
Francis, Gregory
2017-08-01
Recent reform efforts in psychological science have led to a plethora of choices for scientists to analyze their data. A scientist making an inference about their data must now decide whether to report a p value, summarize the data with a standardized effect size and its confidence interval, report a Bayes Factor, or use other model comparison methods. To make good choices among these options, it is necessary for researchers to understand the characteristics of the various statistics used by the different analysis frameworks. Toward that end, this paper makes two contributions. First, it shows that for the case of a two-sample t test with known sample sizes, many different summary statistics are mathematically equivalent in the sense that they are based on the very same information in the data set. When the sample sizes are known, the p value provides as much information about a data set as the confidence interval of Cohen's d or a JZS Bayes factor. Second, this equivalence means that different analysis methods differ only in their interpretation of the empirical data. At first glance, it might seem that mathematical equivalence of the statistics suggests that it does not matter much which statistic is reported, but the opposite is true because the appropriateness of a reported statistic is relative to the inference it promotes. Accordingly, scientists should choose an analysis method appropriate for their scientific investigation. A direct comparison of the different inferential frameworks provides some guidance for scientists to make good choices and improve scientific practice.
Statistics and Data Interpretation for Social Work
Rosenthal, James
2011-01-01
"Without question, this text will be the most authoritative source of information on statistics in the human services. From my point of view, it is a definitive work that combines a rigorous pedagogy with a down to earth (commonsense) exploration of the complex and difficult issues in data analysis (statistics) and interpretation. I welcome its publication.". -Praise for the First Edition. Written by a social worker for social work students, this is a nuts and bolts guide to statistics that presents complex calculations and concepts in clear, easy-to-understand language. It includes
Statistical transformation and the interpretation of inpatient glucose control data.
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-03-01
To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.
Statistical Literacy: High School Students in Reading, Interpreting and Presenting Data
Hafiyusholeh, M.; Budayasa, K.; Siswono, T. Y. E.
2018-01-01
One of the foundations for high school students in statistics is to be able to read data; presents data in the form of tables and diagrams and its interpretation. The purpose of this study is to describe high school students’ competencies in reading, interpreting and presenting data. Subjects were consisted of male and female students who had high levels of mathematical ability. Collecting data was done in form of task formulation which is analyzed by reducing, presenting and verifying data. Results showed that the students read the data based on explicit explanations on the diagram, such as explaining the points in the diagram as the relation between the x and y axis and determining the simple trend of a graph, including the maximum and minimum point. In interpreting and summarizing the data, both subjects pay attention to general data trends and use them to predict increases or decreases in data. The male estimates the value of the (n+1) of weight data by using the modus of the data, while the females estimate the weigth by using the average. The male tend to do not consider the characteristics of the data, while the female more carefully consider the characteristics of data.
Misuse of statistics in the interpretation of data on low-level radiation
International Nuclear Information System (INIS)
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds
Misuse of statistics in the interpretation of data on low-level radiation
Energy Technology Data Exchange (ETDEWEB)
Hamilton, L.D.
1982-01-01
Four misuses of statistics in the interpretation of data of low-level radiation are reviewed: (1) post-hoc analysis and aggregation of data leading to faulty conclusions in the reanalysis of genetic effects of the atomic bomb, and premature conclusions on the Portsmouth Naval Shipyard data; (2) inappropriate adjustment for age and ignoring differences between urban and rural areas leading to potentially spurious increase in incidence of cancer at Rocky Flats; (3) hazard of summary statistics based on ill-conditioned individual rates leading to spurious association between childhood leukemia and fallout in Utah; and (4) the danger of prematurely published preliminary work with inadequate consideration of epidemiological problems - censored data - leading to inappropriate conclusions, needless alarm at the Portsmouth Naval Shipyard, and diversion of scarce research funds.
Statistics translated a step-by-step guide to analyzing and interpreting data
Terrell, Steven R
2012-01-01
Written in a humorous and encouraging style, this text shows how the most common statistical tools can be used to answer interesting real-world questions, presented as mysteries to be solved. Engaging research examples lead the reader through a series of six steps, from identifying a researchable problem to stating a hypothesis, identifying independent and dependent variables, and selecting and interpreting appropriate statistical tests. All techniques are demonstrated both manually and with the help of SPSS software. The book provides students and others who may need to read and interpret sta
Saulnier, George E; Castro, Janna C; Cook, Curtiss B
2014-05-01
Glucose control can be problematic in critically ill patients. We evaluated the impact of statistical transformation on interpretation of intensive care unit inpatient glucose control data. Point-of-care blood glucose (POC-BG) data derived from patients in the intensive care unit for 2011 was obtained. Box-Cox transformation of POC-BG measurements was performed, and distribution of data was determined before and after transformation. Different data subsets were used to establish statistical upper and lower control limits. Exponentially weighted moving average (EWMA) control charts constructed from April, October, and November data determined whether out-of-control events could be identified differently in transformed versus nontransformed data. A total of 8679 POC-BG values were analyzed. POC-BG distributions in nontransformed data were skewed but approached normality after transformation. EWMA control charts revealed differences in projected detection of out-of-control events. In April, an out-of-control process resulting in the lower control limit being exceeded was identified at sample 116 in nontransformed data but not in transformed data. October transformed data detected an out-of-control process exceeding the upper control limit at sample 27 that was not detected in nontransformed data. Nontransformed November results remained in control, but transformation identified an out-of-control event less than 10 samples into the observation period. Using statistical methods to assess population-based glucose control in the intensive care unit could alter conclusions about the effectiveness of care processes for managing hyperglycemia. Further study is required to determine whether transformed versus nontransformed data change clinical decisions about the interpretation of care or intervention results. © 2014 Diabetes Technology Society.
Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall
2016-01-01
Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
Does environmental data collection need statistics?
Pulles, M.P.J.
1998-01-01
The term 'statistics' with reference to environmental science and policymaking might mean different things: the development of statistical methodology, the methodology developed by statisticians to interpret and analyse such data, or the statistical data that are needed to understand environmental
International Nuclear Information System (INIS)
Tadaki, Kohtaro
2010-01-01
The statistical mechanical interpretation of algorithmic information theory (AIT, for short) was introduced and developed by our former works [K. Tadaki, Local Proceedings of CiE 2008, pp. 425-434, 2008] and [K. Tadaki, Proceedings of LFCS'09, Springer's LNCS, vol. 5407, pp. 422-440, 2009], where we introduced the notion of thermodynamic quantities, such as partition function Z(T), free energy F(T), energy E(T), statistical mechanical entropy S(T), and specific heat C(T), into AIT. We then discovered that, in the interpretation, the temperature T equals to the partial randomness of the values of all these thermodynamic quantities, where the notion of partial randomness is a stronger representation of the compression rate by means of program-size complexity. Furthermore, we showed that this situation holds for the temperature T itself, which is one of the most typical thermodynamic quantities. Namely, we showed that, for each of the thermodynamic quantities Z(T), F(T), E(T), and S(T) above, the computability of its value at temperature T gives a sufficient condition for T is an element of (0,1) to satisfy the condition that the partial randomness of T equals to T. In this paper, based on a physical argument on the same level of mathematical strictness as normal statistical mechanics in physics, we develop a total statistical mechanical interpretation of AIT which actualizes a perfect correspondence to normal statistical mechanics. We do this by identifying a microcanonical ensemble in the framework of AIT. As a result, we clarify the statistical mechanical meaning of the thermodynamic quantities of AIT.
Statistical interpretation of geochemical data
International Nuclear Information System (INIS)
Carambula, M.
1990-01-01
Statistical results have been obtained from a geochemical research from the following four aerial photographies Zapican, Carape, Las Canias, Alferez. They have been studied 3020 samples in total, to 22 chemical elements using plasma emission spectrometry methods.
Theoretical, analytical, and statistical interpretation of environmental data
International Nuclear Information System (INIS)
Lombard, S.M.
1974-01-01
The reliability of data from radiochemical analyses of environmental samples cannot be determined from nuclear counting statistics alone. The rigorous application of the principles of propagation of errors, an understanding of the physics and chemistry of the species of interest in the environment, and the application of information from research on the analytical procedure are all necessary for a valid estimation of the errors associated with analytical results. The specific case of the determination of plutonium in soil is considered in terms of analytical problems and data reliability. (U.S.)
Handbook of univariate and multivariate data analysis and interpretation with SPSS
Ho, Robert
2006-01-01
Many statistics texts tend to focus more on the theory and mathematics underlying statistical tests than on their applications and interpretation. This can leave readers with little understanding of how to apply statistical tests or how to interpret their findings. While the SPSS statistical software has done much to alleviate the frustrations of social science professionals and students who must analyze data, they still face daunting challenges in selecting the proper tests, executing the tests, and interpreting the test results.With emphasis firmly on such practical matters, this handbook se
Application of Statistical Tools for Data Analysis and Interpretation in Rice Plant Pathology
Directory of Open Access Journals (Sweden)
Parsuram Nayak
2018-01-01
Full Text Available There has been a significant advancement in the application of statistical tools in plant pathology during the past four decades. These tools include multivariate analysis of disease dynamics involving principal component analysis, cluster analysis, factor analysis, pattern analysis, discriminant analysis, multivariate analysis of variance, correspondence analysis, canonical correlation analysis, redundancy analysis, genetic diversity analysis, and stability analysis, which involve in joint regression, additive main effects and multiplicative interactions, and genotype-by-environment interaction biplot analysis. The advanced statistical tools, such as non-parametric analysis of disease association, meta-analysis, Bayesian analysis, and decision theory, take an important place in analysis of disease dynamics. Disease forecasting methods by simulation models for plant diseases have a great potentiality in practical disease control strategies. Common mathematical tools such as monomolecular, exponential, logistic, Gompertz and linked differential equations take an important place in growth curve analysis of disease epidemics. The highly informative means of displaying a range of numerical data through construction of box and whisker plots has been suggested. The probable applications of recent advanced tools of linear and non-linear mixed models like the linear mixed model, generalized linear model, and generalized linear mixed models have been presented. The most recent technologies such as micro-array analysis, though cost effective, provide estimates of gene expressions for thousands of genes simultaneously and need attention by the molecular biologists. Some of these advanced tools can be well applied in different branches of rice research, including crop improvement, crop production, crop protection, social sciences as well as agricultural engineering. The rice research scientists should take advantage of these new opportunities adequately in
Advanced statistics to improve the physical interpretation of atomization processes
International Nuclear Information System (INIS)
Panão, Miguel R.O.; Radu, Lucian
2013-01-01
Highlights: ► Finite pdf mixtures improves physical interpretation of sprays. ► Bayesian approach using MCMC algorithm is used to find the best finite mixture. ► Statistical method identifies multiple droplet clusters in a spray. ► Multiple drop clusters eventually associated with multiple atomization mechanisms. ► Spray described by drop size distribution and not only its moments. -- Abstract: This paper reports an analysis of the physics of atomization processes using advanced statistical tools. Namely, finite mixtures of probability density functions, which best fitting is found using a Bayesian approach based on a Markov chain Monte Carlo (MCMC) algorithm. This approach takes into account eventual multimodality and heterogeneities in drop size distributions. Therefore, it provides information about the complete probability density function of multimodal drop size distributions and allows the identification of subgroups in the heterogeneous data. This allows improving the physical interpretation of atomization processes. Moreover, it also overcomes the limitations induced by analyzing the spray droplets characteristics through moments alone, particularly, the hindering of different natures of droplet formation. Finally, the method is applied to physically interpret a case-study based on multijet atomization processes
Combinatorial interpretation of Haldane-Wu fractional exclusion statistics.
Aringazin, A K; Mazhitov, M I
2002-08-01
Assuming that the maximal allowed number of identical particles in a state is an integer parameter, q, we derive the statistical weight and analyze the associated equation that defines the statistical distribution. The derived distribution covers Fermi-Dirac and Bose-Einstein ones in the particular cases q=1 and q--> infinity (n(i)/q-->1), respectively. We show that the derived statistical weight provides a natural combinatorial interpretation of Haldane-Wu fractional exclusion statistics, and present exact solutions of the distribution equation.
HistFitter software framework for statistical data analysis
Baak, M.; Côte, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-01-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fitted to data and interpreted with statistical tests. A key innovation of HistFitter is its design, which is rooted in core analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its very fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with mu...
Method for statistical data analysis of multivariate observations
Gnanadesikan, R
1997-01-01
A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte
Structural interpretation of seismic data and inherent uncertainties
Bond, Clare
2013-04-01
Geoscience is perhaps unique in its reliance on incomplete datasets and building knowledge from their interpretation. This interpretation basis for the science is fundamental at all levels; from creation of a geological map to interpretation of remotely sensed data. To teach and understand better the uncertainties in dealing with incomplete data we need to understand the strategies individual practitioners deploy that make them effective interpreters. The nature of interpretation is such that the interpreter needs to use their cognitive ability in the analysis of the data to propose a sensible solution in their final output that is both consistent not only with the original data but also with other knowledge and understanding. In a series of experiments Bond et al. (2007, 2008, 2011, 2012) investigated the strategies and pitfalls of expert and non-expert interpretation of seismic images. These studies focused on large numbers of participants to provide a statistically sound basis for analysis of the results. The outcome of these experiments showed that a wide variety of conceptual models were applied to single seismic datasets. Highlighting not only spatial variations in fault placements, but whether interpreters thought they existed at all, or had the same sense of movement. Further, statistical analysis suggests that the strategies an interpreter employs are more important than expert knowledge per se in developing successful interpretations. Experts are successful because of their application of these techniques. In a new set of experiments a small number of experts are focused on to determine how they use their cognitive and reasoning skills, in the interpretation of 2D seismic profiles. Live video and practitioner commentary were used to track the evolving interpretation and to gain insight on their decision processes. The outputs of the study allow us to create an educational resource of expert interpretation through online video footage and commentary with
Hunting Down Interpretations of the HERA Large-$Q^{2}$ data
Ellis, John R.
1999-01-01
Possible interpretations of the HERA large-Q^2 data are reviewed briefly. The possibility of statistical fluctuations cannot be ruled out, and it seems premature to argue that the H1 and ZEUS anomalies are incompatible. The data cannot be explained away by modifications of parton distributions, nor do contact interactions help. A leptoquark interpretation would need a large tau-q branching ratio. Several R-violating squark interpretations are still viable despite all the constraints, and offer interesting experimental signatures, but please do not hold your breath.
Alternative interpretations of statistics on health effects of low-level radiation
International Nuclear Information System (INIS)
Hamilton, L.D.
1983-01-01
Four examples of the interpretation of statistics of data on low-level radiation are reviewed: (a) genetic effects of the atomic bombs at Hiroshima and Nagasaki, (b) cancer at Rocky Flats, (c) childhood leukemia and fallout in Utah, and (d) cancer among workers at the Portsmouth Naval Shipyard. Aggregation of data, adjustment for age, and other problems related to the determination of health effects of low-level radiation are discussed. Troublesome issues related to post hoc analysis are considered
HistFitter software framework for statistical data analysis
Energy Technology Data Exchange (ETDEWEB)
Baak, M. [CERN, Geneva (Switzerland); Besjes, G.J. [Radboud University Nijmegen, Nijmegen (Netherlands); Nikhef, Amsterdam (Netherlands); Cote, D. [University of Texas, Arlington (United States); Koutsman, A. [TRIUMF, Vancouver (Canada); Lorenz, J. [Ludwig-Maximilians-Universitaet Muenchen, Munich (Germany); Excellence Cluster Universe, Garching (Germany); Short, D. [University of Oxford, Oxford (United Kingdom)
2015-04-15
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
HistFitter software framework for statistical data analysis
International Nuclear Information System (INIS)
Baak, M.; Besjes, G.J.; Cote, D.; Koutsman, A.; Lorenz, J.; Short, D.
2015-01-01
We present a software framework for statistical data analysis, called HistFitter, that has been used extensively by the ATLAS Collaboration to analyze big datasets originating from proton-proton collisions at the Large Hadron Collider at CERN. Since 2012 HistFitter has been the standard statistical tool in searches for supersymmetric particles performed by ATLAS. HistFitter is a programmable and flexible framework to build, book-keep, fit, interpret and present results of data models of nearly arbitrary complexity. Starting from an object-oriented configuration, defined by users, the framework builds probability density functions that are automatically fit to data and interpreted with statistical tests. Internally HistFitter uses the statistics packages RooStats and HistFactory. A key innovation of HistFitter is its design, which is rooted in analysis strategies of particle physics. The concepts of control, signal and validation regions are woven into its fabric. These are progressively treated with statistically rigorous built-in methods. Being capable of working with multiple models at once that describe the data, HistFitter introduces an additional level of abstraction that allows for easy bookkeeping, manipulation and testing of large collections of signal hypotheses. Finally, HistFitter provides a collection of tools to present results with publication quality style through a simple command-line interface. (orig.)
Directory of Open Access Journals (Sweden)
Elżbieta Biernat
2014-12-01
Full Text Available Background: The aim of this paper is to assess whether basic descriptive statistics is sufficient to interpret the data on physical activity of Poles within occupational domain of life. Material and Methods: The study group consisted of 964 randomly selected Polish working professionals. The long version of the International Physical Activity Questionnaire (IPAQ was used. Descriptive statistics included characteristics of variables using: mean (M, median (Me, maximal and minimal values (max–min., standard deviation (SD and percentile values. Statistical inference was based on the comparison of variables with the significance level of 0.05 (Kruskal-Wallis and Pearson’s Chi2 tests. Results: Occupational physical activity (OPA was declared by 46.4% of respondents (vigorous – 23.5%, moderate – 30.2%, walking – 39.5%. The total OPA amounted to 2751.1 MET-min/week (Metabolic Equivalent of Task with very high standard deviation (SD = 5302.8 and max = 35 511 MET-min/week. It concerned different types of activities. Approximately 10% (90th percentile overstated the average. However, there was no significant difference depended on the character of the profession, or the type of activity. The average time of sitting was 256 min/day. As many as 39% of the respondents met the World Health Organization standards only due to OPA (42.5% of white-collar workers, 38% of administrative and technical employees and only 37.9% of physical workers. Conclusions: In the data analysis it is necessary to define quantiles to provide a fuller picture of the distributions of OPA in MET-min/week. It is also crucial to update the guidelines for data processing and analysis of long version of IPAQ. It seems that 16 h of activity/day is not a sufficient criterion for excluding the results from further analysis. Med Pr 2014;65(6:743–753
Vocational students' learning preferences: the interpretability of ipsative data.
Smith, P J
2000-02-01
A number of researchers have argued that ipsative data are not suitable for statistical procedures designed for normative data. Others have argued that the interpretability of such analyses of ipsative data are little affected where the number of variables and the sample size are sufficiently large. The research reported here represents a factor analysis of the scores on the Canfield Learning Styles Inventory for 1,252 students in vocational education. The results of the factor analysis of these ipsative data were examined in a context of existing theory and research on vocational students and lend support to the argument that the factor analysis of ipsative data can provide sensibly interpretable results.
Data analysis and interpretation for environmental surveillance
International Nuclear Information System (INIS)
1992-06-01
The Data Analysis and Interpretation for Environmental Surveillance Conference was held in Lexington, Kentucky, February 5--7, 1990. The conference was sponsored by what is now the Office of Environmental Compliance and Documentation, Oak Ridge National Laboratory. Participants included technical professionals from all Martin Marietta Energy Systems facilities, Westinghouse Materials Company of Ohio, Pacific Northwest Laboratory, and several technical support contractors. Presentations at the conference ranged the full spectrum of issues that effect the analysis and interpretation of environmental data. Topics included tracking systems for samples and schedules associated with ongoing programs; coalescing data from a variety of sources and pedigrees into integrated data bases; methods for evaluating the quality of environmental data through empirical estimates of parameters such as charge balance, pH, and specific conductance; statistical applications to the interpretation of environmental information; and uses of environmental information in risk and dose assessments. Hearing about and discussing this wide variety of topics provided an opportunity to capture the subtlety of each discipline and to appreciate the continuity that is required among the disciplines in order to perform high-quality environmental information analysis
Farrell, Mary Beth
2018-06-01
This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being
Application of descriptive statistics in analysis of experimental data
Mirilović Milorad; Pejin Ivana
2008-01-01
Statistics today represent a group of scientific methods for the quantitative and qualitative investigation of variations in mass appearances. In fact, statistics present a group of methods that are used for the accumulation, analysis, presentation and interpretation of data necessary for reaching certain conclusions. Statistical analysis is divided into descriptive statistical analysis and inferential statistics. The values which represent the results of an experiment, and which are the subj...
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures
Energy Technology Data Exchange (ETDEWEB)
Udey, Ruth Norma [Michigan State Univ., East Lansing, MI (United States)
2013-01-01
Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Basic statistical tools in research and data analysis
Directory of Open Access Journals (Sweden)
Zulfiqar Ali
2016-01-01
Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.
A statistical model for interpreting computerized dynamic posturography data
Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.
2002-01-01
Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.
Localized Smart-Interpretation
Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom
2014-05-01
The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f
Tuuli, Methodius G; Odibo, Anthony O
2011-08-01
The objective of this article is to discuss the rationale for common statistical tests used for the analysis and interpretation of prenatal diagnostic imaging studies. Examples from the literature are used to illustrate descriptive and inferential statistics. The uses and limitations of linear and logistic regression analyses are discussed in detail.
Robust statistics and geochemical data analysis
International Nuclear Information System (INIS)
Di, Z.
1987-01-01
Advantages of robust procedures over ordinary least-squares procedures in geochemical data analysis is demonstrated using NURE data from the Hot Springs Quadrangle, South Dakota, USA. Robust principal components analysis with 5% multivariate trimming successfully guarded the analysis against perturbations by outliers and increased the number of interpretable factors. Regression with SINE estimates significantly increased the goodness-of-fit of the regression and improved the correspondence of delineated anomalies with known uranium prospects. Because of the ubiquitous existence of outliers in geochemical data, robust statistical procedures are suggested as routine procedures to replace ordinary least-squares procedures
Statistical Data Editing in Scientific Articles.
Habibzadeh, Farrokh
2017-07-01
Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.
Systematic interpretation of microarray data using experiment annotations
Directory of Open Access Journals (Sweden)
Frohme Marcus
2006-12-01
Full Text Available Abstract Background Up to now, microarray data are mostly assessed in context with only one or few parameters characterizing the experimental conditions under study. More explicit experiment annotations, however, are highly useful for interpreting microarray data, when available in a statistically accessible format. Results We provide means to preprocess these additional data, and to extract relevant traits corresponding to the transcription patterns under study. We found correspondence analysis particularly well-suited for mapping such extracted traits. It visualizes associations both among and between the traits, the hereby annotated experiments, and the genes, revealing how they are all interrelated. Here, we apply our methods to the systematic interpretation of radioactive (single channel and two-channel data, stemming from model organisms such as yeast and drosophila up to complex human cancer samples. Inclusion of technical parameters allows for identification of artifacts and flaws in experimental design. Conclusion Biological and clinical traits can act as landmarks in transcription space, systematically mapping the variance of large datasets from the predominant changes down toward intricate details.
Guler, Mustafa; Gursoy, Kadir; Guven, Bulent
2016-01-01
Understanding and interpreting biased data, decision-making in accordance with the data, and critically evaluating situations involving data are among the fundamental skills necessary in the modern world. To develop these required skills, emphasis on statistical literacy in school mathematics has been gradually increased in recent years. The…
Statistical application of groundwater monitoring data at the Hanford Site
International Nuclear Information System (INIS)
Chou, C.J.; Johnson, V.G.; Hodges, F.N.
1993-09-01
Effective use of groundwater monitoring data requires both statistical and geohydrologic interpretations. At the Hanford Site in south-central Washington state such interpretations are used for (1) detection monitoring, assessment monitoring, and/or corrective action at Resource Conservation and Recovery Act sites; (2) compliance testing for operational groundwater surveillance; (3) impact assessments at active liquid-waste disposal sites; and (4) cleanup decisions at Comprehensive Environmental Response Compensation and Liability Act sites. Statistical tests such as the Kolmogorov-Smirnov two-sample test are used to test the hypothesis that chemical concentrations from spatially distinct subsets or populations are identical within the uppermost unconfined aquifer. Experience at the Hanford Site in applying groundwater background data indicates that background must be considered as a statistical distribution of concentrations, rather than a single value or threshold. The use of a single numerical value as a background-based standard ignores important information and may result in excessive or unnecessary remediation. Appropriate statistical evaluation techniques include Wilcoxon rank sum test, Quantile test, ''hot spot'' comparisons, and Kolmogorov-Smirnov types of tests. Application of such tests is illustrated with several case studies derived from Hanford groundwater monitoring programs. To avoid possible misuse of such data, an understanding of the limitations is needed. In addition to statistical test procedures, geochemical, and hydrologic considerations are integral parts of the decision process. For this purpose a phased approach is recommended that proceeds from simple to the more complex, and from an overview to detailed analysis
Distributed data collection for a database of radiological image interpretations
Long, L. Rodney; Ostchega, Yechiam; Goh, Gin-Hua; Thoma, George R.
1997-01-01
The National Library of Medicine, in collaboration with the National Center for Health Statistics and the National Institute for Arthritis and Musculoskeletal and Skin Diseases, has built a system for collecting radiological interpretations for a large set of x-ray images acquired as part of the data gathered in the second National Health and Nutrition Examination Survey. This system is capable of delivering across the Internet 5- and 10-megabyte x-ray images to Sun workstations equipped with X Window based 2048 X 2560 image displays, for the purpose of having these images interpreted for the degree of presence of particular osteoarthritic conditions in the cervical and lumbar spines. The collected interpretations can then be stored in a database at the National Library of Medicine, under control of the Illustra DBMS. This system is a client/server database application which integrates (1) distributed server processing of client requests, (2) a customized image transmission method for faster Internet data delivery, (3) distributed client workstations with high resolution displays, image processing functions and an on-line digital atlas, and (4) relational database management of the collected data.
Statistical processing of technological and radiochemical data
International Nuclear Information System (INIS)
Lahodova, Zdena; Vonkova, Kateřina
2011-01-01
The project described in this article had two goals. The main goal was to compare technological and radiochemical data from two units of nuclear power plant. The other goal was to check the collection, organization and interpretation of routinely measured data. Monitoring of analytical and radiochemical data is a very valuable source of knowledge for some processes in the primary circuit. Exploratory analysis of one-dimensional data was performed to estimate location and variability and to find extreme values, data trends, distribution, autocorrelation etc. This process allowed for the cleaning and completion of raw data. Then multiple analyses such as multiple comparisons, multiple correlation, variance analysis, and so on were performed. Measured data was organized into a data matrix. The results and graphs such as Box plots, Mahalanobis distance, Biplot, Correlation, and Trend graphs are presented in this article as statistical analysis tools. Tables of data were replaced with graphs because graphs condense large amounts of information into easy-to-understand formats. The significant conclusion of this work is that the collection and comprehension of data is a very substantial part of statistical processing. With well-prepared and well-understood data, its accurate evaluation is possible. Cooperation between the technicians who collect data and the statistician who processes it is also very important. (author)
Interpretation of the results of statistical measurements. [search for basic probability model
Olshevskiy, V. V.
1973-01-01
For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.
The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Smith, Joseph M.; Mather, Martha E.
2012-01-01
Ecological indicators are science-based tools used to assess how human activities have impacted environmental resources. For monitoring and environmental assessment, existing species assemblage data can be used to make these comparisons through time or across sites. An impediment to using assemblage data, however, is that these data are complex and need to be simplified in an ecologically meaningful way. Because multivariate statistics are mathematical relationships, statistical groupings may not make ecological sense and will not have utility as indicators. Our goal was to define a process to select defensible and ecologically interpretable statistical simplifications of assemblage data in which researchers and managers can have confidence. For this, we chose a suite of statistical methods, compared the groupings that resulted from these analyses, identified convergence among groupings, then we interpreted the groupings using species and ecological guilds. When we tested this approach using a statewide stream fish dataset, not all statistical methods worked equally well. For our dataset, logistic regression (Log), detrended correspondence analysis (DCA), cluster analysis (CL), and non-metric multidimensional scaling (NMDS) provided consistent, simplified output. Specifically, the Log, DCA, CL-1, and NMDS-1 groupings were ≥60% similar to each other, overlapped with the fluvial-specialist ecological guild, and contained a common subset of species. Groupings based on number of species (e.g., Log, DCA, CL and NMDS) outperformed groupings based on abundance [e.g., principal components analysis (PCA) and Poisson regression]. Although the specific methods that worked on our test dataset have generality, here we are advocating a process (e.g., identifying convergent groupings with redundant species composition that are ecologically interpretable) rather than the automatic use of any single statistical tool. We summarize this process in step-by-step guidance for the
Workplace Statistical Literacy for Teachers: Interpreting Box Plots
Pierce, Robyn; Chick, Helen
2013-01-01
As a consequence of the increased use of data in workplace environments, there is a need to understand the demands that are placed on users to make sense of such data. In education, teachers are being increasingly expected to interpret and apply complex data about student and school performance, and, yet it is not clear that they always have the…
Statistical methods for data analysis in particle physics
AUTHOR|(CDS)2070643
2015-01-01
This concise set of course-based notes provides the reader with the main concepts and tools to perform statistical analysis of experimental data, in particular in the field of high-energy physics (HEP). First, an introduction to probability theory and basic statistics is given, mainly as reminder from advanced undergraduate studies, yet also in view to clearly distinguish the Frequentist versus Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on upper limits as many applications in HEP concern hypothesis testing, where often the main goal is to provide better and better limits so as to be able to distinguish eventually between competing hypotheses or to rule out some of them altogether. Many worked examples will help newcomers to the field and graduate students to understand the pitfalls in applying theoretical concepts to actual data
Software for statistical data analysis used in Higgs searches
International Nuclear Information System (INIS)
Gumpert, Christian; Moneta, Lorenzo; Cranmer, Kyle; Kreiss, Sven; Verkerke, Wouter
2014-01-01
The analysis and interpretation of data collected by the Large Hadron Collider (LHC) requires advanced statistical tools in order to quantify the agreement between observation and theoretical models. RooStats is a project providing a statistical framework for data analysis with the focus on discoveries, confidence intervals and combination of different measurements in both Bayesian and frequentist approaches. It employs the RooFit data modelling language where mathematical concepts such as variables, (probability density) functions and integrals are represented as C++ objects. RooStats and RooFit rely on the persistency technology of the ROOT framework. The usage of a common data format enables the concept of digital publishing of complicated likelihood functions. The statistical tools have been developed in close collaboration with the LHC experiments to ensure their applicability to real-life use cases. Numerous physics results have been produced using the RooStats tools, with the discovery of the Higgs boson by the ATLAS and CMS experiments being certainly the most popular among them. We will discuss tools currently used by LHC experiments to set exclusion limits, to derive confidence intervals and to estimate discovery significances based on frequentist statistics and the asymptotic behaviour of likelihood functions. Furthermore, new developments in RooStats and performance optimisation necessary to cope with complex models depending on more than 1000 variables will be reviewed
International Nuclear Information System (INIS)
Shafieloo, Arman
2012-01-01
By introducing Crossing functions and hyper-parameters I show that the Bayesian interpretation of the Crossing Statistics [1] can be used trivially for the purpose of model selection among cosmological models. In this approach to falsify a cosmological model there is no need to compare it with other models or assume any particular form of parametrization for the cosmological quantities like luminosity distance, Hubble parameter or equation of state of dark energy. Instead, hyper-parameters of Crossing functions perform as discriminators between correct and wrong models. Using this approach one can falsify any assumed cosmological model without putting priors on the underlying actual model of the universe and its parameters, hence the issue of dark energy parametrization is resolved. It will be also shown that the sensitivity of the method to the intrinsic dispersion of the data is small that is another important characteristic of the method in testing cosmological models dealing with data with high uncertainties
Model-Based Integration and Interpretation of Data
DEFF Research Database (Denmark)
Petersen, Johannes
2004-01-01
Data integration and interpretation plays a crucial role in supervisory control. The paper defines a set of generic inference steps for the data integration and interpretation process based on a three-layer model of system representations. The three-layer model is used to clarify the combination...... of constraint and object-centered representations of the work domain throwing new light on the basic principles underlying the data integration and interpretation process of Rasmussen's abstraction hierarchy as well as other model-based approaches combining constraint and object-centered representations. Based...
Analysis of Visual Interpretation of Satellite Data
Svatonova, H.
2016-06-01
Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a) the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape) and b) to selected characteristics of users (expertise, gender, age). The results of the research showed that (1) false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2) colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour) increases the success rate of identifying the element (3) experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4) men and women are equally successful in the interpretation of visual image data.
ANALYSIS OF VISUAL INTERPRETATION OF SATELLITE DATA
Directory of Open Access Journals (Sweden)
H. Svatonova
2016-06-01
Full Text Available Millions of people of all ages and expertise are using satellite and aerial data as an important input for their work in many different fields. Satellite data are also gradually finding a new place in education, especially in the fields of geography and in environmental issues. The article presents the results of an extensive research in the area of visual interpretation of image data carried out in the years 2013 - 2015 in the Czech Republic. The research was aimed at comparing the success rate of the interpretation of satellite data in relation to a the substrates (to the selected colourfulness, the type of depicted landscape or special elements in the landscape and b to selected characteristics of users (expertise, gender, age. The results of the research showed that (1 false colour images have a slightly higher percentage of successful interpretation than natural colour images, (2 colourfulness of an element expected or rehearsed by the user (regardless of the real natural colour increases the success rate of identifying the element (3 experts are faster in interpreting visual data than non-experts, with the same degree of accuracy of solving the task, and (4 men and women are equally successful in the interpretation of visual image data.
Sources of Safety Data and Statistical Strategies for Design and Analysis: Clinical Trials.
Zink, Richard C; Marchenko, Olga; Sanchez-Kam, Matilde; Ma, Haijun; Jiang, Qi
2018-03-01
There has been an increased emphasis on the proactive and comprehensive evaluation of safety endpoints to ensure patient well-being throughout the medical product life cycle. In fact, depending on the severity of the underlying disease, it is important to plan for a comprehensive safety evaluation at the start of any development program. Statisticians should be intimately involved in this process and contribute their expertise to study design, safety data collection, analysis, reporting (including data visualization), and interpretation. In this manuscript, we review the challenges associated with the analysis of safety endpoints and describe the safety data that are available to influence the design and analysis of premarket clinical trials. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from clinical trials compared to other sources. Clinical trials are an important source of safety data that contribute to the totality of safety information available to generate evidence for regulators, sponsors, payers, physicians, and patients. This work is a result of the efforts of the American Statistical Association Biopharmaceutical Section Safety Working Group.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization
Eroglu, Sertac
2014-10-01
The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
International Nuclear Information System (INIS)
Lan, B.L.
2001-01-01
An alternative interpretation to Bohm's 'quantum force' and 'active information' is proposed. Numerical evidence is presented, which suggests that the time series of Bohm's 'quantum force' evaluated at the Bohmian position for non-stationary quantum states are typically non-Gaussian stable distributed with a flat power spectrum in classically chaotic Hamiltonian systems. An important implication of these statistical properties is briefly mentioned. (orig.)
Statistical data analysis using SAS intermediate statistical methods
Marasinghe, Mervyn G
2018-01-01
The aim of this textbook (previously titled SAS for Data Analytics) is to teach the use of SAS for statistical analysis of data for advanced undergraduate and graduate students in statistics, data science, and disciplines involving analyzing data. The book begins with an introduction beyond the basics of SAS, illustrated with non-trivial, real-world, worked examples. It proceeds to SAS programming and applications, SAS graphics, statistical analysis of regression models, analysis of variance models, analysis of variance with random and mixed effects models, and then takes the discussion beyond regression and analysis of variance to conclude. Pedagogically, the authors introduce theory and methodological basis topic by topic, present a problem as an application, followed by a SAS analysis of the data provided and a discussion of results. The text focuses on applied statistical problems and methods. Key features include: end of chapter exercises, downloadable SAS code and data sets, and advanced material suitab...
Autonomic Differentiation Map: A Novel Statistical Tool for Interpretation of Heart Rate Variability
Directory of Open Access Journals (Sweden)
Daniela Lucini
2018-04-01
Full Text Available In spite of the large body of evidence suggesting Heart Rate Variability (HRV alone or combined with blood pressure variability (providing an estimate of baroreflex gain as a useful technique to assess the autonomic regulation of the cardiovascular system, there is still an ongoing debate about methodology, interpretation, and clinical applications. In the present investigation, we hypothesize that non-parametric and multivariate exploratory statistical manipulation of HRV data could provide a novel informational tool useful to differentiate normal controls from clinical groups, such as athletes, or subjects affected by obesity, hypertension, or stress. With a data-driven protocol in 1,352 ambulant subjects, we compute HRV and baroreflex indices from short-term data series as proxies of autonomic (ANS regulation. We apply a three-step statistical procedure, by first removing age and gender effects. Subsequently, by factor analysis, we extract four ANS latent domains that detain the large majority of information (86.94%, subdivided in oscillatory (40.84%, amplitude (18.04%, pressure (16.48%, and pulse domains (11.58%. Finally, we test the overall capacity to differentiate clinical groups vs. control. To give more practical value and improve readability, statistical results concerning individual discriminant ANS proxies and ANS differentiation profiles are displayed through peculiar graphical tools, i.e., significance diagram and ANS differentiation map, respectively. This approach, which simultaneously uses all available information about the system, shows what domains make up the difference in ANS discrimination. e.g., athletes differ from controls in all domains, but with a graded strength: maximal in the (normalized oscillatory and in the pulse domains, slightly less in the pressure domain and minimal in the amplitude domain. The application of multiple (non-parametric and exploratory statistical and graphical tools to ANS proxies defines
[Big data in official statistics].
Zwick, Markus
2015-08-01
The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
DEFF Research Database (Denmark)
Eslamimanesh, Ali; Gharagheizi, Farhad; Mohammadi, Amir H.
2012-01-01
We, herein, present a statistical method for diagnostics of the outliers in phase equilibrium data (dissociation data) of simple clathrate hydrates. The applied algorithm is performed on the basis of the Leverage mathematical approach, in which the statistical Hat matrix, Williams Plot, and the r......We, herein, present a statistical method for diagnostics of the outliers in phase equilibrium data (dissociation data) of simple clathrate hydrates. The applied algorithm is performed on the basis of the Leverage mathematical approach, in which the statistical Hat matrix, Williams Plot...... in exponential form is used to represent/predict the hydrate dissociation pressures for three-phase equilibrium conditions (liquid water/ice–vapor-hydrate). The investigated hydrate formers are methane, ethane, propane, carbon dioxide, nitrogen, and hydrogen sulfide. It is interpreted from the obtained results...
Baseline Statistics of Linked Statistical Data
Scharnhorst, Andrea; Meroño-Peñuela, Albert; Guéret, Christophe
2014-01-01
We are surrounded by an ever increasing ocean of information, everybody will agree to that. We build sophisticated strategies to govern this information: design data models, develop infrastructures for data sharing, building tool for data analysis. Statistical datasets curated by National
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data
Directory of Open Access Journals (Sweden)
Scherer Stephen W
2011-05-01
Full Text Available Abstract Background Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. Results We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. Conclusions The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
School Violence: Data & Statistics
... Social Media Publications Injury Center School Violence: Data & Statistics Recommend on Facebook Tweet Share Compartir The first ... Vehicle Safety Traumatic Brain Injury Injury Response Data & Statistics (WISQARS) Funded Programs Press Room Social Media Publications ...
Interpreting Data: The Hybrid Mind
Heisterkamp, Kimberly; Talanquer, Vicente
2015-01-01
The central goal of this study was to characterize major patterns of reasoning exhibited by college chemistry students when analyzing and interpreting chemical data. Using a case study approach, we investigated how a representative student used chemical models to explain patterns in the data based on structure-property relationships. Our results…
Cancer Data and Statistics Tools
... Educational Campaigns Initiatives Stay Informed Cancer Data and Statistics Tools Recommend on Facebook Tweet Share Compartir Cancer Statistics Tools United States Cancer Statistics: Data Visualizations The ...
Tuberculosis Data and Statistics
... Advisory Groups Federal TB Task Force Data and Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... Set) Mortality and Morbidity Weekly Reports Data and Statistics Decrease in Reported Tuberculosis Cases MMWR 2010; 59 ( ...
Braun, Stefan; Pokorná, Šárka; Šachl, Radek; Hof, Martin; Heerklotz, Heiko; Hoernke, Maria
2018-01-23
The mode of action of membrane-active molecules, such as antimicrobial, anticancer, cell penetrating, and fusion peptides and their synthetic mimics, transfection agents, drug permeation enhancers, and biological signaling molecules (e.g., quorum sensing), involves either the general or local destabilization of the target membrane or the formation of defined, rather stable pores. Some effects aim at killing the cell, while others need to be limited in space and time to avoid serious damage. Biological tests reveal translocation of compounds and cell death but do not provide a detailed, mechanistic, and quantitative understanding of the modes of action and their molecular basis. Model membrane studies of membrane leakage have been used for decades to tackle this issue, but their interpretation in terms of biology has remained challenging and often quite limited. Here we compare two recent, powerful protocols to study model membrane leakage: the microscopic detection of dye influx into giant liposomes and time-correlated single photon counting experiments to characterize dye efflux from large unilamellar vesicles. A statistical treatment of both data sets does not only harmonize apparent discrepancies but also makes us aware of principal issues that have been confusing the interpretation of model membrane leakage data so far. Moreover, our study reveals a fundamental difference between nano- and microscale systems that needs to be taken into account when conclusions about microscale objects, such as cells, are drawn from nanoscale models.
Pattern recognition approach to data interpretation
National Research Council Canada - National Science Library
Wolff, Diane D; Parsons, M. L
1983-01-01
An attempt is made in this book to give scientists a detailed working knowledge of the powerful mathematical tools available to aid in data interpretation, especially when confronted with large data...
Bieber, Frederick R; Buckleton, John S; Budowle, Bruce; Butler, John M; Coble, Michael D
2016-08-31
The evaluation and interpretation of forensic DNA mixture evidence faces greater interpretational challenges due to increasingly complex mixture evidence. Such challenges include: casework involving low quantity or degraded evidence leading to allele and locus dropout; allele sharing of contributors leading to allele stacking; and differentiation of PCR stutter artifacts from true alleles. There is variation in statistical approaches used to evaluate the strength of the evidence when inclusion of a specific known individual(s) is determined, and the approaches used must be supportable. There are concerns that methods utilized for interpretation of complex forensic DNA mixtures may not be implemented properly in some casework. Similar questions are being raised in a number of U.S. jurisdictions, leading to some confusion about mixture interpretation for current and previous casework. Key elements necessary for the interpretation and statistical evaluation of forensic DNA mixtures are described. Given the most common method for statistical evaluation of DNA mixtures in many parts of the world, including the USA, is the Combined Probability of Inclusion/Exclusion (CPI/CPE). Exposition and elucidation of this method and a protocol for use is the focus of this article. Formulae and other supporting materials are provided. Guidance and details of a DNA mixture interpretation protocol is provided for application of the CPI/CPE method in the analysis of more complex forensic DNA mixtures. This description, in turn, should help reduce the variability of interpretation with application of this methodology and thereby improve the quality of DNA mixture interpretation throughout the forensic community.
Directory of Open Access Journals (Sweden)
Michelle Redman-MacLaren
2014-08-01
Full Text Available Background: Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. Objective: To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. Design: A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or ‘chunks’ of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. Results: New understandings of the data were evoked when women in interpretive focus groups analysed the data ‘chunks’. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Conclusions: Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Redman-MacLaren, Michelle; Mills, Jane; Tommbe, Rachael
2014-01-01
Participatory approaches to qualitative research practice constantly change in response to evolving research environments. Researchers are increasingly encouraged to undertake secondary analysis of qualitative data, despite epistemological and ethical challenges. Interpretive focus groups can be described as a more participative method for groups to analyse qualitative data. To facilitate interpretive focus groups with women in Papua New Guinea to extend analysis of existing qualitative data and co-create new primary data. The purpose of this was to inform a transformational grounded theory and subsequent health promoting action. A two-step approach was used in a grounded theory study about how women experience male circumcision in Papua New Guinea. Participants analysed portions or 'chunks' of existing qualitative data in story circles and built upon this analysis by using the visual research method of storyboarding. New understandings of the data were evoked when women in interpretive focus groups analysed the data 'chunks'. Interpretive focus groups encouraged women to share their personal experiences about male circumcision. The visual method of storyboarding enabled women to draw pictures to represent their experiences. This provided an additional focus for whole-of-group discussions about the research topic. Interpretive focus groups offer opportunity to enhance trustworthiness of findings when researchers undertake secondary analysis of qualitative data. The co-analysis of existing data and co-generation of new data between research participants and researchers informed an emergent transformational grounded theory and subsequent health promoting action.
Statistical methods for data analysis in particle physics
Lista, Luca
2017-01-01
This concise set of course-based notes provides the reader with the main concepts and tools needed to perform statistical analyses of experimental data, in particular in the field of high-energy physics (HEP). First, the book provides an introduction to probability theory and basic statistics, mainly intended as a refresher from readers’ advanced undergraduate studies, but also to help them clearly distinguish between the Frequentist and Bayesian approaches and interpretations in subsequent applications. More advanced concepts and applications are gradually introduced, culminating in the chapter on both discoveries and upper limits, as many applications in HEP concern hypothesis testing, where the main goal is often to provide better and better limits so as to eventually be able to distinguish between competing hypotheses, or to rule out some of them altogether. Many worked-out examples will help newcomers to the field and graduate students alike understand the pitfalls involved in applying theoretical co...
Statistical analysis of proteomics, metabolomics, and lipidomics data using mass spectrometry
Mertens, Bart
2017-01-01
This book presents an overview of computational and statistical design and analysis of mass spectrometry-based proteomics, metabolomics, and lipidomics data. This contributed volume provides an introduction to the special aspects of statistical design and analysis with mass spectrometry data for the new omic sciences. The text discusses common aspects of design and analysis between and across all (or most) forms of mass spectrometry, while also providing special examples of application with the most common forms of mass spectrometry. Also covered are applications of computational mass spectrometry not only in clinical study but also in the interpretation of omics data in plant biology studies. Omics research fields are expected to revolutionize biomolecular research by the ability to simultaneously profile many compounds within either patient blood, urine, tissue, or other biological samples. Mass spectrometry is one of the key analytical techniques used in these new omic sciences. Liquid chromatography mass ...
Official statistics and Big Data
Directory of Open Access Journals (Sweden)
Peter Struijs
2014-07-01
Full Text Available The rise of Big Data changes the context in which organisations producing official statistics operate. Big Data provides opportunities, but in order to make optimal use of Big Data, a number of challenges have to be addressed. This stimulates increased collaboration between National Statistical Institutes, Big Data holders, businesses and universities. In time, this may lead to a shift in the role of statistical institutes in the provision of high-quality and impartial statistical information to society. In this paper, the changes in context, the opportunities, the challenges and the way to collaborate are addressed. The collaboration between the various stakeholders will involve each partner building on and contributing different strengths. For national statistical offices, traditional strengths include, on the one hand, the ability to collect data and combine data sources with statistical products and, on the other hand, their focus on quality, transparency and sound methodology. In the Big Data era of competing and multiplying data sources, they continue to have a unique knowledge of official statistical production methods. And their impartiality and respect for privacy as enshrined in law uniquely position them as a trusted third party. Based on this, they may advise on the quality and validity of information of various sources. By thus positioning themselves, they will be able to play their role as key information providers in a changing society.
Directory of Open Access Journals (Sweden)
Kristjan Korjus
Full Text Available Supervised machine learning methods typically require splitting data into multiple chunks for training, validating, and finally testing classifiers. For finding the best parameters of a classifier, training and validation are usually carried out with cross-validation. This is followed by application of the classifier with optimized parameters to a separate test set for estimating the classifier's generalization performance. With limited data, this separation of test data creates a difficult trade-off between having more statistical power in estimating generalization performance versus choosing better parameters and fitting a better model. We propose a novel approach that we term "Cross-validation and cross-testing" improving this trade-off by re-using test data without biasing classifier performance. The novel approach is validated using simulated data and electrophysiological recordings in humans and rodents. The results demonstrate that the approach has a higher probability of discovering significant results than the standard approach of cross-validation and testing, while maintaining the nominal alpha level. In contrast to nested cross-validation, which is maximally efficient in re-using data, the proposed approach additionally maintains the interpretability of individual parameters. Taken together, we suggest an addition to currently used machine learning approaches which may be particularly useful in cases where model weights do not require interpretation, but parameters do.
Directory of Open Access Journals (Sweden)
Paul A. Swinton
2018-05-01
Full Text Available The concept of personalized nutrition and exercise prescription represents a topical and exciting progression for the discipline given the large inter-individual variability that exists in response to virtually all performance and health related interventions. Appropriate interpretation of intervention-based data from an individual or group of individuals requires practitioners and researchers to consider a range of concepts including the confounding influence of measurement error and biological variability. In addition, the means to quantify likely statistical and practical improvements are facilitated by concepts such as confidence intervals (CIs and smallest worthwhile change (SWC. The purpose of this review is to provide accessible and applicable recommendations for practitioners and researchers that interpret, and report personalized data. To achieve this, the review is structured in three sections that progressively develop a statistical framework. Section 1 explores fundamental concepts related to measurement error and describes how typical error and CIs can be used to express uncertainty in baseline measurements. Section 2 builds upon these concepts and demonstrates how CIs can be combined with the concept of SWC to assess whether meaningful improvements occur post-intervention. Finally, section 3 introduces the concept of biological variability and discusses the subsequent challenges in identifying individual response and non-response to an intervention. Worked numerical examples and interactive Supplementary Material are incorporated to solidify concepts and assist with implementation in practice.
Understanding spatial organizations of chromosomes via statistical analysis of Hi-C data
Hu, Ming; Deng, Ke; Qin, Zhaohui; Liu, Jun S.
2015-01-01
Understanding how chromosomes fold provides insights into the transcription regulation, hence, the functional state of the cell. Using the next generation sequencing technology, the recently developed Hi-C approach enables a global view of spatial chromatin organization in the nucleus, which substantially expands our knowledge about genome organization and function. However, due to multiple layers of biases, noises and uncertainties buried in the protocol of Hi-C experiments, analyzing and interpreting Hi-C data poses great challenges, and requires novel statistical methods to be developed. This article provides an overview of recent Hi-C studies and their impacts on biomedical research, describes major challenges in statistical analysis of Hi-C data, and discusses some perspectives for future research. PMID:26124977
Statistical methods for ranking data
Alvo, Mayer
2014-01-01
This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis. This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.
Muscular Dystrophy: Data and Statistics
... For… Media Policy Makers MD STARnet Data and Statistics Recommend on Facebook Tweet Share Compartir Expand All Collapse All The following data and statistics come from MD STARnet. Data from the MD ...
Statistical modeling for degradation data
Lio, Yuhlong; Ng, Hon; Tsai, Tzong-Ru
2017-01-01
This book focuses on the statistical aspects of the analysis of degradation data. In recent years, degradation data analysis has come to play an increasingly important role in different disciplines such as reliability, public health sciences, and finance. For example, information on products’ reliability can be obtained by analyzing degradation data. In addition, statistical modeling and inference techniques have been developed on the basis of different degradation measures. The book brings together experts engaged in statistical modeling and inference, presenting and discussing important recent advances in degradation data analysis and related applications. The topics covered are timely and have considerable potential to impact both statistics and reliability engineering.
Data Acquisition and Preprocessing in Studies on Humans: What Is Not Taught in Statistics Classes?
Zhu, Yeyi; Hernandez, Ladia M; Mueller, Peter; Dong, Yongquan; Forman, Michele R
2013-01-01
The aim of this paper is to address issues in research that may be missing from statistics classes and important for (bio-)statistics students. In the context of a case study, we discuss data acquisition and preprocessing steps that fill the gap between research questions posed by subject matter scientists and statistical methodology for formal inference. Issues include participant recruitment, data collection training and standardization, variable coding, data review and verification, data cleaning and editing, and documentation. Despite the critical importance of these details in research, most of these issues are rarely discussed in an applied statistics program. One reason for the lack of more formal training is the difficulty in addressing the many challenges that can possibly arise in the course of a study in a systematic way. This article can help to bridge this gap between research questions and formal statistical inference by using an illustrative case study for a discussion. We hope that reading and discussing this paper and practicing data preprocessing exercises will sensitize statistics students to these important issues and achieve optimal conduct, quality control, analysis, and interpretation of a study.
UN Data- Environmental Statistics: Waste
World Wide Human Geography Data Working Group — The Environment Statistics Database contains selected water and waste statistics by country. Statistics on water and waste are based on official statistics supplied...
UN Data: Environment Statistics: Waste
World Wide Human Geography Data Working Group — The Environment Statistics Database contains selected water and waste statistics by country. Statistics on water and waste are based on official statistics supplied...
Gerrits, Reinie G; Kringos, Dionne S; van den Berg, Michael J; Klazinga, Niek S
2018-03-07
Policy-makers, managers, scientists, patients and the general public are confronted daily with figures on health and healthcare through public reporting in newspapers, webpages and press releases. However, information on the key characteristics of these figures necessary for their correct interpretation is often not adequately communicated, which can lead to misinterpretation and misinformed decision-making. The objective of this research was to map the key characteristics relevant to the interpretation of figures on health and healthcare, and to develop a Figure Interpretation Assessment Tool-Health (FIAT-Health) through which figures on health and healthcare can be systematically assessed, allowing for a better interpretation of these figures. The abovementioned key characteristics of figures on health and healthcare were identified through systematic expert consultations in the Netherlands on four topic categories of figures, namely morbidity, healthcare expenditure, healthcare outcomes and lifestyle. The identified characteristics were used as a frame for the development of the FIAT-Health. Development of the tool and its content was supported and validated through regular review by a sounding board of potential users. Identified characteristics relevant for the interpretation of figures in the four categories relate to the figures' origin, credibility, expression, subject matter, population and geographical focus, time period, and underlying data collection methods. The characteristics were translated into a set of 13 dichotomous and 4-point Likert scale questions constituting the FIAT-Health, and two final assessment statements. Users of the FIAT-Health were provided with a summary overview of their answers to support a final assessment of the correctness of a figure and the appropriateness of its reporting. FIAT-Health can support policy-makers, managers, scientists, patients and the general public to systematically assess the quality of publicly reported
Data Systems and Reports as Active Participants in Data Interpretation
Rankin, Jenny Grant
2016-01-01
Most data-informed decision-making in education is undermined by flawed interpretations. Educator-driven interventions to improve data use are beneficial but not omnipotent, as data misunderstandings persist at schools and school districts commended for ideal data use support. Meanwhile, most data systems and reports display figures without…
Beginning statistics with data analysis
Mosteller, Frederick; Rourke, Robert EK
2013-01-01
This introduction to the world of statistics covers exploratory data analysis, methods for collecting data, formal statistical inference, and techniques of regression and analysis of variance. 1983 edition.
Advanced statistical methods in data science
Chen, Jiahua; Lu, Xuewen; Yi, Grace; Yu, Hao
2016-01-01
This book gathers invited presentations from the 2nd Symposium of the ICSA- CANADA Chapter held at the University of Calgary from August 4-6, 2015. The aim of this Symposium was to promote advanced statistical methods in big-data sciences and to allow researchers to exchange ideas on statistics and data science and to embraces the challenges and opportunities of statistics and data science in the modern world. It addresses diverse themes in advanced statistical analysis in big-data sciences, including methods for administrative data analysis, survival data analysis, missing data analysis, high-dimensional and genetic data analysis, longitudinal and functional data analysis, the design and analysis of studies with response-dependent and multi-phase designs, time series and robust statistics, statistical inference based on likelihood, empirical likelihood and estimating functions. The editorial group selected 14 high-quality presentations from this successful symposium and invited the presenters to prepare a fu...
Interpretive Reporting of Protein Electrophoresis Data by Microcomputer
Talamo, Thomas S.; Losos, Frank J.; Kessler, G. Frederick
1982-01-01
A microcomputer based system for interpretive reporting of protein electrophoretic data has been developed. Data for serum, urine and cerebrospinal fluid protein electrophoreses as well as immunoelectrophoresis can be entered. Patient demographic information is entered through the keyboard followed by manual entry of total and fractionated protein levels obtained after densitometer scanning of the electrophoretic strip. The patterns are then coded, interpreted, and final reports generated. In most cases interpretation time is less than one second. Misinterpretation by computer is uncommon and can be corrected by edit functions within the system. These discrepancies between computer and pathologist interpretation are automatically stored in a data file for later review and possible program modification. Any or all previous tests on a patient may be reviewed with graphic display of the electrophoretic pattern. The system has been in use for several months and is presently well accepted by both laboratory and clinical staff. It also allows rapid storage, retrieval and analysis of protein electrophoretic datab.
Statistical methods for astronomical data analysis
Chattopadhyay, Asis Kumar
2014-01-01
This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...
Data and Statistics: Heart Failure
... Summary Coverdell Program 2012-2015 State Summaries Data & Statistics Fact Sheets Heart Disease and Stroke Fact Sheets ... Roadmap for State Planning Other Data Resources Other Statistic Resources Grantee Information Cross-Program Information Online Tools ...
Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard
2017-11-01
Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.
Data Literacy is Statistical Literacy
Gould, Robert
2017-01-01
Past definitions of statistical literacy should be updated in order to account for the greatly amplified role that data now play in our lives. Experience working with high-school students in an innovative data science curriculum has shown that teaching statistical literacy, augmented by data literacy, can begin early.
Feiveson, Alan H.; Foy, Millennia; Ploutz-Snyder, Robert; Fiedler, James
2014-01-01
Do you have elevated p-values? Is the data analysis process getting you down? Do you experience anxiety when you need to respond to criticism of statistical methods in your manuscript? You may be suffering from Insufficient Statistical Support Syndrome (ISSS). For symptomatic relief of ISSS, come for a free consultation with JSC biostatisticians at our help desk during the poster sessions at the HRP Investigators Workshop. Get answers to common questions about sample size, missing data, multiple testing, when to trust the results of your analyses and more. Side effects may include sudden loss of statistics anxiety, improved interpretation of your data, and increased confidence in your results.
A Review of Statistical Techniques for 2x2 and RxC Categorical Data Tables In SPSS
Directory of Open Access Journals (Sweden)
Cengiz BAL
2009-11-01
Full Text Available In this study, a review of statistical techniques for RxC categorical data tables is explained in detail. The emphasis is given to the association of techniques and their corresponding data considerations. Some suggestions to how to handle specific categorical data tables in SPSS and common mistakes in the interpretation of the SPSS outputs are shown.
Statistical data analysis handbook
National Research Council Canada - National Science Library
Wall, Francis J
1986-01-01
It must be emphasized that this is not a text book on statistics. Instead it is a working tool that presents data analysis in clear, concise terms which can be readily understood even by those without formal training in statistics...
Applied statistics for social and management sciences
Miah, Abdul Quader
2016-01-01
This book addresses the application of statistical techniques and methods across a wide range of disciplines. While its main focus is on the application of statistical methods, theoretical aspects are also provided as fundamental background information. It offers a systematic interpretation of results often discovered in general descriptions of methods and techniques such as linear and non-linear regression. SPSS is also used in all the application aspects. The presentation of data in the form of tables and graphs throughout the book not only guides users, but also explains the statistical application and assists readers in interpreting important features. The analysis of statistical data is presented consistently throughout the text. Academic researchers, practitioners and other users who work with statistical data will benefit from reading Applied Statistics for Social and Management Sciences. .
Heuristics of the algorithm: Big Data, user interpretation and institutional translation
Directory of Open Access Journals (Sweden)
Göran Bolin
2015-10-01
Full Text Available Intelligence on mass media audiences was founded on representative statistical samples, analysed by statisticians at the market departments of media corporations. The techniques for aggregating user data in the age of pervasive and ubiquitous personal media (e.g. laptops, smartphones, credit cards/swipe cards and radio-frequency identification build on large aggregates of information (Big Data analysed by algorithms that transform data into commodities. While the former technologies were built on socio-economic variables such as age, gender, ethnicity, education, media preferences (i.e. categories recognisable to media users and industry representatives alike, Big Data technologies register consumer choice, geographical position, web movement, and behavioural information in technologically complex ways that for most lay people are too abstract to appreciate the full consequences of. The data mined for pattern recognition privileges relational rather than demographic qualities. We argue that the agency of interpretation at the bottom of market decisions within media companies nevertheless introduces a ‘heuristics of the algorithm’, where the data inevitably becomes translated into social categories. In the paper we argue that although the promise of algorithmically generated data is often implemented in automated systems where human agency gets increasingly distanced from the data collected (it is our technological gadgets that are being surveyed, rather than us as social beings, one can observe a felt need among media users and among industry actors to ‘translate back’ the algorithmically produced relational statistics into ‘traditional’ social parameters. The tenacious social structures within the advertising industries work against the techno-economically driven tendencies within the Big Data economy.
Statistical methods in regression and calibration analysis of chromosome aberration data
International Nuclear Information System (INIS)
Merkle, W.
1983-01-01
The method of iteratively reweighted least squares for the regression analysis of Poisson distributed chromosome aberration data is reviewed in the context of other fit procedures used in the cytogenetic literature. As an application of the resulting regression curves methods for calculating confidence intervals on dose from aberration yield are described and compared, and, for the linear quadratic model a confidence interval is given. Emphasis is placed on the rational interpretation and the limitations of various methods from a statistical point of view. (orig./MG)
Khan, Haseeb Ahmad
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for tra...
An Optimization Framework for Travel Pattern Interpretation of Cellular Data
Directory of Open Access Journals (Sweden)
Sarit Freund
2013-09-01
This paper explores methods for identifying travel patterns from cellular data. A primary challenge in this research is to provide an interpretation of the raw data that distinguishes between activity durations and travel durations. A novel framework is proposed for this purpose, based on a grading scheme for candidate interpretations of the raw data. A genetic algorithm is used to find interpretations with high grades, which are considered as the most reasonable ones. The proposed method is tested on a dataset of records covering 9454 cell-phone users over a period of one week. Preliminary evaluation of the resulting interpretations is presented.
Statistical analysis and data management
International Nuclear Information System (INIS)
Anon.
1981-01-01
This report provides an overview of the history of the WIPP Biology Program. The recommendations of the American Institute of Biological Sciences (AIBS) for the WIPP biology program are summarized. The data sets available for statistical analyses and problems associated with these data sets are also summarized. Biological studies base maps are presented. A statistical model is presented to evaluate any correlation between climatological data and small mammal captures. No statistically significant relationship between variance in small mammal captures on Dr. Gennaro's 90m x 90m grid and precipitation records from the Duval Potash Mine were found
Interpretation of magnetotelluric data: Pasco Basin, south central Washington
International Nuclear Information System (INIS)
Orange, A.; Berkman, E.
1985-01-01
The purpose of this project was to review, evaluate, and interpret magnetotelluric (MT) data collected in support of the Basalt Waste Isolation Project. The integrated interpretation presented is related to regional and site-specific geology and associated borehole, gravity, and magnetic data. The MT interpretation procedure placed strong reliance on computer models based upon the inferred physical parameters of the subsurface materials and their anticipated variability. Much of the MT data is of poor quality by current standards; however, significant qualitative observations can be made. The quantification of these observations, including the procedures and assumption utilized, are discussed in detail. Problems related to ambiguities inherent in the MT method are discussed as related to the Pasco Basin MT data. 117 refs., 77 figs., 3 tabs
Boulesteix, Anne-Laure; Wilson, Rory; Hapfelmeier, Alexander
2017-09-09
The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. We suggest that benchmark studies-a method of assessment of statistical methods using real-world datasets-might benefit from adopting (some) concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Applied statistics in ecology: common pitfalls and simple solutions
E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick
2013-01-01
The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...
The seismic analyzer: interpreting and illustrating 2D seismic data.
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, M Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seismic data, such as deformed texturing and line and texture transfer functions. The illustrative rendering results in multi-attribute and scale invariant visualizations where features are represented clearly in both highly zoomed in and zoomed out views. Thumbnail views in combination with interactive appearance control allows for a quick overview of the data before detailed interpretation takes place. These techniques help reduce the work of seismic illustrators and interpreters.
Statistical Methods for Fuzzy Data
Viertl, Reinhard
2011-01-01
Statistical data are not always precise numbers, or vectors, or categories. Real data are frequently what is called fuzzy. Examples where this fuzziness is obvious are quality of life data, environmental, biological, medical, sociological and economics data. Also the results of measurements can be best described by using fuzzy numbers and fuzzy vectors respectively. Statistical analysis methods have to be adapted for the analysis of fuzzy data. In this book, the foundations of the description of fuzzy data are explained, including methods on how to obtain the characterizing function of fuzzy m
... About Us Information For… Media Policy Makers Data & Statistics Recommend on Facebook Tweet Share Compartir Sickle cell ... 1999 through 2002. This drop coincided with the introduction in 2000 of a vaccine that protects against ...
DEFF Research Database (Denmark)
Christensen, Rune Haubo Bojesen; Ennis, John M.; Ennis, Daniel M.
2014-01-01
/preference responses or ties in choice experiments. Food Quality and Preference, 23, 13–17) noted that this proportion can depend on the product category, have proposed that the expected proportion of preference responses within a given category be called an identicality norm, and have argued that knowledge...... of such norms is valuable for more complete interpretation of 2-Alternative Choice (2-AC) data. For instance, these norms can be used to indicate consumer segmentation even with non-replicated data. In this paper, we show that the statistical test suggested by Ennis and Ennis (2012a) behaves poorly and has too...... when ingredient changes are considered for cost-reduction or health initiative purposes....
Developments in statistical evaluation of clinical trials
Oud, Johan; Ghidey, Wendimagegn
2014-01-01
This book describes various ways of approaching and interpreting the data produced by clinical trial studies, with a special emphasis on the essential role that biostatistics plays in clinical trials. Over the past few decades the role of statistics in the evaluation and interpretation of clinical data has become of paramount importance. As a result the standards of clinical study design, conduct and interpretation have undergone substantial improvement. The book includes 18 carefully reviewed chapters on recent developments in clinical trials and their statistical evaluation, with each chapter providing one or more examples involving typical data sets, enabling readers to apply the proposed procedures. The chapters employ a uniform style to enhance comparability between the approaches.
Probabilistic interpretation of data a physicist's approach
Miller, Guthrie
2013-01-01
This book is a physicists approach to interpretation of data using Markov Chain Monte Carlo (MCMC). The concepts are derived from first principles using a style of mathematics that quickly elucidates the basic ideas, sometimes with the aid of examples. Probabilistic data interpretation is a straightforward problem involving conditional probability. A prior probability distribution is essential, and examples are given. In this small book (200 pages) the reader is led from the most basic concepts of mathematical probability all the way to parallel processing algorithms for Markov Chain Monte Carlo. Fortran source code (for eigenvalue analysis of finite discrete Markov Chains, for MCMC, and for nonlinear least squares) is included with the supplementary material for this book (available online).
Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.
2014-12-01
An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.
SOCR: Statistics Online Computational Resource
Dinov, Ivo D.
2006-01-01
The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis...
International Nuclear Information System (INIS)
Smolders, R.; Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H.; Casteleyn, L.; Kolossa-Gehring, M.; Fiddicke, U.; Castaño, A.; Koch, H.M.; Angerer, J.; Esteban, M.; Sepai, O.; Exley, K.; Bloemen, L.; Horvat, M.; Knudsen, L.E.; Joas, A.; Joas, R.; Biot, P.
2015-01-01
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis
Energy Technology Data Exchange (ETDEWEB)
Smolders, R., E-mail: roel.smolders@vito.be [Flemish Institute of Technological Research (VITO), Environmental Risks and Health Unit, Boeretang 200, 2400 Mol (Belgium); Den Hond, E.; Koppen, G.; Govarts, E.; Willems, H. [Flemish Institute of Technological Research (VITO), Environmental Risks and Health Unit, Boeretang 200, 2400 Mol (Belgium); Casteleyn, L. [KU LEUVEN (Belgium); Kolossa-Gehring, M.; Fiddicke, U. [Federal Environment Agency (UBA) (Germany); Castaño, A. [Instituto de Salud Carlos III (Spain); Koch, H.M.; Angerer, J. [Institute for Prevention and Occupational Medicine of the German Social Accident Insurance - Institute of the Ruhr-Universität Bochum (IPA) (Germany); Esteban, M. [Instituto de Salud Carlos III (Spain); Sepai, O.; Exley, K. [Public Health England (United Kingdom); Bloemen, L. [Environmental Health Sciences International (Netherlands); Horvat, M. [Jožef Stefan Institute (Slovenia); Knudsen, L.E. [University of Copenhagen (Denmark); Joas, A.; Joas, R. [BiPRO (Germany); Biot, P. [FPS Health, Food Chain Safety and Environment (Belgium); and others
2015-08-15
In 2011 and 2012, the COPHES/DEMOCOPHES twin projects performed the first ever harmonized human biomonitoring survey in 17 European countries. In more than 1800 mother–child pairs, individual lifestyle data were collected and cadmium, cotinine and certain phthalate metabolites were measured in urine. Total mercury was determined in hair samples. While the main goal of the COPHES/DEMOCOPHES twin projects was to develop and test harmonized protocols and procedures, the goal of the current paper is to investigate whether the observed differences in biomarker values among the countries implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were not available for reporting or were not in line with predefined specifications. Therefore, only part of the external information could be included in the statistical analyses. Nonetheless, there was a highly significant correlation between national levels of fish consumption and mercury in hair, the strength of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the evaluation of) evidence-informed policy making. - Highlights: • External data was collected to interpret HBM data from DEMOCOPHES. • Hg in hair could be related to fish consumption across different countries. • Urinary cotinine was related to strictness of anti-smoking legislation. • Urinary Cd was borderline significantly related to air and food quality. • Lack of comparable data among countries hampered the analysis.
Applied statistics for economists
Lewis, Margaret
2012-01-01
This book is an undergraduate text that introduces students to commonly-used statistical methods in economics. Using examples based on contemporary economic issues and readily-available data, it not only explains the mechanics of the various methods, it also guides students to connect statistical results to detailed economic interpretations. Because the goal is for students to be able to apply the statistical methods presented, online sources for economic data and directions for performing each task in Excel are also included.
Directory of Open Access Journals (Sweden)
Anne-Laure Boulesteix
2017-09-01
Full Text Available Abstract Background The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly “evidence-based”. Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. Main message In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of “evidence-based” statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments. Conclusion We suggest that benchmark studies—a method of assessment of statistical methods using real-world datasets—might benefit from adopting (some concepts from evidence-based medicine towards the goal of more evidence-based statistical research.
Powerful Statistical Inference for Nested Data Using Sufficient Summary Statistics
Dowding, Irene; Haufe, Stefan
2018-01-01
Hierarchically-organized data arise naturally in many psychology and neuroscience studies. As the standard assumption of independent and identically distributed samples does not hold for such data, two important problems are to accurately estimate group-level effect sizes, and to obtain powerful statistical tests against group-level null hypotheses. A common approach is to summarize subject-level data by a single quantity per subject, which is often the mean or the difference between class means, and treat these as samples in a group-level t-test. This “naive” approach is, however, suboptimal in terms of statistical power, as it ignores information about the intra-subject variance. To address this issue, we review several approaches to deal with nested data, with a focus on methods that are easy to implement. With what we call the sufficient-summary-statistic approach, we highlight a computationally efficient technique that can improve statistical power by taking into account within-subject variances, and we provide step-by-step instructions on how to apply this approach to a number of frequently-used measures of effect size. The properties of the reviewed approaches and the potential benefits over a group-level t-test are quantitatively assessed on simulated data and demonstrated on EEG data from a simulated-driving experiment. PMID:29615885
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data.
Carvalho, Paulo C; Fischer, Juliana Sg; Chen, Emily I; Domont, Gilberto B; Carvalho, Maria Gc; Degrave, Wim M; Yates, John R; Barbosa, Valmir C
2009-02-24
Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Here we present a new algorithm, termed GO Explorer (GOEx), that leverages the gene ontology (GO) to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172). We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.
Data Interpretation: Using Probability
Drummond, Gordon B.; Vowler, Sarah L.
2011-01-01
Experimental data are analysed statistically to allow researchers to draw conclusions from a limited set of measurements. The hard fact is that researchers can never be certain that measurements from a sample will exactly reflect the properties of the entire group of possible candidates available to be studied (although using a sample is often the…
Statistical concepts a second course
Lomax, Richard G
2012-01-01
Statistical Concepts consists of the last 9 chapters of An Introduction to Statistical Concepts, 3rd ed. Designed for the second course in statistics, it is one of the few texts that focuses just on intermediate statistics. The book highlights how statistics work and what they mean to better prepare students to analyze their own data and interpret SPSS and research results. As such it offers more coverage of non-parametric procedures used when standard assumptions are violated since these methods are more frequently encountered when working with real data. Determining appropriate sample sizes
Birth Defects Data and Statistics
... Submit" /> Information For… Media Policy Makers Data & Statistics Recommend on Facebook Tweet Share Compartir On This ... and critical. Read below for the latest national statistics on the occurrence of birth defects in the ...
Imputing historical statistics, soils information, and other land-use data to crop area
Perry, C. R., Jr.; Willis, R. W.; Lautenschlager, L.
1982-01-01
In foreign crop condition monitoring, satellite acquired imagery is routinely used. To facilitate interpretation of this imagery, it is advantageous to have estimates of the crop types and their extent for small area units, i.e., grid cells on a map represent, at 60 deg latitude, an area nominally 25 by 25 nautical miles in size. The feasibility of imputing historical crop statistics, soils information, and other ancillary data to crop area for a province in Argentina is studied.
Spina Bifida Data and Statistics
... Us Information For… Media Policy Makers Data and Statistics Recommend on Facebook Tweet Share Compartir Spina bifida ... the spine. Read below for the latest national statistics on spina bifida in the United States. In ...
Singamsetti, Rao
2007-01-01
In this paper an attempt is made to highlight some issues of interpretation of statistical concepts and interpretation of results as taught in undergraduate Business statistics courses. The use of modern technology in the class room is shown to have increased the efficiency and the ease of learning and teaching in statistics. The importance of…
Statistical Models and Methods for Lifetime Data
Lawless, Jerald F
2011-01-01
Praise for the First Edition"An indispensable addition to any serious collection on lifetime data analysis and . . . a valuable contribution to the statistical literature. Highly recommended . . ."-Choice"This is an important book, which will appeal to statisticians working on survival analysis problems."-Biometrics"A thorough, unified treatment of statistical models and methods used in the analysis of lifetime data . . . this is a highly competent and agreeable statistical textbook."-Statistics in MedicineThe statistical analysis of lifetime or response time data is a key tool in engineering,
Interpreting biomarker data from the COPHES/DEMOCOPHES twin projects
DEFF Research Database (Denmark)
Smolders, R; Den Hond, E; Koppen, G
2015-01-01
implementing DEMOCOPHES can be interpreted using information from external databases on environmental quality and lifestyle. In general, 13 countries having implemented DEMOCOPHES provided high-quality data from external sources that were relevant for interpretation purposes. However, some data were...... of antismoking legislation was significantly related to urinary cotinine levels, and we were able to show indications that also urinary cadmium levels were associated with environmental quality and food quality. These results again show the potential of biomonitoring data to provide added value for (the...
van Driel, A.F.; Nikolaev, I.; Vergeer, P.; Lodahl, P.; Vanmaekelbergh, D.; Vos, Willem L.
2007-01-01
We present a statistical analysis of time-resolved spontaneous emission decay curves from ensembles of emitters, such as semiconductor quantum dots, with the aim of interpreting ubiquitous non-single-exponential decay. Contrary to what is widely assumed, the density of excited emitters and the
Markert, K. N.; Ashmall, W.; Johnson, G.; Saah, D. S.; Anderson, E.; Flores Cordova, A. I.; Díaz, A. S. P.; Mollicone, D.; Griffin, R.
2017-12-01
Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.
Markert, Kel; Ashmall, William; Johnson, Gary; Saah, David; Mollicone, Danilo; Diaz, Alfonso Sanchez-Paus; Anderson, Eric; Flores, Africa; Griffin, Robert
2017-01-01
Collect Earth Online (CEO) is a free and open online implementation of the FAO Collect Earth system for collaboratively collecting environmental data through the visual interpretation of Earth observation imagery. The primary collection mechanism in CEO is human interpretation of land surface characteristics in imagery served via Web Map Services (WMS). However, interpreters may not have enough contextual information to classify samples by only viewing the imagery served via WMS, be they high resolution or otherwise. To assist in the interpretation and collection processes in CEO, SERVIR, a joint NASA-USAID initiative that brings Earth observations to improve environmental decision making in developing countries, developed the GeoDash system, an embedded and critical component of CEO. GeoDash leverages Google Earth Engine (GEE) by allowing users to set up custom browser-based widgets that pull from GEE's massive public data catalog. These widgets can be quick looks of other satellite imagery, time series graphs of environmental variables, and statistics panels of the same. Users can customize widgets with any of GEE's image collections, such as the historical Landsat collection with data available since the 1970s, select date ranges, image stretch parameters, graph characteristics, and create custom layouts, all on-the-fly to support plot interpretation in CEO. This presentation focuses on the implementation and potential applications, including the back-end links to GEE and the user interface with custom widget building. GeoDash takes large data volumes and condenses them into meaningful, relevant information for interpreters. While designed initially with national and global forest resource assessments in mind, the system will complement disaster assessments, agriculture management, project monitoring and evaluation, and more.
GO Explorer: A gene-ontology tool to aid in the interpretation of shotgun proteomics data
Directory of Open Access Journals (Sweden)
Domont Gilberto B
2009-02-01
Full Text Available Abstract Background Spectral counting is a shotgun proteomics approach comprising the identification and relative quantitation of thousands of proteins in complex mixtures. However, this strategy generates bewildering amounts of data whose biological interpretation is a challenge. Results Here we present a new algorithm, termed GO Explorer (GOEx, that leverages the gene ontology (GO to aid in the interpretation of proteomic data. GOEx stands out because it combines data from protein fold changes with GO over-representation statistics to help draw conclusions. Moreover, it is tightly integrated within the PatternLab for Proteomics project and, thus, lies within a complete computational environment that provides parsers and pattern recognition tools designed for spectral counting. GOEx offers three independent methods to query data: an interactive directed acyclic graph, a specialist mode where key words can be searched, and an automatic search. Its usefulness is demonstrated by applying it to help interpret the effects of perillyl alcohol, a natural chemotherapeutic agent, on glioblastoma multiform cell lines (A172. We used a new multi-surfactant shotgun proteomic strategy and identified more than 2600 proteins; GOEx pinpointed key sets of differentially expressed proteins related to cell cycle, alcohol catabolism, the Ras pathway, apoptosis, and stress response, to name a few. Conclusion GOEx facilitates organism-specific studies by leveraging GO and providing a rich graphical user interface. It is a simple to use tool, specialized for biologists who wish to analyze spectral counting data from shotgun proteomics. GOEx is available at http://pcarvalho.com/patternlab.
Directory of Open Access Journals (Sweden)
Morreau Hans
2010-01-01
Full Text Available Abstract Background Multiplex Ligation-Dependent Probe Amplification (MLPA is an application that can be used for the detection of multiple chromosomal aberrations in a single experiment. In one reaction, up to 50 different genomic sequences can be analysed. For a reliable work-flow, tools are needed for administrative support, data management, normalisation, visualisation, reporting and interpretation. Results Here, we developed a data management system, MLPAInter for MLPA interpretation, that is windows executable and has a stand-alone database for monitoring and interpreting the MLPA data stream that is generated from the experimental setup to analysis, quality control and visualisation. A statistical approach is applied for the normalisation and analysis of large series of MLPA traces, making use of multiple control samples and internal controls. Conclusions MLPAinter visualises MLPA data in plots with information about sample replicates, normalisation settings, and sample characteristics. This integrated approach helps in the automated handling of large series of MLPA data and guarantees a quick and streamlined dataflow from the beginning of an experiment to an authorised report.
The Seismic Analyzer: Interpreting and Illustrating 2D Seismic Data
Patel, Daniel; Giertsen, Christopher; Thurmond, John; Gjelberg, John; Gröller, Eduard
2008-01-01
We present a toolbox for quickly interpreting and illustrating 2D slices of seismic volumetric reflection data. Searching for oil and gas involves creating a structural overview of seismic reflection data to identify hydrocarbon reservoirs. We improve the search of seismic structures by precalculating the horizon structures of the seismic data prior to interpretation. We improve the annotation of seismic structures by applying novel illustrative rendering algorithms tailored to seism...
Interpretable Categorization of Heterogeneous Time Series Data
Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua
2017-01-01
We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.
Data-driven inference for the spatial scan statistic.
Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C
2011-08-02
Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Teschendorff, Andrew E; Sollich, Peter; Kuehn, Reimer
2014-06-01
A key challenge in systems biology is the elucidation of the underlying principles, or fundamental laws, which determine the cellular phenotype. Understanding how these fundamental principles are altered in diseases like cancer is important for translating basic scientific knowledge into clinical advances. While significant progress is being made, with the identification of novel drug targets and treatments by means of systems biological methods, our fundamental systems level understanding of why certain treatments succeed and others fail is still lacking. We here advocate a novel methodological framework for systems analysis and interpretation of molecular omic data, which is based on statistical mechanical principles. Specifically, we propose the notion of cellular signalling entropy (or uncertainty), as a novel means of analysing and interpreting omic data, and more fundamentally, as a means of elucidating systems-level principles underlying basic biology and disease. We describe the power of signalling entropy to discriminate cells according to differentiation potential and cancer status. We further argue the case for an empirical cellular entropy-robustness correlation theorem and demonstrate its existence in cancer cell line drug sensitivity data. Specifically, we find that high signalling entropy correlates with drug resistance and further describe how entropy could be used to identify the achilles heels of cancer cells. In summary, signalling entropy is a deep and powerful concept, based on rigorous statistical mechanical principles, which, with improved data quality and coverage, will allow a much deeper understanding of the systems biological principles underlying normal and disease physiology. Copyright © 2014 Elsevier Inc. All rights reserved.
Phase 1 report on sensor technology, data fusion and data interpretation for site characterization
International Nuclear Information System (INIS)
Beckerman, M.
1991-10-01
In this report we discuss sensor technology, data fusion and data interpretation approaches of possible maximal usefulness for subsurface imaging and characterization of land-fill waste sites. Two sensor technologies, terrain conductivity using electromagnetic induction and ground penetrating radar, are described and the literature on the subject is reviewed. We identify the maximum entropy stochastic method as one providing a rigorously justifiable framework for fusing the sensor data, briefly summarize work done by us in this area, and examine some of the outstanding issues with regard to data fusion and interpretation. 25 refs., 17 figs
The statistical interpretations of counting data from measurements of low-level radioactivity
International Nuclear Information System (INIS)
Donn, J.J.; Wolke, R.L.
1977-01-01
The statistical model appropriate to measurements of low-level or background-dominant radioactivity is examined and the derived relationships are applied to two practical problems involving hypothesis testing: 'Does the sample exhibit a net activity above background' and 'Is the activity of the sample below some preselected limit'. In each of these cases, the appropriate decision rule is formulated, procedures are developed for estimating the preset count which is necessary to achieve a desired probability of detection, and a specific sequence of operations is provided for the worker in the field. (author)
Empirical approach to interpreting card-sorting data
Directory of Open Access Journals (Sweden)
Steven F. Wolf1,2,*
2012-05-01
Full Text Available Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [ Cogn. Sci. 5 121 (1981]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
Chen, Jin; Roth, Robert E; Naito, Adam T; Lengerich, Eugene J; Maceachren, Alan M
2008-11-07
Kulldorff's spatial scan statistic and its software implementation - SaTScan - are widely used for detecting and evaluating geographic clusters. However, two issues make using the method and interpreting its results non-trivial: (1) the method lacks cartographic support for understanding the clusters in geographic context and (2) results from the method are sensitive to parameter choices related to cluster scaling (abbreviated as scaling parameters), but the system provides no direct support for making these choices. We employ both established and novel geovisual analytics methods to address these issues and to enhance the interpretation of SaTScan results. We demonstrate our geovisual analytics approach in a case study analysis of cervical cancer mortality in the U.S. We address the first issue by providing an interactive visual interface to support the interpretation of SaTScan results. Our research to address the second issue prompted a broader discussion about the sensitivity of SaTScan results to parameter choices. Sensitivity has two components: (1) the method can identify clusters that, while being statistically significant, have heterogeneous contents comprised of both high-risk and low-risk locations and (2) the method can identify clusters that are unstable in location and size as the spatial scan scaling parameter is varied. To investigate cluster result stability, we conducted multiple SaTScan runs with systematically selected parameters. The results, when scanning a large spatial dataset (e.g., U.S. data aggregated by county), demonstrate that no single spatial scan scaling value is known to be optimal to identify clusters that exist at different scales; instead, multiple scans that vary the parameters are necessary. We introduce a novel method of measuring and visualizing reliability that facilitates identification of homogeneous clusters that are stable across analysis scales. Finally, we propose a logical approach to proceed through the analysis of
Topology for statistical modeling of petascale data.
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio (University of Utah, Salt Lake City, UT); Mascarenhas, Ajith Arthur; Rusek, Korben (Texas A& M University, College Station, TX); Bennett, Janine Camille; Levine, Joshua (University of Utah, Salt Lake City, UT); Pebay, Philippe Pierre; Gyulassy, Attila (University of Utah, Salt Lake City, UT); Thompson, David C.; Rojas, Joseph Maurice (Texas A& M University, College Station, TX)
2011-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled 'Topology for Statistical Modeling of Petascale Data', funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program. Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is thus to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, our approach is based on the complementary techniques of combinatorial topology and statistical modeling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modeling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. This document summarizes the technical advances we have made to date that were made possible in whole or in part by MAPD funding. These technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modeling, and (3) new integrated topological and statistical methods.
On the statistical assessment of classifiers using DNA microarray data
Directory of Open Access Journals (Sweden)
Carella M
2006-08-01
Full Text Available Abstract Background In this paper we present a method for the statistical assessment of cancer predictors which make use of gene expression profiles. The methodology is applied to a new data set of microarray gene expression data collected in Casa Sollievo della Sofferenza Hospital, Foggia – Italy. The data set is made up of normal (22 and tumor (25 specimens extracted from 25 patients affected by colon cancer. We propose to give answers to some questions which are relevant for the automatic diagnosis of cancer such as: Is the size of the available data set sufficient to build accurate classifiers? What is the statistical significance of the associated error rates? In what ways can accuracy be considered dependant on the adopted classification scheme? How many genes are correlated with the pathology and how many are sufficient for an accurate colon cancer classification? The method we propose answers these questions whilst avoiding the potential pitfalls hidden in the analysis and interpretation of microarray data. Results We estimate the generalization error, evaluated through the Leave-K-Out Cross Validation error, for three different classification schemes by varying the number of training examples and the number of the genes used. The statistical significance of the error rate is measured by using a permutation test. We provide a statistical analysis in terms of the frequencies of the genes involved in the classification. Using the whole set of genes, we found that the Weighted Voting Algorithm (WVA classifier learns the distinction between normal and tumor specimens with 25 training examples, providing e = 21% (p = 0.045 as an error rate. This remains constant even when the number of examples increases. Moreover, Regularized Least Squares (RLS and Support Vector Machines (SVM classifiers can learn with only 15 training examples, with an error rate of e = 19% (p = 0.035 and e = 18% (p = 0.037 respectively. Moreover, the error rate
Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Status Data
Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...
Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Dynamics Data
Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Directory of Open Access Journals (Sweden)
Brayan Alexander Fonseca Martinez
2017-11-01
Full Text Available One of the most commonly observational study designs employed in veterinary is the cross-sectional study with binary outcomes. To measure an association with exposure, the use of prevalence ratios (PR or odds ratios (OR are possible. In human epidemiology, much has been discussed about the use of the OR exclusively for case–control studies and some authors reported that there is no good justification for fitting logistic regression when the prevalence of the disease is high, in which OR overestimate the PR. Nonetheless, interpretation of OR is difficult since confusing between risk and odds can lead to incorrect quantitative interpretation of data such as “the risk is X times greater,” commonly reported in studies that use OR. The aims of this study were (1 to review articles with cross-sectional designs to assess the statistical method used and the appropriateness of the interpretation of the estimated measure of association and (2 to illustrate the use of alternative statistical methods that estimate PR directly. An overview of statistical methods and its interpretation using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA guidelines was conducted and included a diverse set of peer-reviewed journals among the veterinary science field using PubMed as the search engine. From each article, the statistical method used and the appropriateness of the interpretation of the estimated measure of association were registered. Additionally, four alternative models for logistic regression that estimate directly PR were tested using our own dataset from a cross-sectional study on bovine viral diarrhea virus. The initial search strategy found 62 articles, in which 6 articles were excluded and therefore 56 studies were used for the overall analysis. The review showed that independent of the level of prevalence reported, 96% of articles employed logistic regression, thus estimating the OR. Results of the multivariate models
Statistical Literacy: Data Tell a Story
Sole, Marla A.
2016-01-01
Every day, students collect, organize, and analyze data to make decisions. In this data-driven world, people need to assess how much trust they can place in summary statistics. The results of every survey and the safety of every drug that undergoes a clinical trial depend on the correct application of appropriate statistics. Recognizing the…
Cho, Yunju; Ahmed, Arif; Islam, Annana; Kim, Sunghwan
2015-01-01
Because of the increasing importance of heavy and unconventional crude oil as an energy source, there is a growing need for petroleomics: the pursuit of more complete and detailed knowledge of the chemical compositions of crude oil. Crude oil has an extremely complex nature; hence, techniques with ultra-high resolving capabilities, such as Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS), are necessary. FT-ICR MS has been successfully applied to the study of heavy and unconventional crude oils such as bitumen and shale oil. However, the analysis of crude oil with FT-ICR MS is not trivial, and it has pushed analysis to the limits of instrumental and methodological capabilities. For example, high-resolution mass spectra of crude oils may contain over 100,000 peaks that require interpretation. To visualize large data sets more effectively, data processing methods such as Kendrick mass defect analysis and statistical analyses have been developed. The successful application of FT-ICR MS to the study of crude oil has been critically dependent on key developments in FT-ICR MS instrumentation and data processing methods. This review offers an introduction to the basic principles, FT-ICR MS instrumentation development, ionization techniques, and data interpretation methods for petroleomics and is intended for readers having no prior experience in this field of study. © 2014 Wiley Periodicals, Inc.
DATA ON YOUTH, 1967, A STATISTICAL DOCUMENT.
SCHEIDER, GEORGE
THE DATA IN THIS REPORT ARE STATISTICS ON YOUTH THROUGHOUT THE UNITED STATES AND IN NEW YORK STATE. INCLUDED ARE DATA ON POPULATION, SCHOOL STATISTICS, EMPLOYMENT, FAMILY INCOME, JUVENILE DELINQUENCY AND YOUTH CRIME (INCLUDING NEW YORK CITY FIGURES), AND TRAFFIC ACCIDENTS. THE STATISTICS ARE PRESENTED IN THE TEXT AND IN TABLES AND CHARTS. (NH)
Khan, Haseeb Ahmad
2004-01-01
The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.
Data-driven inference for the spatial scan statistic
Directory of Open Access Journals (Sweden)
Duczmal Luiz H
2011-08-01
Full Text Available Abstract Background Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. Results A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. Conclusions A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Statistics and analysis of scientific data
Bonamente, Massimiliano
2013-01-01
Statistics and Analysis of Scientific Data covers the foundations of probability theory and statistics, and a number of numerical and analytical methods that are essential for the present-day analyst of scientific data. Topics covered include probability theory, distribution functions of statistics, fits to two-dimensional datasheets and parameter estimation, Monte Carlo methods and Markov chains. Equal attention is paid to the theory and its practical application, and results from classic experiments in various fields are used to illustrate the importance of statistics in the analysis of scientific data. The main pedagogical method is a theory-then-application approach, where emphasis is placed first on a sound understanding of the underlying theory of a topic, which becomes the basis for an efficient and proactive use of the material for practical applications. The level is appropriate for undergraduates and beginning graduate students, and as a reference for the experienced researcher. Basic calculus is us...
Sources of Safety Data and Statistical Strategies for Design and Analysis: Postmarket Surveillance.
Izem, Rima; Sanchez-Kam, Matilde; Ma, Haijun; Zink, Richard; Zhao, Yueqin
2018-03-01
Safety data are continuously evaluated throughout the life cycle of a medical product to accurately assess and characterize the risks associated with the product. The knowledge about a medical product's safety profile continually evolves as safety data accumulate. This paper discusses data sources and analysis considerations for safety signal detection after a medical product is approved for marketing. This manuscript is the second in a series of papers from the American Statistical Association Biopharmaceutical Section Safety Working Group. We share our recommendations for the statistical and graphical methodologies necessary to appropriately analyze, report, and interpret safety outcomes, and we discuss the advantages and disadvantages of safety data obtained from passive postmarketing surveillance systems compared to other sources. Signal detection has traditionally relied on spontaneous reporting databases that have been available worldwide for decades. However, current regulatory guidelines and ease of reporting have increased the size of these databases exponentially over the last few years. With such large databases, data-mining tools using disproportionality analysis and helpful graphics are often used to detect potential signals. Although the data sources have many limitations, analyses of these data have been successful at identifying safety signals postmarketing. Experience analyzing these dynamic data is useful in understanding the potential and limitations of analyses with new data sources such as social media, claims, or electronic medical records data.
Cheyney, S.; Hill, I. A.; Linford, N.; Fishwick, S.; Leech, C.
2011-12-01
High-resolution total-field magnetic data can be collected rapidly and relatively cheaply over large archaeological sites due to recent advances in data collection. However, interpretation of these datasets still generally comprises a sequence of data correction and filtering operations prior to a 2D visual interpretation based on pattern recognition. In contrast, current developments in aero-magnetic interpretation have led to several tools for identifying location, shape and depth information of anomalous sources. These methods often fail when directly applied to archaeo-magnetic data, due to the particular noise content typical in very near-surface surveys. Here techniques are explored that allow these aero-magnetic interpretation tools to be applied to archaeological problems, without the need for extensive, often biased user input. It is shown that full 3D quantitative interpretation of the subsurface is possible from just the magnetic data alone. Inversion of magnetic data is increasingly being applied to aero-magnetic surveys to produce 3D models of the subsurface magnetisation. Typically, an objective function is minimised in order to create a smooth distribution of magnetisation away from a reference model (or halfspace if no a-priori information is available). Often, although a good fit to the observed values may be obtained, the final model will be non-unique and biased by the reference model. Testing of synthetic data shows that when archaeo-magnetic datasets are inverted without applying a-priori information, large discrepancies between the true and modelled depths can occur. Where no a-priori information is available, information regarding the horizontal location of sources can be obtained from derivative-based methods such as the absolute horizontal gradient, tilt-angle and theta-map. Using pseudogravity data with these techniques, overcomes the problem of noise amplification that has previously hampered archaeological uses of these techniques. Depth
A Statistical Toolkit for Data Analysis
International Nuclear Information System (INIS)
Donadio, S.; Guatelli, S.; Mascialino, B.; Pfeiffer, A.; Pia, M.G.; Ribon, A.; Viarengo, P.
2006-01-01
The present project aims to develop an open-source and object-oriented software Toolkit for statistical data analysis. Its statistical testing component contains a variety of Goodness-of-Fit tests, from Chi-squared to Kolmogorov-Smirnov, to less known, but generally much more powerful tests such as Anderson-Darling, Goodman, Fisz-Cramer-von Mises, Kuiper, Tiku. Thanks to the component-based design and the usage of the standard abstract interfaces for data analysis, this tool can be used by other data analysis systems or integrated in experimental software frameworks. This Toolkit has been released and is downloadable from the web. In this paper we describe the statistical details of the algorithms, the computational features of the Toolkit and describe the code validation
47 CFR 1.363 - Introduction of statistical data.
2010-10-01
... 47 Telecommunication 1 2010-10-01 2010-10-01 false Introduction of statistical data. 1.363 Section... Proceedings Evidence § 1.363 Introduction of statistical data. (a) All statistical studies, offered in... analyses, and experiments, and those parts of other studies involving statistical methodology shall be...
Fordyce, James A
2010-07-23
Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.
Directory of Open Access Journals (Sweden)
James A Fordyce
Full Text Available BACKGROUND: Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. METHODOLOGY: Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. CONCLUSIONS: The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.
Data and Statistics: Women and Heart Disease
... Summary Coverdell Program 2012-2015 State Summaries Data & Statistics Fact Sheets Heart Disease and Stroke Fact Sheets ... Roadmap for State Planning Other Data Resources Other Statistic Resources Grantee Information Cross-Program Information Online Tools ...
RESEARCH ON THE CONSTRUCTION OF REMOTE SENSING AUTOMATIC INTERPRETATION SYMBOL BIG DATA
Directory of Open Access Journals (Sweden)
Y. Gao
2018-04-01
Full Text Available Remote sensing automatic interpretation symbol (RSAIS is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013–2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era.
Research on the Construction of Remote Sensing Automatic Interpretation Symbol Big Data
Gao, Y.; Liu, R.; Liu, J.; Cheng, T.
2018-04-01
Remote sensing automatic interpretation symbol (RSAIS) is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013-2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era.
Hemophilia Data and Statistics
... View public health webinars on blood disorders Data & Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... genetic testing is done to diagnose hemophilia before birth. For the one-third ... rates and hospitalization rates for bleeding complications from hemophilia ...
Empirical approach to interpreting card-sorting data
Directory of Open Access Journals (Sweden)
Steven F. Wolf
2012-05-01
Full Text Available Since it was first published 30 years ago, the seminal paper of Chi et al. on expert and novice categorization of introductory problems led to a plethora of follow-up studies within and outside of the area of physics [Cogn. Sci. 5, 121 (1981COGSD50364-021310.1207/s15516709cog0502_2]. These studies frequently encompass “card-sorting” exercises whereby the participants group problems. While this technique certainly allows insights into problem solving approaches, simple descriptive statistics more often than not fail to find significant differences between experts and novices. In moving beyond descriptive statistics, we describe a novel microscopic approach that takes into account the individual identity of the cards and uses graph theory and models to visualize, analyze, and interpret problem categorization experiments. We apply these methods to an introductory physics (mechanics problem categorization experiment, and find that most of the variation in sorting outcome is not due to the sorter being an expert versus a novice, but rather due to an independent characteristic that we named “stacker” versus “spreader.” The fact that the expert-novice distinction only accounts for a smaller amount of the variation may explain the frequent null results when conducting these experiments.
A nonparametric spatial scan statistic for continuous data.
Jung, Inkyung; Cho, Ho Jin
2015-10-20
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Search Databases and Statistics
DEFF Research Database (Denmark)
Refsgaard, Jan C; Munk, Stephanie; Jensen, Lars J
2016-01-01
having strengths and weaknesses that must be considered for the individual needs. These are reviewed in this chapter. Equally critical for generating highly confident output datasets is the application of sound statistical criteria to limit the inclusion of incorrect peptide identifications from database...... searches. Additionally, careful filtering and use of appropriate statistical tests on the output datasets affects the quality of all downstream analyses and interpretation of the data. Our considerations and general practices on these aspects of phosphoproteomics data processing are presented here....
Topics in statistical data analysis for high-energy physics
International Nuclear Information System (INIS)
Cowan, G.
2011-01-01
These lectures concert two topics that are becoming increasingly important in the analysis of high-energy physics data: Bayesian statistics and multivariate methods. In the Bayesian approach, we extend the interpretation of probability not only to cover the frequency of repeatable outcomes but also to include a degree of belief. In this way we are able to associate probability with a hypothesis and thus to answer directly questions that cannot be addressed easily with traditional frequentist methods. In multivariate analysis, we try to exploit as much information as possible from the characteristics that we measure for each event to distinguish between event types. In particular we will look at a method that has gained popularity in high-energy physics in recent years: the boosted decision tree. Finally, we give a brief sketch of how multivariate methods may be applied in a search for a new signal process. (author)
Statistical Methods for Unusual Count Data
DEFF Research Database (Denmark)
Guthrie, Katherine A.; Gammill, Hilary S.; Kamper-Jørgensen, Mads
2016-01-01
microchimerism data present challenges for statistical analysis, including a skewed distribution, excess zero values, and occasional large values. Methods for comparing microchimerism levels across groups while controlling for covariates are not well established. We compared statistical models for quantitative...... microchimerism values, applied to simulated data sets and 2 observed data sets, to make recommendations for analytic practice. Modeling the level of quantitative microchimerism as a rate via Poisson or negative binomial model with the rate of detection defined as a count of microchimerism genome equivalents per...
Interpreting New Data from the High Energy Frontier
Energy Technology Data Exchange (ETDEWEB)
Thaler, Jesse [Massachusetts Inst. of Technology (MIT), Cambridge, MA (United States)
2016-09-26
This is the final technical report for DOE grant DE-SC0006389, "Interpreting New Data from the High Energy Frontier", describing research accomplishments by the PI in the field of theoretical high energy physics.
Statistical data filtration in neutron coincidence counting
International Nuclear Information System (INIS)
Beddingfield, D.H.; Menlove, H.O.
1992-11-01
We assessed the effectiveness of statistical data filtration to minimize the contribution of matrix materials in 200-ell drums to the nondestructive assay of plutonium. Those matrices were examined: polyethylene, concrete, aluminum, iron, cadmium, and lead. Statistical filtration of neutron coincidence data improved the low-end sensitivity of coincidence counters. Spurious data arising from electrical noise, matrix spallation, and geometric effects were smoothed in a predictable fashion by the statistical filter. The filter effectively lowers the minimum detectable mass limit that can be achieved for plutonium assay using passive neutron coincidence counting
Normative Data for Interpreting the BREAST-Q: Augmentation
Mundy, Lily R.; Homa, Karen; Klassen, Anne F.; Pusic, Andrea L.; Kerrigan, Carolyn L.
2016-01-01
Background The BREAST-Q is a rigorously developed, well-validated, patient-reported outcome (PRO) instrument with a module designed for evaluating breast augmentation outcomes. However, there are no published normative BREAST-Q scores, limiting interpretation. Methods Normative data were generated for the BREAST-Q Augmentation Module via the Army of Women (AOW), an online community of women (with and without breast cancer) engaged in breast-cancer related research. Members were recruited via email, with women 18 years or older without a history of breast cancer or breast surgery invited to participate. Descriptive statistics and a linear multivariate regression were performed. A separate analysis compared normative scores to findings from previously published BREAST-Q augmentation studies. Results The preoperative BREAST-Q Augmentation Module was completed by 1,211 women. Mean age was 54 ±24 years, mean body mass index (BMI) was 27 ±6, and 39% (n=467) had a bra cup size ≥D. Mean scores were Satisfaction with Breasts (54 ±19), Psychosocial Well-being (66 ±20), Sexual Well-being (49 ±20), and Physical Well-being (86 ±15). Women with a BMI of 30 or greater and bra cup size D or greater had lower scores. In comparison to AOW scores, published BREAST-Q augmentation scores were lower before and higher after surgery for all scales except Physical Well-being. Conclusions The AOW normative data represent breast-related satisfaction and well-being in woman not actively seeking breast augmentation. This data may be used as normative comparison values for those seeking and undergoing surgery as we did, demonstrating the value of breast augmentation in this patient population. PMID:28350657
Marviken test-data interpretation, second project
International Nuclear Information System (INIS)
Collen, J.; Johansson, A.
1978-12-01
A brief description is given of the investigations carried out and the corclusions drawn within the MARTIN-II project, which involved the evaluation and interpretation of the data from the full scale containment response tests at the Marviken Power Station. The data from the tests, which were completed in 1976, provide information about the periodic pressure oscillations and rapid pressure spikes induced in the pressure-suppression containment during study comprise the following items: - Influence of test parameters on pressure oscillations and pressure spikes - Pressure spikes in the wetwell pool - High frequency oscillations - Comparisons between single-pipe and multi-pipe data The study was carried out by Studsvik Energiteknik AB with consulting efforts from AB ASEA-ATOM. It was financed by the Swedish Nuclear Power Inspectorate. (Auth.)
Critical analysis of adsorption data statistically
Kaushal, Achla; Singh, S. K.
2017-10-01
Experimental data can be presented, computed, and critically analysed in a different way using statistics. A variety of statistical tests are used to make decisions about the significance and validity of the experimental data. In the present study, adsorption was carried out to remove zinc ions from contaminated aqueous solution using mango leaf powder. The experimental data was analysed statistically by hypothesis testing applying t test, paired t test and Chi-square test to (a) test the optimum value of the process pH, (b) verify the success of experiment and (c) study the effect of adsorbent dose in zinc ion removal from aqueous solutions. Comparison of calculated and tabulated values of t and χ 2 showed the results in favour of the data collected from the experiment and this has been shown on probability charts. K value for Langmuir isotherm was 0.8582 and m value for Freundlich adsorption isotherm obtained was 0.725, both are mango leaf powder.
Directory of Open Access Journals (Sweden)
John R. Speakman
2013-03-01
Full Text Available The epidemics of obesity and diabetes have aroused great interest in the analysis of energy balance, with the use of organisms ranging from nematode worms to humans. Although generating energy-intake or -expenditure data is relatively straightforward, the most appropriate way to analyse the data has been an issue of contention for many decades. In the last few years, a consensus has been reached regarding the best methods for analysing such data. To facilitate using these best-practice methods, we present here an algorithm that provides a step-by-step guide for analysing energy-intake or -expenditure data. The algorithm can be used to analyse data from either humans or experimental animals, such as small mammals or invertebrates. It can be used in combination with any commercial statistics package; however, to assist with analysis, we have included detailed instructions for performing each step for three popular statistics packages (SPSS, MINITAB and R. We also provide interpretations of the results obtained at each step. We hope that this algorithm will assist in the statistically appropriate analysis of such data, a field in which there has been much confusion and some controversy.
Shewhart, Mark
1991-01-01
Statistical Process Control (SPC) charts are one of several tools used in quality control. Other tools include flow charts, histograms, cause and effect diagrams, check sheets, Pareto diagrams, graphs, and scatter diagrams. A control chart is simply a graph which indicates process variation over time. The purpose of drawing a control chart is to detect any changes in the process signalled by abnormal points or patterns on the graph. The Artificial Intelligence Support Center (AISC) of the Acquisition Logistics Division has developed a hybrid machine learning expert system prototype which automates the process of constructing and interpreting control charts.
Statistical analysis and data display an intermediate course with examples in R
Heiberger, Richard M
2015-01-01
This contemporary presentation of statistical methods features extensive use of graphical displays for exploring data and for displaying the analysis. The authors demonstrate how to analyze data—showing code, graphics, and accompanying tabular listings—for all the methods they cover. They emphasize how to construct and interpret graphs. They discuss principles of graphical design. They identify situations where visual impressions from graphs may need confirmation from traditional tabular results. All chapters have exercises. The authors provide and discuss R functions for all the new graphical display formats. All graphs and tabular output in the book were constructed using these functions. Complete R scripts for all examples and figures are provided for readers to use as models for their own analyses. This book can serve as a standalone text for statistics majors at the master’s level and for other quantitatively oriented disciplines at the doctoral level, and as a reference book for researchers. In-de...
Variation in benthic long-term data of transitional waters: Is interpretation more than speculation?
Directory of Open Access Journals (Sweden)
Michael Lothar Zettler
Full Text Available Biological long-term data series in marine habitats are often used to identify anthropogenic impacts on the environment or climate induced regime shifts. However, particularly in transitional waters, environmental properties like water mass dynamics, salinity variability and the occurrence of oxygen minima not necessarily caused by either human activities or climate change can attenuate or mask apparent signals. At first glance it very often seems impossible to interpret the strong fluctuations of e.g. abundances or species richness, since abiotic variables like salinity and oxygen content vary simultaneously as well as in apparently erratic ways. The long-term development of major macrozoobenthic parameters (abundance, biomass, species numbers and derivative macrozoobenthic indices (Shannon diversity, Margalef, Pilou's evenness and Hurlbert has been successfully interpreted and related to the long-term fluctuations of salinity and oxygen, incorporation of the North Atlantic Oscillation index (NAO index, relying on the statistical analysis of modelled and measured data during 35 years of observation at three stations in the south-western Baltic Sea. Our results suggest that even at a restricted spatial scale the benthic system does not appear to be tightly controlled by any single environmental driver and highlight the complexity of spatially varying temporal response.
Interpretable decision-tree induction in a big data parallel framework
Directory of Open Access Journals (Sweden)
Weinberg Abraham Itzhak
2017-12-01
Full Text Available When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.
Spatial Statistical Data Fusion (SSDF)
Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel
2013-01-01
As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is
ROOT: A C++ framework for petabyte data storage, statistical analysis and visualization
International Nuclear Information System (INIS)
Antcheva, I.; Ballintijn, M.; Bellenot, B.; Biskup, M.; Brun, R.; Buncic, N.; Couet, O.; Franco, L.; Canal, Ph.; Casadei, D.; Fine, V.
2009-01-01
ROOT is an object-oriented C++ framework conceived in the high-energy physics (HEP) community, designed for storing and analyzing petabytes of data in an efficient way. Any instance of a C++ class can be stored into a ROOT file in a machine-independent compressed binary format. In ROOT the TTree object container is optimized for statistical data analysis over very large data sets by using vertical data storage techniques. These containers can span a large number of files on local disks, the web or a number of different shared file systems. In order to analyze this data, the user can chose out of a wide set of mathematical and statistical functions, including linear algebra classes, numerical algorithms such as integration and minimization, and various methods for performing regression analysis (fitting). In particular, the RooFit package allows the user to perform complex data modeling and fitting while the RooStats library provides abstractions and implementations for advanced statistical tools. Multivariate classification methods based on machine learning techniques are available via the TMVA package. A central piece in these analysis tools are the histogram classes which provide binning of one- and multi-dimensional data. Results can be saved in high-quality graphical formats like Postscript and PDF or in bitmap formats like JPG or GIF. The result can also be stored into ROOT macros that allow a full recreation and rework of the graphics. Users typically create their analysis macros step by step, making use of the interactive C++ interpreter CINT, while running over small data samples. Once the development is finished, they can run these macros at full compiled speed over large data sets, using on-the-fly compilation, or by creating a stand-alone batch program. Finally, if processing farms are available, the user can reduce the execution time of intrinsically parallel tasks - e.g. data mining in HEP - by using PROOF, which will take care of optimally
International Nuclear Information System (INIS)
Podorozhnyi, D.M.; Postnikov, E.B.; Sveshnikova, L.G.; Turundaevsky, A.N.
2005-01-01
A multivariate statistical procedure for solving problems of estimating physical parameters on the basis of data from measurements with multichannel equipment is described. Within the multivariate procedure, an algorithm is constructed for estimating the energy of primary cosmic rays and the exponent in their power-law spectrum. They are investigated by using the KLEM spectrometer (NUCLEON project) as a specific example of measuring equipment. The results of computer experiments simulating the operation of the multivariate procedure for this equipment are given, the proposed approach being compared in these experiments with the one-parameter approach presently used in data processing
Classification, (big) data analysis and statistical learning
Conversano, Claudio; Vichi, Maurizio
2018-01-01
This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...
Statistical Challenges in "Big Data" Human Neuroimaging.
Smith, Stephen M; Nichols, Thomas E
2018-01-17
Smith and Nichols discuss "big data" human neuroimaging studies, with very large subject numbers and amounts of data. These studies provide great opportunities for making new discoveries about the brain but raise many new analytical challenges and interpretational risks. Copyright © 2017 Elsevier Inc. All rights reserved.
The insignificance of statistical significance testing
Johnson, Douglas H.
1999-01-01
Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Statistics and analysis of scientific data
Bonamente, Massimiliano
2017-01-01
The revised second edition of this textbook provides the reader with a solid foundation in probability theory and statistics as applied to the physical sciences, engineering and related fields. It covers a broad range of numerical and analytical methods that are essential for the correct analysis of scientific data, including probability theory, distribution functions of statistics, fits to two-dimensional data and parameter estimation, Monte Carlo methods and Markov chains. Features new to this edition include: • a discussion of statistical techniques employed in business science, such as multiple regression analysis of multivariate datasets. • a new chapter on the various measures of the mean including logarithmic averages. • new chapters on systematic errors and intrinsic scatter, and on the fitting of data with bivariate errors. • a new case study and additional worked examples. • mathematical derivations and theoretical background material have been appropriately marked,to improve the readabili...
Statistically significant relational data mining :
Energy Technology Data Exchange (ETDEWEB)
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
Statistical data fusion for cross-tabulation
Kamakura, W.A.; Wedel, M.
The authors address the situation in which a researcher wants to cross-tabulate two sets of discrete variables collected in independent samples, but a subset of the variables is common to both samples. The authors propose a statistical data-fusion model that allows for statistical tests of
Statistical Analysis of Research Data | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data. The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.
Statistical analysis of next generation sequencing data
Nettleton, Dan
2014-01-01
Next Generation Sequencing (NGS) is the latest high throughput technology to revolutionize genomic research. NGS generates massive genomic datasets that play a key role in the big data phenomenon that surrounds us today. To extract signals from high-dimensional NGS data and make valid statistical inferences and predictions, novel data analytic and statistical techniques are needed. This book contains 20 chapters written by prominent statisticians working with NGS data. The topics range from basic preprocessing and analysis with NGS data to more complex genomic applications such as copy number variation and isoform expression detection. Research statisticians who want to learn about this growing and exciting area will find this book useful. In addition, many chapters from this book could be included in graduate-level classes in statistical bioinformatics for training future biostatisticians who will be expected to deal with genomic data in basic biomedical research, genomic clinical trials and personalized med...
Bayesian inference – a way to combine statistical data and semantic analysis meaningfully
Directory of Open Access Journals (Sweden)
Eila Lindfors
2011-11-01
Full Text Available This article focuses on presenting the possibilities of Bayesian modelling (Finite Mixture Modelling in the semantic analysis of statistically modelled data. The probability of a hypothesis in relation to the data available is an important question in inductive reasoning. Bayesian modelling allows the researcher to use many models at a time and provides tools to evaluate the goodness of different models. The researcher should always be aware that there is no such thing as the exact probability of an exact event. This is the reason for using probabilistic models. Each model presents a different perspective on the phenomenon in focus, and the researcher has to choose the most probable model with a view to previous research and the knowledge available.The idea of Bayesian modelling is illustrated here by presenting two different sets of data, one from craft science research (n=167 and the other (n=63 from educational research (Lindfors, 2007, 2002. The principles of how to build models and how to combine different profiles are described in the light of the research mentioned.Bayesian modelling is an analysis based on calculating probabilities in relation to a specific set of quantitative data. It is a tool for handling data and interpreting it semantically. The reliability of the analysis arises from an argumentation of which model can be selected from the model space as the basis for an interpretation, and on which arguments.Keywords: method, sloyd, Bayesian modelling, student teachersURN:NBN:no-29959
Collecting operational event data for statistical analysis
International Nuclear Information System (INIS)
Atwood, C.L.
1994-09-01
This report gives guidance for collecting operational data to be used for statistical analysis, especially analysis of event counts. It discusses how to define the purpose of the study, the unit (system, component, etc.) to be studied, events to be counted, and demand or exposure time. Examples are given of classification systems for events in the data sources. A checklist summarizes the essential steps in data collection for statistical analysis
Statistical methods in quality assurance
International Nuclear Information System (INIS)
Eckhard, W.
1980-01-01
During the different phases of a production process - planning, development and design, manufacturing, assembling, etc. - most of the decision rests on a base of statistics, the collection, analysis and interpretation of data. Statistical methods can be thought of as a kit of tools to help to solve problems in the quality functions of the quality loop with respect to produce quality products and to reduce quality costs. Various statistical methods are represented, typical examples for their practical application are demonstrated. (RW)
Statistical Analysis of Big Data on Pharmacogenomics
Fan, Jianqing; Liu, Han
2013-01-01
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical
Advances in statistical models for data analysis
Minerva, Tommaso; Vichi, Maurizio
2015-01-01
This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.
International Nuclear Information System (INIS)
Boerner, E.; Drexler, G.; Scheibe, D.; Schraube, H.
1994-01-01
The report consists of a summary of relevant statistical data in the official personal dosimetry in 1988-1990 for the Federal States of Bavaria, Hesse, Schleswig-Holstein, and since 1989, Baden-Wuerttemberg. The data are based on the survey of more than 8000 institutions with over 100000 occupational exposed persons and are derived from more than one million single measurements. The report covers informations on the institutions, on the persons as well as dosimetric values. The measuring method is described briefly with respect to dosimeters used, their range and the interpretation of values. Information on notional doses and the interpolation of values nearby the detection limits are given. (HP) [de
Statistical analysis of environmental data
International Nuclear Information System (INIS)
Beauchamp, J.J.; Bowman, K.O.; Miller, F.L. Jr.
1975-10-01
This report summarizes the analyses of data obtained by the Radiological Hygiene Branch of the Tennessee Valley Authority from samples taken around the Browns Ferry Nuclear Plant located in Northern Alabama. The data collection was begun in 1968 and a wide variety of types of samples have been gathered on a regular basis. The statistical analysis of environmental data involving very low-levels of radioactivity is discussed. Applications of computer calculations for data processing are described
Statistics Poster Challenge for Schools
Payne, Brad; Freeman, Jenny; Stillman, Eleanor
2013-01-01
The analysis and interpretation of data are important life skills. A poster challenge for schoolchildren provides an innovative outlet for these skills and demonstrates their relevance to daily life. We discuss our Statistics Poster Challenge and the lessons we have learned.
Directory of Open Access Journals (Sweden)
Karel Octavianus Bachri
2017-07-01
Full Text Available A3S(Arwin-Adang-Aciek-Sembiring is a method of information fusion at a single observation and OMA3S(Observation Multi-time A3S is a method of information fusion for time-series data. This paper proposes OMA3S-based Cognitive Artificial-Intelligence method for interpreting Transformer Condition, which is calculated based on maintenance data from Indonesia National Electric Company (PLN. First, the proposed method is tested using the previously published data, and then followed by implementation on maintenance data. Maintenance data are fused to obtain part condition, and part conditions are fused to obtain transformer condition. Result shows proposed method is valid for DGA fault identification with the average accuracy of 91.1%. The proposed method not only can interpret the major fault, it can also identify the minor fault occurring along with the major fault, allowing early warning feature. Result also shows part conditions can be interpreted using information fusion on maintenance data, and the transformer condition can be interpreted using information fusion on part conditions. The future works on this research is to gather more data, to elaborate more factors to be fused, and to design a cognitive processor that can be used to implement this concept of intelligent instrumentation.
Waller, Derek L
2008-01-01
Statistical analysis is essential to business decision-making and management, but the underlying theory of data collection, organization and analysis is one of the most challenging topics for business students and practitioners. This user-friendly text and CD-ROM package will help you to develop strong skills in presenting and interpreting statistical information in a business or management environment. Based entirely on using Microsoft Excel rather than more complicated applications, it includes a clear guide to using Excel with the key functions employed in the book, a glossary of terms and
Directory of Open Access Journals (Sweden)
Kleinjans Jos
2008-09-01
Full Text Available Abstract Background In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. Results In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh, is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. Conclusion CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a
Moretti, Stefano; van Leeuwen, Danitsja; Gmuender, Hans; Bonassi, Stefano; van Delft, Joost; Kleinjans, Jos; Patrone, Fioravante; Merlo, Domenico Franco
2008-09-02
In gene expression analysis, statistical tests for differential gene expression provide lists of candidate genes having, individually, a sufficiently low p-value. However, the interpretation of each single p-value within complex systems involving several interacting genes is problematic. In parallel, in the last sixty years, game theory has been applied to political and social problems to assess the power of interacting agents in forcing a decision and, more recently, to represent the relevance of genes in response to certain conditions. In this paper we introduce a Bootstrap procedure to test the null hypothesis that each gene has the same relevance between two conditions, where the relevance is represented by the Shapley value of a particular coalitional game defined on a microarray data-set. This method, which is called Comparative Analysis of Shapley value (shortly, CASh), is applied to data concerning the gene expression in children differentially exposed to air pollution. The results provided by CASh are compared with the results from a parametric statistical test for testing differential gene expression. Both lists of genes provided by CASh and t-test are informative enough to discriminate exposed subjects on the basis of their gene expression profiles. While many genes are selected in common by CASh and the parametric test, it turns out that the biological interpretation of the differences between these two selections is more interesting, suggesting a different interpretation of the main biological pathways in gene expression regulation for exposed individuals. A simulation study suggests that CASh offers more power than t-test for the detection of differential gene expression variability. CASh is successfully applied to gene expression analysis of a data-set where the joint expression behavior of genes may be critical to characterize the expression response to air pollution. We demonstrate a synergistic effect between coalitional games and statistics that
A synthetic interpretation: the double-preparation theory
International Nuclear Information System (INIS)
Gondran, Michel; Gondran, Alexandre
2014-01-01
In the 1927 Solvay conference, three apparently irreconcilable interpretations of the quantum mechanics wave function were presented: the pilot-wave interpretation by de Broglie, the soliton wave interpretation by Schrödinger and the Born statistical rule by Born and Heisenberg. In this paper, we demonstrate the complementarity of these interpretations corresponding to quantum systems that are prepared differently and we deduce a synthetic interpretation: the double-preparation theory. We first introduce in quantum mechanics the concept of semi-classical statistically prepared particles, and we show that in the Schrödinger equation these particles converge, when h→0, to the equations of a statistical set of classical particles. These classical particles are undiscerned, and if we assume continuity between classical mechanics and quantum mechanics, we conclude the necessity of the de Broglie–Bohm interpretation for the semi-classical statistically prepared particles (statistical wave). We then introduce in quantum mechanics the concept of a semi-classical deterministically prepared particle, and we show that in the Schrödinger equation this particle converges, when h→0, to the equations of a single classical particle. This classical particle is discerned and assuming continuity between classical mechanics and quantum mechanics, we conclude the necessity of the Schrödinger interpretation for the semi-classical deterministically prepared particle (the soliton wave). Finally we propose, in the semi-classical approximation, a new interpretation of quantum mechanics, the ‘theory of the double preparation’, which depends on the preparation of the particles. (paper)
Tourette Syndrome (TS): Data and Statistics
... Submit" /> Information For… Media Policy Makers Data & Statistics Recommend on Facebook Tweet Share Compartir * The data ... Behavioral or conduct problems, 26%; Anxiety problems, 49%; Depression, 25%; Autism spectrum disorder, 35%; Learning disability, 47%; ...
Pawlowsky-Glahn, Vera; Buccianti, Antonella
In the investigation of fluid samples of a volcanic system, collected during a given period of time, one of the main goals is to discover cause-effect relationships that allow us to explain changes in the chemical composition. They might be caused by physicochemical factors, such as temperature, pressure, or non-conservative behavior of some chemical constituents (addition or subtraction of material), among others. The presence of subgroups of observations showing different behavior is evidence of unusually complex situations, which might render even more difficult the analysis and interpretation of observed phenomena. These cases require appropriate statistical techniques as well as sound a priori hypothesis concerning underlying geological processes. The purpose of this article is to present the state of the art in the methodology for a better visualization of compositional data, as well as for detecting statistically significant sub-populations. The scheme of this article is to present first the application, and then the underlying methodology, with the aim of the first motivating the second. Thus, the first part has the goal to illustrate how to understand and interpret results, whereas the second is devoted to expose how to perform a study of this kind. The case study is related to the chemical composition of a fumarole of Vulcano Island (southern Italy), called F14. The volcanic activity at Vulcano Island is subject to a continuous program of geochemical surveillance from 1978 up to now and the large data set of observations contains the main chemical composition of volcanic gases as well as trace element concentrations in the condensates of fumarolic gases. Out of the complete set of measured components, the variables H2S, HF and As, determined in samples collected from 1978 to 1993 (As is not available in recent samples) are used to characterize two groups in the original population, which proved to be statistically distinct. The choice of the variables is
Lumped parameter models for the interpretation of environmental tracer data
Energy Technology Data Exchange (ETDEWEB)
Maloszewski, P [GSF-Inst. for Hydrology, Oberschleissheim (Germany); Zuber, A [Institute of Nuclear Physics, Cracow (Poland)
1996-10-01
Principles of the lumped-parameter approach to the interpretation of environmental tracer data are given. The following models are considered: the piston flow model (PFM), exponential flow model (EM), linear model (LM), combined piston flow and exponential flow model (EPM), combined linear flow and piston flow model (LPM), and dispersion model (DM). The applicability of these models for the interpretation of different tracer data is discussed for a steady state flow approximation. Case studies are given to exemplify the applicability of the lumped-parameter approach. Description of a user-friendly computer program is given. (author). 68 refs, 25 figs, 4 tabs.
Lumped parameter models for the interpretation of environmental tracer data
International Nuclear Information System (INIS)
Maloszewski, P.; Zuber, A.
1996-01-01
Principles of the lumped-parameter approach to the interpretation of environmental tracer data are given. The following models are considered: the piston flow model (PFM), exponential flow model (EM), linear model (LM), combined piston flow and exponential flow model (EPM), combined linear flow and piston flow model (LPM), and dispersion model (DM). The applicability of these models for the interpretation of different tracer data is discussed for a steady state flow approximation. Case studies are given to exemplify the applicability of the lumped-parameter approach. Description of a user-friendly computer program is given. (author). 68 refs, 25 figs, 4 tabs
HEP Software Foundation Community White Paper Working Group - Data Analysis and Interpretation
Energy Technology Data Exchange (ETDEWEB)
Bauerdick, Lothar; et al.
2018-04-09
At the heart of experimental high energy physics (HEP) is the development of facilities and instrumentation that provide sensitivity to new phenomena. Our understanding of nature at its most fundamental level is advanced through the analysis and interpretation of data from sophisticated detectors in HEP experiments. The goal of data analysis systems is to realize the maximum possible scientific potential of the data within the constraints of computing and human resources in the least time. To achieve this goal, future analysis systems should empower physicists to access the data with a high level of interactivity, reproducibility and throughput capability. As part of the HEP Software Foundation Community White Paper process, a working group on Data Analysis and Interpretation was formed to assess the challenges and opportunities in HEP data analysis and develop a roadmap for activities in this area over the next decade. In this report, the key findings and recommendations of the Data Analysis and Interpretation Working Group are presented.
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Bremer, P. -T. [Univ. of Utah, Salt Lake City, UT (United States)
2013-10-31
Many commonly used algorithms for mathematical analysis do not scale well enough to accommodate the size or complexity of petascale data produced by computational simulations. The primary goal of this project is to develop new mathematical tools that address both the petascale size and uncertain nature of current data. At a high level, the approach of the entire team involving all three institutions is based on the complementary techniques of combinatorial topology and statistical modelling. In particular, we use combinatorial topology to filter out spurious data that would otherwise skew statistical modelling techniques, and we employ advanced algorithms from algebraic statistics to efficiently find globally optimal fits to statistical models. The overall technical contributions can be divided loosely into three categories: (1) advances in the field of combinatorial topology, (2) advances in statistical modelling, and (3) new integrated topological and statistical methods. Roughly speaking, the division of labor between our 3 groups (Sandia Labs in Livermore, Texas A&M in College Station, and U Utah in Salt Lake City) is as follows: the Sandia group focuses on statistical methods and their formulation in algebraic terms, and finds the application problems (and data sets) most relevant to this project, the Texas A&M Group develops new algebraic geometry algorithms, in particular with fewnomial theory, and the Utah group develops new algorithms in computational topology via Discrete Morse Theory. However, we hasten to point out that our three groups stay in tight contact via videconference every 2 weeks, so there is much synergy of ideas between the groups. The following of this document is focused on the contributions that had grater direct involvement from the team at the University of Utah in Salt Lake City.
The disagreeable behaviour of the kappa statistic.
Flight, Laura; Julious, Steven A
2015-01-01
It is often of interest to measure the agreement between a number of raters when an outcome is nominal or ordinal. The kappa statistic is used as a measure of agreement. The statistic is highly sensitive to the distribution of the marginal totals and can produce unreliable results. Other statistics such as the proportion of concordance, maximum attainable kappa and prevalence and bias adjusted kappa should be considered to indicate how well the kappa statistic represents agreement in the data. Each kappa should be considered and interpreted based on the context of the data being analysed. Copyright © 2014 John Wiley & Sons, Ltd.
Statistics and probability with applications for engineers and scientists
Gupta, Bhisham C
2013-01-01
Introducing the tools of statistics and probability from the ground up An understanding of statistical tools is essential for engineers and scientists who often need to deal with data analysis over the course of their work. Statistics and Probability with Applications for Engineers and Scientists walks readers through a wide range of popular statistical techniques, explaining step-by-step how to generate, analyze, and interpret data for diverse applications in engineering and the natural sciences. Unique among books of this kind, Statistics and Prob
Lee, Alexandra J; Chang, Ivan; Burel, Julie G; Lindestam Arlehamn, Cecilia S; Mandava, Aishwarya; Weiskopf, Daniela; Peters, Bjoern; Sette, Alessandro; Scheuermann, Richard H; Qian, Yu
2018-04-17
Computational methods for identification of cell populations from polychromatic flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. However, interpretation of the identified data clusters is labor-intensive. Certain types of user-defined cell populations are also difficult to identify by fully automated data clustering analysis. Both are roadblocks before a cytometry lab can adopt the data clustering approach for cell population identification in routine use. We found that combining recursive data filtering and clustering with constraints converted from the user manual gating strategy can effectively address these two issues. We named this new approach DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell subsets, but also makes the results interpretable to experimental scientists through mapping and merging the multidimensional data clusters into the user-defined two-dimensional gating hierarchy. The recursive data filtering process in DAFi helped identify small data clusters which are otherwise difficult to resolve by a single run of the data clustering method due to the statistical interference of the irrelevant major clusters. Our experiment results showed that the proportions of the cell populations identified by DAFi, while being consistent with those by expert centralized manual gating, have smaller technical variances across samples than those from individual manual gating analysis and the nonrecursive data clustering analysis. Compared with manual gating segregation, DAFi-identified cell populations avoided the abrupt cut-offs on the boundaries. DAFi has been implemented to be used with multiple data clustering methods including K-means, FLOCK, FlowSOM, and
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
International Nuclear Information System (INIS)
Bergstrom, K.A.; Mitchell, T.H.; Kunk, J.R.
1993-07-01
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ''error'' in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand
Interpretation of Ground Penetrating Radar data at the Hanford Site, Richland, Washington
Energy Technology Data Exchange (ETDEWEB)
Bergstrom, K.A.; Mitchell, T.H.; Kunk, J.R.
1993-07-01
Ground Penetrating Radar (GPR) is being used extensively during characterization and remediation of chemical and radioactive waste sites at the Hanford Site in Washington State. Time and money for GPR investigations are often not included during the planning and budgeting phase. Therefore GPR investigations must be inexpensive and quick to minimize impact on already established budgets and schedules. An approach to survey design, data collection, and interpretation has been developed which emphasizes speed and budget with minimal impact on the integrity of the interpretation or quality of the data. The following simple rules of thumb can be applied: (1) Assemble as much pre-survey information as possible, (2) Clearly define survey objectives prior to designing the survey and determine which combination of geophysical methods will best meet the objectives, (3) Continuously communicate with the client, before, during and after the investigation, (4) Only experienced GPR interpreters should acquire the field data, (5) Use real-time monitoring of the data to determine where and how much data to collect and assist in the interpretation, (6) Always ``error`` in favor of collecting too much data, (7) Surveys should have closely spaced (preferably 5 feet, no more than 10 feet), orthogonal profiles, (8) When possible, pull the antenna by hand.
Testing the statistical compatibility of independent data sets
International Nuclear Information System (INIS)
Maltoni, M.; Schwetz, T.
2003-01-01
We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ 2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed
Measuring the data universe data integration using statistical data and metadata exchange
Stahl, Reinhold
2018-01-01
This richly illustrated book provides an easy-to-read introduction to the challenges of organizing and integrating modern data worlds, explaining the contribution of public statistics and the ISO standard SDMX (Statistical Data and Metadata Exchange). As such, it is a must for data experts as well those aspiring to become one. Today, exponentially growing data worlds are increasingly determining our professional and private lives. The rapid increase in the amount of globally available data, fueled by search engines and social networks but also by new technical possibilities such as Big Data, offers great opportunities. But whatever the undertaking – driving the block chain revolution or making smart phones even smarter – success will be determined by how well it is possible to integrate, i.e. to collect, link and evaluate, the required data. One crucial factor in this is the introduction of a cross-domain order system in combination with a standardization of the data structure. Using everyday examples, th...
Campanya, J. L.; Ogaya, X.; Jones, A. G.; Rath, V.; McConnell, B.; Haughton, P.; Prada, M.
2016-12-01
The Science Foundation Ireland funded project IRECCSEM project (www.ireccsem.ie) aims to evaluate Ireland's potential for onshore carbon sequestration in saline aquifers by integrating new electromagnetic geophysical data with existing geophysical and geological data. One of the objectives of this component of IRECCSEM is to characterise the subsurface beneath the Loop Head Peninsula (part of Clare Basin, Co. Clare, Ireland), and identify major electrical resistivity structures that can guide an interpretation of the carbon sequestration potential of this area. During the summer of 2014, a magnetotelluric (MT) survey was carried out on the Loop Head Peninsula, and data from a total of 140 sites were acquired, including audio-magnetotelluric (AMT), and broadband magnetotelluric (BBMT). The dataset was used to generate shallow three-dimensional (3-D) electrical resistivity models constraining the subsurface to depths of up to 3.5 km. The three-dimensional (3-D) joint inversions were performed using three different types of electromagnetic data: MT impedance tensor (Z), geomagnetic transfer functions (T), and inter-station horizontal magnetic transfer-functions (H). The interpretation of the results was complemented with second-derivative models of the resulting electrical resistivity models, and a quantitative comparison with borehole data using multivariate statistical methods. Second-derivative models were used to define the main interfaces between the geoelectrical structures, facilitating superior comparison with geological and seismic results, and also reducing the influence of the colour scale when interpreting the results. Specific analysis was performed to compare the extant borehole data with the electrical resistivity model, identifying those structures that are better characterised by the resistivity model. Finally, the electrical resistivity model was also used to propagate some of the physical properties measured in the borehole, when a good relation was
Complex Data Modeling and Computationally Intensive Statistical Methods
Mantovan, Pietro
2010-01-01
The last years have seen the advent and development of many devices able to record and store an always increasing amount of complex and high dimensional data; 3D images generated by medical scanners or satellite remote sensing, DNA microarrays, real time financial data, system control datasets. The analysis of this data poses new challenging problems and requires the development of novel statistical models and computational methods, fueling many fascinating and fast growing research areas of modern statistics. The book offers a wide variety of statistical methods and is addressed to statistici
Translation of EPA Research: Data Interpretation and Communication Strategies
Symposium Title: Social Determinants of Health, Environmental Exposures, and Disproportionately Impacted Communities: What We Know and How We Tell Others Topic 3: Community Engagement and Research Translation Title: Translation of EPA Research: Data Interpretation and Communicati...
Big Data as a Source for Official Statistics
Directory of Open Access Journals (Sweden)
Daas Piet J.H.
2015-06-01
Full Text Available More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.
Kappa statistic for clustered matched-pair data.
Yang, Zhao; Zhou, Ming
2014-07-10
Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
DEFF Research Database (Denmark)
Van Driel, A.F.; Nikolaev, I.S.; Vergeer, P.
2007-01-01
We present a statistical analysis of time-resolved spontaneous emission decay curves from ensembles of emitters, such as semiconductor quantum dots, with the aim of interpreting ubiquitous non-single-exponential decay. Contrary to what is widely assumed, the density of excited emitters...... and the intensity in an emission decay curve are not proportional, but the density is a time integral of the intensity. The integral relation is crucial to correctly interpret non-single-exponential decay. We derive the proper normalization for both a discrete and a continuous distribution of rates, where every...... decay component is multiplied by its radiative decay rate. A central result of our paper is the derivation of the emission decay curve when both radiative and nonradiative decays are independently distributed. In this case, the well-known emission quantum efficiency can no longer be expressed...
Jang, Daeheung; Lai, Tze; Lee, Youngjo; Lu, Ying; Ni, Jun; Qian, Peter; Qiu, Peihua; Tiao, George
2018-01-01
This book presents the proceedings of the 2nd Pacific Rim Statistical Conference for Production Engineering: Production Engineering, Big Data and Statistics, which took place at Seoul National University in Seoul, Korea in December, 2016. The papers included discuss a wide range of statistical challenges, methods and applications for big data in production engineering, and introduce recent advances in relevant statistical methods.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
International Nuclear Information System (INIS)
Bonetti, R.; Milazzo, L.C.; Melanotte, M.
1983-01-01
A number of (p,n), (n,p), and ( 3 He, p) reactions have been interpreted on the basis of the statistical multistep compound emission mechanism. Good agreement with experiment is found both in spectrum shape and in the value of the coherence widths
Implementation of statistical analysis methods for medical physics data
International Nuclear Information System (INIS)
Teixeira, Marilia S.; Pinto, Nivia G.P.; Barroso, Regina C.; Oliveira, Luis F.
2009-01-01
The objective of biomedical research with different radiation natures is to contribute for the understanding of the basic physics and biochemistry of the biological systems, the disease diagnostic and the development of the therapeutic techniques. The main benefits are: the cure of tumors through the therapy, the anticipated detection of diseases through the diagnostic, the using as prophylactic mean for blood transfusion, etc. Therefore, for the better understanding of the biological interactions occurring after exposure to radiation, it is necessary for the optimization of therapeutic procedures and strategies for reduction of radioinduced effects. The group pf applied physics of the Physics Institute of UERJ have been working in the characterization of biological samples (human tissues, teeth, saliva, soil, plants, sediments, air, water, organic matrixes, ceramics, fossil material, among others) using X-rays diffraction and X-ray fluorescence. The application of these techniques for measurement, analysis and interpretation of the biological tissues characteristics are experimenting considerable interest in the Medical and Environmental Physics. All quantitative data analysis must be initiated with descriptive statistic calculation (means and standard deviations) in order to obtain a previous notion on what the analysis will reveal. It is well known que o high values of standard deviation found in experimental measurements of biologicals samples can be attributed to biological factors, due to the specific characteristics of each individual (age, gender, environment, alimentary habits, etc). This work has the main objective the development of a program for the use of specific statistic methods for the optimization of experimental data an analysis. The specialized programs for this analysis are proprietary, another objective of this work is the implementation of a code which is free and can be shared by the other research groups. As the program developed since the
Experimental statistics for biological sciences.
Bang, Heejung; Davidian, Marie
2010-01-01
In this chapter, we cover basic and fundamental principles and methods in statistics - from "What are Data and Statistics?" to "ANOVA and linear regression," which are the basis of any statistical thinking and undertaking. Readers can easily find the selected topics in most introductory statistics textbooks, but we have tried to assemble and structure them in a succinct and reader-friendly manner in a stand-alone chapter. This text has long been used in real classroom settings for both undergraduate and graduate students who do or do not major in statistical sciences. We hope that from this chapter, readers would understand the key statistical concepts and terminologies, how to design a study (experimental or observational), how to analyze the data (e.g., describe the data and/or estimate the parameter(s) and make inference), and how to interpret the results. This text would be most useful if it is used as a supplemental material, while the readers take their own statistical courses or it would serve as a great reference text associated with a manual for any statistical software as a self-teaching guide.
Combination and interpretation of observables in Cosmology
Directory of Open Access Journals (Sweden)
Virey Jean-Marc
2010-04-01
Full Text Available The standard cosmological model has deep theoretical foundations but need the introduction of two major unknown components, dark matter and dark energy, to be in agreement with various observations. Dark matter describes a non-relativistic collisionless fluid of (non baryonic matter which amount to 25% of the total density of the universe. Dark energy is a new kind of fluid not of matter type, representing 70% of the total density which should explain the recent acceleration of the expansion of the universe. Alternatively, one can reject this idea of adding one or two new components but argue that the equations used to make the interpretation should be modified consmological scales. Instead of dark matter one can invoke a failure of Newton's laws. Instead of dark energy, two approaches are proposed : general relativity (in term of the Einstein equation should be modified, or the cosmological principle which fixes the metric used for cosmology should be abandonned. One of the main objective of the community is to find the path of the relevant interpretations thanks to the next generation of experiments which should provide large statistics of observationnal data. Unfortunately, cosmological in formations are difficult to pin down directly fromt he measurements, and it is mandatory to combine the various observables to get the cosmological parameters. This is not problematic from the statistical point of view, but assumptions and approximations made for the analysis may bias our interprettion of the data. Consequently, a strong attention should be paied to the statistical methods used to make parameters estimation and for model testing. After a review of the basics of cosmology where the cosmological parameters are introduced, we discuss the various cosmological probes and their associated observables used to extract cosmological informations. We present the results obtained from several statistical analyses combining data of diferent nature but
Uncertainty analysis with statistically correlated failure data
International Nuclear Information System (INIS)
Modarres, M.; Dezfuli, H.; Roush, M.L.
1987-01-01
Likelihood of occurrence of the top event of a fault tree or sequences of an event tree is estimated from the failure probability of components that constitute the events of the fault/event tree. Component failure probabilities are subject to statistical uncertainties. In addition, there are cases where the failure data are statistically correlated. At present most fault tree calculations are based on uncorrelated component failure data. This chapter describes a methodology for assessing the probability intervals for the top event failure probability of fault trees or frequency of occurrence of event tree sequences when event failure data are statistically correlated. To estimate mean and variance of the top event, a second-order system moment method is presented through Taylor series expansion, which provides an alternative to the normally used Monte Carlo method. For cases where component failure probabilities are statistically correlated, the Taylor expansion terms are treated properly. Moment matching technique is used to obtain the probability distribution function of the top event through fitting the Johnson Ssub(B) distribution. The computer program, CORRELATE, was developed to perform the calculations necessary for the implementation of the method developed. (author)
Challenges in computational statistics and data mining
Mielniczuk, Jan
2016-01-01
This volume contains nineteen research papers belonging to the areas of computational statistics, data mining, and their applications. Those papers, all written specifically for this volume, are their authors’ contributions to honour and celebrate Professor Jacek Koronacki on the occcasion of his 70th birthday. The book’s related and often interconnected topics, represent Jacek Koronacki’s research interests and their evolution. They also clearly indicate how close the areas of computational statistics and data mining are.
Analysis of filament statistics in fast camera data on MAST
Farley, Tom; Militello, Fulvio; Walkden, Nick; Harrison, James; Silburn, Scott; Bradley, James
2017-10-01
Coherent filamentary structures have been shown to play a dominant role in turbulent cross-field particle transport [D'Ippolito 2011]. An improved understanding of filaments is vital in order to control scrape off layer (SOL) density profiles and thus control first wall erosion, impurity flushing and coupling of radio frequency heating in future devices. The Elzar code [T. Farley, 2017 in prep.] is applied to MAST data. The code uses information about the magnetic equilibrium to calculate the intensity of light emission along field lines as seen in the camera images, as a function of the field lines' radial and toroidal locations at the mid-plane. In this way a `pseudo-inversion' of the intensity profiles in the camera images is achieved from which filaments can be identified and measured. In this work, a statistical analysis of the intensity fluctuations along field lines in the camera field of view is performed using techniques similar to those typically applied in standard Langmuir probe analyses. These filament statistics are interpreted in terms of the theoretical ergodic framework presented by F. Militello & J.T. Omotani, 2016, in order to better understand how time averaged filament dynamics produce the more familiar SOL density profiles. This work has received funding from the RCUK Energy programme (Grant Number EP/P012450/1), from Euratom (Grant Agreement No. 633053) and from the EUROfusion consortium.
Enerdata statistical yearbook. ''the key-data of energy worldwide''. 1999 data
International Nuclear Information System (INIS)
2000-01-01
The new edition of the Enerdata statistical yearbook provides the most recent statistical data on energy (oil, gas, coal and power production) and CO 2 emissions worldwide for the 1994-1999 period of time. These data cover 52 countries and 12 geographic areas and are presented in the form of tables and graphs (production, foreign exchanges, consumptions, market shares, sectoral consumption, 1999 energy status, long-term tendencies). More data for a longer period (1970-1999) and for all countries worldwide are available on the CD-Rom version of the yearbook. (J.S.)
Application of Ontology Technology in Health Statistic Data Analysis.
Guo, Minjiang; Hu, Hongpu; Lei, Xingyun
2017-01-01
Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.
Fuzzy logic and image processing techniques for the interpretation of seismic data
International Nuclear Information System (INIS)
Orozco-del-Castillo, M G; Ortiz-Alemán, C; Rodríguez-Castellanos, A; Urrutia-Fucugauchi, J
2011-01-01
Since interpretation of seismic data is usually a tedious and repetitive task, the ability to do so automatically or semi-automatically has become an important objective of recent research. We believe that the vagueness and uncertainty in the interpretation process makes fuzzy logic an appropriate tool to deal with seismic data. In this work we developed a semi-automated fuzzy inference system to detect the internal architecture of a mass transport complex (MTC) in seismic images. We propose that the observed characteristics of a MTC can be expressed as fuzzy if-then rules consisting of linguistic values associated with fuzzy membership functions. The constructions of the fuzzy inference system and various image processing techniques are presented. We conclude that this is a well-suited problem for fuzzy logic since the application of the proposed methodology yields a semi-automatically interpreted MTC which closely resembles the MTC from expert manual interpretation
Analysis of Preference Data Using Intermediate Test Statistic Abstract
African Journals Online (AJOL)
PROF. O. E. OSUAGWU
2013-06-01
Jun 1, 2013 ... West African Journal of Industrial and Academic Research Vol.7 No. 1 June ... Keywords:-Preference data, Friedman statistic, multinomial test statistic, intermediate test statistic. ... new method and consequently a new statistic ...
Statistical learning from a regression perspective
Berk, Richard A
2016-01-01
This textbook considers statistical learning applications when interest centers on the conditional distribution of the response variable, given a set of predictors, and when it is important to characterize how the predictors are related to the response. As a first approximation, this can be seen as an extension of nonparametric regression. This fully revised new edition includes important developments over the past 8 years. Consistent with modern data analytics, it emphasizes that a proper statistical learning data analysis derives from sound data collection, intelligent data management, appropriate statistical procedures, and an accessible interpretation of results. A continued emphasis on the implications for practice runs through the text. Among the statistical learning procedures examined are bagging, random forests, boosting, support vector machines and neural networks. Response variables may be quantitative or categorical. As in the first edition, a unifying theme is supervised learning that can be trea...
Statistical treatment of fatigue test data
International Nuclear Information System (INIS)
Raske, D.T.
1980-01-01
This report discussed several aspects of fatigue data analysis in order to provide a basis for the development of statistically sound design curves. Included is a discussion on the choice of the dependent variable, the assumptions associated with least squares regression models, the variability of fatigue data, the treatment of data from suspended tests and outlying observations, and various strain-life relations
Statistics As Principled Argument
Abelson, Robert P
2012-01-01
In this illuminating volume, Robert P. Abelson delves into the too-often dismissed problems of interpreting quantitative data and then presenting them in the context of a coherent story about one's research. Unlike too many books on statistics, this is a remarkably engaging read, filled with fascinating real-life (and real-research) examples rather than with recipes for analysis. It will be of true interest and lasting value to beginning graduate students and seasoned researchers alike. The focus of the book is that the purpose of statistics is to organize a useful argument from quantitative
Estimation of global network statistics from incomplete data.
Directory of Open Access Journals (Sweden)
Catherine A Bliss
Full Text Available Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week.
Siska, William; Gupta, Aradhana; Tomlinson, Lindsay; Tripathi, Niraj; von Beust, Barbara
Clinical pathology testing is routinely performed in target animal safety studies in order to identify potential toxicity associated with administration of an investigational veterinary pharmaceutical product. Regulatory and other testing guidelines that address such studies provide recommendations for clinical pathology testing but occasionally contain outdated analytes and do not take into account interspecies physiologic differences that affect the practical selection of appropriate clinical pathology tests. Additionally, strong emphasis is often placed on statistical analysis and use of reference intervals for interpretation of test article-related clinical pathology changes, with limited attention given to the critical scientific review of clinically, toxicologically, or biologically relevant changes. The purpose of this communication from the Regulatory Affairs Committee of the American Society for Veterinary Clinical Pathology is to provide current recommendations for clinical pathology testing and data interpretation in target animal safety studies and thereby enhance the value of clinical pathology testing in these studies.
International Nuclear Information System (INIS)
Berkman, E.
1983-01-01
The purpose of this project was to reprocess, evaluate, and reinterpret 14 line miles of seismic reflection data acquired at the Hanford Site. Regional and area-specific geology has been reviewed, the data acquisition parameters as they relate to the limitations inherent in the data have been discussed, and the reprocessing procedures have been described in detail along with an evaluation of the original processing. After initial testing, the focus of the reprocessing was placed on resolution of the geologic horizons at and near the top of the basalt. The reprocessed seismic data shows significant improvement over the original processing. The improvement is the result of the integrated processing and interpretation approach where each processing step has been tested in sequence and the intermediate results examined carefully in accordance with the project goals. The interpretation procedure placed strong reliance upon synthetic seismograms and models calculated based upon the physical parameters of the subsurface materials, and upon associated geophysical (reflection, gravity, magnetic) data. The final interpretation of the seismic data is in agreement with the structural contour maps based primarily on borehole information. The seismic interpretation has added important detail concerning areas which should be considered for further study. 60 figs., 1 tab
Gas, electricity, coal: 1998 statistical data
International Nuclear Information System (INIS)
1999-01-01
This document brings together the main statistical data from the French direction of gas, electricity and coal and presents a selection of the most significant numbered data: origin of production, share of the consumption, price levels, resources-employment status. These data are presented in a synthetic and accessible way in order to make useful references for the actors of the energy sector. (J.S.)
Statistical Literacy in the Data Science Workplace
Grant, Robert
2017-01-01
Statistical literacy, the ability to understand and make use of statistical information including methods, has particular relevance in the age of data science, when complex analyses are undertaken by teams from diverse backgrounds. Not only is it essential to communicate to the consumers of information but also within the team. Writing from the…
Functional MRI experiments : acquisition, analysis and interpretation of data
Ramsey, NF; Hoogduin, H; Jansma, JM
2002-01-01
Functional MRI is widely used to address basic and clinical neuroscience questions. In the key domains of fMRI experiments, i.e. acquisition, processing and analysis, and interpretation of data, developments are ongoing. The main issues are sensitivity for changes in fMRI signal that are associated
STATISTICS IN SERVICE QUALITY ASSESSMENT
Directory of Open Access Journals (Sweden)
Dragana Gardašević
2012-09-01
Full Text Available For any quality evaluation in sports, science, education, and so, it is useful to collect data to construct a strategy to improve the quality of services offered to the user. For this purpose, we use statistical software packages for data processing data collected in order to increase customer satisfaction. The principle is demonstrated by the example of the level of student satisfaction ratings Belgrade Polytechnic (as users the quality of institutions (Belgrade Polytechnic. Here, the emphasis on statistical analysis as a tool for quality control in order to improve the same, and not the interpretation of results. Therefore, the above can be used as a model in sport to improve the overall results.
Kissling, Grace E; Haseman, Joseph K; Zeiger, Errol
2015-09-02
A recent article by Gaus (2014) demonstrates a serious misunderstanding of the NTP's statistical analysis and interpretation of rodent carcinogenicity data as reported in Technical Report 578 (Ginkgo biloba) (NTP, 2013), as well as a failure to acknowledge the abundant literature on false positive rates in rodent carcinogenicity studies. The NTP reported Ginkgo biloba extract to be carcinogenic in mice and rats. Gaus claims that, in this study, 4800 statistical comparisons were possible, and that 209 of them were statistically significant (p<0.05) compared with 240 (4800×0.05) expected by chance alone; thus, the carcinogenicity of Ginkgo biloba extract cannot be definitively established. However, his assumptions and calculations are flawed since he incorrectly assumes that the NTP uses no correction for multiple comparisons, and that significance tests for discrete data operate at exactly the nominal level. He also misrepresents the NTP's decision making process, overstates the number of statistical comparisons made, and ignores the fact that the mouse liver tumor effects were so striking (e.g., p<0.0000000000001) that it is virtually impossible that they could be false positive outcomes. Gaus' conclusion that such obvious responses merely "generate a hypothesis" rather than demonstrate a real carcinogenic effect has no scientific credibility. Moreover, his claims regarding the high frequency of false positive outcomes in carcinogenicity studies are misleading because of his methodological misconceptions and errors. Published by Elsevier Ireland Ltd.
Yoshioka, S; Aso, Y; Takeda, Y
1990-06-01
Accelerated stability data obtained at a single temperature is statistically evaluated, and the utility of such data for assessment of stability is discussed focussing on the chemical stability of solution-state dosage forms. The probability that the drug content of a product is observed to be within the lower specification limit in the accelerated test is interpreted graphically. This probability depends on experimental errors in the assay and temperature control, as well as the true degradation rate and activation energy. Therefore, the observation that the drug content meets the specification in the accelerated testing can provide only limited information on the shelf-life of the drug, without the knowledge of the activation energy and the accuracy and precision of the assay and temperature control.
Taylor, P. T.; Kis, K. I.; Wittmann, G.
2013-01-01
The ESA SWARM mission will have three earth orbiting magnetometer bearing satellites one in a high orbit and two side-by-side in lower orbits. These latter satellites will record a horizontal magnetic gradient. In order to determine how we can use these gradient measurements for interpretation of large geologic units we used ten years of CHAMP data to compute a horizontal gradient map over a section of southeastern Europe with our goal to interpret these data over the Pannonian Basin of Hungary.
SOCR: Statistics Online Computational Resource
Directory of Open Access Journals (Sweden)
Ivo D. Dinov
2006-10-01
Full Text Available The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR. This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student's intuition and enhance their learning.
Ratner, Bruce
2011-01-01
The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has
Fetal Alcohol Spectrum Disorders (FASDs): Data and Statistics
... alcohol screening and counseling for all women Data & Statistics Recommend on Facebook Tweet Share Compartir Prevalence of ... conducted annually by the National Center for Health Statistics (NCHS), CDC, to produce national estimates for a ...
Statistical data processing with automatic system for environmental radiation monitoring
International Nuclear Information System (INIS)
Zarkh, V.G.; Ostroglyadov, S.V.
1986-01-01
Practice of statistical data processing for radiation monitoring is exemplified, and some results obtained are presented. Experience in practical application of mathematical statistics methods for radiation monitoring data processing allowed to develop a concrete algorithm of statistical processing realized in M-6000 minicomputer. The suggested algorithm by its content is divided into 3 parts: parametrical data processing and hypotheses test, pair and multiple correlation analysis. Statistical processing programms are in a dialogue operation. The above algorithm was used to process observed data over radioactive waste disposal control region. Results of surface waters monitoring processing are presented
Dozmorov, Mikhail G
2017-10-15
One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. mikhail.dozmorov@vcuhealth.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Human biomonitoring data interpretation and ethics; obstacles or surmountable challenges?
Directory of Open Access Journals (Sweden)
Sepai Ovnair
2008-01-01
Full Text Available Abstract The use of human samples to assess environmental exposure and uptake of chemicals is more than an analytical exercise and requires consideration of the utility and interpretation of data as well as due consideration of ethical issues. These aspects are inextricably linked. In 2004 the EC expressed its commitment to the development of a harmonised approach to human biomonitoring (HBM by including an action in the EU Environment and Health Strategy to develop a Human Biomonitoring Pilot Study. This further underlined the need for interpretation strategies as well as guidance on ethical issues. A workshop held in December 2006 brought together stakeholders from academia, policy makers as well as non-governmental organisations and chemical industry associations to a two day workshop built a mutual understanding of the issues in an open and frank discussion forum. This paper describes the discussion and recommendations from the workshop. The workshop developed key recommendations for a Pan-European HBM Study: 1. A strategy for the interpretation of human biomonitoring data should be developed. 2. The pilot study should include the development of a strategy to integrate health data and environmental monitoring with human biomonitoring data at national and international levels. 3. Communication strategies should be developed when designing the study and evolve as the study continues. 4. Early communication with stakeholders is essential to achieve maximum efficacy of policy developments and facilitate subsequent monitoring. 5. Member states will have to apply individually for project approval from their National Research Ethics Committees. 6. The study population needs to have sufficient information on the way data will be gathered, interpreted and disseminated and how samples will be stored and used in the future (if biobanking before they can give informed consent. 7. The participants must be given the option of anonymity. This has an impact
Workshop statistics discovery with data and Minitab
Rossman, Allan J
1998-01-01
Shorn of all subtlety and led naked out of the protec tive fold of educational research literature, there comes a sheepish little fact: lectures don't work nearly as well as many of us would like to think. -George Cobb (1992) This book contains activities that guide students to discover statistical concepts, explore statistical principles, and apply statistical techniques. Students work toward these goals through the analysis of genuine data and through inter action with one another, with their instructor, and with technology. Providing a one-semester introduction to fundamental ideas of statistics for college and advanced high school students, Warkshop Statistics is designed for courses that employ an interactive learning environment by replacing lectures with hands on activities. The text contains enough expository material to stand alone, but it can also be used to supplement a more traditional textbook. Some distinguishing features of Workshop Statistics are its emphases on active learning, conceptu...
Statistical ecology comes of age
Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-01-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151
Statistical ecology comes of age.
Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric
2014-12-01
The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.
Ogunnaike, Babatunde A; Gelmi, Claudio A; Edwards, Jeremy S
2010-05-21
Gene expression studies generate large quantities of data with the defining characteristic that the number of genes (whose expression profiles are to be determined) exceed the number of available replicates by several orders of magnitude. Standard spot-by-spot analysis still seeks to extract useful information for each gene on the basis of the number of available replicates, and thus plays to the weakness of microarrays. On the other hand, because of the data volume, treating the entire data set as an ensemble, and developing theoretical distributions for these ensembles provides a framework that plays instead to the strength of microarrays. We present theoretical results that under reasonable assumptions, the distribution of microarray intensities follows the Gamma model, with the biological interpretations of the model parameters emerging naturally. We subsequently establish that for each microarray data set, the fractional intensities can be represented as a mixture of Beta densities, and develop a procedure for using these results to draw statistical inference regarding differential gene expression. We illustrate the results with experimental data from gene expression studies on Deinococcus radiodurans following DNA damage using cDNA microarrays. Copyright (c) 2010 Elsevier Ltd. All rights reserved.
Sensitivity of goodness-of-fit statistics to rainfall data rounding off
Deidda, Roberto; Puliga, Michelangelo
An analysis based on the L-moments theory suggests of adopting the generalized Pareto distribution to interpret daily rainfall depths recorded by the rain-gauge network of the Hydrological Survey of the Sardinia Region. Nevertheless, a big problem, not yet completely resolved, arises in the estimation of a left-censoring threshold able to assure a good fitting of rainfall data with the generalized Pareto distribution. In order to detect an optimal threshold, keeping the largest possible number of data, we chose to apply a “failure-to-reject” method based on goodness-of-fit tests, as it was proposed by Choulakian and Stephens [Choulakian, V., Stephens, M.A., 2001. Goodness-of-fit tests for the generalized Pareto distribution. Technometrics 43, 478-484]. Unfortunately, the application of the test, using percentage points provided by Choulakian and Stephens (2001), did not succeed in detecting a useful threshold value in most analyzed time series. A deeper analysis revealed that these failures are mainly due to the presence of large quantities of rounding off values among sample data, affecting the distribution of goodness-of-fit statistics and leading to significant departures from percentage points expected for continuous random variables. A procedure based on Monte Carlo simulations is thus proposed to overcome these problems.
International Conference on Robust Statistics 2015
Basu, Ayanendranath; Filzmoser, Peter; Mukherjee, Diganta
2016-01-01
This book offers a collection of recent contributions and emerging ideas in the areas of robust statistics presented at the International Conference on Robust Statistics 2015 (ICORS 2015) held in Kolkata during 12–16 January, 2015. The book explores the applicability of robust methods in other non-traditional areas which includes the use of new techniques such as skew and mixture of skew distributions, scaled Bregman divergences, and multilevel functional data methods; application areas being circular data models and prediction of mortality and life expectancy. The contributions are of both theoretical as well as applied in nature. Robust statistics is a relatively young branch of statistical sciences that is rapidly emerging as the bedrock of statistical analysis in the 21st century due to its flexible nature and wide scope. Robust statistics supports the application of parametric and other inference techniques over a broader domain than the strictly interpreted model scenarios employed in classical statis...
Development of statistical analysis code for meteorological data (W-View)
International Nuclear Information System (INIS)
Tachibana, Haruo; Sekita, Tsutomu; Yamaguchi, Takenori
2003-03-01
A computer code (W-View: Weather View) was developed to analyze the meteorological data statistically based on 'the guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). The code gives statistical meteorological data to assess the public dose in case of normal operation and severe accident to get the license of nuclear reactor operation. This code was revised from the original code used in a large office computer code to enable a personal computer user to analyze the meteorological data simply and conveniently and to make the statistical data tables and figures of meteorology. (author)
A Novel Approach to Asynchronous MVP Data Interpretation Based on Elliptical-Vectors
Kruglyakov, M.; Trofimov, I.; Korotaev, S.; Shneyer, V.; Popova, I.; Orekhova, D.; Scshors, Y.; Zhdanov, M. S.
2014-12-01
We suggest a novel approach to asynchronous magnetic-variation profiling (MVP) data interpretation. Standard method in MVP is based on the interpretation of the coefficients of linear relation between vertical and horizontal components of the measured magnetic field.From mathematical point of view this pair of linear coefficients is not a vector which leads to significant difficulties in asynchronous data interpretation. Our approach allows us to actually treat such a pair of complex numbers as a special vector called an ellipse-vector (EV). By choosing the particular definitions of complex length and direction, the basic relation of MVP can be considered as the dot product. This considerably simplifies the interpretation of asynchronous data. The EV is described by four real numbers: the values of major and minor semiaxes, the angular direction of the major semiaxis and the phase. The notation choice is motivated by historical reasons. It is important that different EV's components have different sensitivity with respect to the field sources and the local heterogeneities. Namely, the value of major semiaxis and the angular direction are mostly determined by the field source and the normal cross-section. On the other hand, the value of minor semiaxis and the phase are responsive to local heterogeneities. Since the EV is the general form of complex vector, the traditional Schmucker vectors can be explicitly expressed through its components.The proposed approach was successfully applied to interpretation the results of asynchronous measurements that had been obtained in the Arctic Ocean at the drift stations "North Pole" in 1962-1976.
Systems Analysis for Interpretation of Phosphoproteomics Data
DEFF Research Database (Denmark)
Munk, Stephanie; Refsgaard, Jan C; Olsen, Jesper V
2016-01-01
Global phosphoproteomics investigations yield overwhelming datasets with up to tens of thousands of quantified phosphosites. The main challenge after acquiring such large-scale data is to extract the biological meaning and relate this to the experimental question at hand. Systems level analysis...... provides the best means for extracting functional insights from such types of datasets, and this has primed a rapid development of bioinformatics tools and resources over the last decade. Many of these tools are specialized databases that can be mined for annotation and pathway enrichment, whereas others...... provide a platform to generate functional protein networks and explore the relations between proteins of interest. The use of these tools requires careful consideration with regard to the input data, and the interpretation demands a critical approach. This chapter provides a summary of the most...
Development of statistical analysis code for meteorological data (W-View)
Energy Technology Data Exchange (ETDEWEB)
Tachibana, Haruo; Sekita, Tsutomu; Yamaguchi, Takenori [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment
2003-03-01
A computer code (W-View: Weather View) was developed to analyze the meteorological data statistically based on 'the guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). The code gives statistical meteorological data to assess the public dose in case of normal operation and severe accident to get the license of nuclear reactor operation. This code was revised from the original code used in a large office computer code to enable a personal computer user to analyze the meteorological data simply and conveniently and to make the statistical data tables and figures of meteorology. (author)
Statistics for scientists and engineers
Shanmugam , Ramalingam
2015-01-01
This book provides the theoretical framework needed to build, analyze and interpret various statistical models. It helps readers choose the correct model, distinguish among various choices that best captures the data, or solve the problem at hand. This is an introductory textbook on probability and statistics. The authors explain theoretical concepts in a step-by-step manner and provide practical examples. The introductory chapter in this book presents the basic concepts. Next, the authors discuss the measures of location, popular measures of spread, and measures of skewness and kurtosis. Prob
The value of statistical tools to detect data fabrication
Hartgerink, C.H.J.; Wicherts, J.M.; van Assen, M.A.L.M.
2016-01-01
We aim to investigate how statistical tools can help detect potential data fabrication in the social- and medical sciences. In this proposal we outline three projects to assess the value of such statistical tools to detect potential data fabrication and make the first steps in order to apply them
Topology for Statistical Modeling of Petascale Data
Energy Technology Data Exchange (ETDEWEB)
Bennett, Janine Camille [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pebay, Philippe Pierre [Sandia National Lab. (SNL-CA), Livermore, CA (United States); Pascucci, Valerio [Univ. of Utah, Salt Lake City, UT (United States); Levine, Joshua [Univ. of Utah, Salt Lake City, UT (United States); Gyulassy, Attila [Univ. of Utah, Salt Lake City, UT (United States); Rojas, Maurice [Texas A & M Univ., College Station, TX (United States)
2014-07-01
This document presents current technical progress and dissemination of results for the Mathematics for Analysis of Petascale Data (MAPD) project titled "Topology for Statistical Modeling of Petascale Data", funded by the Office of Science Advanced Scientific Computing Research (ASCR) Applied Math program.
PANDA: pathway and annotation explorer for visualizing and interpreting gene-centric data.
Hart, Steven N; Moore, Raymond M; Zimmermann, Michael T; Oliver, Gavin R; Egan, Jan B; Bryce, Alan H; Kocher, Jean-Pierre A
2015-01-01
Objective. Bringing together genomics, transcriptomics, proteomics, and other -omics technologies is an important step towards developing highly personalized medicine. However, instrumentation has advances far beyond expectations and now we are able to generate data faster than it can be interpreted. Materials and Methods. We have developed PANDA (Pathway AND Annotation) Explorer, a visualization tool that integrates gene-level annotation in the context of biological pathways to help interpret complex data from disparate sources. PANDA is a web-based application that displays data in the context of well-studied pathways like KEGG, BioCarta, and PharmGKB. PANDA represents data/annotations as icons in the graph while maintaining the other data elements (i.e., other columns for the table of annotations). Custom pathways from underrepresented diseases can be imported when existing data sources are inadequate. PANDA also allows sharing annotations among collaborators. Results. In our first use case, we show how easy it is to view supplemental data from a manuscript in the context of a user's own data. Another use-case is provided describing how PANDA was leveraged to design a treatment strategy from the somatic variants found in the tumor of a patient with metastatic sarcomatoid renal cell carcinoma. Conclusion. PANDA facilitates the interpretation of gene-centric annotations by visually integrating this information with context of biological pathways. The application can be downloaded or used directly from our website: http://bioinformaticstools.mayo.edu/research/panda-viewer/.
STATCAT, Statistical Analysis of Parametric and Non-Parametric Data
International Nuclear Information System (INIS)
David, Hugh
1990-01-01
1 - Description of program or function: A suite of 26 programs designed to facilitate the appropriate statistical analysis and data handling of parametric and non-parametric data, using classical and modern univariate and multivariate methods. 2 - Method of solution: Data is read entry by entry, using a choice of input formats, and the resultant data bank is checked for out-of- range, rare, extreme or missing data. The completed STATCAT data bank can be treated by a variety of descriptive and inferential statistical methods, and modified, using other standard programs as required
Information systems for marine protected areas: How do users interpret desirable data attributes?
Carballo Cárdenas, E.C.; Mol, A.P.J.; Tobi, H.
2013-01-01
The purpose of this paper is to provide empirical evidence on how various user groups related to Marine Protected Areas (MPAs) interpret desirable data attributes, whether their interpretations differ and to what extent. Moreover, this study aims to make a methodological contribution to the
Using Data from Climate Science to Teach Introductory Statistics
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
The interpretation of Charpy impact test data using hyper-logistic fitting functions
International Nuclear Information System (INIS)
Helm, J.L.
1996-01-01
The hyperbolic tangent function is used almost exclusively for computer assisted curve fitting of Charpy impact test data. Unfortunately, there is no physical basis to justify the use of this function and it cannot be generalized to test data that exhibits asymmetry. Using simple physical arguments, a semi-empirical model is derived and identified as a special case of the so called hyper-logistic equation. Although one solution of this equation is the hyperbolic tangent, other more physically interpretable solutions are provided. From the mathematics of the family of functions derived from the hyper-logistic equation, several useful generalizations are made such that asymmetric and wavy Charpy data can be physically interpreted
Symbolic Data Analysis Conceptual Statistics and Data Mining
Billard, Lynne
2012-01-01
With the advent of computers, very large datasets have become routine. Standard statistical methods don't have the power or flexibility to analyse these efficiently, and extract the required knowledge. An alternative approach is to summarize a large dataset in such a way that the resulting summary dataset is of a manageable size and yet retains as much of the knowledge in the original dataset as possible. One consequence of this is that the data may no longer be formatted as single values, but be represented by lists, intervals, distributions, etc. The summarized data have their own internal s
Feature-Based Statistical Analysis of Combustion Simulation Data
Energy Technology Data Exchange (ETDEWEB)
Bennett, J; Krishnamoorthy, V; Liu, S; Grout, R; Hawkes, E; Chen, J; Pascucci, V; Bremer, P T
2011-11-18
We present a new framework for feature-based statistical analysis of large-scale scientific data and demonstrate its effectiveness by analyzing features from Direct Numerical Simulations (DNS) of turbulent combustion. Turbulent flows are ubiquitous and account for transport and mixing processes in combustion, astrophysics, fusion, and climate modeling among other disciplines. They are also characterized by coherent structure or organized motion, i.e. nonlocal entities whose geometrical features can directly impact molecular mixing and reactive processes. While traditional multi-point statistics provide correlative information, they lack nonlocal structural information, and hence, fail to provide mechanistic causality information between organized fluid motion and mixing and reactive processes. Hence, it is of great interest to capture and track flow features and their statistics together with their correlation with relevant scalar quantities, e.g. temperature or species concentrations. In our approach we encode the set of all possible flow features by pre-computing merge trees augmented with attributes, such as statistical moments of various scalar fields, e.g. temperature, as well as length-scales computed via spectral analysis. The computation is performed in an efficient streaming manner in a pre-processing step and results in a collection of meta-data that is orders of magnitude smaller than the original simulation data. This meta-data is sufficient to support a fully flexible and interactive analysis of the features, allowing for arbitrary thresholds, providing per-feature statistics, and creating various global diagnostics such as Cumulative Density Functions (CDFs), histograms, or time-series. We combine the analysis with a rendering of the features in a linked-view browser that enables scientists to interactively explore, visualize, and analyze the equivalent of one terabyte of simulation data. We highlight the utility of this new framework for combustion
Application of multivariate statistical techniques in microbial ecology.
Paliy, O; Shankar, V
2016-03-01
Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Interpretation of bioassay data from nuclear fuel fabrication workers
International Nuclear Information System (INIS)
Melo, D.; Xavier, M.
2005-01-01
Full text: In nuclear fuel fabrication facilities, workers are exposed to different compounds of enriched uranium. Although in this kind of facility the main route of intake is inhalation, ingestion may occur in some situations. The interpretation of the bioassay data is very complex, since it is necessary taking into account all the different parameters, which is a big challenge. Due to the high cost of the individual monitoring programme for internal dose assessment in the routine monitoring programmes, usually only one type of measurement is assigned. In complex situations like the one described in this paper, where several parameters can compromise the accuracy of the bioassay interpretation it is need to have a combination of techniques to evaluate the internal dose. According to ICRP 78 (1997), the general order of preference in terms of accuracy of interpretation is: body activity measurement, excreta analysis and personal air sampling. Results of monitoring of working environment may provide information that assists in interpretation on particle size, chemical form and solubility, time of intake. A group of seventeen workers from controlled area of the fuel fabrication facility was selected to evaluate the internal dose using all different available techniques during a certain period. The workers were monitored for determination of uranium content in the daily urinary and faecal excretion (collected over a period of 3 consecutive days), chest counting and personal air sampling. The results have shown that at least two types of sensitivity techniques must be used, since there are some sources of uncertainties on the bioassay interpretation, like mixture of uranium compounds intake and different routes of intake. The combination of urine and faeces analysis has shown to be the more appropriate methodology for assessing internal dose in this situation. (author)
Vapor Pressure Data Analysis and Statistics
2016-12-01
near 8, 2000, and 200, respectively. The A (or a) value is directly related to vapor pressure and will be greater for high vapor pressure materials...1, (10) where n is the number of data points, Yi is the natural logarithm of the i th experimental vapor pressure value, and Xi is the...VAPOR PRESSURE DATA ANALYSIS AND STATISTICS ECBC-TR-1422 Ann Brozena RESEARCH AND TECHNOLOGY DIRECTORATE
Data management and statistical analysis for environmental assessment
International Nuclear Information System (INIS)
Wendelberger, J.R.; McVittie, T.I.
1995-01-01
Data management and statistical analysis for environmental assessment are important issues on the interface of computer science and statistics. Data collection for environmental decision making can generate large quantities of various types of data. A database/GIS system developed is described which provides efficient data storage as well as visualization tools which may be integrated into the data analysis process. FIMAD is a living database and GIS system. The system has changed and developed over time to meet the needs of the Los Alamos National Laboratory Restoration Program. The system provides a repository for data which may be accessed by different individuals for different purposes. The database structure is driven by the large amount and varied types of data required for environmental assessment. The integration of the database with the GIS system provides the foundation for powerful visualization and analysis capabilities
77 FR 65177 - Swap Data Repositories: Interpretative Statement Regarding the Confidentiality and...
2012-10-25
... COMMODITY FUTURES TRADING COMMISSION Swap Data Repositories: Interpretative Statement Regarding...\\ which requires all swaps-- whether cleared or uncleared--to be reported to swap data repositories... of the CEA to add a definition of the term ``swap data repository.'' Pursuant to CEA section 1a(48...
The Statistical Interpretation of Entropy: An Activity
Timmberlake, Todd
2010-01-01
The second law of thermodynamics, which states that the entropy of an isolated macroscopic system can increase but will not decrease, is a cornerstone of modern physics. Ludwig Boltzmann argued that the second law arises from the motion of the atoms that compose the system. Boltzmann's statistical mechanics provides deep insight into the…
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Inversion interpretation of the mise-a-la-masse data; Denryu den`i ho data no inversion kaiseki
Energy Technology Data Exchange (ETDEWEB)
Okuno, M; Hatanaka, H; Mizunaga, H; Ushijima, K [Kyushu University, Fukuoka (Japan). Faculty of Engineering
1996-05-01
A program was developed for the inversion interpretation of the mise-a-la-masse data, and was applied to a numerical model experiment and to the study of data obtained by actual probing. For the development of this program, a program was used that calculated by finite difference approximation the potential produced by a linear current source, and studies were made through forward interpretation, inversion interpretation of the acquired apparent resistivity data, comparison with the true solution, accuracy and tendency, and the limitations. In the simulation of a horizontal 2-layer model, the parametric value after 20 repetitions converged with deviation of 1% or lower. This program was applied to the data from probing the Hatchobara district, Oita Prefecture, using a model wherein the target area was divided into 5 from east to west, and into 2 in the direction of depth. The result suggested that there was a large-scale low-resistivity body deep in the ground in the southeastern part of the investigated area. Furthermore, there was a spot detected in the direction of east-northeast that suggested an electric structure continuous in the direction of depth and a fault-like structure discontinuous in the transverse direction. 7 refs., 9 figs.
Numeric computation and statistical data analysis on the Java platform
Chekanov, Sergei V
2016-01-01
Numerical computation, knowledge discovery and statistical data analysis integrated with powerful 2D and 3D graphics for visualization are the key topics of this book. The Python code examples powered by the Java platform can easily be transformed to other programming languages, such as Java, Groovy, Ruby and BeanShell. This book equips the reader with a computational platform which, unlike other statistical programs, is not limited by a single programming language. The author focuses on practical programming aspects and covers a broad range of topics, from basic introduction to the Python language on the Java platform (Jython), to descriptive statistics, symbolic calculations, neural networks, non-linear regression analysis and many other data-mining topics. He discusses how to find regularities in real-world data, how to classify data, and how to process data for knowledge discoveries. The code snippets are so short that they easily fit into single pages. Numeric Computation and Statistical Data Analysis ...
Experimental uncertainty estimation and statistics for data having interval uncertainty.
Energy Technology Data Exchange (ETDEWEB)
Kreinovich, Vladik (Applied Biomathematics, Setauket, New York); Oberkampf, William Louis (Applied Biomathematics, Setauket, New York); Ginzburg, Lev (Applied Biomathematics, Setauket, New York); Ferson, Scott (Applied Biomathematics, Setauket, New York); Hajagos, Janos (Applied Biomathematics, Setauket, New York)
2007-05-01
This report addresses the characterization of measurements that include epistemic uncertainties in the form of intervals. It reviews the application of basic descriptive statistics to data sets which contain intervals rather than exclusively point estimates. It describes algorithms to compute various means, the median and other percentiles, variance, interquartile range, moments, confidence limits, and other important statistics and summarizes the computability of these statistics as a function of sample size and characteristics of the intervals in the data (degree of overlap, size and regularity of widths, etc.). It also reviews the prospects for analyzing such data sets with the methods of inferential statistics such as outlier detection and regressions. The report explores the tradeoff between measurement precision and sample size in statistical results that are sensitive to both. It also argues that an approach based on interval statistics could be a reasonable alternative to current standard methods for evaluating, expressing and propagating measurement uncertainties.
Webb, Samuel J; Hanser, Thierry; Howlin, Brendan; Krause, Paul; Vessey, Jonathan D
2014-03-25
A new algorithm has been developed to enable the interpretation of black box models. The developed algorithm is agnostic to learning algorithm and open to all structural based descriptors such as fragments, keys and hashed fingerprints. The algorithm has provided meaningful interpretation of Ames mutagenicity predictions from both random forest and support vector machine models built on a variety of structural fingerprints.A fragmentation algorithm is utilised to investigate the model's behaviour on specific substructures present in the query. An output is formulated summarising causes of activation and deactivation. The algorithm is able to identify multiple causes of activation or deactivation in addition to identifying localised deactivations where the prediction for the query is active overall. No loss in performance is seen as there is no change in the prediction; the interpretation is produced directly on the model's behaviour for the specific query. Models have been built using multiple learning algorithms including support vector machine and random forest. The models were built on public Ames mutagenicity data and a variety of fingerprint descriptors were used. These models produced a good performance in both internal and external validation with accuracies around 82%. The models were used to evaluate the interpretation algorithm. Interpretation was revealed that links closely with understood mechanisms for Ames mutagenicity. This methodology allows for a greater utilisation of the predictions made by black box models and can expedite further study based on the output for a (quantitative) structure activity model. Additionally the algorithm could be utilised for chemical dataset investigation and knowledge extraction/human SAR development.
Statistical and Visualization Data Mining Tools for Foundry Production
Directory of Open Access Journals (Sweden)
M. Perzyk
2007-07-01
Full Text Available In recent years a rapid development of a new, interdisciplinary knowledge area, called data mining, is observed. Its main task is extracting useful information from previously collected large amount of data. The main possibilities and potential applications of data mining in manufacturing industry are characterized. The main types of data mining techniques are briefly discussed, including statistical, artificial intelligence, data base and visualization tools. The statistical methods and visualization methods are presented in more detail, showing their general possibilities, advantages as well as characteristic examples of applications in foundry production. Results of the author’s research are presented, aimed at validation of selected statistical tools which can be easily and effectively used in manufacturing industry. A performance analysis of ANOVA and contingency tables based methods, dedicated for determination of the most significant process parameters as well as for detection of possible interactions among them, has been made. Several numerical tests have been performed using simulated data sets, with assumed hidden relationships as well some real data, related to the strength of ductile cast iron, collected in a foundry. It is concluded that the statistical methods offer relatively easy and fairly reliable tools for extraction of that type of knowledge about foundry manufacturing processes. However, further research is needed, aimed at explanation of some imperfections of the investigated tools as well assessment of their validity for more complex tasks.
Register-based statistics statistical methods for administrative data
Wallgren, Anders
2014-01-01
This book provides a comprehensive and up to date treatment of theory and practical implementation in Register-based statistics. It begins by defining the area, before explaining how to structure such systems, as well as detailing alternative approaches. It explains how to create statistical registers, how to implement quality assurance, and the use of IT systems for register-based statistics. Further to this, clear details are given about the practicalities of implementing such statistical methods, such as protection of privacy and the coordination and coherence of such an undertaking. Thi
Statistical significance versus clinical relevance.
van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G
2017-04-01
In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Using Facebook Data to Turn Introductory Statistics Students into Consultants
Childers, Adam F.
2017-01-01
Facebook provides businesses and organizations with copious data that describe how users are interacting with their page. This data affords an excellent opportunity to turn introductory statistics students into consultants to analyze the Facebook data using descriptive and inferential statistics. This paper details a semester-long project that…
Building software tools to help contextualize and interpret monitoring data
Even modest monitoring efforts at landscape scales produce large volumes of data.These are most useful if they can be interpreted relative to land potential or other similar sites. However, for many ecological systems reference conditions may not be defined or are poorly described, which hinders und...
Simple statistical methods for software engineering data and patterns
Pandian, C Ravindranath
2015-01-01
Although there are countless books on statistics, few are dedicated to the application of statistical methods to software engineering. Simple Statistical Methods for Software Engineering: Data and Patterns fills that void. Instead of delving into overly complex statistics, the book details simpler solutions that are just as effective and connect with the intuition of problem solvers.Sharing valuable insights into software engineering problems and solutions, the book not only explains the required statistical methods, but also provides many examples, review questions, and case studies that prov
Statistical Data Processing with R – Metadata Driven Approach
Directory of Open Access Journals (Sweden)
Rudi SELJAK
2016-06-01
Full Text Available In recent years the Statistical Office of the Republic of Slovenia has put a lot of effort into re-designing its statistical process. We replaced the classical stove-pipe oriented production system with general software solutions, based on the metadata driven approach. This means that one general program code, which is parametrized with process metadata, is used for data processing for a particular survey. Currently, the general program code is entirely based on SAS macros, but in the future we would like to explore how successfully statistical software R can be used for this approach. Paper describes the metadata driven principle for data validation, generic software solution and main issues connected with the use of statistical software R for this approach.
Computer-aided structure elucidation Pt. 2. /sup 1/H-NMR data interpretation
Energy Technology Data Exchange (ETDEWEB)
Szalontai, G; Recsey, Zs; Csapo, Z [Nehezvegyipari Kutato Intezet, Veszprem (Hungary)
1982-01-01
A computerized /sup 1/H-NMR data interpretation system has been developed using the artificial intelligence approach. An attempt has been made to overcome the difficulties of interpreting higher order spin systems. Proton-containing functional groups are divided into subgroups according to their spectroscopic behaviour and the information they bear. Spin simulation is used to study the effect of substituents on the higher order splitting patterns. Illustrative examples are given.
Longitudinal data analysis a handbook of modern statistical methods
Fitzmaurice, Garrett; Verbeke, Geert; Molenberghs, Geert
2008-01-01
Although many books currently available describe statistical models and methods for analyzing longitudinal data, they do not highlight connections between various research threads in the statistical literature. Responding to this void, Longitudinal Data Analysis provides a clear, comprehensive, and unified overview of state-of-the-art theory and applications. It also focuses on the assorted challenges that arise in analyzing longitudinal data. After discussing historical aspects, leading researchers explore four broad themes: parametric modeling, nonparametric and semiparametric methods, joint
A Framework for Assessing High School Students' Statistical Reasoning.
Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang
2016-01-01
Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.
Perception in statistical graphics
VanderPlas, Susan Ruth
There has been quite a bit of research on statistical graphics and visualization, generally focused on new types of graphics, new software to create graphics, interactivity, and usability studies. Our ability to interpret and use statistical graphics hinges on the interface between the graph itself and the brain that perceives and interprets it, and there is substantially less research on the interplay between graph, eye, brain, and mind than is sufficient to understand the nature of these relationships. The goal of the work presented here is to further explore the interplay between a static graph, the translation of that graph from paper to mental representation (the journey from eye to brain), and the mental processes that operate on that graph once it is transferred into memory (mind). Understanding the perception of statistical graphics should allow researchers to create more effective graphs which produce fewer distortions and viewer errors while reducing the cognitive load necessary to understand the information presented in the graph. Taken together, these experiments should lay a foundation for exploring the perception of statistical graphics. There has been considerable research into the accuracy of numerical judgments viewers make from graphs, and these studies are useful, but it is more effective to understand how errors in these judgments occur so that the root cause of the error can be addressed directly. Understanding how visual reasoning relates to the ability to make judgments from graphs allows us to tailor graphics to particular target audiences. In addition, understanding the hierarchy of salient features in statistical graphics allows us to clearly communicate the important message from data or statistical models by constructing graphics which are designed specifically for the perceptual system.
What defines an Expert? - Uncertainty in the interpretation of seismic data
Bond, C. E.
2008-12-01
Studies focusing on the elicitation of information from experts are concentrated primarily in economics and world markets, medical practice and expert witness testimonies. Expert elicitation theory has been applied in the natural sciences, most notably in the prediction of fluid flow in hydrological studies. In the geological sciences expert elicitation has been limited to theoretical analysis with studies focusing on the elicitation element, gaining expert opinion rather than necessarily understanding the basis behind the expert view. In these cases experts are defined in a traditional sense, based for example on: standing in the field, no. of years of experience, no. of peer reviewed publications, the experts position in a company hierarchy or academia. Here traditional indicators of expertise have been compared for significance on affective seismic interpretation. Polytomous regression analysis has been used to assess the relative significance of length and type of experience on the outcome of a seismic interpretation exercise. Following the initial analysis the techniques used by participants to interpret the seismic image were added as additional variables to the analysis. Specific technical skills and techniques were found to be more important for the affective geological interpretation of seismic data than the traditional indicators of expertise. The results of a seismic interpretation exercise, the techniques used to interpret the seismic and the participant's prior experience have been combined and analysed to answer the question - who is and what defines an expert?
Use of demonstrations and experiments in teaching business statistics
Johnson, D. G.; John, J. A.
2003-01-01
The aim of a business statistics course should be to help students think statistically and to interpret and understand data, rather than to focus on mathematical detail and computation. To achieve this students must be thoroughly involved in the learning process, and encouraged to discover for themselves the meaning, importance and relevance of statistical concepts. In this paper we advocate the use of experiments and demonstrations as aids to achieving these goals. A number of demonstrations...
Adobe Illustrator drawing showing geophysical and topographical survey data and interpretations
Wallace, Lacey; Ferraby, Rose
2016-01-01
Adobe Illustrator drawing at 1:2000 that shows the rasters and interpretations of the geophysics, the topographical contours, and the survey areas, with British National Grid coordinates and Ordnance Survey Master Map data included.
An approach to the interpretation of backpropagation neural network models in QSAR studies.
Baskin, I I; Ait, A O; Halberstam, N M; Palyulin, V A; Zefirov, N S
2002-03-01
An approach to the interpretation of backpropagation neural network models for quantitative structure-activity and structure-property relationships (QSAR/QSPR) studies is proposed. The method is based on analyzing the first and second moments of distribution of the values of the first and the second partial derivatives of neural network outputs with respect to inputs calculated at data points. The use of such statistics makes it possible not only to obtain actually the same characteristics as for the case of traditional "interpretable" statistical methods, such as the linear regression analysis, but also to reveal important additional information regarding the non-linear character of QSAR/QSPR relationships. The approach is illustrated by an example of interpreting a backpropagation neural network model for predicting position of the long-wave absorption band of cyane dyes.
Overview of sampling, analysis and data interpretation from sumps and pits
International Nuclear Information System (INIS)
Banks, J.C.; Banks, S.J.
1999-01-01
Aspects of sampling, environmental analysis and data interpretation for sumps and pits are discussed. According to regulatory requirements of the Alberta Energy and Utilities Board (EUB) and Alberta Environmental Protection (AEP), if a sump or pit is impacting the surrounding environment, the situation must be assessed for remediation. An impact on the environment occurs when chemicals or compounds are introduced at a level that is significant enough to cause a chemical imbalance. The immediate goal in remediating an impacted site should be to contain the released chemical to avoid the movement of the chemical through the environment by dispersion, evaporation, capillary action, bioaccumulation or transfer to groundwater. This paper also discussed some of the key issues that should be considered in properly interpreting analytical data regarding spills and remedial action. 2 refs
Energy Technology Data Exchange (ETDEWEB)
Nivlet, Ph.
2001-10-01
Qualitative interpretation of data of different nature and sources, based on segmentation techniques such as discriminant analysis, is useful to characterize and monitor hydrocarbon reservoirs. In order to make this interpretation more reliable, it is necessary to characterize uncertainties attached to data and then, to propagate them in the interpretation work-flow. In this thesis, uncertainties are represented by intervals, because usually, little is known about input data errors. The uncertainty characterization issue is dealt with specifically for each case study. The uncertainty propagation issue is treated by a new technique, based on interval analysis, which consists in extending to intervals various popular approaches (non parametric, quadratic and linear) to discriminant analysis: Firstly, a learning phase allows calibrating an imprecise classifying model on the basis of pre-interpreted data. If the quality of this model is good enough, it is used to interpret the whole set of imprecise recorded data. The resulting interpreted model is thus imprecise, but it is also more reliable. A validation study on a synthetic data set is firstly achieved, which compares the developed algorithms with more traditional -simulation based- uncertainty propagation techniques. Finally, two real case studies are presented. The first one consists in a rock-type interpretation of borehole data recorded on the Alwyn field (North Sea). The second one is concerned with monitoring with 4-D seismic the physical changes occurring in the East-Senlac heavy oil pool (Canada) due to steam injection during hydrocarbon production. (author)
Security of statistical data bases: invasion of privacy through attribute correlational modeling
Energy Technology Data Exchange (ETDEWEB)
Palley, M.A.
1985-01-01
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queries of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.
Statistical interpretation of the process of evolution and functioning of Audiovisual Archives
Directory of Open Access Journals (Sweden)
Nuno Miguel Epifânio
2013-03-01
Full Text Available The article provides a type of the operating conditions of audiovisual archives, using for this purpose the interpretation of the results obtained in the study of quantitative sampling. The study involved 43 institutions of different nature of dimension since the national and foreign organizations, from of the questions answered by services of communication and of cultural institutions. The analysis of the object of study found a variety of guidelines on the management of information preservation, as featured the typology of records collections of each file. The data collection thus allowed building an overview of the operating model of each organization surveyed in this study.
Journal data sharing policies and statistical reporting inconsistencies in psychology.
Nuijten, M.B.; Borghuis, J.; Veldkamp, C.L.S.; Dominguez Alvarez, L.; van Assen, M.A.L.M.; Wicherts, J.M.
2018-01-01
In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of
Statistical analysis with Excel for dummies
Schmuller, Joseph
2013-01-01
Take the mystery out of statistical terms and put Excel to work! If you need to create and interpret statistics in business or classroom settings, this easy-to-use guide is just what you need. It shows you how to use Excel's powerful tools for statistical analysis, even if you've never taken a course in statistics. Learn the meaning of terms like mean and median, margin of error, standard deviation, and permutations, and discover how to interpret the statistics of everyday life. You'll learn to use Excel formulas, charts, PivotTables, and other tools to make sense of everything fro
Statistical Power Analysis with Missing Data A Structural Equation Modeling Approach
Davey, Adam
2009-01-01
Statistical power analysis has revolutionized the ways in which we conduct and evaluate research. Similar developments in the statistical analysis of incomplete (missing) data are gaining more widespread applications. This volume brings statistical power and incomplete data together under a common framework, in a way that is readily accessible to those with only an introductory familiarity with structural equation modeling. It answers many practical questions such as: How missing data affects the statistical power in a study How much power is likely with different amounts and types
Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben
2017-09-15
Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Software development for statistical handling of dosimetric and epidemiological data base
International Nuclear Information System (INIS)
Amaro, M.
1990-01-01
The dose records from different groups of occupationally exposed workers are available in a computerized data base whose main purpose is the individual dose follow-up. Apart from this objective, such a dosimetric data base can be useful to obtain statistical analysis. The type of statistical n formation that can be extracted from the data base may aim to attain mainly two kinds of objectives: - Individual and collective dose distributions and statistics. -Epidemiological statistics. The report describes the software developed to obtain the statistical reports required by the Regulatory Body, as well as any other type of dose distributions or statistics to be included in epidemiological studies A Users Guide for the operators who handle this software package, and the codes listings, are also included in the report. (Author) 2 refs
Software development for statistical handling of dosimetric and epidemiological data base
International Nuclear Information System (INIS)
Amaro, M.
1990-01-01
The dose records from different group of occupationally exposed workers are available in a computerized data base whose main purpose is the individual dose follow-up. Apart from this objective, such a dosimetric data base can be useful to obtain statistical analysis. The type of statistical information that can be extracted from the data base may aim to attain mainly two kinds of obsectives: - Individual and collective dose distributions and statistics. - Epidemiological statistics. The report describes the software developed to obtain the statistical reports required by the Regulatory Body, as well as any other type of dose distributions or statistics to be included in epidsemiological studies. A Users Guide for the operators who handle this sofware package, and the codes listings, are also included in the report. (Author)
Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr
2012-05-01
In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Directory of Open Access Journals (Sweden)
Brion Philippe
2015-12-01
Full Text Available Using as much administrative data as possible is a general trend among most national statistical institutes. Different kinds of administrative sources, from tax authorities or other administrative bodies, are very helpful material in the production of business statistics. However, these sources often have to be completed by information collected through statistical surveys. This article describes the way Insee has implemented such a strategy in order to produce French structural business statistics. The originality of the French procedure is that administrative and survey variables are used jointly for the same enterprises, unlike the majority of multisource systems, in which the two kinds of sources generally complement each other for different categories of units. The idea is to use, as much as possible, the richness of the administrative sources combined with the timeliness of a survey, even if the latter is conducted only on a sample of enterprises. One main issue is the classification of enterprises within the NACE nomenclature, which is a cornerstone variable in producing the breakdown of the results by industry. At a given date, two values of the corresponding code may coexist: the value of the register, not necessarily up to date, and the value resulting from the data collected via the survey, but only from a sample of enterprises. Using all this information together requires the implementation of specific statistical estimators combining some properties of the difference estimators with calibration techniques. This article presents these estimators, as well as their statistical properties, and compares them with those of other methods.
International Nuclear Information System (INIS)
Ranaivo Nomenjanahary, F.; Rakoto, H.; Ratsimbazafy, J.B.
1994-08-01
This paper is concerned with resistivity sounding measurements performed from single site (vertical sounding) or from several sites (profiles) within a bounded area. The objective is to present an accurate information about the study area and to estimate the likelihood of the produced quantitative models. The achievement of this objective obviously requires quite relevant data and processing methods. It also requires interpretation methods which should take into account the probable effect of an heterogeneous structure. In front of such difficulties, the interpretation of resistivity sounding data inevitably involves the use of inversion methods. We suggest starting the interpretation in simple situation (1-D approximation), and using the rough but correct model obtained as an a-priori model for any more refined interpretation. Related to this point of view, special attention should be paid for the inverse problem applied to the resistivity sounding data. This inverse problem is nonlinear, while linearity inherent in the functional response used to describe the physical experiment. Two different approaches are used to build an approximate but higher dimensional inversion of geoelectrical data: the linear approach and the bayesian statistical approach. Some illustrations of their application in resistivity sounding data acquired at Tritrivakely volcanic lake (single site) and at Mahitsy area (several sites) will be given. (author). 28 refs, 7 figs
Wind Statistics from a Forested Landscape
DEFF Research Database (Denmark)
Arnqvist, Johan; Segalini, Antonio; Dellwik, Ebba
2015-01-01
An analysis and interpretation of measurements from a 138-m tall tower located in a forested landscape is presented. Measurement errors and statistical uncertainties are carefully evaluated to ensure high data quality. A 40(Formula presented.) wide wind-direction sector is selected as the most...... representative for large-scale forest conditions, and from that sector first-, second- and third-order statistics, as well as analyses regarding the characteristic length scale, the flux-profile relationship and surface roughness are presented for a wide range of stability conditions. The results are discussed...
DEFF Research Database (Denmark)
Denwood, M.J.; McKendrick, I.J.; Matthews, L.
Introduction. There is an urgent need for a method of analysing FECRT data that is computationally simple and statistically robust. A method for evaluating the statistical power of a proposed FECRT study would also greatly enhance the current guidelines. Methods. A novel statistical framework has...... been developed that evaluates observed FECRT data against two null hypotheses: (1) the observed efficacy is consistent with the expected efficacy, and (2) the observed efficacy is inferior to the expected efficacy. The method requires only four simple summary statistics of the observed data. Power...... that the notional type 1 error rate of the new statistical test is accurate. Power calculations demonstrate a power of only 65% with a sample size of 20 treatment and control animals, which increases to 69% with 40 control animals or 79% with 40 treatment animals. Discussion. The method proposed is simple...
Nhalevilo, Emilia Afonso; Ogunniyi, Meshach
2014-01-01
This article presents a reflection on an aspect of research methodology, particularly on the interpretation strategy of data from a Science and Indigenous Knowledge Systems Project (SIKSP) in a South African university. The data interpretation problem arose while we were analysing the effects of a series of SIKSP-based workshops on the views of a…
INTERPRETATION OF AIRBORNE ELECTROMAGNETIC AND MAGNETIC DATA IN THE 600 AREA
Energy Technology Data Exchange (ETDEWEB)
CUMMINS GD
2010-11-11
As part of the 200-PO-1 Phase I geophysical surveys, Fugro Airborne Surveys was contracted to collect airborne electromagnetic (EM) and magnetic surveys of the Hanford Site 600 Area. Two helicopter survey systems were used with the HeliGEOTEM{reg_sign} time domain portion flown between June 19th and June 20th, 2008, and the RESOLVE{reg_sign} frequency domain portion was flown from June 29th to July 1st, 2008. Magnetic data were acquired contemporaneously with the electromagnetic surveys using a total-field cesium vapor magnetometer. Approximately 925 line kilometers (km) were flown using the HeliGEOTEM{reg_sign} II system and 412 line kilometers were flown using the RESOLVE{reg_sign} system. The HeliGEOTEM system has an effective penetration of roughly 250 meters into the ground and the RESOLVE system has an effective penetration of roughly 60 meters. Acquisition parameters and preliminary results are provided in SGW-39674, Airborne Electromagnetic Survey Report, 200-PO-1 Groundwater Operable Unit, 600 Area, Hanford Site. Airborne data are interpreted in this report in an attempt to identify areas of likely preferential groundwater flow within the aquifer system based on the presence of paleochannels or fault zones. The premise for the interpretation is that coarser-grained intervals have filled in scour channels created by episodic catastrophic flood events during the late Pleistocene. The interpretation strategy used the magnetic field anomaly data and existing bedrock maps to identify likely fault or lineament zones. Combined analysis of the magnetic, 60-Hz noise monitor, and flight-altitude (radar) data were used to identify zones where EM response is more likely due to cultural interference and or bedrock structures. Cross-sectional and map view presentations of the EM data were used to identify more electrically resistive zones that likely correlate with coarser-grained intervals. The resulting interpretation identifies one major northwest-southeast trending
International Nuclear Information System (INIS)
Tang Bin; Liu Ling; Zhou Shumin; Zhou Rongsheng
2006-01-01
The paper discusses the gamma-ray spectrum interpretation technology on nuclear logging. The principles of familiar quantitative interpretation methods, including the average content method and the traditional spectrum striping method, are introduced, and their limitation of determining the contents of radioactive elements on unsaturated ledges (where radioactive elements distribute unevenly) is presented. On the basis of the intensity gamma-logging quantitative interpretation technology by using the deconvolution method, a new quantitative interpretation method of separating radioactive elements is presented for interpreting the gamma spectrum logging. This is a point-by-point spectrum striping deconvolution technology which can give the logging data a quantitative interpretation. (authors)
Infrared spectroscopy for geologic interpretation of TIMS data
Bartholomew, Mary Jane
1986-01-01
The Portable Field Emission Spectrometer (PFES) was designed to collect meaningful spectra in the field under climatic, thermal, and sky conditions that approximate those at the time of the overflight. The specifications and procedures of PFES are discussed. Laboratory reflectance measurements of rocks and minerals were examined for the purpose of interpreting Thermal Infrared Multispectral Scanner (TIMS) data. The capability is currently being developed to perform direct laboratory measurement of the normal spectral radiance of Earth surface materials at low temperatures (20 to 30 C) at the Jet Propulsion Laboratory.
The interpretation of quantitative microbial data
DEFF Research Database (Denmark)
Ribeiro Duarte, Ana Sofia
, there are several distribution alternatives available to describe concentrations and several methods to fit distributions to bacterial data; on the other hand predictive models are built based on controlled laboratory experiments of microbial behaviour, andmay not be appropriate to apply in the context of real food...... zeroes as censored below a quantification threshold. The method that is presented estimates the prevalence of contamination within a food lot and the parameters (mean and standard deviation)characterizing the within-lot distribution of concentrations, without assuming a LOQ, and using raw plate count....... Perspectives of future work include the validation of the method developed in manuscript I with real data, and its presentation as a tool made available to the scientific community by developing, for example, a working package for the statistical software R. Also, the author expects that a standardized way...
Statistical Approaches to Assess Biosimilarity from Analytical Data.
Burdick, Richard; Coffey, Todd; Gutka, Hiten; Gratzl, Gyöngyi; Conlon, Hugh D; Huang, Chi-Ting; Boyne, Michael; Kuehne, Henriette
2017-01-01
Protein therapeutics have unique critical quality attributes (CQAs) that define their purity, potency, and safety. The analytical methods used to assess CQAs must be able to distinguish clinically meaningful differences in comparator products, and the most important CQAs should be evaluated with the most statistical rigor. High-risk CQA measurements assess the most important attributes that directly impact the clinical mechanism of action or have known implications for safety, while the moderate- to low-risk characteristics may have a lower direct impact and thereby may have a broader range to establish similarity. Statistical equivalence testing is applied for high-risk CQA measurements to establish the degree of similarity (e.g., highly similar fingerprint, highly similar, or similar) of selected attributes. Notably, some high-risk CQAs (e.g., primary sequence or disulfide bonding) are qualitative (e.g., the same as the originator or not the same) and therefore not amenable to equivalence testing. For biosimilars, an important step is the acquisition of a sufficient number of unique originator drug product lots to measure the variability in the originator drug manufacturing process and provide sufficient statistical power for the analytical data comparisons. Together, these analytical evaluations, along with PK/PD and safety data (immunogenicity), provide the data necessary to determine if the totality of the evidence warrants a designation of biosimilarity and subsequent licensure for marketing in the USA. In this paper, a case study approach is used to provide examples of analytical similarity exercises and the appropriateness of statistical approaches for the example data.
Improved custom statistics visualization for CA Performance Center data
Talevi, Iacopo
2017-01-01
The main goal of my project is to understand and experiment the possibilities that CA Performance Center (CA PC) offers for creating custom applications to display stored information through interesting visual means, such as maps. In particular, I have re-written some of the network statistics web pages in order to fetch data from new statistics modules in CA PC, which has its own API, and stop using the RRD data.
Statistical analysis of dragline monitoring data
Energy Technology Data Exchange (ETDEWEB)
Mirabediny, H.; Baafi, E.Y. [University of Tehran, Tehran (Iran)
1998-07-01
Dragline monitoring systems are normally the best tool used to collect data on the machine performance and operational parameters of a dragline operation. This paper discusses results of a time study using data from a dragline monitoring system captured over a four month period. Statistical summaries of the time study in terms of average values, standard deviation and frequency distributions showed that the mode of operation and the geological conditions have a significant influence on the dragline performance parameters. 6 refs., 14 figs., 3 tabs.
Diagnostic Interpretation of Array Data Using Public Databases and Internet Sources
de Leeuw, Nicole; Dijkhuizen, Trijnie; Hehir-Kwa, Jayne Y.; Carter, Nigel P.; Feuk, Lars; Firth, Helen V.; Kuhn, Robert M.; Ledbetter, David H.; Martin, Christa Lese; van Ravenswaaij-Arts, Conny M. A.; Scherer, Steven W.; Shams, Soheil; Van Vooren, Steven; Sijmons, Rolf; Swertz, Morris; Hastings, Ros
The range of commercially available array platforms and analysis software packages is expanding and their utility is improving, making reliable detection of copy-number variants (CNVs) relatively straightforward. Reliable interpretation of CNV data, however, is often difficult and requires
Energy Technology Data Exchange (ETDEWEB)
Munoz, Gerard; Bauer, Klaus; Moeck, Inga; Schulze, Albrecht; Ritter, Oliver [Deutsches GeoForschungsZentrum (GFZ), Telegrafenberg, 14473 Potsdam (Germany)
2010-03-15
Exploration for geothermal resources is often challenging because there are no geophysical techniques that provide direct images of the parameters of interest, such as porosity, permeability and fluid content. Magnetotelluric (MT) and seismic tomography methods yield information about subsurface distribution of resistivity and seismic velocity on similar scales and resolution. The lack of a fundamental law linking the two parameters, however, has limited joint interpretation to a qualitative analysis. By using a statistical approach in which the resistivity and velocity models are investigated in the joint parameter space, we are able to identify regions of high correlation and map these classes (or structures) back onto the spatial domain. This technique, applied to a seismic tomography-MT profile in the area of the Gross Schoenebeck geothermal site, allows us to identify a number of classes in accordance with the local geology. In particular, a high-velocity, low-resistivity class is interpreted as related to areas with thinner layers of evaporites; regions where these sedimentary layers are highly fractured may be of higher permeability. (author)
Rumsey, Deborah
2011-01-01
The fun and easy way to get down to business with statistics Stymied by statistics? No fear ? this friendly guide offers clear, practical explanations of statistical ideas, techniques, formulas, and calculations, with lots of examples that show you how these concepts apply to your everyday life. Statistics For Dummies shows you how to interpret and critique graphs and charts, determine the odds with probability, guesstimate with confidence using confidence intervals, set up and carry out a hypothesis test, compute statistical formulas, and more.Tracks to a typical first semester statistics cou
Improved interpretation of satellite altimeter data using genetic algorithms
Messa, Kenneth; Lybanon, Matthew
1992-01-01
Genetic algorithms (GA) are optimization techniques that are based on the mechanics of evolution and natural selection. They take advantage of the power of cumulative selection, in which successive incremental improvements in a solution structure become the basis for continued development. A GA is an iterative procedure that maintains a 'population' of 'organisms' (candidate solutions). Through successive 'generations' (iterations) the population as a whole improves in simulation of Darwin's 'survival of the fittest'. GA's have been shown to be successful where noise significantly reduces the ability of other search techniques to work effectively. Satellite altimetry provides useful information about oceanographic phenomena. It provides rapid global coverage of the oceans and is not as severely hampered by cloud cover as infrared imagery. Despite these and other benefits, several factors lead to significant difficulty in interpretation. The GA approach to the improved interpretation of satellite data involves the representation of the ocean surface model as a string of parameters or coefficients from the model. The GA searches in parallel, a population of such representations (organisms) to obtain the individual that is best suited to 'survive', that is, the fittest as measured with respect to some 'fitness' function. The fittest organism is the one that best represents the ocean surface model with respect to the altimeter data.
Hearing Loss in Children: Data and Statistics
... 5 Chapter 6 EHDI-IS Functional Standards EHDI Electronic Health Records EHDI Data Analysis and Statistical Hub (DASH) Articles & ... RSS ABOUT About CDC Jobs Funding LEGAL Policies Privacy FOIA No Fear Act OIG 1600 Clifton Road ...
Statistics corner: A guide to appropriate use of correlation coefficient in medical research.
Mukaka, M M
2012-09-01
Correlation is a statistical method used to assess a possible linear association between two continuous variables. It is simple both to calculate and to interpret. However, misuse of correlation is so common among researchers that some statisticians have wished that the method had never been devised at all. The aim of this article is to provide a guide to appropriate use of correlation in medical research and to highlight some misuse. Examples of the applications of the correlation coefficient have been provided using data from statistical simulations as well as real data. Rule of thumb for interpreting size of a correlation coefficient has been provided.
Statistical Analysis of Data for Timber Strengths
DEFF Research Database (Denmark)
Sørensen, John Dalsgaard
2003-01-01
Statistical analyses are performed for material strength parameters from a large number of specimens of structural timber. Non-parametric statistical analysis and fits have been investigated for the following distribution types: Normal, Lognormal, 2 parameter Weibull and 3-parameter Weibull...... fits to the data available, especially if tail fits are used whereas the Log Normal distribution generally gives a poor fit and larger coefficients of variation, especially if tail fits are used. The implications on the reliability level of typical structural elements and on partial safety factors...... for timber are investigated....
Lineament interpretation. Short review and methodology
Energy Technology Data Exchange (ETDEWEB)
Tiren, Sven (GEOSIGMA AB (Sweden))
2010-11-15
interpretation, and the skill of the interpreter. Images and digital terrain models that display the relief of the studied area should, if possible, be illuminated in at least four directions to reduce biases regarding the orientation of structures. The resolution in the source data should be fully used and extrapolation of structures avoided in the primary interpretation of the source data. The interpretation of lineaments should be made in steps: a. Interpretation of each data set/image/terrain model is conducted separately; b. Compilation of all interpretations in a base lineament map and classification of the lineaments; and c. Construction of thematical maps, e.g. structural maps, rock block maps, and statistic presentation of lineaments. Generalisations and extrapolations of lineaments/structures may be made when producing the thematical maps. The construction of thematical maps should be supported by auxiliary information (geological and geomorphologic data and information on human impact in the area). Inferred tectonic structures should be controlled in field
Lineament interpretation. Short review and methodology
International Nuclear Information System (INIS)
Tiren, Sven
2010-11-01
the interpreter. Images and digital terrain models that display the relief of the studied area should, if possible, be illuminated in at least four directions to reduce biases regarding the orientation of structures. The resolution in the source data should be fully used and extrapolation of structures avoided in the primary interpretation of the source data. The interpretation of lineaments should be made in steps: a. Interpretation of each data set/image/terrain model is conducted separately; b. Compilation of all interpretations in a base lineament map and classification of the lineaments; and c. Construction of thematical maps, e.g. structural maps, rock block maps, and statistic presentation of lineaments. Generalisations and extrapolations of lineaments/structures may be made when producing the thematical maps. The construction of thematical maps should be supported by auxiliary information (geological and geomorphologic data and information on human impact in the area). Inferred tectonic structures should be controlled in field
Statistics of meteorological data at Tokai Research Establishment in JAERI
International Nuclear Information System (INIS)
Sekita, Tsutomu; Tachibana, Haruo; Matsuura, Kenichi; Yamaguchi, Takenori
2003-12-01
The meteorological observation data at Tokai site were analyzed statistically based on a 'Guideline of meteorological statistics for the safety analysis of nuclear power reactor' (Nuclear Safety Commission on January 28, 1982; revised on March 29, 2001). This report shows the meteorological analysis of wind direction, wind velocity and atmospheric stability etc. to assess the public dose around the Tokai site caused by the released gaseous radioactivity. The statistical period of meteorological data is every 5 years from 1981 to 1995. (author)
Bayesian maximum posterior probability method for interpreting plutonium urinalysis data
International Nuclear Information System (INIS)
Miller, G.; Inkret, W.C.
1996-01-01
A new internal dosimetry code for interpreting urinalysis data in terms of radionuclide intakes is described for the case of plutonium. The mathematical method is to maximise the Bayesian posterior probability using an entropy function as the prior probability distribution. A software package (MEMSYS) developed for image reconstruction is used. Some advantages of the new code are that it ensures positive calculated dose, it smooths out fluctuating data, and it provides an estimate of the propagated uncertainty in the calculated doses. (author)
Three-dimensional interpretation of MT data in volcanic environments (computer simulation)
Energy Technology Data Exchange (ETDEWEB)
Spichak, V. [Geoelectromagnetic Research Institute RAS, Troitsk, Moscow (Russian Federation)
2001-04-01
The research is aimed, first, to find components of MT-fields and their transforms, which facilitate the imaging of the internal structure of volcanoes and, second, to study the detectability of conductivity variations in a magma chamber due to alterations of other physical parameters. The resolving power of MT data with respect to the electric structure of volcanic zones is studied using software developed by the author for three-dimensional (3D) numerical modeling, analysis and imaging. A set of 3D volcano models are constructed and synthetic MT data on the relief Earth's surface are analysed. It is found that impedance phases as well as in-phase and quadrature parts of the electric field type transforms enable the best imaging of the volcanic interior. The determinant impedance is, however, the most suitable for adequate interpretation of measurements carried out for the purpose of monitoring conductivity variations in a magma chamber. The way of removing the geological noise from the MT data by means of its upward analytical continuation to the artificial reference plane is discussed. Interpretation methodologies are suggested aimed at 3D imaging and monitoring volcanic interiors by MT data.
Drug safety data mining with a tree-based scan statistic.
Kulldorff, Martin; Dashevsky, Inna; Avery, Taliser R; Chan, Arnold K; Davis, Robert L; Graham, David; Platt, Richard; Andrade, Susan E; Boudreau, Denise; Gunter, Margaret J; Herrinton, Lisa J; Pawloski, Pamala A; Raebel, Marsha A; Roblin, Douglas; Brown, Jeffrey S
2013-05-01
In post-marketing drug safety surveillance, data mining can potentially detect rare but serious adverse events. Assessing an entire collection of drug-event pairs is traditionally performed on a predefined level of granularity. It is unknown a priori whether a drug causes a very specific or a set of related adverse events, such as mitral valve disorders, all valve disorders, or different types of heart disease. This methodological paper evaluates the tree-based scan statistic data mining method to enhance drug safety surveillance. We use a three-million-member electronic health records database from the HMO Research Network. Using the tree-based scan statistic, we assess the safety of selected antifungal and diabetes drugs, simultaneously evaluating overlapping diagnosis groups at different granularity levels, adjusting for multiple testing. Expected and observed adverse event counts were adjusted for age, sex, and health plan, producing a log likelihood ratio test statistic. Out of 732 evaluated disease groupings, 24 were statistically significant, divided among 10 non-overlapping disease categories. Five of the 10 signals are known adverse effects, four are likely due to confounding by indication, while one may warrant further investigation. The tree-based scan statistic can be successfully applied as a data mining tool in drug safety surveillance using observational data. The total number of statistical signals was modest and does not imply a causal relationship. Rather, data mining results should be used to generate candidate drug-event pairs for rigorous epidemiological studies to evaluate the individual and comparative safety profiles of drugs. Copyright © 2013 John Wiley & Sons, Ltd.
Data and Statistics on New York's Mining Resources - NYS Dept. of
): Search DEC D E C banner Home » Lands and Waters » Mining & Reclamation » Data and Statistics on New York's Mining Resources Skip to main navigation Data and Statistics on New York's Mining Resources Statistics on New York's Mining Resources: Mines in New York - Information on active mines in New York State
Flexibility in data interpretation: effects of representational format.
Braithwaite, David W; Goldstone, Robert L
2013-01-01
Graphs and tables differentially support performance on specific tasks. For tasks requiring reading off single data points, tables are as good as or better than graphs, while for tasks involving relationships among data points, graphs often yield better performance. However, the degree to which graphs and tables support flexibility across a range of tasks is not well-understood. In two experiments, participants detected main and interaction effects in line graphs and tables of bivariate data. Graphs led to more efficient performance, but also lower flexibility, as indicated by a larger discrepancy in performance across tasks. In particular, detection of main effects of variables represented in the graph legend was facilitated relative to detection of main effects of variables represented in the x-axis. Graphs may be a preferable representational format when the desired task or analytical perspective is known in advance, but may also induce greater interpretive bias than tables, necessitating greater care in their use and design.
Data Mining and Statistics for Decision Making
Tufféry, Stéphane
2011-01-01
Data mining is the process of automatically searching large volumes of data for models and patterns using computational techniques from statistics, machine learning and information theory; it is the ideal tool for such an extraction of knowledge. Data mining is usually associated with a business or an organization's need to identify trends and profiles, allowing, for example, retailers to discover patterns on which to base marketing objectives. This book looks at both classical and recent techniques of data mining, such as clustering, discriminant analysis, logistic regression, generalized lin
International Nuclear Information System (INIS)
Pirkle, F.L.
1981-04-01
STAARS is a new series which is being published to disseminate information concerning statistical procedures for interpreting aerial radiometric data. The application of a particular data interpretation technique to geologic understanding for delineating regions favorable to uranium deposition is the primary concern of STAARS. Statements concerning the utility of a technique on aerial reconnaissance data as well as detailed aerial survey data will be included
Directory of Open Access Journals (Sweden)
Mark W Perlin
Full Text Available Mixtures are a commonly encountered form of biological evidence that contain DNA from two or more contributors. Laboratory analysis of mixtures produces data signals that usually cannot be separated into distinct contributor genotypes. Computer modeling can resolve the genotypes up to probability, reflecting the uncertainty inherent in the data. Human analysts address the problem by simplifying the quantitative data in a threshold process that discards considerable identification information. Elevated stochastic threshold levels potentially discard more information. This study examines three different mixture interpretation methods. In 72 criminal cases, 111 genotype comparisons were made between 92 mixture items and relevant reference samples. TrueAllele computer modeling was done on all the evidence samples, and documented in DNA match reports that were provided as evidence for each case. Threshold-based Combined Probability of Inclusion (CPI and stochastically modified CPI (mCPI analyses were performed as well. TrueAllele's identification information in 101 positive matches was used to assess the reliability of its modeling approach. Comparison was made with 81 CPI and 53 mCPI DNA match statistics that were manually derived from the same data. There were statistically significant differences between the DNA interpretation methods. TrueAllele gave an average match statistic of 113 billion, CPI averaged 6.68 million, and mCPI averaged 140. The computer was highly specific, with a false positive rate under 0.005%. The modeling approach was precise, having a factor of two within-group standard deviation. TrueAllele accuracy was indicated by having uniformly distributed match statistics over the data set. The computer could make genotype comparisons that were impossible or impractical using manual methods. TrueAllele computer interpretation of DNA mixture evidence is sensitive, specific, precise, accurate and more informative than manual
Data Warehousing: How To Make Your Statistics Meaningful.
Flaherty, William
2001-01-01
Examines how one school district found a way to turn data collection from a disparate mountain of statistics into more useful information by using their Instructional Decision Support System. System software is explained as is how the district solved some data management challenges. (GR)
Directory of Open Access Journals (Sweden)
Meng Kuan eLin
2013-07-01
Full Text Available Digital Imaging Processing (DIP requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and digital imaging processing service, called M-DIP. The objective of the system is to (1 automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC, Neuroimaging Informatics Technology Initiative (NIFTI to RAW formats; (2 speed up querying of imaging measurement; and (3 display high level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle- layer database, a stand-alone DIP server and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data a multiple zoom levels and to increase its quality to meet users expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.
Lin, Meng Kuan; Nicolini, Oliver; Waxenegger, Harald; Galloway, Graham J; Ullmann, Jeremy F P; Janke, Andrew L
2013-01-01
Digital Imaging Processing (DIP) requires data extraction and output from a visualization tool to be consistent. Data handling and transmission between the server and a user is a systematic process in service interpretation. The use of integrated medical services for management and viewing of imaging data in combination with a mobile visualization tool can be greatly facilitated by data analysis and interpretation. This paper presents an integrated mobile application and DIP service, called M-DIP. The objective of the system is to (1) automate the direct data tiling, conversion, pre-tiling of brain images from Medical Imaging NetCDF (MINC), Neuroimaging Informatics Technology Initiative (NIFTI) to RAW formats; (2) speed up querying of imaging measurement; and (3) display high-level of images with three dimensions in real world coordinates. In addition, M-DIP provides the ability to work on a mobile or tablet device without any software installation using web-based protocols. M-DIP implements three levels of architecture with a relational middle-layer database, a stand-alone DIP server, and a mobile application logic middle level realizing user interpretation for direct querying and communication. This imaging software has the ability to display biological imaging data at multiple zoom levels and to increase its quality to meet users' expectations. Interpretation of bioimaging data is facilitated by an interface analogous to online mapping services using real world coordinate browsing. This allows mobile devices to display multiple datasets simultaneously from a remote site. M-DIP can be used as a measurement repository that can be accessed by any network environment, such as a portable mobile or tablet device. In addition, this system and combination with mobile applications are establishing a virtualization tool in the neuroinformatics field to speed interpretation services.
The Nirex Sellafield site investigation: the role of geophysical interpretation
International Nuclear Information System (INIS)
Muir Wood, R.; Woo, G.; MacMillan, G.
1992-01-01
This report reviews the methods by which geophysical data are interpreted, and used to characterize the 3-D geology of a site for potential storage of radioactive waste. The report focuses on the NIREX site investigation at Sellafield, for which geophysical observations provide a significant component of the structural geological understanding. In outlining the basic technical principles of seismic data processing and interpretation, and borehole logging, an attempt has been made to identify errors, uncertainties, and the implicit use of expert judgement. To enhance the reliability of a radiological probabilistic risk assessment, recommendations are proposed for independent use of the primary NIREX geophysical site investigation data in characterizing the site geology. These recommendations include quantitative procedures for undertaking an uncertainty audit using a combination of statistical analysis and expert judgement. (author)
Directory of Open Access Journals (Sweden)
Rochelle E. Tractenberg
2016-12-01
Full Text Available Statistical literacy is essential to an informed citizenry; and two emerging trends highlight a growing need for training that achieves this literacy. The first trend is towards “big” data: while automated analyses can exploit massive amounts of data, the interpretation—and possibly more importantly, the replication—of results are challenging without adequate statistical literacy. The second trend is that science and scientific publishing are struggling with insufficient/inappropriate statistical reasoning in writing, reviewing, and editing. This paper describes a model for statistical literacy (SL and its development that can support modern scientific practice. An established curriculum development and evaluation tool—the Mastery Rubric—is integrated with a new, developmental, model of statistical literacy that reflects the complexity of reasoning and habits of mind that scientists need to cultivate in order to recognize, choose, and interpret statistical methods. This developmental model provides actionable evidence, and explicit opportunities for consequential assessment that serves students, instructors, developers/reviewers/accreditors of a curriculum, and institutions. By supporting the enrichment, rather than increasing the amount, of statistical training in the basic and life sciences, this approach supports curriculum development, evaluation, and delivery to promote statistical literacy for students and a collective quantitative proficiency more broadly.
A statistical study on fracture toughness data of Japanese RPVS
International Nuclear Information System (INIS)
Sakai, Y.; Ogura, N.
1987-01-01
In a cooperative study for investigating fracture toughness on pressure vessel steels produced in Japan, a number of heats of ASTM A533B cl.1 and A508 cl.3 steels have been studied. Approximately 3000 fracture toughness data and 8000 mechanical properties data were obtained and filed in a computer data bank. Statistical characterization of toughness data in the transition region has been carried out using the computer data bank. Curve fitting technique for toughness data has been examined. Approach using the function to model the transition behaviours of each toughness has been applied. The aims of fitting curve technique were as follows; (1) Summarization of an enormous toughness data base to permit comparison heats, materials and testing methods; (2) Investigating the relationships among static, dynamic and arrest toughness; (3) Examining the ASME K(IR) curve statistically. The methodology used in this study for analyzing a large quantity of fracture toughness data was found to be useful for formulating a statistically based K(IR) curve. (orig./HP)
ARTEFACT MOBILE DATA MODEL TO SUPPORT CULTURAL HERITAGE DATA COLLECTION AND INTERPRETATION
Directory of Open Access Journals (Sweden)
Z. S. Mohamed-Ghouse
2012-07-01
Full Text Available This paper discusses the limitation of existing data structures in mobile mapping applications to support archaeologists to manage the artefact (any object made or modified by a human culture, and later recovered by an archaeological endeavor details excavated at a cultural heritage site. Current limitations of data structure in the mobile mapping application allow archeologist to record only one artefact per test pit location. In reality, more than one artefact can be excavated from the same test pit location. A spatial data model called Artefact Mobile Data Model (AMDM was developed applying existing Relational Data Base Management System (RDBMS technique to overcome the limitation. The data model was implemented in a mobile database environment called SprintDB Pro which was in turn connected to ArcPad 7.1 mobile mapping application through Open Data Base Connectivity (ODBC. In addition, the design of a user friendly application built on top of AMDM to interpret and record the technology associated with each artefact excavated in the field is also discussed in the paper. In summary, the paper discusses the design and implementation of a data model to facilitate the collection of artefacts in the field using integrated mobile mapping and database approach.
Statistical methods to evaluate thermoluminescence ionizing radiation dosimetry data
International Nuclear Information System (INIS)
Segre, Nadia; Matoso, Erika; Fagundes, Rosane Correa
2011-01-01
Ionizing radiation levels, evaluated through the exposure of CaF 2 :Dy thermoluminescence dosimeters (TLD- 200), have been monitored at Centro Experimental Aramar (CEA), located at Ipero in Sao Paulo state, Brazil, since 1991 resulting in a large amount of measurements until 2009 (more than 2,000). The data amount associated with measurements dispersion, since every process has deviation, reinforces the utilization of statistical tools to evaluate the results, procedure also imposed by the Brazilian Standard CNEN-NN-3.01/PR- 3.01-008 which regulates the radiometric environmental monitoring. Thermoluminescence ionizing radiation dosimetry data are statistically compared in order to evaluate potential CEA's activities environmental impact. The statistical tools discussed in this work are box plots, control charts and analysis of variance. (author)
General statistical data structure for epidemiologic studies of DOE workers
International Nuclear Information System (INIS)
Frome, E.L.; Hudson, D.R.
1981-01-01
Epidemiologic studies to evaluate the occupational risks associated with employment in the nuclear industry are currently being conducted by the Department of Energy. Data that have potential value in evaluating any long-term health effects of occupational exposure to low levels of radiation are obtained for each individual at a given facility. We propose a general data structure for statistical analysis that is used to define transformations from the data management system into the data analysis system. Statistical methods of interest in epidemiologic studies include contingency table analysis and survival analysis procedures that can be used to evaluate potential associations between occupational radiation exposure and mortality. The purposes of this paper are to discuss (1) the adequacy of this data structure for single- and multiple-facility analysis and (2) the statistical computing problems encountered in dealing with large populations over extended periods of time
DEFF Research Database (Denmark)
Lynge, E; Juel, K
1987-01-01
Completeness of company records is a crucial point in interpretation of follow-up studies of industrial cohorts. The Employers Quarterly Reports on Earnings (EQRE) are valuable for control of company records in the United States. The public Supplementary Pension Scheme (ATP) and the industrial...... statistics questionnaires are of equal value in Denmark. The authors report on a cohort study from a chemical plant, where cohort members were identified from company records, which were checked against ATP and industrial statistics data. The mortality analysis shows employees known only from the ATP data...... to have an excess mortality, relative risk (RR) = 1.45. Inclusion in the cohort of these additional employees only changed the relative risk in overall mortality from 1.01 to 1.04, but turned out to be decisive in the study of a rare disease....
Statistical analysis of network data with R
Kolaczyk, Eric D
2014-01-01
Networks have permeated everyday life through everyday realities like the Internet, social networks, and viral marketing. As such, network analysis is an important growth area in the quantitative sciences, with roots in social network analysis going back to the 1930s and graph theory going back centuries. Measurement and analysis are integral components of network research. As a result, statistical methods play a critical role in network analysis. This book is the first of its kind in network research. It can be used as a stand-alone resource in which multiple R packages are used to illustrate how to conduct a wide range of network analyses, from basic manipulation and visualization, to summary and characterization, to modeling of network data. The central package is igraph, which provides extensive capabilities for studying network graphs in R. This text builds on Eric D. Kolaczyk’s book Statistical Analysis of Network Data (Springer, 2009).
Using statistical correlation to compare geomagnetic data sets
Stanton, T.
2009-04-01
The major features of data curves are often matched, to a first order, by bump and wiggle matching to arrive at an offset between data sets. This poster describes a simple statistical correlation program that has proved useful during this stage by determining the optimal correlation between geomagnetic curves using a variety of fixed and floating windows. Its utility is suggested by the fact that it is simple to run, yet generates meaningful data comparisons, often when data noise precludes the obvious matching of curve features. Data sets can be scaled, smoothed, normalised and standardised, before all possible correlations are carried out between selected overlapping portions of each curve. Best-fit offset curves can then be displayed graphically. The program was used to cross-correlate directional and palaeointensity data from Holocene lake sediments (Stanton et al., submitted) and Holocene lava flows. Some example curve matches are shown, including some that illustrate the potential of this technique when examining particularly sparse data sets. Stanton, T., Snowball, I., Zillén, L. and Wastegård, S., submitted. Detecting potential errors in varve chronology and 14C ages using palaeosecular variation curves, lead pollution history and statistical correlation. Quaternary Geochronology.
Interpretation of self-potential data for dam seepage investigations
Energy Technology Data Exchange (ETDEWEB)
Corwin, R.F.; Sheffer, M.R.; Salmon, G. [BC Hydro, Burnaby, BC (Canada)
2007-04-15
This book represents one of a series on the subject of geophysical methods and their use in assessing seepage and internal erosion in embankment dams. This manual facilitates the interpretation of self-potential (SP) data generated by subsurface fluid flow, with an emphasis on dam seepage studies. It is intended for users with a background in geophysics or engineering having a general familiarity with both the SP and direct-current (DC) resistivity methods and their applications. It includes an extensive reference list covering all aspects of available SP interpretation techniques, including qualitative, analytical and numerical methods. Particular emphasis is placed on the use of geometric source analytical modeling methods to evaluate SP anomalies. These methods provide a simple yet efficient means of estimating the location and depth of current sources of observed SP data, which may be linked to fluid flow in the subsurface. The manual is primarily oriented toward embankment dams and earthen structures such as levees and dikes. SP methods have been used to investigate seepage through pervious zones and cracks in concrete and concrete-faced structures. The manual describes the nature of SP fields generated by both uniform and non-uniform dam seepage flow, as well as non-seepage sources of SP variations. These methods enable the study of more complex systems and require a more comprehensive analysis of a given field site. refs., tabs., figs.
Conversion factors and oil statistics
International Nuclear Information System (INIS)
Karbuz, Sohbet
2004-01-01
World oil statistics, in scope and accuracy, are often far from perfect. They can easily lead to misguided conclusions regarding the state of market fundamentals. Without proper attention directed at statistic caveats, the ensuing interpretation of oil market data opens the door to unnecessary volatility, and can distort perception of market fundamentals. Among the numerous caveats associated with the compilation of oil statistics, conversion factors, used to produce aggregated data, play a significant role. Interestingly enough, little attention is paid to conversion factors, i.e. to the relation between different units of measurement for oil. Additionally, the underlying information regarding the choice of a specific factor when trying to produce measurements of aggregated data remains scant. The aim of this paper is to shed some light on the impact of conversion factors for two commonly encountered issues, mass to volume equivalencies (barrels to tonnes) and for broad energy measures encountered in world oil statistics. This paper will seek to demonstrate how inappropriate and misused conversion factors can yield wildly varying results and ultimately distort oil statistics. Examples will show that while discrepancies in commonly used conversion factors may seem trivial, their impact on the assessment of a world oil balance is far from negligible. A unified and harmonised convention for conversion factors is necessary to achieve accurate comparisons and aggregate oil statistics for the benefit of both end-users and policy makers
Interpretation of TLD data measured in the vicinity of nuclear power plants
International Nuclear Information System (INIS)
Czarnecki, J.; Baggenstos, M.; Schuler, J.; Voelkle, H.
1981-01-01
It is shown that incorporating the location-specific characteristics of natural radiation into the interpretation of the surrounding measurements makes some valuable contributions to the improvement of the measuring quality of thermoluminescent enviromental dosimetry. This brings the possibility to determine the net dose of the additional man-made radiations (e.g. caused by the nuclear power plant) with better accuracy. The authors propose a method of analysing the measured results which enables one to include the measured data from the evidence finding phase in the interpretation of the environment monitoring-TLD-measurement (orig./DG) [de
A log-Weibull spatial scan statistic for time to event data.
Usman, Iram; Rosychuk, Rhonda J
2018-06-13
Spatial scan statistics have been used for the identification of geographic clusters of elevated numbers of cases of a condition such as disease outbreaks. These statistics accompanied by the appropriate distribution can also identify geographic areas with either longer or shorter time to events. Other authors have proposed the spatial scan statistics based on the exponential and Weibull distributions. We propose the log-Weibull as an alternative distribution for the spatial scan statistic for time to events data and compare and contrast the log-Weibull and Weibull distributions through simulation studies. The effect of type I differential censoring and power have been investigated through simulated data. Methods are also illustrated on time to specialist visit data for discharged patients presenting to emergency departments for atrial fibrillation and flutter in Alberta during 2010-2011. We found northern regions of Alberta had longer times to specialist visit than other areas. We proposed the spatial scan statistic for the log-Weibull distribution as a new approach for detecting spatial clusters for time to event data. The simulation studies suggest that the test performs well for log-Weibull data.
Statistics and data analysis for financial engineering with R examples
Ruppert, David
2015-01-01
The new edition of this influential textbook, geared towards graduate or advanced undergraduate students, teaches the statistics necessary for financial engineering. In doing so, it illustrates concepts using financial markets and economic data, R Labs with real-data exercises, and graphical and analytic methods for modeling and diagnosing modeling errors. Financial engineers now have access to enormous quantities of data. To make use of these data, the powerful methods in this book, particularly about volatility and risks, are essential. Strengths of this fully-revised edition include major additions to the R code and the advanced topics covered. Individual chapters cover, among other topics, multivariate distributions, copulas, Bayesian computations, risk management, multivariate volatility and cointegration. Suggested prerequisites are basic knowledge of statistics and probability, matrices and linear algebra, and calculus. There is an appendix on probability, statistics and linear algebra. Practicing fina...
Statistical distributions as applied to environmental surveillance data
International Nuclear Information System (INIS)
Speer, D.R.; Waite, D.A.
1975-09-01
Application of normal, log normal, and Weibull distributions to environmental surveillance data was investigated for approximately 300 nuclide-medium-year-location combinations. Corresponding W test calculations were made to determine the probability of a particular data set falling within the distribution of interest. Conclusions are drawn as to the fit of any data group to the various distributions. The significance of fitting statistical distributions to the data is discussed
Congedo, Marco; Barachant, Alexandre
2015-01-01
Currently the Riemannian geometry of symmetric positive definite (SPD) matrices is gaining momentum as a powerful tool in a wide range of engineering applications such as image, radar and biomedical data signal processing. If the data is not natively represented in the form of SPD matrices, typically we may summarize them in such form by estimating covariance matrices of the data. However once we manipulate such covariance matrices on the Riemannian manifold we lose the representation in the original data space. For instance, we can evaluate the geometric mean of a set of covariance matrices, but not the geometric mean of the data generating the covariance matrices, the space of interest in which the geometric mean can be interpreted. As a consequence, Riemannian information geometry is often perceived by non-experts as a "black-box" tool and this perception prevents a wider adoption in the scientific community. Hereby we show that we can overcome this limitation by constructing a special form of SPD matrix embedding both the covariance structure of the data and the data itself. Incidentally, whenever the original data can be represented in the form of a generic data matrix (not even square), this special SPD matrix enables an exhaustive and unique description of the data up to second-order statistics. This is achieved embedding the covariance structure of both the rows and columns of the data matrix, allowing naturally a wide range of possible applications and bringing us over and above just an interpretability issue. We demonstrate the method by manipulating satellite images (pansharpening) and event-related potentials (ERPs) of an electroencephalography brain-computer interface (BCI) study. The first example illustrates the effect of moving along geodesics in the original data space and the second provides a novel estimation of ERP average (geometric mean), showing that, in contrast to the usual arithmetic mean, this estimation is robust to outliers. In
Development of Screening Tools for the Interpretation of Chemical Biomonitoring Data
Directory of Open Access Journals (Sweden)
Richard A. Becker
2012-01-01
Full Text Available Evaluation of a larger number of chemicals in commerce from the perspective of potential human health risk has become a focus of attention in North America and Europe. Screening-level chemical risk assessment evaluations consider both exposure and hazard. Exposures are increasingly being evaluated through biomonitoring studies in humans. Interpreting human biomonitoring results requires comparison to toxicity guidance values. However, conventional chemical-specific risk assessments result in identification of toxicity-based exposure guidance values such as tolerable daily intakes (TDIs as applied doses that cannot directly be used to evaluate exposure information provided by biomonitoring data in a health risk context. This paper describes a variety of approaches for development of screening-level exposure guidance values with translation from an external dose to a biomarker concentration framework for interpreting biomonitoring data in a risk context. Applications of tools and concepts including biomonitoring equivalents (BEs, the threshold of toxicologic concern (TTC, and generic toxicokinetic and physiologically based toxicokinetic models are described. These approaches employ varying levels of existing chemical-specific data, chemical class-specific assessments, and generic modeling tools in response to varying levels of available data in order to allow assessment and prioritization of chemical exposures for refined assessment in a risk management context.
Advances in Statistical Methods for Substance Abuse Prevention Research
MacKinnon, David P.; Lockwood, Chondra M.
2010-01-01
The paper describes advances in statistical methods for prevention research with a particular focus on substance abuse prevention. Standard analysis methods are extended to the typical research designs and characteristics of the data collected in prevention research. Prevention research often includes longitudinal measurement, clustering of data in units such as schools or clinics, missing data, and categorical as well as continuous outcome variables. Statistical methods to handle these features of prevention data are outlined. Developments in mediation, moderation, and implementation analysis allow for the extraction of more detailed information from a prevention study. Advancements in the interpretation of prevention research results include more widespread calculation of effect size and statistical power, the use of confidence intervals as well as hypothesis testing, detailed causal analysis of research findings, and meta-analysis. The increased availability of statistical software has contributed greatly to the use of new methods in prevention research. It is likely that the Internet will continue to stimulate the development and application of new methods. PMID:12940467
New Cosmological Model and Its Implications on Observational Data Interpretation
Directory of Open Access Journals (Sweden)
Vlahovic Branislav
2013-09-01
Full Text Available The paradigm of ΛCDM cosmology works impressively well and with the concept of inflation it explains the universe after the time of decoupling. However there are still a few concerns; after much effort there is no detection of dark matter and there are significant problems in the theoretical description of dark energy. We will consider a variant of the cosmological spherical shell model, within FRW formalism and will compare it with the standard ΛCDM model. We will show that our new topological model satisfies cosmological principles and is consistent with all observable data, but that it may require new interpretation for some data. Considered will be constraints imposed on the model, as for instance the range for the size and allowed thickness of the shell, by the supernovae luminosity distance and CMB data. In this model propagation of the light is confined along the shell, which has as a consequence that observed CMB originated from one point or a limited space region. It allows to interpret the uniformity of the CMB without inflation scenario. In addition this removes any constraints on the uniformity of the universe at the early stage and opens a possibility that the universe was not uniform and that creation of galaxies and large structures is due to the inhomogeneities that originated in the Big Bang.
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.
Schlenker, Evelyn
2016-01-01
This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
Journal Data Sharing Policies and Statistical Reporting Inconsistencies in Psychology
Directory of Open Access Journals (Sweden)
Michèle B. Nuijten
2017-12-01
Full Text Available In this paper, we present three retrospective observational studies that investigate the relation between data sharing and statistical reporting inconsistencies. Previous research found that reluctance to share data was related to a higher prevalence of statistical errors, often in the direction of statistical significance (Wicherts, Bakker, & Molenaar, 2011. We therefore hypothesized that journal policies about data sharing and data sharing itself would reduce these inconsistencies. In Study 1, we compared the prevalence of reporting inconsistencies in two similar journals on decision making with different data sharing policies. In Study 2, we compared reporting inconsistencies in psychology articles published in PLOS journals (with a data sharing policy and Frontiers in Psychology (without a stipulated data sharing policy. In Study 3, we looked at papers published in the journal Psychological Science to check whether papers with or without an Open Practice Badge differed in the prevalence of reporting errors. Overall, we found no relationship between data sharing and reporting inconsistencies. We did find that journal policies on data sharing seem extremely effective in promoting data sharing. We argue that open data is essential in improving the quality of psychological science, and we discuss ways to detect and reduce reporting inconsistencies in the literature.
Solar radiation data - statistical analysis and simulation models
Energy Technology Data Exchange (ETDEWEB)
Mustacchi, C; Cena, V; Rocchi, M; Haghigat, F
1984-01-01
The activities consisted in collecting meteorological data on magnetic tape for ten european locations (with latitudes ranging from 42/sup 0/ to 56/sup 0/ N), analysing the multi-year sequences, developing mathematical models to generate synthetic sequences having the same statistical properties of the original data sets, and producing one or more Short Reference Years (SRY's) for each location. The meteorological parameters examinated were (for all the locations) global + diffuse radiation on horizontal surface, dry bulb temperature, sunshine duration. For some of the locations additional parameters were available, namely, global, beam and diffuse radiation on surfaces other than horizontal, wet bulb temperature, wind velocity, cloud type, cloud cover. The statistical properties investigated were mean, variance, autocorrelation, crosscorrelation with selected parameters, probability density function. For all the meteorological parameters, various mathematical models were built: linear regression, stochastic models of the AR and the DAR type. In each case, the model with the best statistical behaviour was selected for the production of a SRY for the relevant parameter/location.
Testing independence of bivariate interval-censored data using modified Kendall's tau statistic.
Kim, Yuneung; Lim, Johan; Park, DoHwan
2015-11-01
In this paper, we study a nonparametric procedure to test independence of bivariate interval censored data; for both current status data (case 1 interval-censored data) and case 2 interval-censored data. To do it, we propose a score-based modification of the Kendall's tau statistic for bivariate interval-censored data. Our modification defines the Kendall's tau statistic with expected numbers of concordant and disconcordant pairs of data. The performance of the modified approach is illustrated by simulation studies and application to the AIDS study. We compare our method to alternative approaches such as the two-stage estimation method by Sun et al. (Scandinavian Journal of Statistics, 2006) and the multiple imputation method by Betensky and Finkelstein (Statistics in Medicine, 1999b). © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Energy Technology Data Exchange (ETDEWEB)
Marcussen, C.; Skaarup, N.; Chalmers, J.A.
2002-07-01
Data acquisition of project NuussuaqSeis2000 (GEUS2000G survey) was very successful. Due to very favourable weather and ice conditions, 2743 km of good quality data were acquired, nearly 20% more than originally planned. High concentrations of icebergs prevented acquisition of data in the eastern part of Uummannaq Fjord and east of Svartenhuk Halvoe. Data from project NuussuaqSeis 2000 have therefore considerably increased the seismic coverage in the region. Interpretation of the new data has given information on the size and geometry of the individual fault blocks in areas with Cretaceous sediments. Furthermore the new data confirm that the seeps in the Vaigat region are structurally controlled and mainly occur in the block-faulted area north of the Disko gneiss ridge. Data from the GEUS2000G survey confirm the overall structural interpretation published in Chalmers et al. (1999). The interpretation of the structural style in western Vaigat is confirmed as being substantially correct, although the fault patterns have been shown to be much more complex than realised previously, from either the onshore data or from the single line GGU/95-06. The main area of revision has been in eastern Vaigat, but even there the essentials of the Chalmers et al. (1999) interpretation remain. In the area north of Nuussuaq in Uummannaq Fjord and Illorssuit Sund the fault trend is more N-S than in Vaigat, but also here the complexity in the fault pattern prohibites a correlation for too long distances. The general fault pattern from the structural interpretation of Chalmers et al. (1999) remains valid in this area as well. Some of the data acquired under project NuussuaqSeis2000 are highly relevant for an assessment of the petroleum prospectivity of the Disko-Nuussuaq region, and the data and their interpretation will be included in the next revised GEUS Note to the Bureau of Minerals and Petroleum on this matter. The new data and the interpretations are particularly important for
78 FR 10166 - Access Interpreting; Transfer of Data
2013-02-13
... regulations. Access Interpreting has been awarded a contract to perform work for OPP, and access to this information will enable Access Interpreting to fulfill the obligations of the contract. DATES: Access.... Contractor Requirements Under Contract No. EP10H000109, this contract is to provide the Environmental...
Analysis of statistical misconception in terms of statistical reasoning
Maryati, I.; Priatna, N.
2018-05-01
Reasoning skill is needed for everyone to face globalization era, because every person have to be able to manage and use information from all over the world which can be obtained easily. Statistical reasoning skill is the ability to collect, group, process, interpret, and draw conclusion of information. Developing this skill can be done through various levels of education. However, the skill is low because many people assume that statistics is just the ability to count and using formulas and so do students. Students still have negative attitude toward course which is related to research. The purpose of this research is analyzing students’ misconception in descriptive statistic course toward the statistical reasoning skill. The observation was done by analyzing the misconception test result and statistical reasoning skill test; observing the students’ misconception effect toward statistical reasoning skill. The sample of this research was 32 students of math education department who had taken descriptive statistic course. The mean value of misconception test was 49,7 and standard deviation was 10,6 whereas the mean value of statistical reasoning skill test was 51,8 and standard deviation was 8,5. If the minimal value is 65 to state the standard achievement of a course competence, students’ mean value is lower than the standard competence. The result of students’ misconception study emphasized on which sub discussion that should be considered. Based on the assessment result, it was found that students’ misconception happen on this: 1) writing mathematical sentence and symbol well, 2) understanding basic definitions, 3) determining concept that will be used in solving problem. In statistical reasoning skill, the assessment was done to measure reasoning from: 1) data, 2) representation, 3) statistic format, 4) probability, 5) sample, and 6) association.
Analyzing sickness absence with statistical models for survival data
DEFF Research Database (Denmark)
Christensen, Karl Bang; Andersen, Per Kragh; Smith-Hansen, Lars
2007-01-01
OBJECTIVES: Sickness absence is the outcome in many epidemiologic studies and is often based on summary measures such as the number of sickness absences per year. In this study the use of modern statistical methods was examined by making better use of the available information. Since sickness...... absence data deal with events occurring over time, the use of statistical models for survival data has been reviewed, and the use of frailty models has been proposed for the analysis of such data. METHODS: Three methods for analyzing data on sickness absences were compared using a simulation study...... involving the following: (i) Poisson regression using a single outcome variable (number of sickness absences), (ii) analysis of time to first event using the Cox proportional hazards model, and (iii) frailty models, which are random effects proportional hazards models. Data from a study of the relation...
Laterally constrained inversion for CSAMT data interpretation
Wang, Ruo; Yin, Changchun; Wang, Miaoyue; Di, Qingyun
2015-10-01
Laterally constrained inversion (LCI) has been successfully applied to the inversion of dc resistivity, TEM and airborne EM data. However, it hasn't been yet applied to the interpretation of controlled-source audio-frequency magnetotelluric (CSAMT) data. In this paper, we apply the LCI method for CSAMT data inversion by preconditioning the Jacobian matrix. We apply a weighting matrix to Jacobian to balance the sensitivity of model parameters, so that the resolution with respect to different model parameters becomes more uniform. Numerical experiments confirm that this can improve the convergence of the inversion. We first invert a synthetic dataset with and without noise to investigate the effect of LCI applications to CSAMT data, for the noise free data, the results show that the LCI method can recover the true model better compared to the traditional single-station inversion; and for the noisy data, the true model is recovered even with a noise level of 8%, indicating that LCI inversions are to some extent noise insensitive. Then, we re-invert two CSAMT datasets collected respectively in a watershed and a coal mine area in Northern China and compare our results with those from previous inversions. The comparison with the previous inversion in a coal mine shows that LCI method delivers smoother layer interfaces that well correlate to seismic data, while comparison with a global searching algorithm of simulated annealing (SA) in a watershed shows that though both methods deliver very similar good results, however, LCI algorithm presented in this paper runs much faster. The inversion results for the coal mine CSAMT survey show that a conductive water-bearing zone that was not revealed by the previous inversions has been identified by the LCI. This further demonstrates that the method presented in this paper works for CSAMT data inversion.
Statistical yearbook 2005. Data available as of March 2006. 50 ed
International Nuclear Information System (INIS)
2006-08-01
The Statistical Yearbook is an annual compilation of a wide range of international economic, social and environmental statistics on over 200 countries and areas, compiled from sources including UN agencies and other international, national and specialized organizations. The 50th issue contains data available to the Statistics Division as of March 2006 and presents them in 76 tables. The number of years of data shown in the tables varies from one to ten, with the ten-year tables covering 1994 to 2003 or 1995 to 2004. Accompanying the tables are technical notes providing brief descriptions of major statistical concepts, definitions and classifications
Performing Inferential Statistics Prior to Data Collection
Trafimow, David; MacDonald, Justin A.
2017-01-01
Typically, in education and psychology research, the investigator collects data and subsequently performs descriptive and inferential statistics. For example, a researcher might compute group means and use the null hypothesis significance testing procedure to draw conclusions about the populations from which the groups were drawn. We propose an…
Statistical yearbook. 2000. Data available as of 31 January 2003. 47 ed
International Nuclear Information System (INIS)
2003-01-01
This is the forty-seventh issue of the United Nations Statistical Yearbook, prepared by the Statistics Division, Department of Economic and Social Affairs of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1989-1998 or 1990-1999, using statistics available to the Statistics Division up to 30 November 2000. The Yearbook is based on data compiled by the Statistics Division from over 40 different international and national sources. These include the United Nations Statistics Division in the fields of national accounts, industry, energy, transport and international trade; the United Nations Statistics Division and Population Division in the field of demographic statistics; and data provided by over 20 offices of the United Nations system and international organizations in other specialized fields.United Nations agencies and other international organizations which furnished data are listed under 'Statistical sources and references' at the end of the Yearbook. Acknowledgement is gratefully made for their generous cooperation in providing data. The Statistics Division also publishes the Monthly Bulletin of Statistics, which provides a valuable complement to the Yearbook covering current international economic statistics for most countries and areas of the world and quarterly world and regional aggregates. Subscribers to the Monthly Bulletin of Statistics may also access the Bulletin on-line via the World Wide Web on Internet. MBS On-line allows time-sensitive statistics to reach users much faster than the traditional print publication. For further information see . The present issue of the Yearbook reflects a phased programme of major changes in its organization and presentation undertaken in 1990 which until then was relatively unchanged since the first issue was released in 1948. The Yearbook has also been published on CD-ROM for IBM-compatible microcomputers, since the thirty-eighth issue
Consequences of Not Interpreting Structure Coefficients in Published CFA Research: A Reminder
Graham, James M.; Guthrie, Abbie C.; Thompson, Bruce
2003-01-01
Confirmatory factor analysis (CFA) is a statistical procedure frequently used to test the fit of data to measurement models. Published CFA studies typically report factor pattern coefficients. Few reports, however, also present factor structure coefficients, which can be essential for the accurate interpretation of CFA results. The interpretation…
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Statistical analysis of quality control of automatic processor
International Nuclear Information System (INIS)
Niu Yantao; Zhao Lei; Zhang Wei; Yan Shulin
2002-01-01
Objective: To strengthen the scientific management of automatic processor and promote QC, based on analyzing QC management chart for automatic processor by statistical method, evaluating and interpreting the data and trend of the chart. Method: Speed, contrast, minimum density of step wedge of film strip were measured everyday and recorded on the QC chart. Mean (x-bar), standard deviation (s) and range (R) were calculated. The data and the working trend were evaluated and interpreted for management decisions. Results: Using relative frequency distribution curve constructed by measured data, the authors can judge whether it is a symmetric bell-shaped curve or not. If not, it indicates a few extremes overstepping control limits possibly are pulling the curve to the left or right. If it is a normal distribution, standard deviation (s) is observed. When x-bar +- 2s lies in upper and lower control limits of relative performance indexes, it indicates the processor works in stable status in this period. Conclusion: Guided by statistical method, QC work becomes more scientific and quantified. The authors can deepen understanding and application of the trend chart, and improve the quality management to a new step
Citizen Data and Official Statistics: Background Document to a Collaborative Workshop
DEFF Research Database (Denmark)
Grommé, Francisca; Ustek, Funda; Ruppert, Evelyn
2017-01-01
This working paper was written in preparation for a collaborative workshop organised for statisticians, social scientists, information and app designers and other participants inside and outside academia. The autumn 2017 workshop aimed to develop the main principles for a citizen data app...... for official statistics. Through this work we sought to conceive of a new regime of data collection in official statistics through different devices. How can we capture citizens’ meanings and intentions when they produce data? Can we develop ‘smart’ methods that do not rely on cooperating with, and data...... generated by, large tech companies, but by developing methods and data co-produced with citizens? Towards addressing these issues we developed four key concepts outlined in this document: experimentalism, citizen data, smart statistics and privacy by design. We introduced these concepts to facilitate shared...
Combined interpretation of SkyTEM and high-resolution seismic data
DEFF Research Database (Denmark)
Høyer, Anne-Sophie; Lykke-Andersen, Holger; Jørgensen, Flemming Voldum
2011-01-01
made based on AEM (SkyTEM) and high-resolution seismic data from an area covering 10 km2 in the western part of Denmark. As support for the interpretations, an exploration well was drilled to provide lithological and logging information in the form of resistivity and vertical seismic profiling. Based...... on the resistivity log, synthetic SkyTEM responses were calculated with a varying number of gate-times in order to illustrate the effect of the noise-level. At the exploration well geophysical data were compared to the lithological log; in general there is good agreement. The same tendency was recognised when Sky...
Gönci, Balázs; Németh, Valéria; Balogh, Emeric; Szabó, Bálint; Dénes, Ádám; Környei, Zsuzsanna; Vicsek, Tamás
2010-12-20
Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation.
Measurement of Osteogenic Exercise – How to Interpret Accelerometric Data?
Jämsä, Timo; Ahola, Riikka; Korpelainen, Raija
2011-01-01
Bone tissue adapts to its mechanical loading environment. We review here the accelerometric measurements with special emphasis on osteogenic exercise. The accelerometric method offers a unique opportunity to assess the intensity of mechanical loadings. We present methods to interpret accelerometric data, reducing it to the daily distributions of magnitude, slope, area, and energy of signal. These features represent the intensity level of physical activities, and were associated with the chang...
International Nuclear Information System (INIS)
Seeliger, D.
1993-01-01
This contribution contains a brief presentation and comparison of the different Statistical Multistep Approaches, presently available for practical nuclear data calculations. (author). 46 refs, 5 figs
Savalei, Victoria
2010-01-01
Incomplete nonnormal data are common occurrences in applied research. Although these 2 problems are often dealt with separately by methodologists, they often cooccur. Very little has been written about statistics appropriate for evaluating models with such data. This article extends several existing statistics for complete nonnormal data to…
Nuclear material statistical accountancy system
International Nuclear Information System (INIS)
Argentest, F.; Casilli, T.; Franklin, M.
1979-01-01
The statistical accountancy system developed at JRC Ispra is refered as 'NUMSAS', ie Nuclear Material Statistical Accountancy System. The principal feature of NUMSAS is that in addition to an ordinary material balance calcultation, NUMSAS can calculate an estimate of the standard deviation of the measurement error accumulated in the material balance calculation. The purpose of the report is to describe in detail, the statistical model on wich the standard deviation calculation is based; the computational formula which is used by NUMSAS in calculating the standard deviation and the information about nuclear material measurements and the plant measurement system which are required as data for NUMSAS. The material balance records require processing and interpretation before the material balance calculation is begun. The material balance calculation is the last of four phases of data processing undertaken by NUMSAS. Each of these phases is implemented by a different computer program. The activities which are carried out in each phase can be summarised as follows; the pre-processing phase; the selection and up-date phase; the transformation phase, and the computation phase
International Nuclear Information System (INIS)
Denham, D.H.; Kathren, R.L.
1989-02-01
Current reductions in ''allowable'' levels of radiation and radioactive materials in the environment and an increased public awareness of naturally occurring radioactive materials have reinforced the need for consistency in evaluating the radiological environment. A key concern is the identification and interpretation of environmental levels of radiation and radioactive materials resulting from nuclear facility operations. If these levels can be detected and their source(s) identified, then corrective actions can be taken to eliminate or greatly reduce the environmental impacts of the facility operations. In this paper we address the lack of definitive guidance necessary to determine incremental levels of significance (or insignificance), and we propose a series of protocols to achieve more consistent collection and interpretation of radiological environmental data. 8 refs
Using Data Mining to Teach Applied Statistics and Correlation
Hartnett, Jessica L.
2016-01-01
This article describes two class activities that introduce the concept of data mining and very basic data mining analyses. Assessment data suggest that students learned some of the conceptual basics of data mining, understood some of the ethical concerns related to the practice, and were able to perform correlations via the Statistical Package for…
Use of keyword hierarchies to interpret gene expression patterns.
Masys, D R; Welsh, J B; Lynn Fink, J; Gribskov, M; Klacansky, I; Corbeil, J
2001-04-01
High-density microarray technology permits the quantitative and simultaneous monitoring of thousands of genes. The interpretation challenge is to extract relevant information from this large amount of data. A growing variety of statistical analysis approaches are available to identify clusters of genes that share common expression characteristics, but provide no information regarding the biological similarities of genes within clusters. The published literature provides a potential source of information to assist in interpretation of clustering results. We describe a data mining method that uses indexing terms ('keywords') from the published literature linked to specific genes to present a view of the conceptual similarity of genes within a cluster or group of interest. The method takes advantage of the hierarchical nature of Medical Subject Headings used to index citations in the MEDLINE database, and the registry numbers applied to enzymes.
LSD Dimensions: Use and Reuse of Linked Statistical Data
Meroño-Peñuela, Albert
2014-01-01
RDF Data Cube (QB) has boosted the publication of Linked Statistical Data (LSD) on the Web, making them linkable to other related datasets and concepts following the Linked Data paradigm. In this demo we present LSD Dimensions, a web based application that monitors the usage of dimensions and codes
Interpretation of environmental isotopic groundwater data. Arid and semi-arid zones
International Nuclear Information System (INIS)
Geyh, M.A.
1980-01-01
Various hydrodynamic aspects are discussed in order to show their implication for the hydrogeological interpretation of environmental isotope and hydrochemical groundwater data. Special attention is drawn to radiocarbon and tritium studies carried out in arid and semi-arid zones. An exponential model has been utilized to determine the mean residence time of the long-term water from springs in karst and crystalline regions. Hydrogeological parameters such as the porosity can be checked by this result. In addition, the exponential model offers the possibility of determining the initial 14 C content of spring water, which is sensitively dependent on the soil of the recharge area. A base-flow model has been introduced to interpret the 14 C and 3 H data of groundwater samples from older karst regions. Differences between pumped and drawn samples exist with respect to the groundwater budget. Owing to pumping, the old base flow is accelerated and becomes enriched in pumped groundwater in comparison to the short-term water. Radiocarbon ages of groundwater in alluvium may be dubious because of isotope exchange with the CO 2 in the root zone along the river bank. Under confined conditions 14 C groundwater ages are diminished if the hydraulic head of the confined aquifer is lower than that of the shallow one. This is due to the radiocarbon downwards transport by convection of shallow groundwater. The same effect occurs, though much faster, if the groundwater table is depleted by groundwater withdrawal. The decrease of the radiocarbon groundwater ages in time can be used to determine the hydraulic transmissibility coefficient of the aquitarde. According to the practical and theoretic results obtained the hydrodynamic aspects require at least the same attention for the interpretation of environmental isotope and hydrochemical data of groundwater as do hydrochemical and isotope fractionation processes. (author)
Directory of Open Access Journals (Sweden)
I. M. Ulbrich
2009-05-01
Full Text Available The organic aerosol (OA dataset from an Aerodyne Aerosol Mass Spectrometer (Q-AMS collected at the Pittsburgh Air Quality Study (PAQS in September 2002 was analyzed with Positive Matrix Factorization (PMF. Three components – hydrocarbon-like organic aerosol OA (HOA, a highly-oxygenated OA (OOA-1 that correlates well with sulfate, and a less-oxygenated, semi-volatile OA (OOA-2 that correlates well with nitrate and chloride – are identified and interpreted as primary combustion emissions, aged SOA, and semivolatile, less aged SOA, respectively. The complexity of interpreting the PMF solutions of unit mass resolution (UMR AMS data is illustrated by a detailed analysis of the solutions as a function of number of components and rotational forcing. A public web-based database of AMS spectra has been created to aid this type of analysis. Realistic synthetic data is also used to characterize the behavior of PMF for choosing the best number of factors, and evaluating the rotations of non-unique solutions. The ambient and synthetic data indicate that the variation of the PMF quality of fit parameter (Q, a normalized chi-squared metric vs. number of factors in the solution is useful to identify the minimum number of factors, but more detailed analysis and interpretation are needed to choose the best number of factors. The maximum value of the rotational matrix is not useful for determining the best number of factors. In synthetic datasets, factors are "split" into two or more components when solving for more factors than were used in the input. Elements of the "splitting" behavior are observed in solutions of real datasets with several factors. Significant structure remains in the residual of the real dataset after physically-meaningful factors have been assigned and an unrealistic number of factors would be required to explain the remaining variance. This residual structure appears to be due to variability in the spectra of the components
Carter, Jackie; Noble, Susan; Russell, Andrew; Swanson, Eric
2011-01-01
Increasing volumes of statistical data are being made available on the open web, including from the World Bank. This "data deluge" provides both opportunities and challenges. Good use of these data requires statistical literacy. This paper presents results from a project that set out to better understand how socioeconomic secondary data…
Application of Multivariable Statistical Techniques in Plant-wide WWTP Control Strategies Analysis
DEFF Research Database (Denmark)
Flores Alsina, Xavier; Comas, J.; Rodríguez-Roda, I.
2007-01-01
The main objective of this paper is to present the application of selected multivariable statistical techniques in plant-wide wastewater treatment plant (WWTP) control strategies analysis. In this study, cluster analysis (CA), principal component analysis/factor analysis (PCA/FA) and discriminant...... analysis (DA) are applied to the evaluation matrix data set obtained by simulation of several control strategies applied to the plant-wide IWA Benchmark Simulation Model No 2 (BSM2). These techniques allow i) to determine natural groups or clusters of control strategies with a similar behaviour, ii......) to find and interpret hidden, complex and casual relation features in the data set and iii) to identify important discriminant variables within the groups found by the cluster analysis. This study illustrates the usefulness of multivariable statistical techniques for both analysis and interpretation...
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
Statistical summaries of selected Iowa streamflow data through September 2013
Eash, David A.; O'Shea, Padraic S.; Weber, Jared R.; Nguyen, Kevin T.; Montgomery, Nicholas L.; Simonson, Adrian J.
2016-01-04
Statistical summaries of streamflow data collected at 184 streamgages in Iowa are presented in this report. All streamgages included for analysis have at least 10 years of continuous record collected before or through September 2013. This report is an update to two previously published reports that presented statistical summaries of selected Iowa streamflow data through September 1988 and September 1996. The statistical summaries include (1) monthly and annual flow durations, (2) annual exceedance probabilities of instantaneous peak discharges (flood frequencies), (3) annual exceedance probabilities of high discharges, and (4) annual nonexceedance probabilities of low discharges and seasonal low discharges. Also presented for each streamgage are graphs of the annual mean discharges, mean annual mean discharges, 50-percent annual flow-duration discharges (median flows), harmonic mean flows, mean daily mean discharges, and flow-duration curves. Two sets of statistical summaries are presented for each streamgage, which include (1) long-term statistics for the entire period of streamflow record and (2) recent-term statistics for or during the 30-year period of record from 1984 to 2013. The recent-term statistics are only calculated for streamgages with streamflow records pre-dating the 1984 water year and with at least 10 years of record during 1984–2013. The streamflow statistics in this report are not adjusted for the effects of water use; although some of this water is used consumptively, most of it is returned to the streams.
Parker, Loran Carleton; Gleichsner, Alyssa M.; Adedokun, Omolola A.; Forney, James
2016-01-01
Transformation of research in all biological fields necessitates the design, analysis and, interpretation of large data sets. Preparing students with the requisite skills in experimental design, statistical analysis, and interpretation, and mathematical reasoning will require both curricular reform and faculty who are willing and able to integrate…
The Blackboard Model of Computer Programming Applied to the Interpretation of Passive Sonar Data
National Research Council Canada - National Science Library
Liebing, David
1997-01-01
... (location, course, speed, classification, etc.). At present the potential volume of data produced by modern sonar systems is so large that unless some form of computer assistance is provided with the interpretation of this data, information...
Nevedrova, N. N.; Pospeeva, E. V.; Sanchaa, A. M.
2011-01-01
A procedure for the simultaneous interpretation of magnetotelluric and near-field transient electromagnetic sounding (MTS and NF TEMS, respectively) data is proposed. The advantages of the complex interpretation are demonstrated by specific examples. In accordance with the interpretation of the field data, geoelectrical sections of the lithosphere in the western part of the Chuya Depression are constructed. A reduction in the depth to the conductive crustal layer in the epicentral zone is found, and the geoelectrical boundary in the upper part of the paleozoic basement is revealed.
Tools to support interpreting multiple regression in the face of multicollinearity.
Kraha, Amanda; Turner, Heather; Nimon, Kim; Zientek, Linda Reichwein; Henson, Robin K
2012-01-01
While multicollinearity may increase the difficulty of interpreting multiple regression (MR) results, it should not cause undue problems for the knowledgeable researcher. In the current paper, we argue that rather than using one technique to investigate regression results, researchers should consider multiple indices to understand the contributions that predictors make not only to a regression model, but to each other as well. Some of the techniques to interpret MR effects include, but are not limited to, correlation coefficients, beta weights, structure coefficients, all possible subsets regression, commonality coefficients, dominance weights, and relative importance weights. This article will review a set of techniques to interpret MR effects, identify the elements of the data on which the methods focus, and identify statistical software to support such analyses.
Qualitative Data Analysis and Interpretation in Counseling Psychology: Strategies for Best Practices
Yeh, Christine J.; Inman, Arpana G.
2007-01-01
This article presents an overview of various strategies and methods of engaging in qualitative data interpretations and analyses in counseling psychology. The authors explore the themes of self, culture, collaboration, circularity, trustworthiness, and evidence deconstruction from multiple qualitative methodologies. Commonalities and differences…
Outpatient health care statistics data warehouse--implementation.
Zilli, D
1999-01-01
Data warehouse implementation is assumed to be a very knowledge-demanding, expensive and long-lasting process. As such it requires senior management sponsorship, involvement of experts, a big budget and probably years of development time. Presented Outpatient Health Care Statistics Data Warehouse implementation research provides ample evidence against the infallibility of the above statements. New, inexpensive, but powerful technology, which provides outstanding platform for On-Line Analytical Processing (OLAP), has emerged recently. Presumably, it will be the basis for the estimated future growth of data warehouse market, both in the medical and in other business fields. Methods and tools for building, maintaining and exploiting data warehouses are also briefly discussed in the paper.
42 CFR 417.806 - Financial records, statistical data, and cost finding.
2010-10-01
... 42 Public Health 3 2010-10-01 2010-10-01 false Financial records, statistical data, and cost... MEDICAL PLANS, AND HEALTH CARE PREPAYMENT PLANS Health Care Prepayment Plans § 417.806 Financial records, statistical data, and cost finding. (a) The principles specified in § 417.568 apply to HCPPs, except those in...
Common errors in statistics (and how to avoid them)
Good, Phillip I
2012-01-01
The Fourth Edition of this tried-and-true book elaborates on many key topics such as epidemiological studies, distribution of data; baseline data incorporation; case control studies; simulations; statistical theory publication; biplots; instrumental variables; ecological regression; result reporting, survival analysis; etc. Including new modifications and figures, the book also covers such topics as research plan creation; data collection; hypothesis formulation and testing; coefficient estimates; sample size specifications; assumption checking; p-values interpretations and confidence interval
Calkins, D. S.
1998-01-01
When the dependent (or response) variable response variable in an experiment has direction and magnitude, one approach that has been used for statistical analysis involves splitting magnitude and direction and applying univariate statistical techniques to the components. However, such treatment of quantities with direction and magnitude is not justifiable mathematically and can lead to incorrect conclusions about relationships among variables and, as a result, to flawed interpretations. This note discusses a problem with that practice and recommends mathematically correct procedures to be used with dependent variables that have direction and magnitude for 1) computation of mean values, 2) statistical contrasts of and confidence intervals for means, and 3) correlation methods.
Statistical analysis and interpolation of compositional data in materials science.
Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M
2015-02-09
Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.
Statistical Reform in School Psychology Research: A Synthesis
Swaminathan, Hariharan; Rogers, H. Jane
2007-01-01
Statistical reform in school psychology research is discussed in terms of research designs, measurement issues, statistical modeling and analysis procedures, interpretation and reporting of statistical results, and finally statistics education.
Interpretation of seismic reflection data, Central Palo Duro Basin: Technical report
International Nuclear Information System (INIS)
1986-11-01
Seismic reflection data from the Central Palo Duro Basin, Texas, were studied to identify and characterize geologic structure, potential hydrocarbon traps, and anomalies suggesting adverse features such as salt dissolution or diapirism. The data included seismic reflection data, geologic and geophysical data controlled by Stone and Webster Engineering Corporation, and data from the literature. These data comprised approximately 590 line-mi of seismic profiles over approximately 4000 mi 2 , plus well logs from 308 wells. The study addressed the section from shallow reflectors down to basement. Structural contour maps were prepared for the Upper San Andres, Near Top of Glorieta, Wolfcamp, and Precambrian horizons. Isopach maps were prepared for intervals between the Upper and Lower San Andres and between the Upper San Andres and the Wolfcamp. Interpretation indicates southeasterly dips in the northwest part of the mapped area and southwesterly dips in the southwest part. Geologic structures show a generally northwest alignment. Faults at the Precambrian level and geologic structures show a generally northwest alignment. Faulting in the area is largely limited to the Precambrian, but interpretation is uncertain. Evidence of post-Wolfcampian faulting is not recognized. Seismic data delineating the San Andres section indicate a stable section throughout the area. Anomalous reflection events possibly associated with subsurface salt dissolution were seen at the 800- to 1200-ft level in Swisher County. Other anomalies include an overthickened zone northwest of Westway and carbonate buildup in the Wolfcamp and Pennsylvanian in Randall County. Mississippian to Middle Pennsylvanian diastrophism resulting in the Amarillo Uplift and Matador Arch is not manifested structurally in the central Palo Duro Basin. Subsidence or gentle uplift contributed to some structural deformation
Analysis and interpretation of diffraction data from complex, anisotropic materials
Tutuncu, Goknur
Most materials are elastically anisotropic and exhibit additional anisotropy beyond elastic deformation. For instance, in ferroelectric materials the main inelastic deformation mode is via domains, which are highly anisotropic crystallographic features. To quantify this anisotropy of ferroelectrics, advanced X-ray and neutron diffraction methods were employed. Extensive sets of data were collected from tetragonal BaTiO3, PZT and other ferroelectric ceramics. Data analysis was challenging due to the complex constitutive behavior of these materials. To quantify the elastic strain and texture evolution in ferroelectrics under loading, a number of data analysis techniques such as the single peak and Rietveld methods were used and their advantages and disadvantages compared. It was observed that the single peak analysis fails at low peak intensities especially after domain switching while the Rietveld method does not account for lattice strain anisotropy although it overcomes the low intensity problem via whole pattern analysis. To better account for strain anisotropy the constant stress (Reuss) approximation was employed within the Rietveld method and new formulations to estimate lattice strain were proposed. Along the way, new approaches for handling highly anisotropic lattice strain data were also developed and applied. All of the ceramics studied exhibited significant changes in their crystallographic texture after loading indicating non-180° domain switching. For a full interpretation of domain switching the spherical harmonics method was employed in Rietveld. A procedure for simultaneous refinement of multiple data sets was established for a complete texture analysis. To further interpret diffraction data, a solid mechanics model based on the self-consistent approach was used in calculating lattice strain and texture evolution during the loading of a polycrystalline ferroelectric. The model estimates both the macroscopic average response of a specimen and its hkl
International Nuclear Information System (INIS)
Sahre, P.; Schoenmuth, Th.; Helling, K.
2000-01-01
At the Nuclear Engineering and Analytics Inc. Rossendorf near Dresden (Germany) occupationally exposed persons are working with Uranium and Thorium. In accordance with German guides urine and faecal analysis is carried out. But for the interpretation the data in terms of dose or intake it is important to have knowledge about the portion of the activity measured caused by natural sources. For this reason 16 occupationally exposed persons who did not have any history of occupational exposure to Thorium or Uranium have been checked concerning the excretion data since 1994. The excretion data in mBq per day for all persons covers the following ranges: Faeces: U-234 1 to 310 mBq/d, U-235 0.2 to 3.7 mBq/d, U-238 1.3 to 72 mBq/d. Th-228 7 to 89 mBq/d, Th-230 0.7 to 19 mBq/d, Th-232 0.7 to 16 mBq/d. Urine: all values below the detection limits of about 1 mBq/l. The large variation results from differences between the individual excretion rates but also from the variation of the excretion rate of one person. For example, the U-234-faecal excretion of one person reaches from 77 to 310 mBq per day. In the paper the faecal excretion for some individuals in dependence on the time are given. These excretion date caused by natural sources are taken into account by interpreting faecal excretion data of occupationally exposed persons working with Uranium or Thorium. If the measured faecal excretion per day is within the range caused by natural sources no interpretation will be done. By exceeding these values additional faeces and urine samples will be collected and measured. In dependence on these additional results intake and dose will be assessed some times by using lung counter or whole body counter measuring results. In the paper some examples are described. (author)
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Symmetry, Invariance and Ontology in Physics and Statistics
Directory of Open Access Journals (Sweden)
Julio Michael Stern
2011-09-01
Full Text Available This paper has three main objectives: (a Discuss the formal analogy between some important symmetry-invariance arguments used in physics, probability and statistics. Specifically, we will focus on Noether’s theorem in physics, the maximum entropy principle in probability theory, and de Finetti-type theorems in Bayesian statistics; (b Discuss the epistemological and ontological implications of these theorems, as they are interpreted in physics and statistics. Specifically, we will focus on the positivist (in physics or subjective (in statistics interpretations vs. objective interpretations that are suggested by symmetry and invariance arguments; (c Introduce the cognitive constructivism epistemological framework as a solution that overcomes the realism-subjectivism dilemma and its pitfalls. The work of the physicist and philosopher Max Born will be particularly important in our discussion.
Statistics & probaility for dummies
Rumsey, Deborah J
2013-01-01
Two complete eBooks for one low price! Created and compiled by the publisher, this Statistics I and Statistics II bundle brings together two math titles in one, e-only bundle. With this special bundle, you'll get the complete text of the following two titles: Statistics For Dummies, 2nd Edition Statistics For Dummies shows you how to interpret and critique graphs and charts, determine the odds with probability, guesstimate with confidence using confidence intervals, set up and carry out a hypothesis test, compute statistical formulas, and more. Tra
Using Carbon Emissions Data to "Heat Up" Descriptive Statistics
Brooks, Robert
2012-01-01
This article illustrates using carbon emissions data in an introductory statistics assignment. The carbon emissions data has desirable characteristics including: choice of measure; skewness; and outliers. These complexities allow research and public policy debate to be introduced. (Contains 4 figures and 2 tables.)
Statistical methods for longitudinal data with agricultural applications
DEFF Research Database (Denmark)
Anantharama Ankinakatte, Smitha
The PhD study focuses on modeling two kings of longitudinal data arising in agricultural applications: continuous time series data and discrete longitudinal data. Firstly, two statistical methods, neural networks and generalized additive models, are applied to predict masistis using multivariate...... algorithm. This was found to compare favourably with the algorithm implemented in the well-known Beagle software. Finally, an R package to apply APFA models developed as part of the PhD project is described...
Kanji, Gopal K
2006-01-01
This expanded and updated Third Edition of Gopal K. Kanji's best-selling resource on statistical tests covers all the most commonly used tests with information on how to calculate and interpret results with simple datasets. Each entry begins with a short summary statement about the test's purpose, and contains details of the test objective, the limitations (or assumptions) involved, a brief outline of the method, a worked example, and the numerical calculation. 100 Statistical Tests, Third Edition is the one indispensable guide for users of statistical materials and consumers of statistical information at all levels and across all disciplines.
Statistical modelling of transcript profiles of differentially regulated genes
Directory of Open Access Journals (Sweden)
Sergeant Martin J
2008-07-01
Full Text Available Abstract Background The vast quantities of gene expression profiling data produced in microarray studies, and the more precise quantitative PCR, are often not statistically analysed to their full potential. Previous studies have summarised gene expression profiles using simple descriptive statistics, basic analysis of variance (ANOVA and the clustering of genes based on simple models fitted to their expression profiles over time. We report the novel application of statistical non-linear regression modelling techniques to describe the shapes of expression profiles for the fungus Agaricus bisporus, quantified by PCR, and for E. coli and Rattus norvegicus, using microarray technology. The use of parametric non-linear regression models provides a more precise description of expression profiles, reducing the "noise" of the raw data to produce a clear "signal" given by the fitted curve, and describing each profile with a small number of biologically interpretable parameters. This approach then allows the direct comparison and clustering of the shapes of response patterns between genes and potentially enables a greater exploration and interpretation of the biological processes driving gene expression. Results Quantitative reverse transcriptase PCR-derived time-course data of genes were modelled. "Split-line" or "broken-stick" regression identified the initial time of gene up-regulation, enabling the classification of genes into those with primary and secondary responses. Five-day profiles were modelled using the biologically-oriented, critical exponential curve, y(t = A + (B + CtRt + ε. This non-linear regression approach allowed the expression patterns for different genes to be compared in terms of curve shape, time of maximal transcript level and the decline and asymptotic response levels. Three distinct regulatory patterns were identified for the five genes studied. Applying the regression modelling approach to microarray-derived time course data
Applied systems ecology: models, data, and statistical methods
Energy Technology Data Exchange (ETDEWEB)
Eberhardt, L L
1976-01-01
In this report, systems ecology is largely equated to mathematical or computer simulation modelling. The need for models in ecology stems from the necessity to have an integrative device for the diversity of ecological data, much of which is observational, rather than experimental, as well as from the present lack of a theoretical structure for ecology. Different objectives in applied studies require specialized methods. The best predictive devices may be regression equations, often non-linear in form, extracted from much more detailed models. A variety of statistical aspects of modelling, including sampling, are discussed. Several aspects of population dynamics and food-chain kinetics are described, and it is suggested that the two presently separated approaches should be combined into a single theoretical framework. It is concluded that future efforts in systems ecology should emphasize actual data and statistical methods, as well as modelling.
Some statistical issues important to future developments in human radiation research
International Nuclear Information System (INIS)
Vaeth, Michael
1991-01-01
Using his two years experience at the Radiation Effects Research Foundation at Hiroshima, the author tries to outline some of the areas of statistics where methodologies relevant to the future developments in human radiation research are likely to be found. Problems related to statistical analysis of existing data are discussed, together with methodological developments in non-parametric and semi-parametric regression modelling, and interpretation and presentation of results. (Author)
Data base of accident and agricultural statistics for transportation risk assessment
Energy Technology Data Exchange (ETDEWEB)
Saricks, C.L.; Williams, R.G.; Hopf, M.R.
1989-11-01
A state-level data base of accident and agricultural statistics has been developed to support risk assessment for transportation of spent nuclear fuels and high-level radioactive wastes. This data base will enhance the modeling capabilities for more route-specific analyses of potential risks associated with transportation of these wastes to a disposal site. The data base and methodology used to develop state-specific accident and agricultural data bases are described, and summaries of accident and agricultural statistics are provided. 27 refs., 9 tabs.
Data base of accident and agricultural statistics for transportation risk assessment
International Nuclear Information System (INIS)
Saricks, C.L.; Williams, R.G.; Hopf, M.R.
1989-11-01
A state-level data base of accident and agricultural statistics has been developed to support risk assessment for transportation of spent nuclear fuels and high-level radioactive wastes. This data base will enhance the modeling capabilities for more route-specific analyses of potential risks associated with transportation of these wastes to a disposal site. The data base and methodology used to develop state-specific accident and agricultural data bases are described, and summaries of accident and agricultural statistics are provided. 27 refs., 9 tabs
Role of Melt Curve Analysis in Interpretation of Nutrigenomics' MicroRNA Expression Data.
Ahmed, Farid E; Gouda, Mostafa M; Hussein, Laila A; Ahmed, Nancy C; Vos, Paul W; Mohammad, Mahmoud A
2017-01-01
This article illustrates the importance of melt curve analysis (MCA) in interpretation of mild nutrogenomic micro(mi)RNA expression data, by measuring the magnitude of the expression of key miRNA molecules in stool of healthy human adults as molecular markers, following the intake of Pomegranate juice (PGJ), functional fermented sobya (FS), rich in potential probiotic lactobacilli, or their combination. Total small RNA was isolated from stool of 25 volunteers before and following a three-week dietary intervention trial. Expression of 88 miRNA genes was evaluated using Qiagen's 96 well plate RT 2 miRNA qPCR arrays. Employing parallel coordinates plots, there was no observed significant separation for the gene expression (Cq) values, using Roche 480® PCR LightCycler instrument used in this study, and none of the miRNAs showed significant statistical expression after controlling for the false discovery rate. On the other hand, melting temperature profiles produced during PCR amplification run, found seven significant genes (miR-184, miR-203, miR-373, miR-124, miR-96, miR-373 and miR-301a), which separated candidate miRNAs that could function as novel molecular markers of relevance to oxidative stress and immunoglobulin function, for the intake of polyphenol (PP)-rich, functional fermented foods rich in lactobacilli (FS), or their combination. We elaborate on these data, and present a detailed review on use of melt curves for analyzing nutigenomic miRNA expression data, which initially appear to show no significant expressions, but are actually more subtle than this simplistic view, necessitating the understanding of the role of MCA for a comprehensive understanding of what the collective expression and MCA data collectively imply. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Energy Technology Data Exchange (ETDEWEB)
Wjihi, Sarra [Unité de Recherche de Physique Quantique, 11 ES 54, Faculté des Science de Monastir (Tunisia); Dhaou, Houcine [Laboratoire des Etudes des Systèmes Thermiques et Energétiques (LESTE), ENIM, Route de Kairouan, 5019 Monastir (Tunisia); Yahia, Manel Ben; Knani, Salah [Unité de Recherche de Physique Quantique, 11 ES 54, Faculté des Science de Monastir (Tunisia); Jemni, Abdelmajid [Laboratoire des Etudes des Systèmes Thermiques et Energétiques (LESTE), ENIM, Route de Kairouan, 5019 Monastir (Tunisia); Lamine, Abdelmottaleb Ben, E-mail: abdelmottaleb.benlamine@gmail.com [Unité de Recherche de Physique Quantique, 11 ES 54, Faculté des Science de Monastir (Tunisia)
2015-12-15
Statistical physics treatment is used to study the desorption of hydrogen on LaNi{sub 4.75}Fe{sub 0.25}, in order to obtain new physicochemical interpretations at the molecular level. Experimental desorption isotherms of hydrogen on LaNi{sub 4.75}Fe{sub 0.25} are fitted at three temperatures (293 K, 303 K and 313 K), using a monolayer desorption model. Six parameters of the model are fitted, namely the number of molecules per site n{sub α} and n{sub β}, the receptor site densities N{sub αM} and N{sub βM}, and the energetic parameters P{sub α} and P{sub β}. The behaviors of these parameters are discussed in relationship with desorption process. A dynamic study of the α and β phases in the desorption process was then carried out. Finally, the different thermodynamical potential functions are derived by statistical physics calculations from our adopted model.
Directory of Open Access Journals (Sweden)
Susmaga Robert
2018-03-01
Full Text Available The paper considers particular interestingness measures, called confirmation measures (also known as Bayesian confirmation measures, used for the evaluation of “if evidence, then hypothesis” rules. The agreement of such measures with a statistically sound (significant dependency between the evidence and the hypothesis in data is thoroughly investigated. The popular confirmation measures were not defined to possess such form of agreement. However, in error-prone environments, potential lack of agreement may lead to undesired effects, e.g. when a measure indicates either strong confirmation or strong disconfirmation, while in fact there is only weak dependency between the evidence and the hypothesis. In order to detect and prevent such situations, the paper employs a coefficient allowing to assess the level of dependency between the evidence and the hypothesis in data, and introduces a method of quantifying the level of agreement (referred to as a concordance between this coefficient and the measure being analysed. The concordance is characterized and visualised using specialized histograms, scatter-plots, etc. Moreover, risk-related interpretations of the concordance are introduced. Using a set of 12 confirmation measures, the paper presents experiments designed to establish the actual concordance as well as other useful characteristics of the measures.
Innovative statistical methods for public health data
Wilson, Jeffrey
2015-01-01
The book brings together experts working in public health and multi-disciplinary areas to present recent issues in statistical methodological development and their applications. This timely book will impact model development and data analyses of public health research across a wide spectrum of analysis. Data and software used in the studies are available for the reader to replicate the models and outcomes. The fifteen chapters range in focus from techniques for dealing with missing data with Bayesian estimation, health surveillance and population definition and implications in applied latent class analysis, to multiple comparison and meta-analysis in public health data. Researchers in biomedical and public health research will find this book to be a useful reference, and it can be used in graduate level classes.
In situ impulse test: an experimental and analytical evaluation of data interpretation procedures
International Nuclear Information System (INIS)
1975-08-01
Special experimental field testing and analytical studies were undertaken at Fort Lawton in Seattle, Washington, to study ''close-in'' wave propagation and evaluate data interpretation procedures for a new in situ impulse test. This test was developed to determine the shear wave velocity and dynamic modulus of soils underlying potential nuclear power plant sites. The test is different from conventional geophysical testing in that the velocity variation with strain is determined for each test. In general, strains between 10 -1 and 10 -3 percent are achieved. The experimental field work consisted of performing special tests in a large test sand fill to obtain detailed ''close-in'' data. Six recording transducers were placed at various points on the energy source, while approximately 37 different transducers were installed within the soil fill, all within 7 feet of the energy source. Velocity measurements were then taken simultaneously under controlled test conditions to study shear wave propagation phenomenology and help evaluate data interpretation procedures. Typical test data are presented along with detailed descriptions of the results
Identifying Reflectors in Seismic Images via Statistic and Syntactic Methods
Directory of Open Access Journals (Sweden)
Carlos A. Perez
2010-04-01
Full Text Available In geologic interpretation of seismic reflection data, accurate identification of reflectors is the foremost step to ensure proper subsurface structural definition. Reflector information, along with other data sets, is a key factor to predict the presence of hydrocarbons. In this work, mathematic and pattern recognition theory was adapted to design two statistical and two syntactic algorithms which constitute a tool in semiautomatic reflector identification. The interpretive power of these four schemes was evaluated in terms of prediction accuracy and computational speed. Among these, the semblance method was confirmed to render the greatest accuracy and speed. Syntactic methods offer an interesting alternative due to their inherently structural search method.
Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu
2015-05-27
Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.
Insights in Experimental Data : Interactive Statistics with the ILLMO Program
Martens, J.B.O.S.
2017-01-01
Empirical researchers turn to statistics to assist them in drawing conclusions, also called inferences, from their collected data. Often, this data is experimental data, i.e., it consists of (repeated) measurements collected in one or more distinct conditions. The observed data can hence be
Semenov, Alexander V; Elsas, Jan Dirk; Glandorf, Debora C M; Schilthuizen, Menno; Boer, Willem F
2013-08-01
To fulfill existing guidelines, applicants that aim to place their genetically modified (GM) insect-resistant crop plants on the market are required to provide data from field experiments that address the potential impacts of the GM plants on nontarget organisms (NTO's). Such data may be based on varied experimental designs. The recent EFSA guidance document for environmental risk assessment (2010) does not provide clear and structured suggestions that address the statistics of field trials on effects on NTO's. This review examines existing practices in GM plant field testing such as the way of randomization, replication, and pseudoreplication. Emphasis is placed on the importance of design features used for the field trials in which effects on NTO's are assessed. The importance of statistical power and the positive and negative aspects of various statistical models are discussed. Equivalence and difference testing are compared, and the importance of checking the distribution of experimental data is stressed to decide on the selection of the proper statistical model. While for continuous data (e.g., pH and temperature) classical statistical approaches - for example, analysis of variance (ANOVA) - are appropriate, for discontinuous data (counts) only generalized linear models (GLM) are shown to be efficient. There is no golden rule as to which statistical test is the most appropriate for any experimental situation. In particular, in experiments in which block designs are used and covariates play a role GLMs should be used. Generic advice is offered that will help in both the setting up of field testing and the interpretation and data analysis of the data obtained in this testing. The combination of decision trees and a checklist for field trials, which are provided, will help in the interpretation of the statistical analyses of field trials and to assess whether such analyses were correctly applied. We offer generic advice to risk assessors and applicants that will
Dimensional enrichment of statistical linked open data
DEFF Research Database (Denmark)
Varga, Jovan; Vaisman, Alejandro; Romero, Oscar
2016-01-01
On-Line Analytical Processing (OLAP) is a data analysis technique typically used for local and well-prepared data. However, initiatives like Open Data and Open Government bring new and publicly available data on the web that are to be analyzed in the same way. The use of semantic web technologies...... for this context is especially encouraged by the Linked Data initiative. There is already a considerable amount of statistical linked open data sets published using the RDF Data Cube Vocabulary (QB) which is designed for these purposes. However, QB lacks some essential schema constructs (e.g., dimension levels......) to support OLAP. Thus, the QB4OLAP vocabulary has been proposed to extend QB with the necessary constructs and be fully compliant with OLAP. In this paper, we focus on the enrichment of an existing QB data set with QB4OLAP semantics. We first thoroughly compare the two vocabularies and outline the benefits...
Gray, Alistair; Veale, Jaimie F.; Binson, Diane; Sell, Randell L.
2013-01-01
Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand's Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens. PMID:23840231
Linear mixed models a practical guide using statistical software
West, Brady T; Galecki, Andrzej T
2006-01-01
Simplifying the often confusing array of software programs for fitting linear mixed models (LMMs), Linear Mixed Models: A Practical Guide Using Statistical Software provides a basic introduction to primary concepts, notation, software implementation, model interpretation, and visualization of clustered and longitudinal data. This easy-to-navigate reference details the use of procedures for fitting LMMs in five popular statistical software packages: SAS, SPSS, Stata, R/S-plus, and HLM. The authors introduce basic theoretical concepts, present a heuristic approach to fitting LMMs based on bo
Interpretation of Spirometry: Selection of Predicted Values and Defining Abnormality.
Chhabra, S K
2015-01-01
Spirometry is the most frequently performed investigation to evaluate pulmonary function. It provides clinically useful information on the mechanical properties of the lung and the thoracic cage and aids in taking management-related decisions in a wide spectrum of diseases and disorders. Few measurements in medicine are so dependent on factors related to equipment, operator and the patient. Good spirometry requires quality assured measurements and a systematic approach to interpretation. Standard guidelines on the technical aspects of equipment and their calibration as well as the test procedure have been developed and revised from time-to-time. Strict compliance with standardisation guidelines ensures quality control. Interpretation of spirometry data is based only on two basic measurements--the forced vital capacity (FVC) and the forced expiratory volume in 1 second (FEV1) and their ratio, FEV1/FVC. A meaningful and clinically useful interpretation of the measured data requires a systematic approach and consideration of several important issues. Central to interpretation is the understanding of the development and application of prediction equations. Selection of prediction equations that are appropriate for the ethnic origin of the patient is vital to avoid erroneous interpretation. Defining abnormal values is a debatable but critical aspect of spirometry. A statistically valid definition of the lower limits of normal has been advocated as the better method over the more commonly used approach of defining abnormality as a fixed percentage of the predicted value. Spirometry rarely provides a specific diagnosis. Examination of the flow-volume curve and the measured data provides information to define patterns of ventilatory impairment. Spirometry must be interpreted in conjunction with clinical information including results of other investigations.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2014-11-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason maybe that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: 1. P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. 2. Overemphasis on P values rather than on the actual size of the observed effect. 3. Overuse of statistical hypothesis testing, and being seduced by the word "significant". 4. Overreliance on standard errors, which are often misunderstood.
Common misconceptions about data analysis and statistics.
Motulsky, Harvey J
2015-02-01
Ideally, any experienced investigator with the right tools should be able to reproduce a finding published in a peer-reviewed biomedical science journal. In fact, the reproducibility of a large percentage of published findings has been questioned. Undoubtedly, there are many reasons for this, but one reason may be that investigators fool themselves due to a poor understanding of statistical concepts. In particular, investigators often make these mistakes: (1) P-Hacking. This is when you reanalyze a data set in many different ways, or perhaps reanalyze with additional replicates, until you get the result you want. (2) Overemphasis on P values rather than on the actual size of the observed effect. (3) Overuse of statistical hypothesis testing, and being seduced by the word "significant". (4) Overreliance on standard errors, which are often misunderstood.
Interpretation of Confidence Interval Facing the Conflict
Andrade, Luisa; Fernández, Felipe
2016-01-01
As literature has reported, it is usual that university students in statistics courses, and even statistics teachers, interpret the confidence level associated with a confidence interval as the probability that the parameter value will be between the lower and upper interval limits. To confront this misconception, class activities have been…
Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P
1999-01-01
Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149
Energy Technology Data Exchange (ETDEWEB)
Voutay, O.
2003-02-01
Seismic data contain further geological information than well, due to their good spatial extent. But the seismic measure is band pass limited and the contrasts in acoustic or elastic properties derived from seismic are not directly linked to the reservoir properties. Thus, it is difficult to give a geological interpretation to seismic data. Basically, relevant seismic attributes are extracted at the reservoir level, and then are calibrated with information available at wells by using pattern recognition and statistical estimation techniques. These methods are successfully used in the post-stack domain. But, for multi-cube seismic information such as pre-stack or 4D data, the number of attributes can considerably increase and statistical methods are not often used. It is necessary to find a parameterization allowing an optimal description the seismic variability in the time window of interest. We propose to extract new attributes from seismic multi-cube data with Generalised Principal Analysis and to use them for reservoir interpretation with statistical techniques. The new attributes can be clearly related to the initial data set, and then be physically interpreted, while optimally summarizing the initial seismic information. By applying the Generalised Principal Analysis to 3D pre-stack surveys, the contribution of the pre-stack seismic information to reservoir characterisation is compared to the post-stack seismic one, in both synthetic and real cases. By applying the Generalised Principal Analysis to real 4D surveys, the seismic repeatability is quantified and the seismic changes in the reservoir with calendar time are highlighted and interpreted. A coherency cube has also been defined, based on the Generalised Principal Analysis. This attribute is a coherence measurement in three dimensions representing the local similarity between 4D or AVO surveys. (author)
Study of the effects of photoelectron statistics on Thomson scattering data
International Nuclear Information System (INIS)
Hart, G.W.; Levinton, F.M.; McNeill, D.H.
1986-01-01
A computer code has been developed which simulates a Thomson scattering measurement, from the counting statistics of the input channels through the mathematical analysis of the data. The scattered and background signals in each of the wavelength channels are assumed to obey Poisson statistics, and the spectral data are fitted to a Gaussian curve using a nonlinear least-squares fitting algorithm. This method goes beyond the usual calculation of the signal-to-noise ratio for the hardware and gives a quantitative measure of the effect of the noise on the final measurement. This method is applicable to Thomson scattering measurements in which the signal-to-noise ratio is low due to either low signal or high background. Thomson scattering data from the S-1 spheromak have been compared to this simulation, and they have been found to be in good agreement. This code has proven to be useful in assessing the effects of counting statistics relative to shot-to-shot variability in producing the observed spread in the data. It was also useful for designing improvements for the S-1 Thomson scattering system, and this method would be applicable to any measurement affected by counting statistics
Directory of Open Access Journals (Sweden)
Balázs Gönci
2010-12-01
Full Text Available Because of its relevance to everyday life, the spreading of viral infections has been of central interest in a variety of scientific communities involved in fighting, preventing and theoretically interpreting epidemic processes. Recent large scale observations have resulted in major discoveries concerning the overall features of the spreading process in systems with highly mobile susceptible units, but virtually no data are available about observations of infection spreading for a very large number of immobile units. Here we present the first detailed quantitative documentation of percolation-type viral epidemics in a highly reproducible in vitro system consisting of tens of thousands of virtually motionless cells. We use a confluent astroglial monolayer in a Petri dish and induce productive infection in a limited number of cells with a genetically modified herpesvirus strain. This approach allows extreme high resolution tracking of the spatio-temporal development of the epidemic. We show that a simple model is capable of reproducing the basic features of our observations, i.e., the observed behaviour is likely to be applicable to many different kinds of systems. Statistical physics inspired approaches to our data, such as fractal dimension of the infected clusters as well as their size distribution, seem to fit into a percolation theory based interpretation. We suggest that our observations may be used to model epidemics in more complex systems, which are difficult to study in isolation.
Explorations in Statistics: The Analysis of Ratios and Normalized Data
Curran-Everett, Douglas
2013-01-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of "Explorations in Statistics" explores the analysis of ratios and normalized--or standardized--data. As researchers, we compute a ratio--a numerator divided by a denominator--to compute a…
A new method to determine the number of experimental data using statistical modeling methods
Energy Technology Data Exchange (ETDEWEB)
Jung, Jung-Ho; Kang, Young-Jin; Lim, O-Kaung; Noh, Yoojeong [Pusan National University, Busan (Korea, Republic of)
2017-06-15
For analyzing the statistical performance of physical systems, statistical characteristics of physical parameters such as material properties need to be estimated by collecting experimental data. For accurate statistical modeling, many such experiments may be required, but data are usually quite limited owing to the cost and time constraints of experiments. In this study, a new method for determining a rea- sonable number of experimental data is proposed using an area metric, after obtaining statistical models using the information on the underlying distribution, the Sequential statistical modeling (SSM) approach, and the Kernel density estimation (KDE) approach. The area metric is used as a convergence criterion to determine the necessary and sufficient number of experimental data to be acquired. The pro- posed method is validated in simulations, using different statistical modeling methods, different true models, and different convergence criteria. An example data set with 29 data describing the fatigue strength coefficient of SAE 950X is used for demonstrating the performance of the obtained statistical models that use a pre-determined number of experimental data in predicting the probability of failure for a target fatigue life.
Klose, C. D.; Giese, R.; Löw, S.; Borm, G.
Especially for deep underground excavations, the prediction of the locations of small- scale hazardous geotechnical structures is nearly impossible when exploration is re- stricted to surface based methods. Hence, for the AlpTransit base tunnels, exploration ahead has become an essential component of the excavation plan. The project de- scribed in this talk aims at improving the technology for the geological interpretation of reflection seismic data. The discovered geological-seismic relations will be used to develop an interpretation system based on artificial intelligence to predict hazardous geotechnical structures of the advancing tunnel face. This talk gives, at first, an overview about the data mining of geological and seismic properties of metamorphic rocks within the Penninic gneiss zone in Southern Switzer- land. The data results from measurements of a specific geophysical prediction system developed by the GFZ Potsdam, Germany, along the 2600 m long and 1400 m deep Faido access tunnel. The goal is to find those seismic features (i.e. compression and shear wave velocities, velocity ratios and velocity gradients) which show a significant relation to geological properties (i.e. fracturing and fabric features). The seismic properties were acquired from different tomograms, whereas the geolog- ical features derive from tunnel face maps. The features are statistically compared with the seismic rock properties taking into account the different methods used for the tunnel excavation (TBM and Drill/Blast). Fracturing and the mica content stay in a positive relation to the velocity values. Both, P- and S-wave velocities near the tunnel surface describe the petrology better, whereas in the interior of the rock mass they correlate to natural micro- and macro-scopic fractures surrounding tectonites, i.e. cataclasites. The latter lie outside of the excavation damage zone and the tunnel loos- ening zone. The shear wave velocities are better indicators for rock
Stanzel, Sven; Weimer, Marc; Kopp-Schneider, Annette
2013-06-01
High-throughput screening approaches are carried out for the toxicity assessment of a large number of chemical compounds. In such large-scale in vitro toxicity studies several hundred or thousand concentration-response experiments are conducted. The automated evaluation of concentration-response data using statistical analysis scripts saves time and yields more consistent results in comparison to data analysis performed by the use of menu-driven statistical software. Automated statistical analysis requires that concentration-response data are available in a standardised data format across all compounds. To obtain consistent data formats, a standardised data management workflow must be established, including guidelines for data storage, data handling and data extraction. In this paper two procedures for data management within large-scale toxicological projects are proposed. Both procedures are based on Microsoft Excel files as the researcher's primary data format and use a computer programme to automate the handling of data files. The first procedure assumes that data collection has not yet started whereas the second procedure can be used when data files already exist. Successful implementation of the two approaches into the European project ACuteTox is illustrated. Copyright © 2012 Elsevier Ltd. All rights reserved.
Wu, Chong; Pan, Wei
2018-04-01
Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS. © 2018 WILEY PERIODICALS, INC.
Statistical processing of experimental data
NAVRÁTIL, Pavel
2012-01-01
This thesis contains theory of probability and statistical sets. Solved and unsolved problems of probability, random variable and distributions random variable, random vector, statistical sets, regression and correlation analysis. Unsolved problems contains solutions.
Statistical yearbook. 1998. Data available as of 30 November 2000. 45 ed
International Nuclear Information System (INIS)
2001-01-01
This is the forty-fifth issue of the United Nations Statistical Yearbook, prepared by the Statistics Division, Department of Economic and Social Affairs of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1989-1998 or 1990-1999, using statistics available to the Statistics Division up to 30 November 2000. The Yearbook is based on data compiled by the Statistics Division from over 40 different international and national sources. These include the United Nations Statistics Division in the fields of national accounts, industry, energy, transport and international trade; the United Nations Statistics Division and Population Division in the field of demographic statistics; and data provided by over 20 offices of the United Nations system and international organizations in other specialized fields.United Nations agencies and other international organizations which furnished data are listed under 'Statistical sources and references' at the end of the Yearbook. Acknowledgement is gratefully made for their generous cooperation in providing data. The Statistics Division also publishes the Monthly Bulletin of Statistics, which provides a valuable complement to the Yearbook covering current international economic statistics for most countries and areas of the world and quarterly world and regional aggregates. Subscribers to the Monthly Bulletin of Statistics may also access the Bulletin on-line via the World Wide Web on Internet. MBS On-line allows time-sensitive statistics to reach users much faster than the traditional print publication. For further information see . The present issue of the Yearbook reflects a phased programme of major changes in its organization and presentation undertaken in 1990 which until then was relatively unchanged since the first issue was released in 1948. One result of this process has been to reduce the total number of tables from 140 in the 37th issue to 80 in the present issue and to include
Interpreting Evidence-of-Learning: Educational Research in the Era of Big Data
Cope, Bill; Kalantzis, Mary
2015-01-01
In this article, we argue that big data can offer new opportunities and roles for educational researchers. In the traditional model of evidence-gathering and interpretation in education, researchers are independent observers, who pre-emptively create instruments of measurement, and insert these into the educational process in specialized times and…
National Vital Statistics System (NVSS) - National Cardiovascular Disease Surveillance Data
U.S. Department of Health & Human Services — 2000 forward. NVSS is a secure, web-based data management system that collects and disseminates the Nation's official vital statistics. Indicators from this data...
Statistical methods of combining information: Applications to sensor data fusion
Energy Technology Data Exchange (ETDEWEB)
Burr, T.
1996-12-31
This paper reviews some statistical approaches to combining information from multiple sources. Promising new approaches will be described, and potential applications to combining not-so-different data sources such as sensor data will be discussed. Experiences with one real data set are described.
Hendikawati, P.; Arifudin, R.; Zahid, M. Z.
2018-03-01
This study aims to design an android Statistics Data Analysis application that can be accessed through mobile devices to making it easier for users to access. The Statistics Data Analysis application includes various topics of basic statistical along with a parametric statistics data analysis application. The output of this application system is parametric statistics data analysis that can be used for students, lecturers, and users who need the results of statistical calculations quickly and easily understood. Android application development is created using Java programming language. The server programming language uses PHP with the Code Igniter framework, and the database used MySQL. The system development methodology used is the Waterfall methodology with the stages of analysis, design, coding, testing, and implementation and system maintenance. This statistical data analysis application is expected to support statistical lecturing activities and make students easier to understand the statistical analysis of mobile devices.
Some statistical properties of gene expression clustering for array data
DEFF Research Database (Denmark)
Abreu, G C G; Pinheiro, A; Drummond, R D
2010-01-01
DNA array data without a corresponding statistical error measure. We propose an easy-to-implement and simple-to-use technique that uses bootstrap re-sampling to evaluate the statistical error of the nodes provided by SOM-based clustering. Comparisons between SOM and parametric clustering are presented...... for simulated as well as for two real data sets. We also implement a bootstrap-based pre-processing procedure for SOM, that improves the false discovery ratio of differentially expressed genes. Code in Matlab is freely available, as well as some supplementary material, at the following address: https...
Experimental data at high PT and its interpretation: the role of theory
Energy Technology Data Exchange (ETDEWEB)
Belonoshko, A. B.; Rosengren, A.
2011-07-01
Experiments, relevant for planetary science, are performed often under extreme conditions of pressure and temperature. This makes them technically difficult. The results are often difficult to interpret correctly, especially in the cases when experimental data are scarce and experimental trends difficult to establish. Theory, while normally is inferior in precision of delivered data, is superior in providing a big picture and details behind materials behavior. We consider the experiments performed for deuterium, Mo, and Fe. We demonstrate that when experimental data is verified by theory, significant insight can be gained. (Author) 26 refs.
Radiologic head CT interpretation errors in pediatric abusive and non-abusive head trauma patients
International Nuclear Information System (INIS)
Kralik, Stephen F.; Finke, Whitney; Wu, Isaac C.; Ho, Chang Y.; Hibbard, Roberta A.; Hicks, Ralph A.
2017-01-01
Pediatric head trauma, including abusive head trauma, is a significant cause of morbidity and mortality. The purpose of this research was to identify and evaluate radiologic interpretation errors of head CTs performed on abusive and non-abusive pediatric head trauma patients from a community setting referred for a secondary interpretation at a tertiary pediatric hospital. A retrospective search identified 184 patients <5 years of age with head CT for known or potential head trauma who had a primary interpretation performed at a referring community hospital by a board-certified radiologist. Two board-certified fellowship-trained neuroradiologists at an academic pediatric hospital independently interpreted the head CTs, compared their interpretations to determine inter-reader discrepancy rates, and resolved discrepancies to establish a consensus second interpretation. The primary interpretation was compared to the consensus second interpretation using the RADPEER trademark scoring system to determine the primary interpretation-second interpretation overall and major discrepancy rates. MRI and/or surgical findings were used to validate the primary interpretation or second interpretation when possible. The diagnosis of abusive head trauma was made using clinical and imaging data by a child abuse specialist to separate patients into abusive head trauma and non-abusive head trauma groups. Discrepancy rates were compared for both groups. Lastly, primary interpretations and second interpretations were evaluated for discussion of imaging findings concerning for abusive head trauma. There were statistically significant differences between primary interpretation-second interpretation versus inter-reader overall and major discrepancy rates (28% vs. 6%, P=0.0001; 16% vs. 1%, P=0.0001). There were significant differences in the primary interpretation-second interpretation overall and major discrepancy rates for abusive head trauma patients compared to non-abusive head trauma
Radiologic head CT interpretation errors in pediatric abusive and non-abusive head trauma patients
Energy Technology Data Exchange (ETDEWEB)
Kralik, Stephen F.; Finke, Whitney; Wu, Isaac C.; Ho, Chang Y. [Indiana University School of Medicine, Department of Radiology and Imaging Sciences, Indianapolis, IN (United States); Hibbard, Roberta A.; Hicks, Ralph A. [Indiana University School of Medicine, Department of Pediatrics, Section of Child Protection Programs, Indianapolis, IN (United States)
2017-07-15
Pediatric head trauma, including abusive head trauma, is a significant cause of morbidity and mortality. The purpose of this research was to identify and evaluate radiologic interpretation errors of head CTs performed on abusive and non-abusive pediatric head trauma patients from a community setting referred for a secondary interpretation at a tertiary pediatric hospital. A retrospective search identified 184 patients <5 years of age with head CT for known or potential head trauma who had a primary interpretation performed at a referring community hospital by a board-certified radiologist. Two board-certified fellowship-trained neuroradiologists at an academic pediatric hospital independently interpreted the head CTs, compared their interpretations to determine inter-reader discrepancy rates, and resolved discrepancies to establish a consensus second interpretation. The primary interpretation was compared to the consensus second interpretation using the RADPEER trademark scoring system to determine the primary interpretation-second interpretation overall and major discrepancy rates. MRI and/or surgical findings were used to validate the primary interpretation or second interpretation when possible. The diagnosis of abusive head trauma was made using clinical and imaging data by a child abuse specialist to separate patients into abusive head trauma and non-abusive head trauma groups. Discrepancy rates were compared for both groups. Lastly, primary interpretations and second interpretations were evaluated for discussion of imaging findings concerning for abusive head trauma. There were statistically significant differences between primary interpretation-second interpretation versus inter-reader overall and major discrepancy rates (28% vs. 6%, P=0.0001; 16% vs. 1%, P=0.0001). There were significant differences in the primary interpretation-second interpretation overall and major discrepancy rates for abusive head trauma patients compared to non-abusive head trauma
Procedure for statistical analysis of one-parameter discrepant experimental data
International Nuclear Information System (INIS)
Badikov, Sergey A.; Chechev, Valery P.
2012-01-01
A new, Mandel–Paule-type procedure for statistical processing of one-parameter discrepant experimental data is described. The procedure enables one to estimate a contribution of unrecognized experimental errors into the total experimental uncertainty as well as to include it in analysis. A definition of discrepant experimental data for an arbitrary number of measurements is introduced as an accompanying result. In the case of negligible unrecognized experimental errors, the procedure simply reduces to the calculation of the weighted average and its internal uncertainty. The procedure was applied to the statistical analysis of half-life experimental data; Mean half-lives for 20 actinides were calculated and results were compared to the ENSDF and DDEP evaluations. On the whole, the calculated half-lives are consistent with the ENSDF and DDEP evaluations. However, the uncertainties calculated in this work essentially exceed the ENSDF and DDEP evaluations for discrepant experimental data. This effect can be explained by adequately taking into account unrecognized experimental errors. - Highlights: ► A new statistical procedure for processing one-parametric discrepant experimental data has been presented. ► Procedure estimates a contribution of unrecognized errors in the total experimental uncertainty. ► Procedure was applied for processing half-life discrepant experimental data. ► Results of the calculations are compared to the ENSDF and DDEP evaluations.
Assistive Technologies for Second-Year Statistics Students Who Are Blind
Erhardt, Robert J.; Shuman, Michael P.
2015-01-01
At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…
Feature relevance assessment for the semantic interpretation of 3D point cloud data
Directory of Open Access Journals (Sweden)
M. Weinmann
2013-10-01
Full Text Available The automatic analysis of large 3D point clouds represents a crucial task in photogrammetry, remote sensing and computer vision. In this paper, we propose a new methodology for the semantic interpretation of such point clouds which involves feature relevance assessment in order to reduce both processing time and memory consumption. Given a standard benchmark dataset with 1.3 million 3D points, we first extract a set of 21 geometric 3D and 2D features. Subsequently, we apply a classifier-independent ranking procedure which involves a general relevance metric in order to derive compact and robust subsets of versatile features which are generally applicable for a large variety of subsequent tasks. This metric is based on 7 different feature selection strategies and thus addresses different intrinsic properties of the given data. For the example of semantically interpreting 3D point cloud data, we demonstrate the great potential of smaller subsets consisting of only the most relevant features with 4 different state-of-the-art classifiers. The results reveal that, instead of including as many features as possible in order to compensate for lack of knowledge, a crucial task such as scene interpretation can be carried out with only few versatile features and even improved accuracy.
AutoBayes: A System for Generating Data Analysis Programs from Statistical Models
Fischer, Bernd; Schumann, Johann
2003-01-01
Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but dificult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis...
Methods for interpreting lists of affected genes obtained in a DNA microarray experiment
Directory of Open Access Journals (Sweden)
Hedegaard Jakob
2009-07-01
Full Text Available Abstract Background The aim of this paper was to describe and compare the methods used and the results obtained by the participants in a joint EADGENE (European Animal Disease Genomic Network of Excellence and SABRE (Cutting Edge Genomics for Sustainable Animal Breeding workshop focusing on post analysis of microarray data. The participating groups were provided with identical lists of microarray probes, including test statistics for three different contrasts, and the normalised log-ratios for each array, to be used as the starting point for interpreting the affected probes. The data originated from a microarray experiment conducted to study the host reactions in broilers occurring shortly after a secondary challenge with either a homologous or heterologous species of Eimeria. Results Several conceptually different analytical approaches, using both commercial and public available software, were applied by the participating groups. The following tools were used: Ingenuity Pathway Analysis, MAPPFinder, LIMMA, GOstats, GOEAST, GOTM, Globaltest, TopGO, ArrayUnlock, Pathway Studio, GIST and AnnotationDbi. The main focus of the approaches was to utilise the relation between probes/genes and their gene ontology and pathways to interpret the affected probes/genes. The lack of a well-annotated chicken genome did though limit the possibilities to fully explore the tools. The main results from these analyses showed that the biological interpretation is highly dependent on the statistical method used but that some common biological conclusions could be reached. Conclusion It is highly recommended to test different analytical methods on the same data set and compare the results to obtain a reliable biological interpretation of the affected genes in a DNA microarray experiment.
A statistical test for outlier identification in data envelopment analysis
Directory of Open Access Journals (Sweden)
Morteza Khodabin
2010-09-01
Full Text Available In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the presented method, each observation is deleted from the sample once and the resulting linear program is solved, leading to a distribution of efficiency estimates. Based on the achieved distribution, a pared test is designed to identify the potential outlier(s. We illustrate the method through a real data set. The method could be used in a first step, as an exploratory data analysis, before using any frontier estimation.
Association testing for next-generation sequencing data using score statistics
DEFF Research Database (Denmark)
Skotte, Line; Korneliussen, Thorfinn Sand; Albrechtsen, Anders
2012-01-01
computationally feasible due to the use of score statistics. As part of the joint likelihood, we model the distribution of the phenotypes using a generalized linear model framework, which works for both quantitative and discrete phenotypes. Thus, the method presented here is applicable to case-control studies...... of genotype calls into account have been proposed; most require numerical optimization which for large-scale data is not always computationally feasible. We show that using a score statistic for the joint likelihood of observed phenotypes and observed sequencing data provides an attractive approach...... to association testing for next-generation sequencing data. The joint model accounts for the genotype classification uncertainty via the posterior probabilities of the genotypes given the observed sequencing data, which gives the approach higher power than methods based on called genotypes. This strategy remains...
Statistical methods and computing for big data
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay. PMID:27695593
Statistical methods and computing for big data.
Wang, Chun; Chen, Ming-Hui; Schifano, Elizabeth; Wu, Jing; Yan, Jun
2016-01-01
Big data are data on a massive scale in terms of volume, intensity, and complexity that exceed the capacity of standard analytic tools. They present opportunities as well as challenges to statisticians. The role of computational statisticians in scientific discovery from big data analyses has been under-recognized even by peer statisticians. This article summarizes recent methodological and software developments in statistics that address the big data challenges. Methodologies are grouped into three classes: subsampling-based, divide and conquer, and online updating for stream data. As a new contribution, the online updating approach is extended to variable selection with commonly used criteria, and their performances are assessed in a simulation study with stream data. Software packages are summarized with focuses on the open source R and R packages, covering recent tools that help break the barriers of computer memory and computing power. Some of the tools are illustrated in a case study with a logistic regression for the chance of airline delay.
Statistical data for the tensile properties of natural fibre composites
Directory of Open Access Journals (Sweden)
J.P. Torres
2017-06-01
Full Text Available This article features a large statistical database on the tensile properties of natural fibre reinforced composite laminates. The data presented here corresponds to a comprehensive experimental testing program of several composite systems including: different material constituents (epoxy and vinyl ester resins; flax, jute and carbon fibres, different fibre configurations (short-fibre mats, unidirectional, and plain, twill and satin woven fabrics and different fibre orientations (0°, 90°, and [0,90] angle plies. For each material, ~50 specimens were tested under uniaxial tensile loading. Here, we provide the complete set of stress–strain curves together with the statistical distributions of their calculated elastic modulus, strength and failure strain. The data is also provided as support material for the research article: “The mechanical properties of natural fibre composite laminates: A statistical study” [1].
Radar Derived Spatial Statistics of Summer Rain. Volume 2; Data Reduction and Analysis
Konrad, T. G.; Kropfli, R. A.
1975-01-01
Data reduction and analysis procedures are discussed along with the physical and statistical descriptors used. The statistical modeling techniques are outlined and examples of the derived statistical characterization of rain cells in terms of the several physical descriptors are presented. Recommendations concerning analyses which can be pursued using the data base collected during the experiment are included.
HOW TO SELECT APPROPRIATE STATISTICAL TEST IN SCIENTIFIC ARTICLES
Directory of Open Access Journals (Sweden)
Vladimir TRAJKOVSKI
2016-09-01
Full Text Available Statistics is mathematical science dealing with the collection, analysis, interpretation, and presentation of masses of numerical data in order to draw relevant conclusions. Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. The students and young researchers in biomedical sciences and in special education and rehabilitation often declare that they have chosen to enroll that study program because they have lack of knowledge or interest in mathematics. This is a sad statement, but there is much truth in it. The aim of this editorial is to help young researchers to select statistics or statistical techniques and statistical software appropriate for the purposes and conditions of a particular analysis. The most important statistical tests are reviewed in the article. Knowing how to choose right statistical test is an important asset and decision in the research data processing and in the writing of scientific papers. Young researchers and authors should know how to choose and how to use statistical methods. The competent researcher will need knowledge in statistical procedures. That might include an introductory statistics course, and it most certainly includes using a good statistics textbook. For this purpose, there is need to return of Statistics mandatory subject in the curriculum of the Institute of Special Education and Rehabilitation at Faculty of Philosophy in Skopje. Young researchers have a need of additional courses in statistics. They need to train themselves to use statistical software on appropriate way.
A method for statistical comparison of data sets and its uses in analysis of nuclear physics data
International Nuclear Information System (INIS)
Bityukov, S.I.; Smirnova, V.V.; Krasnikov, N.V.; Maksimushkina, A.V.; Nikitenko, A.N.
2014-01-01
Authors propose a method for statistical comparison of two data sets. The method is based on the method of statistical comparison of histograms. As an estimator of quality of the decision made, it is proposed to use the value which it is possible to call the probability that the decision (data sets are various) is correct [ru
Statistical yearbook. 1995 Data available as of 30 June 1997. 42. ed.
International Nuclear Information System (INIS)
1997-01-01
This is the forty-second issue of the United Nations Statistical Yearbook, prepared by the Statistics Division, Department of Economic and Social Affairs of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1985-1994 or 1986-1995, using statistics available to the Statistics Division up to 30 June 1997. The Yearbook is based on data compiled by the Statistics Division from over 40 different international and national sources
Statistical yearbook. 1996. Data available as of 30 September 1988. 43 ed.
International Nuclear Information System (INIS)
1999-01-01
This is the forty-third issue of the United Nations Statistical Yearbook, prepared by the Statistics Division, Department of Economic and Social Affairs of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1986-1995 or 1987-1996, using statistics available to the Statistics Division up to 30 September 1998. The Yearbook is based on data compiled by the Statistics Division from over 40 different international and national sources
Analysis of biomarker data a practical guide
Looney, Stephen W
2015-01-01
A "how to" guide for applying statistical methods to biomarker data analysis Presenting a solid foundation for the statistical methods that are used to analyze biomarker data, Analysis of Biomarker Data: A Practical Guide features preferred techniques for biomarker validation. The authors provide descriptions of select elementary statistical methods that are traditionally used to analyze biomarker data with a focus on the proper application of each method, including necessary assumptions, software recommendations, and proper interpretation of computer output. In addition, the book discusses
Directory of Open Access Journals (Sweden)
Frank Pega
2013-01-01
Full Text Available Objective. Effectively addressing health disparities experienced by sexual minority populations requires high-quality official data on sexual orientation. We developed a conceptual framework of sexual orientation to improve the quality of sexual orientation data in New Zealand’s Official Statistics System. Methods. We reviewed conceptual and methodological literature, culminating in a draft framework. To improve the framework, we held focus groups and key-informant interviews with sexual minority stakeholders and producers and consumers of official statistics. An advisory board of experts provided additional guidance. Results. The framework proposes working definitions of the sexual orientation topic and measurement concepts, describes dimensions of the measurement concepts, discusses variables framing the measurement concepts, and outlines conceptual grey areas. Conclusion. The framework proposes standard definitions and concepts for the collection of official sexual orientation data in New Zealand. It presents a model for producers of official statistics in other countries, who wish to improve the quality of health data on their citizens.
Reducing bias in the analysis of counting statistics data
International Nuclear Information System (INIS)
Hammersley, A.P.; Antoniadis, A.
1997-01-01
In the analysis of counting statistics data it is common practice to estimate the variance of the measured data points as the data points themselves. This practice introduces a bias into the results of further analysis which may be significant, and under certain circumstances lead to false conclusions. In the case of normal weighted least squares fitting this bias is quantified and methods to avoid it are proposed. (orig.)
Data Model Performance in Data Warehousing
Rorimpandey, G. C.; Sangkop, F. I.; Rantung, V. P.; Zwart, J. P.; Liando, O. E. S.; Mewengkang, A.
2018-02-01
Data Warehouses have increasingly become important in organizations that have large amount of data. It is not a product but a part of a solution for the decision support system in those organizations. Data model is the starting point for designing and developing of data warehouses architectures. Thus, the data model needs stable interfaces and consistent for a longer period of time. The aim of this research is to know which data model in data warehousing has the best performance. The research method is descriptive analysis, which has 3 main tasks, such as data collection and organization, analysis of data and interpretation of data. The result of this research is discussed in a statistic analysis method, represents that there is no statistical difference among data models used in data warehousing. The organization can utilize four data model proposed when designing and developing data warehouse.
Conformity and statistical tolerancing
Leblond, Laurent; Pillet, Maurice
2018-02-01
Statistical tolerancing was first proposed by Shewhart (Economic Control of Quality of Manufactured Product, (1931) reprinted 1980 by ASQC), in spite of this long history, its use remains moderate. One of the probable reasons for this low utilization is undoubtedly the difficulty for designers to anticipate the risks of this approach. The arithmetic tolerance (worst case) allows a simple interpretation: conformity is defined by the presence of the characteristic in an interval. Statistical tolerancing is more complex in its definition. An interval is not sufficient to define the conformance. To justify the statistical tolerancing formula used by designers, a tolerance interval should be interpreted as the interval where most of the parts produced should probably be located. This tolerance is justified by considering a conformity criterion of the parts guaranteeing low offsets on the latter characteristics. Unlike traditional arithmetic tolerancing, statistical tolerancing requires a sustained exchange of information between design and manufacture to be used safely. This paper proposes a formal definition of the conformity, which we apply successively to the quadratic and arithmetic tolerancing. We introduce a concept of concavity, which helps us to demonstrate the link between tolerancing approach and conformity. We use this concept to demonstrate the various acceptable propositions of statistical tolerancing (in the space decentring, dispersion).
Quick Access: Find Statistical Data on the Internet.
Su, Di
1999-01-01
Provides an annotated list of Internet sources (World Wide Web, ftp, and gopher sites) for current and historical statistical business data, including selected interest rates, the Consumer Price Index, the Producer Price Index, foreign currency exchange rates, noon buying rates, per diem rates, the special drawing right, stock quotes, and mutual…
Statistical yearbook 1993. Data available as of 31 December 1994. 40 ed.
International Nuclear Information System (INIS)
1995-01-01
This is the fortieth issue of the United Nations Statistical Yearbook, prepared by the Statistical Division, Department for Economic and Social Information and Policy Analysis of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1983-1992 or 1984-1993, using statistics available to the Statistical Division up to 31 December 1994. The Yearbook is based on data compiled by the Statistical Division from over 40 different international and national sources
Statistical yearbook 1994. Data available as of 31 March 1996. 41 ed.
International Nuclear Information System (INIS)
1996-01-01
This is the forty-first issue of the United Nations Statistical Yearbook, prepared by the Statistics Division, Department for Economic and Social Information and Policy Analysis of the United Nations Secretariat, since 1948. The present issue contains series covering, in general, 1984-1993 or 1985-1994, using statistics available to the Statistics Division up to 31 December 1995. The Yearbook is based on data compiled by the Statistics Division from over 40 different international and national sources
SEISVIZ3D: Stereoscopic system for the representation of seismic data - Interpretation and Immersion
von Hartmann, Hartwig; Rilling, Stefan; Bogen, Manfred; Thomas, Rüdiger
2015-04-01
The seismic method is a valuable tool for getting 3D-images from the subsurface. Seismic data acquisition today is not only a topic for oil and gas exploration but is used also for geothermal exploration, inspections of nuclear waste sites and for scientific investigations. The system presented in this contribution may also have an impact on the visualization of 3D-data of other geophysical methods. 3D-seismic data can be displayed in different ways to give a spatial impression of the subsurface.They are a combination of individual vertical cuts, possibly linked to a cubical portion of the data volume, and the stereoscopic view of the seismic data. By these methods, the spatial perception for the structures and thus of the processes in the subsurface should be increased. Stereoscopic techniques are e. g. implemented in the CAVE and the WALL, both of which require a lot of space and high technical effort. The aim of the interpretation system shown here is stereoscopic visualization of seismic data at the workplace, i.e. at the personal workstation and monitor. The system was developed with following criteria in mind: • Fast rendering of large amounts of data so that a continuous view of the data when changing the viewing angle and the data section is possible, • defining areas in stereoscopic view to translate the spatial impression directly into an interpretation, • the development of an appropriate user interface, including head-tracking, for handling the increased degrees of freedom, • the possibility of collaboration, i.e. teamwork and idea exchange with the simultaneous viewing of a scene at remote locations. The possibilities offered by the use of a stereoscopic system do not replace a conventional interpretation workflow. Rather they have to be implemented into it as an additional step. The amplitude distribution of the seismic data is a challenge for the stereoscopic display because the opacity level and the scaling and selection of the data have to
Bayesian statistics applied to neutron activation data for reactor flux spectrum analysis
International Nuclear Information System (INIS)
Chiesa, Davide; Previtali, Ezio; Sisti, Monica
2014-01-01
Highlights: • Bayesian statistics to analyze the neutron flux spectrum from activation data. • Rigorous statistical approach for accurate evaluation of the neutron flux groups. • Cross section and activation data uncertainties included for the problem solution. • Flexible methodology applied to analyze different nuclear reactor flux spectra. • The results are in good agreement with the MCNP simulations of neutron fluxes. - Abstract: In this paper, we present a statistical method, based on Bayesian statistics, to analyze the neutron flux spectrum from the activation data of different isotopes. The experimental data were acquired during a neutron activation experiment performed at the TRIGA Mark II reactor of Pavia University (Italy) in four irradiation positions characterized by different neutron spectra. In order to evaluate the neutron flux spectrum, subdivided in energy groups, a system of linear equations, containing the group effective cross sections and the activation rate data, has to be solved. However, since the system’s coefficients are experimental data affected by uncertainties, a rigorous statistical approach is fundamental for an accurate evaluation of the neutron flux groups. For this purpose, we applied the Bayesian statistical analysis, that allows to include the uncertainties of the coefficients and the a priori information about the neutron flux. A program for the analysis of Bayesian hierarchical models, based on Markov Chain Monte Carlo (MCMC) simulations, was used to define the problem statistical model and solve it. The first analysis involved the determination of the thermal, resonance-intermediate and fast flux components and the dependence of the results on the Prior distribution choice was investigated to confirm the reliability of the Bayesian analysis. After that, the main resonances of the activation cross sections were analyzed to implement multi-group models with finer energy subdivisions that would allow to determine the
Statistical data on butane and kerosene in West Africa
International Nuclear Information System (INIS)
Masse, R.
1990-01-01
This book gives statistical, technical and economical informations on butane and kerosene used in West Africa in 1990. In a first part, informations on gas and gas using are given: market, energy efficiency, performance, safety, distribution, storage, transport and commercialization. Statistical data on petroleum and natural gas production or consumption are also described. Natural gas and petroleum reserves in Africa are also studied. In the second part, thirty country entries give an economic analysis of each african country. 21 figs., 19 tabs., 5 maps
Statistical analysis of hydrologic data for Yucca Mountain
International Nuclear Information System (INIS)
Rutherford, B.M.; Hall, I.J.; Peters, R.R.; Easterling, R.G.; Klavetter, E.A.
1992-02-01
The geologic formations in the unsaturated zone at Yucca Mountain are currently being studied as the host rock for a potential radioactive waste repository. Data from several drill holes have been collected to provide the preliminary information needed for planning site characterization for the Yucca Mountain Project. Hydrologic properties have been measured on the core samples and the variables analyzed here are thought to be important in the determination of groundwater travel times. This report presents a statistical analysis of four hydrologic variables: saturated-matrix hydraulic conductivity, maximum moisture content, suction head, and calculated groundwater travel time. It is important to modelers to have as much information about the distribution of values of these variables as can be obtained from the data. The approach taken in this investigation is to (1) identify regions at the Yucca Mountain site that, according to the data, are distinctly different; (2) estimate the means and variances within these regions; (3) examine the relationships among the variables; and (4) investigate alternative statistical methods that might be applicable when more data become available. The five different functional stratigraphic units at three different locations are compared and grouped into relatively homogeneous regions. Within these regions, the expected values and variances associated with core samples of different sizes are estimated. The results provide a rough estimate of the distribution of hydrologic variables for small core sections within each region
Electromagnetic SAMPO monitoring soundings at OLKILUOTO in 2012 with updated interpretations
International Nuclear Information System (INIS)
Korhonen, K.
2013-11-01
The Geological Survey of Finland (GTK) has carried out electromagnetic depth soundings annually at fixed stations at Olkiluoto since 2004 as part of a monitoring programme. The goal of the programme is to detect and monitor changes in the electrical properties of the bedrock above and in the vicinity of the ONKALO tunnel which will serve as a part of the future underground nuclear waste disposal facility. A new Sampo monitoring survey was carried out during October 2012. The survey plan of 2011 was slightly modified and 36 soundings at 16 measurement stations were carried out. The nominal coil separations of 200, 400, 500, 600 and 800 meters were used. Interpretations at eight selected stations were updated with the new data. The interpretations indicate consistent statistically significant changes. Annual increases in resistivity were detected at stations to the East of ONKALO while annual decreases in resistivity were detected to the West of ONKALO. However, these changes need to be considered keeping in mind the high degree of uncertainty associated with the data and their interpretations. (orig.)
The application of bayesian statistic in data fit processing
International Nuclear Information System (INIS)
Guan Xingyin; Li Zhenfu; Song Zhaohui
2010-01-01
The rationality and disadvantage of least squares fitting that is usually used in data processing is analyzed, and the theory and commonly method that Bayesian statistic is applied in data processing is shown in detail. As it is proved in analysis, Bayesian approach avoid the limitative hypothesis that least squares fitting has in data processing, and the result has traits that it is more scientific and more easily understood, may replace the least squares fitting to apply in data processing. (authors)
Michaela Kreyenfeld; Rembrandt D. Scholz; Frederik Peters; Ines Wlosnewski
2010-01-01
Until 2008, Germany’s vital statistics did not include information on the biological order of each birth. This resulted in a dearth of important demographic indicators, such as the mean age at first birth and the level of childlessness. Researchers have tried to fill this gap by generating order-specific birth rates from survey data, and by combining survey data with vital statistics. This paper takes a different approach by using hospital statistics on births to generate birth order-specific...
2010-05-05
...] Guidance for Industry on Documenting Statistical Analysis Programs and Data Files; Availability AGENCY... documenting statistical analyses and data files submitted to the Center for Veterinary Medicine (CVM) for the... on Documenting Statistical Analysis Programs and Data Files; Availability'' giving interested persons...
A spatial scan statistic for survival data based on Weibull distribution.
Bhatt, Vijaya; Tiwari, Neeraj
2014-05-20
The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.
Which statistics should tropical biologists learn?
Loaiza Velásquez, Natalia; González Lutz, María Isabel; Monge-Nájera, Julián
2011-09-01
Tropical biologists study the richest and most endangered biodiversity in the planet, and in these times of climate change and mega-extinctions, the need for efficient, good quality research is more pressing than in the past. However, the statistical component in research published by tropical authors sometimes suffers from poor quality in data collection; mediocre or bad experimental design and a rigid and outdated view of data analysis. To suggest improvements in their statistical education, we listed all the statistical tests and other quantitative analyses used in two leading tropical journals, the Revista de Biología Tropical and Biotropica, during a year. The 12 most frequent tests in the articles were: Analysis of Variance (ANOVA), Chi-Square Test, Student's T Test, Linear Regression, Pearson's Correlation Coefficient, Mann-Whitney U Test, Kruskal-Wallis Test, Shannon's Diversity Index, Tukey's Test, Cluster Analysis, Spearman's Rank Correlation Test and Principal Component Analysis. We conclude that statistical education for tropical biologists must abandon the old syllabus based on the mathematical side of statistics and concentrate on the correct selection of these and other procedures and tests, on their biological interpretation and on the use of reliable and friendly freeware. We think that their time will be better spent understanding and protecting tropical ecosystems than trying to learn the mathematical foundations of statistics: in most cases, a well designed one-semester course should be enough for their basic requirements.
Special study for the statistical evaluation of groundwater data trends. Final report
International Nuclear Information System (INIS)
1993-05-01
Analysis of trends over time in the concentrations of chemicals in groundwater at Uranium Mill Tailings Remedial Action (UMTRA) Project sites can provide valuable information for monitoring the performance of disposal cells and the effectiveness of groundwater restoration activities. Random variation in data may obscure real trends or may produce the illusion of a trend where none exists, so statistical methods are needed to reliably detect and estimate trends. Trend analysis includes both trend detection and estimation. Trend detection uses statistical hypothesis testing and provides a yes or no answer regarding the existence of a trend. Hypothesis tests try to reach a balance between false negative and false positive conclusions. To quantify the magnitude of a trend, estimation is required. This report presents the statistical concepts that are necessary for understanding trend analysis. The types of patterns most likely to occur in UMTRA data sets are emphasized. Two general approaches to analyzing data for trends are proposed and recommendations are given to assist UMTRA Project staff in selecting an appropriate method for their site data. Trend analysis is much more difficult when data contain values less than the reported laboratory detection limit. The complications that arise are explained. This report also discusses the impact of data collection procedures on statistical trend methods and offers recommendations to improve the efficiency of the methods and reduce sampling costs. Guidance for determining how many sampling rounds might be needed by statistical methods to detect trends of various magnitudes is presented. This information could be useful in planning site monitoring activities
A practical guide to scientific data analysis
Livingstone, David J
2009-01-01
Inspired by the author's need for practical guidance in the processes of data analysis, A Practical Guide to Scientific Data Analysis has been written as a statistical companion for the working scientist. This handbook of data analysis with worked examples focuses on the application of mathematical and statistical techniques and the interpretation of their results. Covering the most common statistical methods for examining and exploring relationships in data, the text includes extensive examples from a variety of scientific disciplines. The chapters are organised logically, from pl
Introduction to statistics and data analysis with exercises, solutions and applications in R
Heumann, Christian; Shalabh
2016-01-01
This introductory statistics textbook conveys the essential concepts and tools needed to develop and nurture statistical thinking. It presents descriptive, inductive and explorative statistical methods and guides the reader through the process of quantitative data analysis. In the experimental sciences and interdisciplinary research, data analysis has become an integral part of any scientific study. Issues such as judging the credibility of data, analyzing the data, evaluating the reliability of the obtained results and finally drawing the correct and appropriate conclusions from the results are vital. The text is primarily intended for undergraduate students in disciplines like business administration, the social sciences, medicine, politics, macroeconomics, etc. It features a wealth of examples, exercises and solutions with computer code in the statistical programming language R as well as supplementary material that will enable the reader to quickly adapt all methods to their own applications.
Computer processing of 14C data; statistical tests and corrections of data
International Nuclear Information System (INIS)
Obelic, B.; Planinic, J.
1977-01-01
The described computer program calculates the age of samples and performs statistical tests and corrections of data. Data are obtained from the proportional counter that measures anticoincident pulses per 20 minute intervals. After every 9th interval the counter measures total number of counts per interval. Input data are punched on cards. The output list contains input data schedule and the following results: mean CPM value, correction of CPM for normal pressure and temperature (NTP), sample age calculation based on 14 C half life of 5570 and 5730 years, age correction for NTP, dendrochronological corrections and the relative radiocarbon concentration. All results are given with one standard deviation. Input data test (Chauvenet's criterion), gas purity test, standard deviation test and test of the data processor are also included in the program. (author)
Statistical distributions as applied to environmental surveillance data
International Nuclear Information System (INIS)
Speer, D.R.; Waite, D.A.
1976-01-01
Application of normal, lognormal, and Weibull distributions to radiological environmental surveillance data was investigated for approximately 300 nuclide-medium-year-location combinations. The fit of data to distributions was compared through probability plotting (special graph paper provides a visual check) and W test calculations. Results show that 25% of the data fit the normal distribution, 50% fit the lognormal, and 90% fit the Weibull.Demonstration of how to plot each distribution shows that normal and lognormal distributions are comparatively easy to use while Weibull distribution is complicated and difficult to use. Although current practice is to use normal distribution statistics, normal fit the least number of data groups considered in this study
Use of the dynamic stiffness method to interpret experimental data from a nonlinear system
Tang, Bin; Brennan, M. J.; Gatti, G.
2018-05-01
The interpretation of experimental data from nonlinear structures is challenging, primarily because of dependency on types and levels of excitation, and coupling issues with test equipment. In this paper, the use of the dynamic stiffness method, which is commonly used in the analysis of linear systems, is used to interpret the data from a vibration test of a controllable compressed beam structure coupled to a test shaker. For a single mode of the system, this method facilitates the separation of mass, stiffness and damping effects, including nonlinear stiffness effects. It also allows the separation of the dynamics of the shaker from the structure under test. The approach needs to be used with care, and is only suitable if the nonlinear system has a response that is predominantly at the excitation frequency. For the structure under test, the raw experimental data revealed little about the underlying causes of the dynamic behaviour. However, the dynamic stiffness approach allowed the effects due to the nonlinear stiffness to be easily determined.
Statistical Physics in the Era of Big Data
Wang, Dashun
2013-01-01
With the wealth of data provided by a wide range of high-throughout measurement tools and technologies, statistical physics of complex systems is entering a new phase, impacting in a meaningful fashion a wide range of fields, from cell biology to computer science to economics. In this dissertation, by applying tools and techniques developed in…
On the statistical comparison of climate model output and climate data
International Nuclear Information System (INIS)
Solow, A.R.
1991-01-01
Some broad issues arising in the statistical comparison of the output of climate models with the corresponding climate data are reviewed. Particular attention is paid to the question of detecting climate change. The purpose of this paper is to review some statistical approaches to the comparison of the output of climate models with climate data. There are many statistical issues arising in such a comparison. The author will focus on some of the broader issues, although some specific methodological questions will arise along the way. One important potential application of the approaches discussed in this paper is the detection of climate change. Although much of the discussion will be fairly general, he will try to point out the appropriate connections to the detection question. 9 refs
On the statistical comparison of climate model output and climate data
International Nuclear Information System (INIS)
Solow, A.R.
1990-01-01
Some broad issues arising in the statistical comparison of the output of climate models with the corresponding climate data are reviewed. Particular attention is paid to the question of detecting climate change. The purpose of this paper is to review some statistical approaches to the comparison of the output of climate models with climate data. There are many statistical issues arising in such a comparison. The author will focus on some of the broader issues, although some specific methodological questions will arise along the way. One important potential application of the approaches discussed in this paper is the detection of climate change. Although much of the discussion will be fairly general, he will try to point out the appropriate connections to the detection question
International Nuclear Information System (INIS)
Nelson, L.A.
1991-01-01
The emphasis of the mission was the provision of training to the staff of the Department of Agriculture, Government of Thailand, in the analysis and interpretation of data from experiments concerning fertilizer applications in agriculture
Guner, Huseyin; Close, Patrick L; Cai, Wenxuan; Zhang, Han; Peng, Ying; Gregorich, Zachery R; Ge, Ying
2014-03-01
The rapid advancements in mass spectrometry (MS) instrumentation, particularly in Fourier transform (FT) MS, have made the acquisition of high-resolution and high-accuracy mass measurements routine. However, the software tools for the interpretation of high-resolution MS data are underdeveloped. Although several algorithms for the automatic processing of high-resolution MS data are available, there is still an urgent need for a user-friendly interface with functions that allow users to visualize and validate the computational output. Therefore, we have developed MASH Suite, a user-friendly and versatile software interface for processing high-resolution MS data. MASH Suite contains a wide range of features that allow users to easily navigate through data analysis, visualize complex high-resolution MS data, and manually validate automatically processed results. Furthermore, it provides easy, fast, and reliable interpretation of top-down, middle-down, and bottom-up MS data. MASH Suite is convenient, easily operated, and freely available. It can greatly facilitate the comprehensive interpretation and validation of high-resolution MS data with high accuracy and reliability.
Hierarchical modelling for the environmental sciences statistical methods and applications
Clark, James S
2006-01-01
New statistical tools are changing the way in which scientists analyze and interpret data and models. Hierarchical Bayes and Markov Chain Monte Carlo methods for analysis provide a consistent framework for inference and prediction where information is heterogeneous and uncertain, processes are complicated, and responses depend on scale. Nowhere are these methods more promising than in the environmental sciences.
Model-independent plot of dynamic PET data facilitates data interpretation and model selection.
Munk, Ole Lajord
2012-02-21
When testing new PET radiotracers or new applications of existing tracers, the blood-tissue exchange and the metabolism need to be examined. However, conventional plots of measured time-activity curves from dynamic PET do not reveal the inherent kinetic information. A novel model-independent volume-influx plot (vi-plot) was developed and validated. The new vi-plot shows the time course of the instantaneous distribution volume and the instantaneous influx rate. The vi-plot visualises physiological information that facilitates model selection and it reveals when a quasi-steady state is reached, which is a prerequisite for the use of the graphical analyses by Logan and Gjedde-Patlak. Both axes of the vi-plot have direct physiological interpretation, and the plot shows kinetic parameter in close agreement with estimates obtained by non-linear kinetic modelling. The vi-plot is equally useful for analyses of PET data based on a plasma input function or a reference region input function. The vi-plot is a model-independent and informative plot for data exploration that facilitates the selection of an appropriate method for data analysis. Copyright © 2011 Elsevier Ltd. All rights reserved.
Robbin, Alice
1981-01-01
In recent decades there has been a notable expansion of statistical data produced by the public and private sectors for administrative, research, policy and evaluation programs. This is due to advances in relatively inexpensive and efficient data collection and management of computer-readable statistical data. Corresponding changes have not occurred in the management of data collection, preservation, description and dissemination. As a result, the process by which data become accessible to so...
Statistical mechanics of learning: A variational approach for real data
International Nuclear Information System (INIS)
Malzahn, Doerthe; Opper, Manfred
2002-01-01
Using a variational technique, we generalize the statistical physics approach of learning from random examples to make it applicable to real data. We demonstrate the validity and relevance of our method by computing approximate estimators for generalization errors that are based on training data alone