WorldWideScience

Sample records for data analysis

  1. Data analysis workbench

    International Nuclear Information System (INIS)

    Goetz, A.; Gerring, M.; Svensson, O.; Brockhauser, S.

    2012-01-01

    Data Analysis Workbench (DAWB) is a new software tool being developed at the ESRF. Its goal is to provide a tool for both online data analysis which can be used on the beamlines and for offline data analysis which users can use during experiments or take home. The tool includes support for data visualization and work-flows. work-flows allow algorithms which exploit parallel architectures to be designed from existing high level modules for data analysis in combination with data collection. The workbench uses Passerelle as the work-flow engine and EDNA plug-ins for data analysis. Actors talking to Tango are used for sending commands to a limited set of hardware to start existing data collection algorithms. A Tango server allows work-flows to be executed from existing applications. There are scripting interfaces to Python, Javascript and SPEC. The current state at the ESRF is the workbench is in test on a selected number of beamlines. (authors)

  2. Practical data analysis

    CERN Document Server

    Cuesta, Hector

    2013-01-01

    Each chapter of the book quickly introduces a key 'theme' of Data Analysis, before immersing you in the practical aspects of each theme. You'll learn quickly how to perform all aspects of Data Analysis.Practical Data Analysis is a book ideal for home and small business users who want to slice & dice the data they have on hand with minimum hassle.

  3. Functional data analysis

    CERN Document Server

    Ramsay, J O

    1997-01-01

    Scientists today collect samples of curves and other functional observations. This monograph presents many ideas and techniques for such data. Included are expressions in the functional domain of such classics as linear regression, principal components analysis, linear modelling, and canonical correlation analysis, as well as specifically functional techniques such as curve registration and principal differential analysis. Data arising in real applications are used throughout for both motivation and illustration, showing how functional approaches allow us to see new things, especially by exploiting the smoothness of the processes generating the data. The data sets exemplify the wide scope of functional data analysis; they are drwan from growth analysis, meterology, biomechanics, equine science, economics, and medicine. The book presents novel statistical technology while keeping the mathematical level widely accessible. It is designed to appeal to students, to applied data analysts, and to experienced researc...

  4. CADDIS Volume 4. Data Analysis: Exploratory Data Analysis

    Science.gov (United States)

    Intro to exploratory data analysis. Overview of variable distributions, scatter plots, correlation analysis, GIS datasets. Use of conditional probability to examine stressor levels and impairment. Exploring correlations among multiple stressors.

  5. DataSHIELD: taking the analysis to the data, not the data to the analysis.

    Science.gov (United States)

    Gaye, Amadou; Marcon, Yannick; Isaeva, Julia; LaFlamme, Philippe; Turner, Andrew; Jones, Elinor M; Minion, Joel; Boyd, Andrew W; Newby, Christopher J; Nuotio, Marja-Liisa; Wilson, Rebecca; Butters, Oliver; Murtagh, Barnaby; Demir, Ipek; Doiron, Dany; Giepmans, Lisette; Wallace, Susan E; Budin-Ljøsne, Isabelle; Oliver Schmidt, Carsten; Boffetta, Paolo; Boniol, Mathieu; Bota, Maria; Carter, Kim W; deKlerk, Nick; Dibben, Chris; Francis, Richard W; Hiekkalinna, Tero; Hveem, Kristian; Kvaløy, Kirsti; Millar, Sean; Perry, Ivan J; Peters, Annette; Phillips, Catherine M; Popham, Frank; Raab, Gillian; Reischl, Eva; Sheehan, Nuala; Waldenberger, Melanie; Perola, Markus; van den Heuvel, Edwin; Macleod, John; Knoppers, Bartha M; Stolk, Ronald P; Fortier, Isabel; Harris, Jennifer R; Woffenbuttel, Bruce H R; Murtagh, Madeleine J; Ferretti, Vincent; Burton, Paul R

    2014-12-01

    Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a

  6. Fuzzy data analysis

    CERN Document Server

    Bandemer, Hans

    1992-01-01

    Fuzzy data such as marks, scores, verbal evaluations, imprecise observations, experts' opinions and grey tone pictures, are quite common. In Fuzzy Data Analysis the authors collect their recent results providing the reader with ideas, approaches and methods for processing such data when looking for sub-structures in knowledge bases for an evaluation of functional relationship, e.g. in order to specify diagnostic or control systems. The modelling presented uses ideas from fuzzy set theory and the suggested methods solve problems usually tackled by data analysis if the data are real numbers. Fuzzy Data Analysis is self-contained and is addressed to mathematicians oriented towards applications and to practitioners in any field of application who have some background in mathematics and statistics.

  7. An Array of Qualitative Data Analysis Tools: A Call for Data Analysis Triangulation

    Science.gov (United States)

    Leech, Nancy L.; Onwuegbuzie, Anthony J.

    2007-01-01

    One of the most important steps in the qualitative research process is analysis of data. The purpose of this article is to provide elements for understanding multiple types of qualitative data analysis techniques available and the importance of utilizing more than one type of analysis, thus utilizing data analysis triangulation, in order to…

  8. Statistical data analysis

    International Nuclear Information System (INIS)

    Hahn, A.A.

    1994-11-01

    The complexity of instrumentation sometimes requires data analysis to be done before the result is presented to the control room. This tutorial reviews some of the theoretical assumptions underlying the more popular forms of data analysis and presents simple examples to illuminate the advantages and hazards of different techniques

  9. Data-variant kernel analysis

    CERN Document Server

    Motai, Yuichi

    2015-01-01

    Describes and discusses the variants of kernel analysis methods for data types that have been intensely studied in recent years This book covers kernel analysis topics ranging from the fundamental theory of kernel functions to its applications. The book surveys the current status, popular trends, and developments in kernel analysis studies. The author discusses multiple kernel learning algorithms and how to choose the appropriate kernels during the learning phase. Data-Variant Kernel Analysis is a new pattern analysis framework for different types of data configurations. The chapters include

  10. Mastering Clojure data analysis

    CERN Document Server

    Rochester, Eric

    2014-01-01

    This book consists of a practical, example-oriented approach that aims to help you learn how to use Clojure for data analysis quickly and efficiently.This book is great for those who have experience with Clojure and who need to use it to perform data analysis. This book will also be hugely beneficial for readers with basic experience in data analysis and statistics.

  11. Conducting Qualitative Data Analysis: Qualitative Data Analysis as a Metaphoric Process

    Science.gov (United States)

    Chenail, Ronald J.

    2012-01-01

    In the second of a series of "how-to" essays on conducting qualitative data analysis, Ron Chenail argues the process can best be understood as a metaphoric process. From this orientation he suggests researchers follow Kenneth Burke's notion of metaphor and see qualitative data analysis as the analyst systematically considering the "this-ness" of…

  12. Analysis of event-mode data with Interactive Data Language

    International Nuclear Information System (INIS)

    De Young, P.A.; Hilldore, B.B.; Kiessel, L.M.; Peaslee, G.F.

    2003-01-01

    We have developed an analysis package for event-mode data based on Interactive Data Language (IDL) from Research Systems Inc. This high-level language is high speed, array oriented, object oriented, and has extensive visual (multi-dimensional plotting) and mathematical functions. We have developed a general framework, written in IDL, for the analysis of a variety of experimental data that does not require significant customization for each analysis. Unlike many traditional analysis package, spectra and gates are applied after data are read and are easily changed as analysis proceeds without rereading the data. The events are not sequentially processed into predetermined arrays subject to predetermined gates

  13. Bayesian data analysis for newcomers.

    Science.gov (United States)

    Kruschke, John K; Liddell, Torrin M

    2018-02-01

    This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.

  14. Abstract interfaces for data analysis - component architecture for data analysis tools

    International Nuclear Information System (INIS)

    Barrand, G.; Binko, P.; Doenszelmann, M.; Pfeiffer, A.; Johnson, A.

    2001-01-01

    The fast turnover of software technologies, in particular in the domain of interactivity (covering user interface and visualisation), makes it difficult for a small group of people to produce complete and polished software-tools before the underlying technologies make them obsolete. At the HepVis'99 workshop, a working group has been formed to improve the production of software tools for data analysis in HENP. Beside promoting a distributed development organisation, one goal of the group is to systematically design a set of abstract interfaces based on using modern OO analysis and OO design techniques. An initial domain analysis has come up with several categories (components) found in typical data analysis tools: Histograms, Ntuples, Functions, Vectors, Fitter, Plotter, analyzer and Controller. Special emphasis was put on reducing the couplings between the categories to a minimum, thus optimising re-use and maintainability of any component individually. The interfaces have been defined in Java and C++ and implementations exist in the form of libraries and tools using C++ (Anaphe/Lizard, OpenScientist) and Java (Java Analysis Studio). A special implementation aims at accessing the Java libraries (through their Abstract Interfaces) from C++. The authors give an overview of the architecture and design of the various components for data analysis as discussed in AIDA

  15. Accounting and Financial Data Analysis Data Mining Tools

    Directory of Open Access Journals (Sweden)

    Diana Elena Codreanu

    2011-05-01

    Full Text Available Computerized accounting systems in recent years have seen an increase in complexity due to thecompetitive economic environment but with the help of data analysis solutions such as OLAP and DataMining can be a multidimensional data analysis, can detect the fraud and can discover knowledge hidden indata, ensuring such information is useful for decision making within the organization. In the literature thereare many definitions for data mining but all boils down to same idea: the process takes place to extract newinformation from large data collections, information without the aid of data mining tools would be verydifficult to obtain. Information obtained by data mining process has the advantage that only respond to thequestion of what happens but at the same time argue and show why certain things are happening. In this paperwe wish to present advanced techniques for analysis and exploitation of data stored in a multidimensionaldatabase.

  16. Panel data analysis using EViews

    CERN Document Server

    Agung, I Gusti Ngurah

    2013-01-01

    A comprehensive and accessible guide to panel data analysis using EViews software This book explores the use of EViews software in creating panel data analysis using appropriate empirical models and real datasets. Guidance is given on developing alternative descriptive statistical summaries for evaluation and providing policy analysis based on pool panel data. Various alternative models based on panel data are explored, including univariate general linear models, fixed effect models and causal models, and guidance on the advantages and disadvantages of each one is given. Panel Data Analysis

  17. Virtual data in CMS analysis

    International Nuclear Information System (INIS)

    Arbree, A.

    2003-01-01

    The use of virtual data for enhancing the collaboration between large groups of scientists is explored in several ways: by defining ''virtual'' parameter spaces which can be searched and shared in an organized way by a collaboration of scientists in the course of their analysis; by providing a mechanism to log the provenance of results and the ability to trace them back to the various stages in the analysis of real or simulated data; by creating ''check points'' in the course of an analysis to permit collaborators to explore their own analysis branches by refining selections, improving the signal to background ratio, varying the estimation of parameters, etc.; by facilitating the audit of an analysis and the reproduction of its results by a different group, or in a peer review context. We describe a prototype for the analysis of data from the CMS experiment based on the virtual data system Chimera and the object-oriented data analysis framework ROOT. The Chimera system is used to chain together several steps in the analysis process including the Monte Carlo generation of data, the simulation of detector response, the reconstruction of physics objects and their subsequent analysis, histogramming and visualization using the ROOT framework

  18. Longitudinal categorical data analysis

    CERN Document Server

    Sutradhar, Brajendra C

    2014-01-01

    This is the first book in longitudinal categorical data analysis with parametric correlation models developed based on dynamic relationships among repeated categorical responses. This book is a natural generalization of the longitudinal binary data analysis to the multinomial data setup with more than two categories. Thus, unlike the existing books on cross-sectional categorical data analysis using log linear models, this book uses multinomial probability models both in cross-sectional and longitudinal setups. A theoretical foundation is provided for the analysis of univariate multinomial responses, by developing models systematically for the cases with no covariates as well as categorical covariates, both in cross-sectional and longitudinal setups. In the longitudinal setup, both stationary and non-stationary covariates are considered. These models have also been extended to the bivariate multinomial setup along with suitable covariates. For the inferences, the book uses the generalized quasi-likelihood as w...

  19. Data-base tools for enhanced analysis of TMX-U data

    International Nuclear Information System (INIS)

    Stewart, M.E.; Carter, M.R.; Casper, T.A.; Meyer, W.H.; Perkins, D.E.; Whitney, D.M.

    1986-01-01

    The authors use a commercial data-base software package to create several data-base products that enhance the ability of experimental physicists to analyze data from the TMX-U experiment. This software resides on a Dec-20 computer in M-Divisions's user service center (USC), where data can be analyzed separately from the main acquisition computers. When these data-base tools are combined with interactive data analysis programs, physicists can perform automated (batch-style) processing or interactive data analysis on the computers in the USC or on the supercomputers of the NMFECC, in addition to the normal processing done on the acquisition system. One data-base tool provides highly reduced data for searching and correlation analysis of several diagnostic signals for a single shot or many shots. A second data-base tool provides retrieval and storage of unreduced data for detailed analysis of one or more diagnostic signals. The authors report how these data-base tools form the core of an evolving off-line data-analysis environment on the USC computers

  20. Highdimensional data analysis

    CERN Document Server

    Cai, Tony

    2010-01-01

    Over the last few years, significant developments have been taking place in highdimensional data analysis, driven primarily by a wide range of applications in many fields such as genomics and signal processing. In particular, substantial advances have been made in the areas of feature selection, covariance estimation, classification and regression. This book intends to examine important issues arising from highdimensional data analysis to explore key ideas for statistical inference and prediction. It is structured around topics on multiple hypothesis testing, feature selection, regression, cla

  1. Exascale Data Analysis

    CERN Multimedia

    CERN. Geneva; Fitch, Blake

    2011-01-01

    Traditionaly, the primary role of supercomputers was to create data, primarily for simulation applications. Due to usage and technology trends, supercomputers are increasingly also used for data analysis. Some of this data is from simulations, but there is also a rapidly increasingly amount of real-world science and business data to be analyzed. We briefly overview Blue Gene and other current supercomputer architectures. We outline future architectures, up to the Exascale supercomputers expected in the 2020 time frame. We focus on the data analysis challenges and opportunites, especially those concerning Flash and other up-and-coming storage class memory. About the speakers Blake G. Fitch has been with IBM Research, Yorktown Heights, NY since 1987, mainly pursuing interests in parallel systems. He joined the Scalable Parallel Systems Group in 1990, contributing to research and development that culminated in the IBM scalable parallel system (SP*) product. His research interests have focused on applicatio...

  2. Python data analysis

    CERN Document Server

    Idris, Ivan

    2014-01-01

    This book is for programmers, scientists, and engineers who have knowledge of the Python language and know the basics of data science. It is for those who wish to learn different data analysis methods using Python and its libraries. This book contains all the basic ingredients you need to become an expert data analyst.

  3. Virtual Data in CMS Analysis

    CERN Document Server

    Arbree, A; Bourilkov, D; Cavanaugh, R J; Graham, G; Rodríguez, J; Wilde, M; Zhao, Y

    2003-01-01

    The use of virtual data for enhancing the collaboration between large groups of scientists is explored in several ways: - by defining ``virtual'' parameter spaces which can be searched and shared in an organized way by a collaboration of scientists in the course of their analysis - by providing a mechanism to log the provenance of results and the ability to trace them back to the various stages in the analysis of real or simulated data - by creating ``check points'' in the course of an analysis to permit collaborators to explore their own analysis branches by refining selections, improving the signal to background ratio, varying the estimation of parameters, etc. - by facilitating the audit of an analysis and the reproduction of its results by a different group, or in a peer review context. We describe a prototype for the analysis of data from the CMS experiment based on the virtual data system Chimera and the object-oriented data analysis framework ROOT. The Chimera system is used to chain together several s...

  4. Bayesian nonparametric data analysis

    CERN Document Server

    Müller, Peter; Jara, Alejandro; Hanson, Tim

    2015-01-01

    This book reviews nonparametric Bayesian methods and models that have proven useful in the context of data analysis. Rather than providing an encyclopedic review of probability models, the book’s structure follows a data analysis perspective. As such, the chapters are organized by traditional data analysis problems. In selecting specific nonparametric models, simpler and more traditional models are favored over specialized ones. The discussed methods are illustrated with a wealth of examples, including applications ranging from stylized examples to case studies from recent literature. The book also includes an extensive discussion of computational methods and details on their implementation. R code for many examples is included in on-line software pages.

  5. The data analysis handbook

    CERN Document Server

    Frank, IE

    1994-01-01

    Analyzing observed or measured data is an important step in applied sciences. The recent increase in computer capacity has resulted in a revolution both in data collection and data analysis. An increasing number of scientists, researchers and students are venturing into statistical data analysis; hence the need for more guidance in this field, which was previously dominated mainly by statisticians. This handbook fills the gap in the range of textbooks on data analysis. Written in a dictionary format, it will serve as a comprehensive reference book in a rapidly growing field. However, this book is more structured than an ordinary dictionary, where each entry is a separate, self-contained entity. The authors provide not only definitions and short descriptions, but also offer an overview of the different topics. Therefore, the handbook can also be used as a companion to textbooks for undergraduate or graduate courses. 1700 entries are given in alphabetical order grouped into 20 topics and each topic is organized...

  6. Data near processing support for climate data analysis

    Science.gov (United States)

    Kindermann, Stephan; Ehbrecht, Carsten; Hempelmann, Nils

    2016-04-01

    Climate data repositories grow in size exponentially. Scalable data near processing capabilities are required to meet future data analysis requirements and to replace current "data download and process at home" workflows and approaches. On one hand side, these processing capabilities should be accessible via standardized interfaces (e.g. OGC WPS), on the other side a large variety of processing tools, toolboxes and deployment alternatives have to be supported and maintained at the data/processing center. We present a community approach of a modular and flexible system supporting the development, deployment and maintenace of OGC-WPS based web processing services. This approach is organized in an open source github project (called "bird-house") supporting individual processing services ("birds", e.g. climate index calculations, model data ensemble calculations), which rely on basic common infrastructural components (e.g. installation and deployment recipes, analysis code dependencies management). To support easy deployment at data centers as well as home institutes (e.g. for testing and development) the system supports the management of the often very complex package dependency chain of climate data analysis packages as well as docker based packaging and installation. We present a concrete deployment scenario at the German Climate Computing Center (DKRZ). The DKRZ one hand side hosts a multi-petabyte climate archive which is integrated e.g. into the european ENES and worldwide ESGF data infrastructure, and on the other hand hosts an HPC center supporting (model) data production and data analysis. The deployment scenario also includes openstack based data cloud services to support data import and data distribution for bird-house based WPS web processing services. Current challenges for inter-institutionnal deployments of web processing services supporting the european and international climate modeling community as well as the climate impact community are highlighted

  7. Dynamic data analysis modeling data with differential equations

    CERN Document Server

    Ramsay, James

    2017-01-01

    This text focuses on the use of smoothing methods for developing and estimating differential equations following recent developments in functional data analysis and building on techniques described in Ramsay and Silverman (2005) Functional Data Analysis. The central concept of a dynamical system as a buffer that translates sudden changes in input into smooth controlled output responses has led to applications of previously analyzed data, opening up entirely new opportunities for dynamical systems. The technical level has been kept low so that those with little or no exposure to differential equations as modeling objects can be brought into this data analysis landscape. There are already many texts on the mathematical properties of ordinary differential equations, or dynamic models, and there is a large literature distributed over many fields on models for real world processes consisting of differential equations. However, a researcher interested in fitting such a model to data, or a statistician interested in...

  8. Abstract Interfaces for Data Analysis Component Architecture for Data Analysis Tools

    CERN Document Server

    Barrand, G; Dönszelmann, M; Johnson, A; Pfeiffer, A

    2001-01-01

    The fast turnover of software technologies, in particular in the domain of interactivity (covering user interface and visualisation), makes it difficult for a small group of people to produce complete and polished software-tools before the underlying technologies make them obsolete. At the HepVis '99 workshop, a working group has been formed to improve the production of software tools for data analysis in HENP. Beside promoting a distributed development organisation, one goal of the group is to systematically design a set of abstract interfaces based on using modern OO analysis and OO design techniques. An initial domain analysis has come up with several categories (components) found in typical data analysis tools: Histograms, Ntuples, Functions, Vectors, Fitter, Plotter, Analyzer and Controller. Special emphasis was put on reducing the couplings between the categories to a minimum, thus optimising re-use and maintainability of any component individually. The interfaces have been defined in Java and C++ and i...

  9. Beginning statistics with data analysis

    CERN Document Server

    Mosteller, Frederick; Rourke, Robert EK

    2013-01-01

    This introduction to the world of statistics covers exploratory data analysis, methods for collecting data, formal statistical inference, and techniques of regression and analysis of variance. 1983 edition.

  10. Workbook on data analysis

    International Nuclear Information System (INIS)

    Hopke, P.K.

    2000-01-01

    As a consequence of various IAEA programmes to sample airborne particulate matter and determine its elemental composition, the participating research groups are accumulating data on the composition of the atmospheric aerosol. It is necessary to consider ways in which these data can be utilized in order to be certain that the data obtained are correct and that the information then being transmitted to others who may make decisions based on such information is as representative and correct as possible. In order to both examine the validity of those data and extract appropriate information from them, it is necessary to utilize a variety of data analysis methods. The objective of this workbook is to provide a guide with examples of utilizing data analysis on airborne particle composition data using a spreadsheet program (EXCEL) and a personal computer based statistical package (StatGraphics)

  11. Workbook on data analysis

    Energy Technology Data Exchange (ETDEWEB)

    Hopke, P K [Department of Chemistry, Clarkson Univ., Potsdam, NY (United States)

    2000-07-01

    As a consequence of various IAEA programmes to sample airborne particulate matter and determine its elemental composition, the participating research groups are accumulating data on the composition of the atmospheric aerosol. It is necessary to consider ways in which these data can be utilized in order to be certain that the data obtained are correct and that the information then being transmitted to others who may make decisions based on such information is as representative and correct as possible. In order to both examine the validity of those data and extract appropriate information from them, it is necessary to utilize a variety of data analysis methods. The objective of this workbook is to provide a guide with examples of utilizing data analysis on airborne particle composition data using a spreadsheet program (EXCEL) and a personal computer based statistical package (StatGraphics)

  12. Qualitative data analysis: conceptual and practical considerations.

    Science.gov (United States)

    Liamputtong, Pranee

    2009-08-01

    Qualitative inquiry requires that collected data is organised in a meaningful way, and this is referred to as data analysis. Through analytic processes, researchers turn what can be voluminous data into understandable and insightful analysis. This paper sets out the different approaches that qualitative researchers can use to make sense of their data including thematic analysis, narrative analysis, discourse analysis and semiotic analysis and discusses the ways that qualitative researchers can analyse their data. I first discuss salient issues in performing qualitative data analysis, and then proceed to provide some suggestions on different methods of data analysis in qualitative research. Finally, I provide some discussion on the use of computer-assisted data analysis.

  13. SP mountain data analysis

    Science.gov (United States)

    Rawson, R. F.; Hamilton, R. E.; Liskow, C. L.; Dias, A. R.; Jackson, P. L.

    1981-01-01

    An analysis of synthetic aperture radar data of SP Mountain was undertaken to demonstrate the use of digital image processing techniques to aid in geologic interpretation of SAR data. These data were collected with the ERIM X- and L-band airborne SAR using like- and cross-polarizations. The resulting signal films were used to produce computer compatible tapes, from which four-channel imagery was generated. Slant range-to-ground range and range-azimuth-scale corrections were made in order to facilitate image registration; intensity corrections were also made. Manual interpretation of the imagery showed that L-band represented the geology of the area better than X-band. Several differences between the various images were also noted. Further digital analysis of the corrected data was done for enhancement purposes. This analysis included application of an MSS differencing routine and development of a routine for removal of relief displacement. It was found that accurate registration of the SAR channels is critical to the effectiveness of the differencing routine. Use of the relief displacement algorithm on the SP Mountain data demonstrated the feasibility of the technique.

  14. Expediting Scientific Data Analysis with Reorganization of Data

    Energy Technology Data Exchange (ETDEWEB)

    Byna, Surendra; Wu, Kesheng

    2013-08-19

    Data producers typically optimize the layout of data files to minimize the write time. In most cases, data analysis tasks read these files in access patterns different from the write patterns causing poor read performance. In this paper, we introduce Scientific Data Services (SDS), a framework for bridging the performance gap between writing and reading scientific data. SDS reorganizes data to match the read patterns of analysis tasks and enables transparent data reads from the reorganized data. We implemented a HDF5 Virtual Object Layer (VOL) plugin to redirect the HDF5 dataset read calls to the reorganized data. To demonstrate the effectiveness of SDS, we applied two parallel data organization techniques: a sort-based organization on a plasma physics data and a transpose-based organization on mass spectrometry imaging data. We also extended the HDF5 data access API to allow selection of data based on their values through a query interface, called SDS Query. We evaluated the execution time in accessing various subsets of data through existing HDF5 Read API and SDS Query. We showed that reading the reorganized data using SDS is up to 55X faster than reading the original data.

  15. DATA ANALYSIS BY SQL-MAPREDUCE PLATFORM

    Directory of Open Access Journals (Sweden)

    A. A. A. Dergachev

    2014-01-01

    Full Text Available The paper deals with the problems related to the usage of relational database management system (RDBMS, mainly in the analysis of large data content, including data analysis based on web services in the Internet. A solution of these problems can be represented as a web-oriented distributed system of the data analysis with the processor of service requests as an executive kernel. The functions of such system are similar to the functions of relational DBMS, only with the usage of web services. The processor of service requests is responsible for planning of data analysis web services calls and their execution. The efficiency of such web-oriented system depends on the efficiency of web services calls plan and their program implementation where the basic element is the facilities of analyzed data storage – relational DBMS. The main attention is given to extension of functionality of relational DBMS for the analysis of large data content, in particular, the perspective estimation of web services data analysis implementation on the basis of SQL/MapReduce platform. With a view of obtaining this result, analytical task was chosen as an application-oriented part, typical for data analysis in various social networks and web portals, based on data analysis of users’ attendance. In the practical part of this research the algorithm for planning of web services calls was implemented for application-oriented task solution. SQL/MapReduce platform efficiency is confirmed by experimental results that show the opportunity of effective application for data analysis web services.

  16. Learning Haskell data analysis

    CERN Document Server

    Church, James

    2015-01-01

    If you are a developer, analyst, or data scientist who wants to learn data analysis methods using Haskell and its libraries, then this book is for you. Prior experience with Haskell and a basic knowledge of data science will be beneficial.

  17. Analysis of neural data

    CERN Document Server

    Kass, Robert E; Brown, Emery N

    2014-01-01

    Continual improvements in data collection and processing have had a huge impact on brain research, producing data sets that are often large and complicated. By emphasizing a few fundamental principles, and a handful of ubiquitous techniques, Analysis of Neural Data provides a unified treatment of analytical methods that have become essential for contemporary researchers. Throughout the book ideas are illustrated with more than 100 examples drawn from the literature, ranging from electrophysiology, to neuroimaging, to behavior. By demonstrating the commonality among various statistical approaches the authors provide the crucial tools for gaining knowledge from diverse types of data. Aimed at experimentalists with only high-school level mathematics, as well as computationally-oriented neuroscientists who have limited familiarity with statistics, Analysis of Neural Data serves as both a self-contained introduction and a reference work.

  18. Computerized ECT data analysis system

    International Nuclear Information System (INIS)

    Miyake, Y.; Fukui, S.; Iwahashi, Y.; Matsumoto, M.; Koyama, K.

    1988-01-01

    For the analytical method of the eddy current testing (ECT) of steam generator tubes in nuclear power plants, the authors have developed the computerized ECT data analysis system using a large-scale computer with a high-resolution color graphic display. This system can store acquired ECT data up to 15 steam generators, and ECT data can be analyzed immediately on the monitor in dialogue communication with a computer. Analyzed results of ECT data are stored and registered in the data base. This system enables an analyst to perform sorting and collecting of data under various conditions and obtain the results automatically, and also to make a plan of tube repair works. This system has completed the test run, and has been used for data analysis at the annual inspection of domestic plants. This paper describes an outline, features and examples of the computerized eddy current data analysis system for steam generator tubes in PWR nuclear power plants

  19. Excel data analysis for dummies

    CERN Document Server

    Nelson, Stephen L

    2014-01-01

    Harness the power of Excel to discover what your numbers are hiding Excel Data Analysis For Dummies, 2nd Edition is the ultimate guide to getting the most out of your data. Veteran Dummies author Stephen L. Nelson guides you through the basic and not-so-basic features of Excel to help you discover the gems hidden in your rough data. From input, to analysis, to visualization, the book walks you through the steps that lead to superior data analysis. Excel is the number-one spreadsheet application, with ever-expanding capabilities. If you're only using it to balance the books, you're missing out

  20. On Survey Data Analysis in Corporate Finance

    OpenAIRE

    Serita, Toshio

    2008-01-01

    Recently, survey data analysis has emerged as a new method for testing hypotheses andfor clarifying the relative importance of different factors in corporate finance decisions. This paper investigates the advantages and drawbacks of survey data analysis, methodology of survey data analysis such as questionnaire design, and analytical methods for survey data, incomparison with traditional large sample analysis. We show that survey data analysis does not replace traditional large sample analysi...

  1. Reporting Data with "Over-the-Counter" Data Analysis Supports Increases Educators' Analysis Accuracy

    Science.gov (United States)

    Rankin, Jenny Grant

    2013-01-01

    There is extensive research on the benefits of making data-informed decisions to improve learning, but these benefits rely on the data being effectively interpreted. Despite educators' above-average intellect and education levels, there is evidence many educators routinely misinterpret student data. Data analysis problems persist even at districts…

  2. Collective Analysis of Qualitative Data

    DEFF Research Database (Denmark)

    Simonsen, Jesper; Friberg, Karin

    2014-01-01

    What. Many students and practitioners do not know how to systematically process qualitative data once it is gathered—at least not as a collective effort. This chapter presents two workshop techniques, affinity diagramming and diagnostic mapping, that support collective analysis of large amounts...... of qualitative data. Affinity diagramming is used to make collective analysis and interpretations of qualitative data to identify core problems that need to be addressed in the design process. Diagnostic mapping supports collective interpretation and description of these problems and how to intervene in them. We....... In particular, collective analysis can be used to identify, understand, and act on complex design problems that emerge, for example, after the introduction of new tech- nologies. Such problems might be hard to clarify, and the basis for the analysis often involves large amounts of unstructured qualitative data...

  3. Statistical analysis of medical data using SAS

    CERN Document Server

    Der, Geoff

    2005-01-01

    An Introduction to SASDescribing and Summarizing DataBasic InferenceScatterplots Correlation: Simple Regression and SmoothingAnalysis of Variance and CovarianceMultiple RegressionLogistic RegressionThe Generalized Linear ModelGeneralized Additive ModelsNonlinear Regression ModelsThe Analysis of Longitudinal Data IThe Analysis of Longitudinal Data II: Models for Normal Response VariablesThe Analysis of Longitudinal Data III: Non-Normal ResponseSurvival AnalysisAnalysis Multivariate Date: Principal Components and Cluster AnalysisReferences

  4. Mobile networks for biometric data analysis

    CERN Document Server

    Madrid, Natividad; Seepold, Ralf; Orcioni, Simone

    2016-01-01

    This book showcases new and innovative approaches to biometric data capture and analysis, focusing especially on those that are characterized by non-intrusiveness, reliable prediction algorithms, and high user acceptance. It comprises the peer-reviewed papers from the international workshop on the subject that was held in Ancona, Italy, in October 2014 and featured sessions on ICT for health care, biometric data in automotive and home applications, embedded systems for biometric data analysis, biometric data analysis: EMG and ECG, and ICT for gait analysis. The background to the book is the challenge posed by the prevention and treatment of common, widespread chronic diseases in modern, aging societies. Capture of biometric data is a cornerstone for any analysis and treatment strategy. The latest advances in sensor technology allow accurate data measurement in a non-intrusive way, and in many cases it is necessary to provide online monitoring and real-time data capturing to support a patient’s prevention pl...

  5. Data Analysis in Experimental Biomedical Research

    DEFF Research Database (Denmark)

    Markovich, Dmitriy

    This thesis covers two non-related topics in experimental biomedical research: data analysis in thrombin generation experiments (collaboration with Novo Nordisk A/S), and analysis of images and physiological signals in the context of neurovascular signalling and blood flow regulation in the brain...... to critically assess and compare obtained results. We reverse engineered the data analysis performed by CAT, a de facto standard assay in the field. This revealed a number of possibilities to improve its methods of data analysis. We found that experimental calibration data is described well with textbook...

  6. Open Data and Data Analysis Preservation Services for LHC Experiments

    CERN Document Server

    Cowton, J; Fokianos, P; Rueda, L; Herterich, P; Kunčar, J; Šimko, T; Smith, T

    2015-01-01

    In this paper we present newly launched services for open data and for long-term preservation and reuse of high-energy-physics data analyses based on the digital library software Invenio. We track the ”data continuum” practices through several progressive data analysis phases up to the final publication. The aim is to capture for subsequent generations all digital assets and associated knowledge inherent in the data analysis process, and to make a subset available rapidly to the public. The ultimate goal of the analysis preservation platform is to capture enough information about the processing steps in order to facilitate reproduction of an analysis even many years after its initial publication, permitting to extend the impact of preserved analyses through future revalidation and recasting services. A related ”open data” service was launched for the benefit of the general public.

  7. Functional and shape data analysis

    CERN Document Server

    Srivastava, Anuj

    2016-01-01

    This textbook for courses on function data analysis and shape data analysis describes how to define, compare, and mathematically represent shapes, with a focus on statistical modeling and inference. It is aimed at graduate students in analysis in statistics, engineering, applied mathematics, neuroscience, biology, bioinformatics, and other related areas. The interdisciplinary nature of the broad range of ideas covered—from introductory theory to algorithmic implementations and some statistical case studies—is meant to familiarize graduate students with an array of tools that are relevant in developing computational solutions for shape and related analyses. These tools, gleaned from geometry, algebra, statistics, and computational science, are traditionally scattered across different courses, departments, and disciplines; Functional and Shape Data Analysis offers a unified, comprehensive solution by integrating the registration problem into shape analysis, better preparing graduate students for handling fu...

  8. The Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT): Data Analysis and Visualization for Geoscience Data

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Dean [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Doutriaux, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Patchett, John [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Sean [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Shipman, Galen [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Miller, Ross [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Steed, Chad [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Krishnan, Harinarayan [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Silva, Claudio [NYU Polytechnic School of Engineering, New York, NY (United States); Chaudhary, Aashish [Kitware, Inc., Clifton Park, NY (United States); Bremer, Peer-Timo [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Pugmire, David [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Bethel, E. Wes [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Childs, Hank [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Prabhat, Mr. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States); Bauer, Andrew [Kitware, Inc., Clifton Park, NY (United States); Pletzer, Alexander [Tech-X Corp., Boulder, CO (United States); Poco, Jorge [NYU Polytechnic School of Engineering, New York, NY (United States); Ellqvist, Tommy [NYU Polytechnic School of Engineering, New York, NY (United States); Santos, Emanuele [Federal Univ. of Ceara, Fortaleza (Brazil); Potter, Gerald [NASA Johnson Space Center, Houston, TX (United States); Smith, Brian [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Maxwell, Thomas [NASA Johnson Space Center, Houston, TX (United States); Kindig, David [Tech-X Corp., Boulder, CO (United States); Koop, David [NYU Polytechnic School of Engineering, New York, NY (United States)

    2013-05-01

    To support interactive visualization and analysis of complex, large-scale climate data sets, UV-CDAT integrates a powerful set of scientific computing libraries and applications to foster more efficient knowledge discovery. Connected through a provenance framework, the UV-CDAT components can be loosely coupled for fast integration or tightly coupled for greater functionality and communication with other components. This framework addresses many challenges in the interactive visual analysis of distributed large-scale data for the climate community.

  9. Exploring functional data analysis and wavelet principal component analysis on ecstasy (MDMA wastewater data

    Directory of Open Access Journals (Sweden)

    Stefania Salvatore

    2016-07-01

    Full Text Available Abstract Background Wastewater-based epidemiology (WBE is a novel approach in drug use epidemiology which aims to monitor the extent of use of various drugs in a community. In this study, we investigate functional principal component analysis (FPCA as a tool for analysing WBE data and compare it to traditional principal component analysis (PCA and to wavelet principal component analysis (WPCA which is more flexible temporally. Methods We analysed temporal wastewater data from 42 European cities collected daily over one week in March 2013. The main temporal features of ecstasy (MDMA were extracted using FPCA using both Fourier and B-spline basis functions with three different smoothing parameters, along with PCA and WPCA with different mother wavelets and shrinkage rules. The stability of FPCA was explored through bootstrapping and analysis of sensitivity to missing data. Results The first three principal components (PCs, functional principal components (FPCs and wavelet principal components (WPCs explained 87.5-99.6 % of the temporal variation between cities, depending on the choice of basis and smoothing. The extracted temporal features from PCA, FPCA and WPCA were consistent. FPCA using Fourier basis and common-optimal smoothing was the most stable and least sensitive to missing data. Conclusion FPCA is a flexible and analytically tractable method for analysing temporal changes in wastewater data, and is robust to missing data. WPCA did not reveal any rapid temporal changes in the data not captured by FPCA. Overall the results suggest FPCA with Fourier basis functions and common-optimal smoothing parameter as the most accurate approach when analysing WBE data.

  10. Computer-assisted qualitative data analysis software.

    Science.gov (United States)

    Cope, Diane G

    2014-05-01

    Advances in technology have provided new approaches for data collection methods and analysis for researchers. Data collection is no longer limited to paper-and-pencil format, and numerous methods are now available through Internet and electronic resources. With these techniques, researchers are not burdened with entering data manually and data analysis is facilitated by software programs. Quantitative research is supported by the use of computer software and provides ease in the management of large data sets and rapid analysis of numeric statistical methods. New technologies are emerging to support qualitative research with the availability of computer-assisted qualitative data analysis software (CAQDAS).CAQDAS will be presented with a discussion of advantages, limitations, controversial issues, and recommendations for this type of software use.

  11. AGR-1 Thermocouple Data Analysis

    International Nuclear Information System (INIS)

    Einerson, Jeff

    2012-01-01

    This report documents an effort to analyze measured and simulated data obtained in the Advanced Gas Reactor (AGR) fuel irradiation test program conducted in the INL's Advanced Test Reactor (ATR) to support the Next Generation Nuclear Plant (NGNP) R and D program. The work follows up on a previous study (Pham and Einerson, 2010), in which statistical analysis methods were applied for AGR-1 thermocouple data qualification. The present work exercises the idea that, while recognizing uncertainties inherent in physics and thermal simulations of the AGR-1 test, results of the numerical simulations can be used in combination with the statistical analysis methods to further improve qualification of measured data. Additionally, the combined analysis of measured and simulation data can generate insights about simulation model uncertainty that can be useful for model improvement. This report also describes an experimental control procedure to maintain fuel target temperature in the future AGR tests using regression relationships that include simulation results. The report is organized into four chapters. Chapter 1 introduces the AGR Fuel Development and Qualification program, AGR-1 test configuration and test procedure, overview of AGR-1 measured data, and overview of physics and thermal simulation, including modeling assumptions and uncertainties. A brief summary of statistical analysis methods developed in (Pham and Einerson 2010) for AGR-1 measured data qualification within NGNP Data Management and Analysis System (NDMAS) is also included for completeness. Chapters 2-3 describe and discuss cases, in which the combined use of experimental and simulation data is realized. A set of issues associated with measurement and modeling uncertainties resulted from the combined analysis are identified. This includes demonstration that such a combined analysis led to important insights for reducing uncertainty in presentation of AGR-1 measured data (Chapter 2) and interpretation of

  12. AGR-1 Thermocouple Data Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Jeff Einerson

    2012-05-01

    This report documents an effort to analyze measured and simulated data obtained in the Advanced Gas Reactor (AGR) fuel irradiation test program conducted in the INL's Advanced Test Reactor (ATR) to support the Next Generation Nuclear Plant (NGNP) R&D program. The work follows up on a previous study (Pham and Einerson, 2010), in which statistical analysis methods were applied for AGR-1 thermocouple data qualification. The present work exercises the idea that, while recognizing uncertainties inherent in physics and thermal simulations of the AGR-1 test, results of the numerical simulations can be used in combination with the statistical analysis methods to further improve qualification of measured data. Additionally, the combined analysis of measured and simulation data can generate insights about simulation model uncertainty that can be useful for model improvement. This report also describes an experimental control procedure to maintain fuel target temperature in the future AGR tests using regression relationships that include simulation results. The report is organized into four chapters. Chapter 1 introduces the AGR Fuel Development and Qualification program, AGR-1 test configuration and test procedure, overview of AGR-1 measured data, and overview of physics and thermal simulation, including modeling assumptions and uncertainties. A brief summary of statistical analysis methods developed in (Pham and Einerson 2010) for AGR-1 measured data qualification within NGNP Data Management and Analysis System (NDMAS) is also included for completeness. Chapters 2-3 describe and discuss cases, in which the combined use of experimental and simulation data is realized. A set of issues associated with measurement and modeling uncertainties resulted from the combined analysis are identified. This includes demonstration that such a combined analysis led to important insights for reducing uncertainty in presentation of AGR-1 measured data (Chapter 2) and interpretation of

  13. Integrative Analysis of Omics Big Data.

    Science.gov (United States)

    Yu, Xiang-Tian; Zeng, Tao

    2018-01-01

    The diversity and huge omics data take biology and biomedicine research and application into a big data era, just like that popular in human society a decade ago. They are opening a new challenge from horizontal data ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data ensemble (e.g., the different types of data collected for a group of person with match information), which requires the integrative analysis in biology and biomedicine and also asks for emergent development of data integration to address the great changes from previous population-guided to newly individual-guided investigations.Data integration is an effective concept to solve the complex problem or understand the complicate system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproducible way. Current integration approaches on biological data have two modes: one is "bottom-up integration" mode with follow-up manual integration, and the other one is "top-down integration" mode with follow-up in silico integration.This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on biological experiment design for effectively integrative study on genomics and then survey the data fusion approaches to give helpful instruction on computational model development for biological significance detection, which have also provided newly data resources and analysis tools to support the precision medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted for integrative analysis of omics big data.

  14. Data management, archiving, visualization and analysis of space physics data

    Science.gov (United States)

    Russell, C. T.

    1995-01-01

    A series of programs for the visualization and analysis of space physics data has been developed at UCLA. In the course of those developments, a number of lessons have been learned regarding data management and data archiving, as well as data analysis. The issues now facing those wishing to develop such software, as well as the lessons learned, are reviewed. Modern media have eased many of the earlier problems of the physical volume required to store data, the speed of access, and the permanence of the records. However, the ultimate longevity of these media is still a question of debate. Finally, while software development has become easier, cost is still a limiting factor in developing visualization and analysis software.

  15. Data Analysis and Data Mining: Current Issues in Biomedical Informatics

    Science.gov (United States)

    Bellazzi, Riccardo; Diomidous, Marianna; Sarkar, Indra Neil; Takabayashi, Katsuhiko; Ziegler, Andreas; McCray, Alexa T.

    2011-01-01

    Summary Background Medicine and biomedical sciences have become data-intensive fields, which, at the same time, enable the application of data-driven approaches and require sophisticated data analysis and data mining methods. Biomedical informatics provides a proper interdisciplinary context to integrate data and knowledge when processing available information, with the aim of giving effective decision-making support in clinics and translational research. Objectives To reflect on different perspectives related to the role of data analysis and data mining in biomedical informatics. Methods On the occasion of the 50th year of Methods of Information in Medicine a symposium was organized, that reflected on opportunities, challenges and priorities of organizing, representing and analysing data, information and knowledge in biomedicine and health care. The contributions of experts with a variety of backgrounds in the area of biomedical data analysis have been collected as one outcome of this symposium, in order to provide a broad, though coherent, overview of some of the most interesting aspects of the field. Results The paper presents sections on data accumulation and data-driven approaches in medical informatics, data and knowledge integration, statistical issues for the evaluation of data mining models, translational bioinformatics and bioinformatics aspects of genetic epidemiology. Conclusions Biomedical informatics represents a natural framework to properly and effectively apply data analysis and data mining methods in a decision-making context. In the future, it will be necessary to preserve the inclusive nature of the field and to foster an increasing sharing of data and methods between researchers. PMID:22146916

  16. Exact analysis of discrete data

    CERN Document Server

    Hirji, Karim F

    2005-01-01

    Researchers in fields ranging from biology and medicine to the social sciences, law, and economics regularly encounter variables that are discrete or categorical in nature. While there is no dearth of books on the analysis and interpretation of such data, these generally focus on large sample methods. When sample sizes are not large or the data are otherwise sparse, exact methods--methods not based on asymptotic theory--are more accurate and therefore preferable.This book introduces the statistical theory, analysis methods, and computation techniques for exact analysis of discrete data. After reviewing the relevant discrete distributions, the author develops the exact methods from the ground up in a conceptually integrated manner. The topics covered range from univariate discrete data analysis, a single and several 2 x 2 tables, a single and several 2 x K tables, incidence density and inverse sampling designs, unmatched and matched case -control studies, paired binary and trinomial response models, and Markov...

  17. 2nd European Conference on Data Analysis

    CERN Document Server

    Wilhelm, Adalbert FX

    2016-01-01

    This book offers a snapshot of the state-of-the-art in classification at the interface between statistics, computer science and application fields. The contributions span a broad spectrum, from theoretical developments to practical applications; they all share a strong computational component. The topics addressed are from the following fields: Statistics and Data Analysis; Machine Learning and Knowledge Discovery; Data Analysis in Marketing; Data Analysis in Finance and Economics; Data Analysis in Medicine and the Life Sciences; Data Analysis in the Social, Behavioural, and Health Care Sciences; Data Analysis in Interdisciplinary Domains; Classification and Subject Indexing in Library and Information Science. The book presents selected papers from the Second European Conference on Data Analysis, held at Jacobs University Bremen in July 2014. This conference unites diverse researchers in the pursuit of a common topic, creating truly unique synergies in the process.

  18. Grain-A Java data analysis system for Total Data Readout

    International Nuclear Information System (INIS)

    Rahkila, P.

    2008-01-01

    Grain is a data analysis system developed to be used with the novel Total Data Readout data acquisition system. In Total Data Readout all the electronics channels are read out asynchronously in singles mode and each data item is timestamped. Event building and analysis has to be done entirely in the software post-processing the data stream. A flexible and efficient event parser and the accompanying software system have been written entirely in Java. The design and implementation of the software are discussed along with experiences gained in running real-life experiments

  19. Chapter 8. Data Analysis

    Science.gov (United States)

    Lyman L. McDonald; Christina D. Vojta; Kevin S. McKelvey

    2013-01-01

    Perhaps the greatest barrier between monitoring and management is data analysis. Data languish in drawers and spreadsheets because those who collect or maintain monitoring data lack training in how to effectively summarize and analyze their findings. This chapter serves as a first step to surmounting that barrier by empowering any monitoring team with the basic...

  20. Project MOHAVE data analysis plan

    International Nuclear Information System (INIS)

    Watson, J.G.; Green, M.; Hoffer, T.E.; Lawson, D.R.; Pitchford, M.; Eatough, D.J.; Farber, R.J.; Malm, W.C.; McDade, C.E.

    1993-01-01

    Project MOHAVE is intended to develop ambient and source emissions data for use with source models, receptor models, and data analysis methods in order to explain the nature and causes of visibility degradation in the Grand Canyon. Approximately 50% of the modeling and data analysis effort will be directed toward understanding the contributions from the Mohave Power Project to haze in the Grand Canyon and other nearby Class areas; the remaining resources will be used to understand the contribution from other sources. The major goals of Project MOHAVE and data analysis are: to evaluate the measurement for applicability to modeling and data analysis activities; to describe the visibility, air quality and meteorology during the field study period and to determine the degree to which these measurements represent typical visibility events at the Grand Canyon; to further develop conceptual models of physical and chemical processes which affect visibility impairment at the Grand Canyon; to estimate the contributions from different emission sources to visibility impairment at the Grand Canyon, and to quantitatively evaluate the uncertainties of those estimates; to reconcile different scientific interpretations of the same data and to present this reconciliation to decision-makers. Several different approaches will be applied. Each approach will involve explicit examination of measurement uncertainties, compliance with implicit and explicit assumptions, and representativeness of the measurements. Scientific disagreements will be sought, expressed, explained, quantified, and presented. Data which can be used to verify methods will be withheld for independent evaluation of the validity of those methods. All assumptions will be stated and evaluated against reality. Data analysis results not supporting hypotheses will be presented with those results supporting the hypotheses. Uncertainty statements will be quantitative and consistent with decision-making needs

  1. Tornado detection data reduction and analysis

    Science.gov (United States)

    Davisson, L. D.

    1977-01-01

    Data processing and analysis was provided in support of tornado detection by analysis of radio frequency interference in various frequency bands. Sea state determination data from short pulse radar measurements were also processed and analyzed. A backscatter simulation was implemented to predict radar performance as a function of wind velocity. Computer programs were developed for the various data processing and analysis goals of the effort.

  2. Emotion Analysis on Social Big Data

    Institute of Scientific and Technical Information of China (English)

    REN Fuji; Kazuyuki Matsumoto

    2017-01-01

    In this paper, we describe a method of emotion analysis on social big data. Social big data means text data that is emerging on In-ternet social networking services.We collect multilingual web corpora and annotated emotion tags to these corpora for the purpose of emotion analysis. Because these data are constructed by manual annotation, their quality is high but their quantity is low. If we create an emotion analysis model based on this corpus with high quality and use the model for the analysis of social big data, we might be able to statistically analyze emotional sensesand behavior of the people in Internet communications, which we could not know before. In this paper, we create an emotion analysis model that integrate the high-quality emotion corpus and the automatic-constructed corpus that we created in our past studies, and then analyze a large-scale corpus consisting of Twitter tweets based on the model. As the result of time-series analysis on the large-scale corpus and the result of model evaluation, we show the effective-ness of our proposed method.

  3. Selected topics on data analysis in astronomy

    International Nuclear Information System (INIS)

    Scarsi, L.

    1987-01-01

    The contents of this book are: General Lectures Given at the Erice II Workshop on Data Analysis in Astronomy: Fundamentals in Data Analysis in Astronomy; Computational Techniques; Evolution of Architectures for Data Processing; Hardware for Graphics and Image Display; and Data Analysis Systems

  4. Roadside video data analysis deep learning

    CERN Document Server

    Verma, Brijesh; Stockwell, David

    2017-01-01

    This book highlights the methods and applications for roadside video data analysis, with a particular focus on the use of deep learning to solve roadside video data segmentation and classification problems. It describes system architectures and methodologies that are specifically built upon learning concepts for roadside video data processing, and offers a detailed analysis of the segmentation, feature extraction and classification processes. Lastly, it demonstrates the applications of roadside video data analysis including scene labelling, roadside vegetation classification and vegetation biomass estimation in fire risk assessment.

  5. NVivo 8 and consistency in data analysis: reflecting on the use of a qualitative data analysis program.

    Science.gov (United States)

    Bergin, Michael

    2011-01-01

    Qualitative data analysis is a complex process and demands clear thinking on the part of the analyst. However, a number of deficiencies may obstruct the research analyst during the process, leading to inconsistencies occurring. This paper is a reflection on the use of a qualitative data analysis program, NVivo 8, and its usefulness in identifying consistency and inconsistency during the coding process. The author was conducting a large-scale study of providers and users of mental health services in Ireland. He used NVivo 8 to store, code and analyse the data and this paper reflects some of his observations during the study. The demands placed on the analyst in trying to balance the mechanics of working through a qualitative data analysis program, while simultaneously remaining conscious of the value of all sources are highlighted. NVivo 8 as a qualitative data analysis program is a challenging but valuable means for advancing the robustness of qualitative research. Pitfalls can be avoided during analysis by running queries as the analyst progresses from tree node to tree node rather than leaving it to a stage whereby data analysis is well advanced.

  6. Analysis of irradiation disordering data

    Energy Technology Data Exchange (ETDEWEB)

    Schwartz, D L [Jet Propulsion Lab., Pasadena, CA (USA); Schwartz, D M

    1978-08-01

    The analysis of irradiation disordering data in ordered Ni/sub 3/Mn is discussed. An analytical expression relating observed irradiation induced magnetic changes in this material to the number of alternating site <110> replacements is derived. This expression is then employed to analyze previous experimental results. This analysis gives results which appear to be consistent with a previous Monte Carlo data analysis and indicates that the expected number of alternating site <110> replacements is 66.4 per 450 eV recoil.

  7. A Disciplined Architectural Approach to Scaling Data Analysis for Massive, Scientific Data

    Science.gov (United States)

    Crichton, D. J.; Braverman, A. J.; Cinquini, L.; Turmon, M.; Lee, H.; Law, E.

    2014-12-01

    Data collections across remote sensing and ground-based instruments in astronomy, Earth science, and planetary science are outpacing scientists' ability to analyze them. Furthermore, the distribution, structure, and heterogeneity of the measurements themselves pose challenges that limit the scalability of data analysis using traditional approaches. Methods for developing science data processing pipelines, distribution of scientific datasets, and performing analysis will require innovative approaches that integrate cyber-infrastructure, algorithms, and data into more systematic approaches that can more efficiently compute and reduce data, particularly distributed data. This requires the integration of computer science, machine learning, statistics and domain expertise to identify scalable architectures for data analysis. The size of data returned from Earth Science observing satellites and the magnitude of data from climate model output, is predicted to grow into the tens of petabytes challenging current data analysis paradigms. This same kind of growth is present in astronomy and planetary science data. One of the major challenges in data science and related disciplines defining new approaches to scaling systems and analysis in order to increase scientific productivity and yield. Specific needs include: 1) identification of optimized system architectures for analyzing massive, distributed data sets; 2) algorithms for systematic analysis of massive data sets in distributed environments; and 3) the development of software infrastructures that are capable of performing massive, distributed data analysis across a comprehensive data science framework. NASA/JPL has begun an initiative in data science to address these challenges. Our goal is to evaluate how scientific productivity can be improved through optimized architectural topologies that identify how to deploy and manage the access, distribution, computation, and reduction of massive, distributed data, while

  8. The ACIGA data analysis programme

    International Nuclear Information System (INIS)

    Scott, Susan M; Searle, Antony C; Cusack, Benedict J; McClelland, David E

    2004-01-01

    The data analysis programme of the Australian Consortium for Interferometric Gravitational Astronomy (ACIGA) was set up in 1998 by Scott to complement the then existing ACIGA programmes working on suspension systems, lasers and optics and detector configurations. The ACIGA data analysis programme continues to contribute significantly in the field; we present an overview of our activities

  9. Scientific data analysis on data-parallel platforms.

    Energy Technology Data Exchange (ETDEWEB)

    Ulmer, Craig D.; Bayer, Gregory W.; Choe, Yung Ryn; Roe, Diana C.

    2010-09-01

    As scientific computing users migrate to petaflop platforms that promise to generate multi-terabyte datasets, there is a growing need in the community to be able to embed sophisticated analysis algorithms in the computing platforms' storage systems. Data Warehouse Appliances (DWAs) are attractive for this work, due to their ability to store and process massive datasets efficiently. While DWAs have been utilized effectively in data-mining and informatics applications, they remain largely unproven in scientific workloads. In this paper we present our experiences in adapting two mesh analysis algorithms to function on five different DWA architectures: two Netezza database appliances, an XtremeData dbX database, a LexisNexis DAS, and multiple Hadoop MapReduce clusters. The main contribution of this work is insight into the differences between these DWAs from a user's perspective. In addition, we present performance measurements for ten DWA systems to help understand the impact of different architectural trade-offs in these systems.

  10. Analysis of successive data sets

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan; Breeuwer, Marcel; Haselhoff, Eltjo Hans

    2008-01-01

    The invention relates to the analysis of successive data sets. A local intensity variation is formed from such successive data sets, that is, from data values in successive data sets at corresponding positions in each of the data sets. A region of interest is localized in the individual data sets on

  11. Analysis of successive data sets

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan; Breeuwer, Marcel; Haselhoff, Eltjo Hans

    2002-01-01

    The invention relates to the analysis of successive data sets. A local intensity variation is formed from such successive data sets, that is, from data values in successive data sets at corresponding positions in each of the data sets. A region of interest is localized in the individual data sets on

  12. Analysis of Hydrologic Properties Data

    International Nuclear Information System (INIS)

    Liu, H.H.; Ahlers, C.F.

    2001-01-01

    The purpose of this Analysis/Model Report (AMR) is to describe the methods used to determine hydrologic properties based on the available field data from the unsaturated zone at Yucca Mountain, Nevada. This is in accordance with the AMR Development Plan (DP) for U0090 Analysis of Hydrologic Properties Data (CRWMS M and O 1999c). Fracture and matrix properties are developed by compiling and analyzing available survey data from the Exploratory Studies Facility (ESF), Cross Drift of Enhanced Characterization of Repository Block (ECRB), and/or boreholes; air injection testing data from surface boreholes and from boreholes in ESF; in-situ measurements of water potential; and data from laboratory testing of core samples

  13. Clinical trial data analysis using R

    National Research Council Canada - National Science Library

    Chen, Ding-Geng; Peace, Karl E

    2011-01-01

    .... Case studies demonstrate how to select the appropriate clinical trial data. The authors introduce the corresponding biostatistical analysis methods, followed by the step-by-step data analysis using R...

  14. 40 CFR 92.131 - Smoke, data analysis.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Smoke, data analysis. 92.131 Section... analysis. The following procedure shall be used to analyze the smoke test data: (a) Locate each throttle... performed by direct analysis of the recorder traces, or by computer analysis of data collected by automatic...

  15. 40 CFR 86.884-13 - Data analysis.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Data analysis. 86.884-13 Section 86... New Diesel Heavy-Duty Engines; Smoke Exhaust Test Procedure § 86.884-13 Data analysis. The following... linearity check may be performed by direct analysis of the recorder traces, or by computer analysis of data...

  16. European Conference on Data Analysis

    CERN Document Server

    Krolak-Schwerdt, Sabine; Böhmer, Matthias; Data Science, Learning by Latent Structures, and Knowledge Discovery; ECDA 2013

    2015-01-01

    This volume comprises papers dedicated to data science and the extraction of knowledge from many types of data: structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering, and pattern recognition methods; strategies for modeling complex data and mining large data sets; applications of advanced methods in specific domains of practice. The contributions offer interesting applications to various disciplines such as psychology, biology, medical and health sciences; economics, marketing, banking, and finance; engineering; geography and geology;  archeology, sociology, educational sciences, linguistics, and musicology; library science. The book contains the selected and peer-reviewed papers presented during the European Conference on Data Analysis (ECDA 2013) which was jointly held by the German Classification Society (GfKl) and the French-speaking Classification Society (SFC) in July 2013 at the University of Luxembourg.

  17. DataSHIELD : taking the analysis to the data, not the data to the analysis

    NARCIS (Netherlands)

    Gaye, Amadou; Marcon, Yannick; Isaeva, Julia; LaFlamme, Philippe; Turner, Andrew; Jones, Elinor M.; Minion, Joel; Boyd, Andrew W.; Newby, Christopher J.; Nuotio, Marja-Liisa; Wilson, Rebecca; Butters, Oliver; Murtagh, Barnaby; Demir, Ipek; Doiron, Dany; Giepmans, Lisette; Wallace, Susan E.; Budin-Ljosne, Isabelle; Schmidt, Carsten Oliver; Boffetta, Paolo; Boniol, Mathieu; Bota, Maria; Carter, Kim W.; deKlerk, Nick; Dibben, Chris; Francis, Richard W.; Hiekkalinna, Tero; Hveem, Kristian; Kvaloy, Kirsti; Millar, Sean; Perry, Ivan J.; Peters, Annette; Phillips, Catherine M.; Popham, Frank; Raab, Gillian; Reischl, Eva; Sheehan, Nuala; Waldenberger, Melanie; Perola, Markus; van den Heuvel, Edwin; Macleod, John; Knoppers, Bartha M.; Stolk, Ronald P.; Fortier, Isabel; Harris, Jennifer R.; Woffenbuttel, Bruce H. R.; Murtagh, Madeleine J.; Ferretti, Vincent; Burton, Paul R.

    2014-01-01

    Background: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises

  18. Nuclear data needs for material analysis

    International Nuclear Information System (INIS)

    Molnar, Gabor L.

    2001-01-01

    Nuclear data for material analysis using neutron-based methods are examined. Besides a critical review of the available data, emphasis is given to emerging application areas and new experimental techniques. Neutron scattering and reaction data, as well as decay data for delayed and prompt gamma activation analysis are all discussed in detail. Conclusions are formed concerning the need of new measurement, calculation, evaluation and dissemination activities. (author)

  19. Correspondence analysis of longitudinal data

    NARCIS (Netherlands)

    Van der Heijden, P.G.M.|info:eu-repo/dai/nl/073087998

    2005-01-01

    Correspondence analysis is an exploratory tool for the analysis of associations between categorical variables, the results of which may be displayed graphically. For longitudinal data with two time points, an analysis of the transition matrix (showing the relative frequencies for pairs of

  20. Data analysis using a data base driven graphics animation system

    International Nuclear Information System (INIS)

    Schwieder, D.H.; Stewart, H.D.; Curtis, J.N.

    1985-01-01

    A graphics animation system has been developed at the Idaho National Engineering Laboratory (INEL) to assist engineers in the analysis of large amounts of time series data. Most prior attempts at computer animation of data involve the development of large and expensive problem-specific systems. This paper discusses a generalized interactive computer animation system designed to be used in a wide variety of data analysis applications. By using relational data base storage of graphics and control information, considerable flexibility in design and development of animated displays is achieved

  1. Efficient Incremental Data Analysis

    OpenAIRE

    Nikolic, Milos

    2016-01-01

    Many data-intensive applications require real-time analytics over streaming data. In a growing number of domains -- sensor network monitoring, social web applications, clickstream analysis, high-frequency algorithmic trading, and fraud detections to name a few -- applications continuously monitor stream events to promptly react to certain data conditions. These applications demand responsive analytics even when faced with high volume and velocity of incoming changes, large numbers of users, a...

  2. Online Interactive Data Analysis of Multi-Sensor Data Using Giovanni

    Science.gov (United States)

    Berrick, S.; Leptoukh, G.; Liu, Z.; Rui, H.; Shen, S.; Teng, W.; Zhu, T.

    2005-12-01

    The goal of the GES-DISC Interactive Online Visualization and Analysis System (Giovanni) is to provide earth science users a means for performing data analysis on data in the Goddard Earth Sciences (GES) Distributed Active Archive Center (DAAC) without having to download the data. Through Giovanni, users are able to apply statistical analysis on many individual gridded global data products across multiple instruments and even inter-compare parameters from more than one instrument. Giovanni currently allows users to select a time window and a region of interest to generate many graphical output types including area plots (time-averaged), time-series (area-averaged), Hovmoller (latitude vs. time, longitude vs. time), and animations for area plots. A number of graphical output types are also available for parameter inter-comparisons. ASCII output is also available for those who want to apply their own analysis software. Using the knowledge gained from Giovanni, a user can minimize the amount of data they need to download while maximizing the amount of relevant content in those data. The design challenges of Giovanni are (1) to successfully balance a simple, intuitive Web interface with the complexity and heterogeneity of our data, (2) to have a simple and flexible configuration so that new data sets and parameters can be added and organized for particular user communities, (3) to be agnostic with respect to the analysis software and graphing software and, (4) scalability. In a short time, the original Giovanni (Giovanni 1) has grown from two instances to eight (Giovanni 2), each tailored for a specific user community. The demand, however, for Giovanni and its capabilities continues to increase and in order to meet those demands, a redesign effort of Giovanni, which we call Giovanni 3, is being undertaken.

  3. IAGA Geomagnetic Data Analysis format - Analysis_IAGA

    Science.gov (United States)

    -Emilian Toader, Victorin; Marmureanu, Alexandru

    2013-04-01

    Geomagnetic research involves a continuous Earth's magnetic field monitoring and software for processing large amounts of data. The Analysis_IAGA program reads and analyses files in IAGA2002 format used within the INTERMAGNET observer network. The data is made available by INTERMAGNET (http://www.intermagnet.org/Data_e.php) and NOAA - National Geophysical Data Center (ftp://ftp.ngdc.noaa.gov/wdc/geomagnetism/data/observatories/definitive) cost free for scientific use. The users of this software are those who study geomagnetism or use this data along with other atmospheric or seismic factors. Analysis_IAGA allows the visualization of files for the same station, with the feature of merging data for analyzing longer time intervals. Each file contains data collected within a 24 hour time interval with a sampling rate of 60 seconds or 1 second. Adding a large number of files may be done by dividing the sampling frequency. Also, the program has the feature of combining data files gathered from multiple stations as long as the sampling rate and time intervals are the same. Different channels may be selected, visualized and filtered individually. Channel properties can be saved and edited in a file. Data can be processed (spectral power, P / F, estimated frequency, Bz/Bx, Bz/By, convolutions and correlations on pairs of axis, discrete differentiation) and visualized along with the original signals on the same panel. With the help of cursors/magnifiers time differences can be calculated. Each channel can be analyzed separately. Signals can be filtered using bandpass, lowpass, highpass (Butterworth, Chebyshev, Inver Chebyshev, Eliptic, Bessel, Median, ZeroPath). Separate graphics visualize the spectral power, frequency spectrum histogram, the evolution of the estimated frequency, P/H, the spectral power. Adaptive JTFA spectrograms can be selected: CSD (Cone-Shaped Distribution), CWD (Choi-Williams Distribution), Gabor, STFT (short-time Fourier transform), WVD (Wigner

  4. Topological data analysis for scientific visualization

    CERN Document Server

    Tierny, Julien

    2017-01-01

    Combining theoretical and practical aspects of topology, this book delivers a comprehensive and self-contained introduction to topological methods for the analysis and visualization of scientific data. Theoretical concepts are presented in a thorough but intuitive manner, with many high-quality color illustrations. Key algorithms for the computation and simplification of topological data representations are described in details, and their application is carefully illustrated in a chapter dedicated to concrete use cases. With its fine balance between theory and practice, "Topological Data Analysis for Scientific Visualization" constitutes an appealing introduction to the increasingly important topic of topological data analysis, for lecturers, students and researchers.

  5. Columbia River Component Data Gap Analysis

    Energy Technology Data Exchange (ETDEWEB)

    L. C. Hulstrom

    2007-10-23

    This Data Gap Analysis report documents the results of a study conducted by Washington Closure Hanford (WCH) to compile and reivew the currently available surface water and sediment data for the Columbia River near and downstream of the Hanford Site. This Data Gap Analysis study was conducted to review the adequacy of the existing surface water and sediment data set from the Columbia River, with specific reference to the use of the data in future site characterization and screening level risk assessments.

  6. Privacy protected text analysis in DataSHIELD

    Directory of Open Access Journals (Sweden)

    Rebecca Wilson

    2017-04-01

    Whilst it is possible to analyse free text within a DataSHIELD infrastructure, the challenge is creating generalised and resilient anti-disclosure methods for free text analysis. There are a range of biomedical and health sciences applications for DataSHIELD methods of privacy protected analysis of free text including analysis of electronic health records and analysis of qualitative data e.g. from social media.

  7. Adaptive Analysis of Functional MRI Data

    International Nuclear Information System (INIS)

    Friman, Ola

    2003-01-01

    Functional Magnetic Resonance Imaging (fMRI) is a recently developed neuro-imaging technique with capacity to map neural activity with high spatial precision. To locate active brain areas, the method utilizes local blood oxygenation changes which are reflected as small intensity changes in a special type of MR images. The ability to non-invasively map brain functions provides new opportunities to unravel the mysteries and advance the understanding of the human brain, as well as to perform pre-surgical examinations in order to optimize surgical interventions. This dissertation introduces new approaches for the analysis of fMRI data. The detection of active brain areas is a challenging problem due to high noise levels and artifacts present in the data. A fundamental tool in the developed methods is Canonical Correlation Analysis (CCA). CCA is used in two novel ways. First as a method with the ability to fully exploit the spatio-temporal nature of fMRI data for detecting active brain areas. Established analysis approaches mainly focus on the temporal dimension of the data and they are for this reason commonly referred to as being mass-univariate. The new CCA detection method encompasses and generalizes the traditional mass-univariate methods and can in this terminology be viewed as a mass-multivariate approach. The concept of spatial basis functions is introduced as a spatial counterpart of the temporal basis functions already in use in fMRI analysis. The spatial basis functions implicitly perform an adaptive spatial filtering of the fMRI images, which significantly improves detection performance. It is also shown how prior information can be incorporated into the analysis by imposing constraints on the temporal and spatial models and a constrained version of CCA is devised to this end. A general Principal Component Analysis technique for generating and constraining temporal and spatial subspace models is proposed to be used in combination with the constrained CCA

  8. A Multimodal Data Analysis Approach for Targeted Drug Discovery Involving Topological Data Analysis (TDA).

    Science.gov (United States)

    Alagappan, Muthuraman; Jiang, Dadi; Denko, Nicholas; Koong, Albert C

    In silico drug discovery refers to a combination of computational techniques that augment our ability to discover drug compounds from compound libraries. Many such techniques exist, including virtual high-throughput screening (vHTS), high-throughput screening (HTS), and mechanisms for data storage and querying. However, presently these tools are often used independent of one another. In this chapter, we describe a new multimodal in silico technique for the hit identification and lead generation phases of traditional drug discovery. Our technique leverages the benefits of three independent methods-virtual high-throughput screening, high-throughput screening, and structural fingerprint analysis-by using a fourth technique called topological data analysis (TDA). We describe how a compound library can be independently tested with vHTS, HTS, and fingerprint analysis, and how the results can be transformed into a topological data analysis network to identify compounds from a diverse group of structural families. This process of using TDA or similar clustering methods to identify drug leads is advantageous because it provides a mechanism for choosing structurally diverse compounds while maintaining the unique advantages of already established techniques such as vHTS and HTS.

  9. Modeling data irregularities and structural complexities in data envelopment analysis

    CERN Document Server

    Zhu, Joe

    2007-01-01

    In a relatively short period of time, Data Envelopment Analysis (DEA) has grown into a powerful quantitative, analytical tool for measuring and evaluating performance. It has been successfully applied to a whole variety of problems in many different contexts worldwide. This book deals with the micro aspects of handling and modeling data issues in modeling DEA problems. DEA's use has grown with its capability of dealing with complex "service industry" and the "public service domain" types of problems that require modeling of both qualitative and quantitative data. This handbook treatment deals with specific data problems including: imprecise or inaccurate data; missing data; qualitative data; outliers; undesirable outputs; quality data; statistical analysis; software and other data aspects of modeling complex DEA problems. In addition, the book will demonstrate how to visualize DEA results when the data is more than 3-dimensional, and how to identify efficiency units quickly and accurately.

  10. Integrated analysis of genetic data with R

    Directory of Open Access Journals (Sweden)

    Zhao Jing

    2006-01-01

    Full Text Available Abstract Genetic data are now widely available. There is, however, an apparent lack of concerted effort to produce software systems for statistical analysis of genetic data compared with other fields of statistics. It is often a tremendous task for end-users to tailor them for particular data, especially when genetic data are analysed in conjunction with a large number of covariates. Here, R http://www.r-project.org, a free, flexible and platform-independent environment for statistical modelling and graphics is explored as an integrated system for genetic data analysis. An overview of some packages currently available for analysis of genetic data is given. This is followed by examples of package development and practical applications. With clear advantages in data management, graphics, statistical analysis, programming, internet capability and use of available codes, it is a feasible, although challenging, task to develop it into an integrated platform for genetic analysis; this will require the joint efforts of many researchers.

  11. Advanced Excel for scientific data analysis

    CERN Document Server

    De Levie, Robert

    2004-01-01

    Excel is by far the most widely distributed data analysis software but few users are aware of its full powers. Advanced Excel For Scientific Data Analysis takes off from where most books dealing with scientific applications of Excel end. It focuses on three areas-least squares, Fourier transformation, and digital simulation-and illustrates these with extensive examples, often taken from the literature. It also includes and describes a number of sample macros and functions to facilitate common data analysis tasks. These macros and functions are provided in uncompiled, computer-readable, easily

  12. Panel data analysis of cardiotocograph (CTG) data.

    Science.gov (United States)

    Horio, Hiroyuki; Kikuchi, Hitomi; Ikeda, Tomoaki

    2013-01-01

    Panel data analysis is a statistical method, widely used in econometrics, which deals with two-dimensional panel data collected over time and over individuals. Cardiotocograph (CTG) which monitors fetal heart rate (FHR) using Doppler ultrasound and uterine contraction by strain gage is commonly used in intrapartum treatment of pregnant women. Although the relationship between FHR waveform pattern and the outcome such as umbilical blood gas data at delivery has long been analyzed, there exists no accumulated FHR patterns from large number of cases. As time-series economic fluctuations in econometrics such as consumption trend has been studied using panel data which consists of time-series and cross-sectional data, we tried to apply this method to CTG data. The panel data composed of a symbolized segment of FHR pattern can be easily handled, and a perinatologist can get the whole FHR pattern view from the microscopic level of time-series FHR data.

  13. Haskell data analysis cookbook

    CERN Document Server

    Shukla, Nishant

    2014-01-01

    Step-by-step recipes filled with practical code samples and engaging examples demonstrate Haskell in practice, and then the concepts behind the code. This book shows functional developers and analysts how to leverage their existing knowledge of Haskell specifically for high-quality data analysis. A good understanding of data sets and functional programming is assumed.

  14. DataToText: A Consumer-Oriented Approach to Data Analysis

    Science.gov (United States)

    Kenny, David A.

    2010-01-01

    DataToText is a project developed where the user communicates the relevant information for an analysis and DataToText computer routine produces text output that describes in words, tables, and figures the results from the analyses. Two extended examples are given, one an example of a moderator analysis and the other an example of a dyadic data…

  15. Secondary data analysis of large data sets in urology: successes and errors to avoid.

    Science.gov (United States)

    Schlomer, Bruce J; Copp, Hillary L

    2014-03-01

    Secondary data analysis is the use of data collected for research by someone other than the investigator. In the last several years there has been a dramatic increase in the number of these studies being published in urological journals and presented at urological meetings, especially involving secondary data analysis of large administrative data sets. Along with this expansion, skepticism for secondary data analysis studies has increased for many urologists. In this narrative review we discuss the types of large data sets that are commonly used for secondary data analysis in urology, and discuss the advantages and disadvantages of secondary data analysis. A literature search was performed to identify urological secondary data analysis studies published since 2008 using commonly used large data sets, and examples of high quality studies published in high impact journals are given. We outline an approach for performing a successful hypothesis or goal driven secondary data analysis study and highlight common errors to avoid. More than 350 secondary data analysis studies using large data sets have been published on urological topics since 2008 with likely many more studies presented at meetings but never published. Nonhypothesis or goal driven studies have likely constituted some of these studies and have probably contributed to the increased skepticism of this type of research. However, many high quality, hypothesis driven studies addressing research questions that would have been difficult to conduct with other methods have been performed in the last few years. Secondary data analysis is a powerful tool that can address questions which could not be adequately studied by another method. Knowledge of the limitations of secondary data analysis and of the data sets used is critical for a successful study. There are also important errors to avoid when planning and performing a secondary data analysis study. Investigators and the urological community need to strive to use

  16. Advantages of Integrative Data Analysis for Developmental Research

    Science.gov (United States)

    Bainter, Sierra A.; Curran, Patrick J.

    2015-01-01

    Amid recent progress in cognitive development research, high-quality data resources are accumulating, and data sharing and secondary data analysis are becoming increasingly valuable tools. Integrative data analysis (IDA) is an exciting analytical framework that can enhance secondary data analysis in powerful ways. IDA pools item-level data across…

  17. GaggleBridge: collaborative data analysis.

    Science.gov (United States)

    Battke, Florian; Symons, Stephan; Herbig, Alexander; Nieselt, Kay

    2011-09-15

    Tools aiding in collaborative data analysis are becoming ever more important as researchers work together over long distances. We present an extension to the Gaggle framework, which has been widely adopted as a tool to enable data exchange between different analysis programs on one computer. Our program, GaggleBridge, transparently extends this functionality to allow data exchange between Gaggle users at different geographic locations using network communication. GaggleBridge can automatically set up SSH tunnels to traverse firewalls while adding some security features to the Gaggle communication. GaggleBridge is available as open-source software implemented in the Java language at http://it.inf.uni-tuebingen.de/gb. florian.battke@uni-tuebingen.de Supplementary data are available at Bioinformatics online.

  18. 40 CFR 51.366 - Data analysis and reporting.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 2 2010-07-01 2010-07-01 false Data analysis and reporting. 51.366... Requirements § 51.366 Data analysis and reporting. Data analysis and reporting are required to allow for..., including the results of an analysis of the registration data base; (ii) The percentage of motorist...

  19. Long-term Preservation of Data Analysis Capabilities

    Science.gov (United States)

    Gabriel, C.; Arviset, C.; Ibarra, A.; Pollock, A.

    2015-09-01

    While the long-term preservation of scientific data obtained by large astrophysics missions is ensured through science archives, the issue of data analysis software preservation has hardly been addressed. Efforts by large data centres have contributed so far to maintain some instrument or mission-specific data reduction packages on top of high-level general purpose data analysis software. However, it is always difficult to keep software alive without support and maintenance once the active phase of a mission is over. This is especially difficult in the budgetary model followed by space agencies. We discuss the importance of extending the lifetime of dedicated data analysis packages and review diverse strategies under development at ESA using new paradigms such as Virtual Machines, Cloud Computing, and Software as a Service for making possible full availability of data analysis and calibration software for decades at minimal cost.

  20. An Automated Data Analysis Tool for Livestock Market Data

    Science.gov (United States)

    Williams, Galen S.; Raper, Kellie Curry

    2011-01-01

    This article describes an automated data analysis tool that allows Oklahoma Cooperative Extension Service educators to disseminate results in a timely manner. Primary data collected at Oklahoma Quality Beef Network (OQBN) certified calf auctions across the state results in a large amount of data per sale site. Sale summaries for an individual sale…

  1. Common Data Format (CDF) and Coordinated Data Analysis Web (CDAWeb)

    Science.gov (United States)

    Candey, Robert M.

    2010-01-01

    The Coordinated Data Analysis Web (CDAWeb) data browsing system provides plotting, listing and open access v ia FTP, HTTP, and web services (REST, SOAP, OPeNDAP) for data from mo st NASA Heliophysics missions and is heavily used by the community. C ombining data from many instruments and missions enables broad resear ch analysis and correlation and coordination with other experiments a nd missions. Crucial to its effectiveness is the use of a standard se lf-describing data format, in this case, the Common Data Format (CDF) , also developed at the Space Physics Data facility , and the use of metadata standa rds (easily edited with SKTeditor ). CDAweb is based on a set of IDL routines, CDAWlib . . The CDF project also maintains soft ware and services for translating between many standard formats (CDF. netCDF, HDF, FITS, XML) .

  2. Robust statistics and geochemical data analysis

    International Nuclear Information System (INIS)

    Di, Z.

    1987-01-01

    Advantages of robust procedures over ordinary least-squares procedures in geochemical data analysis is demonstrated using NURE data from the Hot Springs Quadrangle, South Dakota, USA. Robust principal components analysis with 5% multivariate trimming successfully guarded the analysis against perturbations by outliers and increased the number of interpretable factors. Regression with SINE estimates significantly increased the goodness-of-fit of the regression and improved the correspondence of delineated anomalies with known uranium prospects. Because of the ubiquitous existence of outliers in geochemical data, robust statistical procedures are suggested as routine procedures to replace ordinary least-squares procedures

  3. TREX13 Data Analysis/Modeling

    Science.gov (United States)

    2018-03-29

    From: Dajun Tang, Principal Investigator Subj: ONR Grant# N00014-14-1-0239 & N00014-16-1-2371, “TREX 13 Data analysis /Modeling” Encl: (1) Final...Performance/ Technical Report with SF298 The attached enclosures constitute the final technical report for ONR Grant# N00014-14-1-0239 & N00014-16-1-2371...TREX 13 Data analysis /Modeling” cc: Grant & Contract Administrator, APL-UW Office of Sponsor Programs, UW ONR Seattle – Robert Rice and

  4. Data engineering systems: Computerized modeling and data bank capabilities for engineering analysis

    Science.gov (United States)

    Kopp, H.; Trettau, R.; Zolotar, B.

    1984-01-01

    The Data Engineering System (DES) is a computer-based system that organizes technical data and provides automated mechanisms for storage, retrieval, and engineering analysis. The DES combines the benefits of a structured data base system with automated links to large-scale analysis codes. While the DES provides the user with many of the capabilities of a computer-aided design (CAD) system, the systems are actually quite different in several respects. A typical CAD system emphasizes interactive graphics capabilities and organizes data in a manner that optimizes these graphics. On the other hand, the DES is a computer-aided engineering system intended for the engineer who must operationally understand an existing or planned design or who desires to carry out additional technical analysis based on a particular design. The DES emphasizes data retrieval in a form that not only provides the engineer access to search and display the data but also links the data automatically with the computer analysis codes.

  5. Wavelets in functional data analysis

    CERN Document Server

    Morettin, Pedro A; Vidakovic, Brani

    2017-01-01

    Wavelet-based procedures are key in many areas of statistics, applied mathematics, engineering, and science. This book presents wavelets in functional data analysis, offering a glimpse of problems in which they can be applied, including tumor analysis, functional magnetic resonance and meteorological data. Starting with the Haar wavelet, the authors explore myriad families of wavelets and how they can be used. High-dimensional data visualization (using Andrews' plots), wavelet shrinkage (a simple, yet powerful, procedure for nonparametric models) and a selection of estimation and testing techniques (including a discussion on Stein’s Paradox) make this a highly valuable resource for graduate students and experienced researchers alike.

  6. Enabling High-performance Interactive Geoscience Data Analysis Through Data Placement and Movement Optimization

    Science.gov (United States)

    Zhu, F.; Yu, H.; Rilee, M. L.; Kuo, K. S.; Yu, L.; Pan, Y.; Jiang, H.

    2017-12-01

    Since the establishment of data archive centers and the standardization of file formats, scientists are required to search metadata catalogs for data needed and download the data files to their local machines to carry out data analysis. This approach has facilitated data discovery and access for decades, but it inevitably leads to data transfer from data archive centers to scientists' computers through low-bandwidth Internet connections. Data transfer becomes a major performance bottleneck in such an approach. Combined with generally constrained local compute/storage resources, they limit the extent of scientists' studies and deprive them of timely outcomes. Thus, this conventional approach is not scalable with respect to both the volume and variety of geoscience data. A much more viable solution is to couple analysis and storage systems to minimize data transfer. In our study, we compare loosely coupled approaches (exemplified by Spark and Hadoop) and tightly coupled approaches (exemplified by parallel distributed database management systems, e.g., SciDB). In particular, we investigate the optimization of data placement and movement to effectively tackle the variety challenge, and boost the popularization of parallelization to address the volume challenge. Our goal is to enable high-performance interactive analysis for a good portion of geoscience data analysis exercise. We show that tightly coupled approaches can concentrate data traffic between local storage systems and compute units, and thereby optimizing bandwidth utilization to achieve a better throughput. Based on our observations, we develop a geoscience data analysis system that tightly couples analysis engines with storages, which has direct access to the detailed map of data partition locations. Through an innovation data partitioning and distribution scheme, our system has demonstrated scalable and interactive performance in real-world geoscience data analysis applications.

  7. Data Analysis Facility (DAF)

    Science.gov (United States)

    1991-01-01

    NASA-Dryden's Data Analysis Facility (DAF) provides a variety of support services to the entire Dryden community. It provides state-of-the-art hardware and software systems, available to any Dryden engineer for pre- and post-flight data processing and analysis, plus supporting all archival and general computer use. The Flight Data Access System (FDAS) is one of the advanced computer systems in the DAF, providing for fast engineering unit conversion and archival processing of flight data delivered from the Western Aeronautical Test Range. Engineering unit conversion and archival formatting of flight data is performed by the DRACO program on a Sun 690MP and an E-5000 computer. Time history files produced by DRACO are then moved to a permanent magneto-optical archive, where they are network-accessible 24 hours a day, 7 days a week. Pertinent information about the individual flights is maintained in a relational (Sybase) database. The DAF also houses all general computer services, including; the Compute Server 1 and 2 (CS1 and CS2), the server for the World Wide Web, overall computer operations support, courier service, a CD-ROM Writer system, a Technical Support Center, the NASA Dryden Phone System (NDPS), and Hardware Maintenance.

  8. An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

    Science.gov (United States)

    Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

    2016-01-01

    Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.

  9. Classification, (big) data analysis and statistical learning

    CERN Document Server

    Conversano, Claudio; Vichi, Maurizio

    2018-01-01

    This edited book focuses on the latest developments in classification, statistical learning, data analysis and related areas of data science, including statistical analysis of large datasets, big data analytics, time series clustering, integration of data from different sources, as well as social networks. It covers both methodological aspects as well as applications to a wide range of areas such as economics, marketing, education, social sciences, medicine, environmental sciences and the pharmaceutical industry. In addition, it describes the basic features of the software behind the data analysis results, and provides links to the corresponding codes and data sets where necessary. This book is intended for researchers and practitioners who are interested in the latest developments and applications in the field. The peer-reviewed contributions were presented at the 10th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in Santa Margherita di Pul...

  10. Survival analysis using S analysis of time-to-event data

    CERN Document Server

    Tableman, Mara

    2003-01-01

    Survival Analysis Using S: Analysis of Time-to-Event Data is designed as a text for a one-semester or one-quarter course in survival analysis for upper-level or graduate students in statistics, biostatistics, and epidemiology. Prerequisites are a standard pre-calculus first course in probability and statistics, and a course in applied linear regression models. No prior knowledge of S or R is assumed. A wide choice of exercises is included, some intended for more advanced students with a first course in mathematical statistics. The authors emphasize parametric log-linear models, while also detailing nonparametric procedures along with model building and data diagnostics. Medical and public health researchers will find the discussion of cut point analysis with bootstrap validation, competing risks and the cumulative incidence estimator, and the analysis of left-truncated and right-censored data invaluable. The bootstrap procedure checks robustness of cut point analysis and determines cut point(s). In a chapter ...

  11. NGNP Data Management and Analysis System Analysis and Web Delivery Capabilities

    Energy Technology Data Exchange (ETDEWEB)

    Cynthia D. Gentillon

    2011-09-01

    Projects for the Very High Temperature Reactor (VHTR) Technology Development Office provide data in support of Nuclear Regulatory Commission licensing of the very high temperature reactor. Fuel and materials to be used in the reactor are tested and characterized to quantify performance in high-temperature and high-fluence environments. The NGNP Data Management and Analysis System (NDMAS) at the Idaho National Laboratory has been established to ensure that VHTR data are (1) qualified for use, (2) stored in a readily accessible electronic form, and (3) analyzed to extract useful results. This document focuses on the third NDMAS objective. It describes capabilities for displaying the data in meaningful ways and for data analysis to identify useful relationships among the measured quantities. The capabilities are described from the perspective of NDMAS users, starting with those who just view experimental data and analytical results on the INL NDMAS web portal. Web display and delivery capabilities are described in detail. Also the current web pages that show Advanced Gas Reactor, Advanced Graphite Capsule, and High Temperature Materials test results are itemized. Capabilities available to NDMAS developers are more extensive, and are described using a second series of examples. Much of the data analysis efforts focus on understanding how thermocouple measurements relate to simulated temperatures and other experimental parameters. Statistical control charts and correlation monitoring provide an ongoing assessment of instrument accuracy. Data analysis capabilities are virtually unlimited for those who use the NDMAS web data download capabilities and the analysis software of their choice. Overall, the NDMAS provides convenient data analysis and web delivery capabilities for studying a very large and rapidly increasing database of well-documented, pedigreed data.

  12. VESUVIO Data Analysis Goes MANTID

    International Nuclear Information System (INIS)

    Jackson, S; Krzystyniak, M; Seel, A G; Gigg, M; Richards, S E; Fernandez-Alonso, F

    2014-01-01

    This paper describes ongoing efforts to implement the reduction and analysis of neutron Compton scattering data within the MANTID framework. Recently, extensive work has been carried out to integrate the bespoke data reduction and analysis routines written for VESUVIO with the MANTID framework. While the programs described in this document are designed to replicate the functionality of the Fortran and Genie routines already in use, most of them have been written from scratch and are not based on the original code base

  13. VESUVIO Data Analysis Goes MANTID

    Science.gov (United States)

    Jackson, S.; Krzystyniak, M.; Seel, A. G.; Gigg, M.; Richards, S. E.; Fernandez-Alonso, F.

    2014-12-01

    This paper describes ongoing efforts to implement the reduction and analysis of neutron Compton scattering data within the MANTID framework. Recently, extensive work has been carried out to integrate the bespoke data reduction and analysis routines written for VESUVIO with the MANTID framework. While the programs described in this document are designed to replicate the functionality of the Fortran and Genie routines already in use, most of them have been written from scratch and are not based on the original code base.

  14. An Analysis of the Climate Data Initiative's Data Collection

    Science.gov (United States)

    Ramachandran, R.; Bugbee, K.

    2015-12-01

    The Climate Data Initiative (CDI) is a broad multi-agency effort of the U.S. government that seeks to leverage the extensive existing federal climate-relevant data to stimulate innovation and private-sector entrepreneurship to support national climate-change preparedness. The CDI project is a systematic effort to manually curate and share openly available climate data from various federal agencies. To date, the CDI has curated seven themes, or topics, relevant to climate change resiliency. These themes include Coastal Flooding, Food Resilience, Water, Ecosystem Vulnerability, Human Health, Energy Infrastructure, and Transportation. Each theme was curated by subject matter experts who selected datasets relevant to the topic at hand. An analysis of the entire Climate Data Initiative data collection and the data curated for each theme offers insights into which datasets are considered most relevant in addressing climate resiliency. Other aspects of the data collection will be examined including which datasets were the most visited or popular and which datasets were the most sought after for curation by the theme teams. Results from the analysis of the CDI collection will be presented in this talk.

  15. Critical Data Analysis Precedes Soft Computing Of Medical Data

    DEFF Research Database (Denmark)

    Keyserlingk, Diedrich Graf von; Jantzen, Jan; Berks, G.

    2000-01-01

    extracted. The factors had different relationships (loadings) to the symptoms. Although the factors were gained only by computations, they seemed to express some modular features of the language disturbances. This phenomenon, that factors represent superior aspects of data, is well known in factor analysis...... the deficits in communication. Sets of symptoms corresponding to the traditional symptoms in Broca and Wernicke aphasia may be represented in the factors, but the factor itself does not represent a syndrome. It is assumed that this kind of data analysis shows a new approach to the understanding of language...

  16. Substituting missing data in compositional analysis

    Energy Technology Data Exchange (ETDEWEB)

    Real, Carlos, E-mail: carlos.real@usc.es [Area de Ecologia, Departamento de Biologia Celular y Ecologia, Escuela Politecnica Superior, Universidad de Santiago de Compostela, 27002 Lugo (Spain); Angel Fernandez, J.; Aboal, Jesus R.; Carballeira, Alejo [Area de Ecologia, Departamento de Biologia Celular y Ecologia, Facultad de Biologia, Universidad de Santiago de Compostela, 15782 Santiago de Compostela (Spain)

    2011-10-15

    Multivariate analysis of environmental data sets requires the absence of missing values or their substitution by small values. However, if the data is transformed logarithmically prior to the analysis, this solution cannot be applied because the logarithm of a small value might become an outlier. Several methods for substituting the missing values can be found in the literature although none of them guarantees that no distortion of the structure of the data set is produced. We propose a method for the assessment of these distortions which can be used for deciding whether to retain or not the samples or variables containing missing values and for the investigation of the performance of different substitution techniques. The method analyzes the structure of the distances among samples using Mantel tests. We present an application of the method to PCDD/F data measured in samples of terrestrial moss as part of a biomonitoring study. - Highlights: > Missing values in multivariate data sets must be substituted prior to analysis. > The substituted values can modify the structure of the data set. > We developed a method to estimate the magnitude of the alterations. > The method is simple and based on the Mantel test. > The method allowed the identification of problematic variables in a sample data set. - A method is presented for the assessment of the possible distortions in multivariate analysis caused by the substitution of missing values.

  17. A Proposed Data Fusion Architecture for Micro-Zone Analysis and Data Mining

    Energy Technology Data Exchange (ETDEWEB)

    Kevin McCarthy; Milos Manic

    2012-08-01

    Data Fusion requires the ability to combine or “fuse” date from multiple data sources. Time Series Analysis is a data mining technique used to predict future values from a data set based upon past values. Unlike other data mining techniques, however, Time Series places special emphasis on periodicity and how seasonal and other time-based factors tend to affect trends over time. One of the difficulties encountered in developing generic time series techniques is the wide variability of the data sets available for analysis. This presents challenges all the way from the data gathering stage to results presentation. This paper presents an architecture designed and used to facilitate the collection of disparate data sets well suited to Time Series analysis as well as other predictive data mining techniques. Results show this architecture provides a flexible, dynamic framework for the capture and storage of a myriad of dissimilar data sets and can serve as a foundation from which to build a complete data fusion architecture.

  18. CMS Data Analysis School Model

    CERN Document Server

    Malik, Sudhir; Cavanaugh, R; Bloom, K; Chan, Kai-Feng; D'Hondt, J; Klima, B; Narain, M; Palla, F; Rolandi, G; Schörner-Sadenius, T

    2014-01-01

    To impart hands-on training in physics analysis, CMS experiment initiated the  concept of CMS Data Analysis School (CMSDAS). It was born three years ago at the LPC (LHC Physics Center), Fermilab and is based on earlier workshops held at the LPC and CLEO Experiment. As CMS transitioned from construction to the data taking mode, the nature of earlier training also evolved to include more of analysis tools, software tutorials and physics analysis. This effort epitomized as CMSDAS has proven to be a key for the new and young physicists to jump start and contribute to the physics goals of CMS by looking for new physics with the collision data. With over 400 physicists trained in six CMSDAS around the globe , CMS is trying to  engage the collaboration discovery potential and maximize the physics output. As a bigger goal, CMS is striving to nurture and increase engagement of the myriad talents of CMS, in the development of physics, service, upgrade, education of those new to CMS and the caree...

  19. Post-Flight Data Analysis Tool

    Science.gov (United States)

    George, Marina

    2018-01-01

    A software tool that facilitates the retrieval and analysis of post-flight data. This allows our team and other teams to effectively and efficiently analyze and evaluate post-flight data in order to certify commercial providers.

  20. Factor analysis of multivariate data

    Digital Repository Service at National Institute of Oceanography (India)

    Fernandes, A.A.; Mahadevan, R.

    A brief introduction to factor analysis is presented. A FORTRAN program, which can perform the Q-mode and R-mode factor analysis and the singular value decomposition of a given data matrix is presented in Appendix B. This computer program, uses...

  1. NeoAnalysis: a Python-based toolbox for quick electrophysiological data processing and analysis.

    Science.gov (United States)

    Zhang, Bo; Dai, Ji; Zhang, Tao

    2017-11-13

    In a typical electrophysiological experiment, especially one that includes studying animal behavior, the data collected normally contain spikes, local field potentials, behavioral responses and other associated data. In order to obtain informative results, the data must be analyzed simultaneously with the experimental settings. However, most open-source toolboxes currently available for data analysis were developed to handle only a portion of the data and did not take into account the sorting of experimental conditions. Additionally, these toolboxes require that the input data be in a specific format, which can be inconvenient to users. Therefore, the development of a highly integrated toolbox that can process multiple types of data regardless of input data format and perform basic analysis for general electrophysiological experiments is incredibly useful. Here, we report the development of a Python based open-source toolbox, referred to as NeoAnalysis, to be used for quick electrophysiological data processing and analysis. The toolbox can import data from different data acquisition systems regardless of their formats and automatically combine different types of data into a single file with a standardized format. In cases where additional spike sorting is needed, NeoAnalysis provides a module to perform efficient offline sorting with a user-friendly interface. Then, NeoAnalysis can perform regular analog signal processing, spike train, and local field potentials analysis, behavioral response (e.g. saccade) detection and extraction, with several options available for data plotting and statistics. Particularly, it can automatically generate sorted results without requiring users to manually sort data beforehand. In addition, NeoAnalysis can organize all of the relevant data into an informative table on a trial-by-trial basis for data visualization. Finally, NeoAnalysis supports analysis at the population level. With the multitude of general-purpose functions provided

  2. A practical guide to scientific data analysis

    CERN Document Server

    Livingstone, David J

    2009-01-01

    Inspired by the author's need for practical guidance in the processes of data analysis, A Practical Guide to Scientific Data Analysis has been written as a statistical companion for the working scientist.  This handbook of data analysis with worked examples focuses on the application of mathematical and statistical techniques and the interpretation of their results. Covering the most common statistical methods for examining and exploring relationships in data, the text includes extensive examples from a variety of scientific disciplines. The chapters are organised logically, from pl

  3. Data analysis for the LISA Technology Package

    Energy Technology Data Exchange (ETDEWEB)

    Hewitson, M; Danzmann, K; Diepholz, I; GarcIa, A [Albert-Einstein-Institut, Max-Planck-Institut fuer Gravitationsphysik und Universitaet Hannover, 30167 Hannover (Germany); Armano, M; Fauste, J [European Space Agency, ESAC, Villanueva de la Canada, 28692 Madrid (Spain); Benedetti, M [Dipartimento di Ingegneria dei Materiali e Tecnologie Industriali, Universita di Trento and INFN, Gruppo Collegato di Trento, Mesiano, Trento (Italy); Bogenstahl, J [Department of Physics and Astronomy, University of Glasgow, Glasgow (United Kingdom); Bortoluzzi, D; Bosetti, P; Cristofolini, I [Dipartimento di Ingegneria Meccanica e Strutturale, Universita di Trento and INFN, Gruppo Collegato di Trento, Mesiano, Trento (Italy); Brandt, N [Astrium GmbH, 88039 Friedrichshafen (Germany); Cavalleri, A; Ciani, G; Dolesi, R; Ferraioli, L [Dipartimento di Fisica, Universita di Trento and INFN, Gruppo Collegato di Trento, 38050 Povo, Trento (Italy); Cruise, M [Department of Physics and Astronomy, University of Birmingham, Birmingham (United Kingdom); Fertin, D; GarcIa, C [European Space Agency, ESTEC, 2200 AG Noordwijk (Netherlands); Fichter, W, E-mail: martin.hewitson@aei.mpg.d [Institut fuer Flugmechanik und Flugregelung, 70569 Stuttgart (Germany)

    2009-05-07

    The LISA Technology Package (LTP) on board the LISA Pathfinder mission aims to demonstrate some key concepts for LISA which cannot be tested on ground. The mission consists of a series of preplanned experimental runs. The data analysis for each experiment must be designed in advance of the mission. During the mission, the analysis must be carried out promptly so that the results can be fed forward into subsequent experiments. As such a robust and flexible data analysis environment needs to be put in place. Since this software is used during mission operations and effects the mission timeline, it must be very robust and tested to a high degree. This paper presents the requirements, design and implementation of the data analysis environment (LTPDA) that will be used for analysing the data from LTP. The use of the analysis software to perform mock data challenges (MDC) is also discussed, and some highlights from the first MDC are presented.

  4. Data analysis for the LISA Technology Package

    International Nuclear Information System (INIS)

    Hewitson, M; Danzmann, K; Diepholz, I; GarcIa, A; Armano, M; Fauste, J; Benedetti, M; Bogenstahl, J; Bortoluzzi, D; Bosetti, P; Cristofolini, I; Brandt, N; Cavalleri, A; Ciani, G; Dolesi, R; Ferraioli, L; Cruise, M; Fertin, D; GarcIa, C; Fichter, W

    2009-01-01

    The LISA Technology Package (LTP) on board the LISA Pathfinder mission aims to demonstrate some key concepts for LISA which cannot be tested on ground. The mission consists of a series of preplanned experimental runs. The data analysis for each experiment must be designed in advance of the mission. During the mission, the analysis must be carried out promptly so that the results can be fed forward into subsequent experiments. As such a robust and flexible data analysis environment needs to be put in place. Since this software is used during mission operations and effects the mission timeline, it must be very robust and tested to a high degree. This paper presents the requirements, design and implementation of the data analysis environment (LTPDA) that will be used for analysing the data from LTP. The use of the analysis software to perform mock data challenges (MDC) is also discussed, and some highlights from the first MDC are presented.

  5. Quantitative Data Analysis--In the Graduate Curriculum

    Science.gov (United States)

    Albers, Michael J.

    2017-01-01

    A quantitative research study collects numerical data that must be analyzed to help draw the study's conclusions. Teaching quantitative data analysis is not teaching number crunching, but teaching a way of critical thinking for how to analyze the data. The goal of data analysis is to reveal the underlying patterns, trends, and relationships of a…

  6. Collecting operational event data for statistical analysis

    International Nuclear Information System (INIS)

    Atwood, C.L.

    1994-09-01

    This report gives guidance for collecting operational data to be used for statistical analysis, especially analysis of event counts. It discusses how to define the purpose of the study, the unit (system, component, etc.) to be studied, events to be counted, and demand or exposure time. Examples are given of classification systems for events in the data sources. A checklist summarizes the essential steps in data collection for statistical analysis

  7. EXPLORATORY DATA ANALYSIS AND MULTIVARIATE STRATEGIES FOR REVEALING MULTIVARIATE STRUCTURES IN CLIMATE DATA

    Directory of Open Access Journals (Sweden)

    2016-12-01

    Full Text Available This paper is on data analysis strategy in a complex, multidimensional, and dynamic domain. The focus is on the use of data mining techniques to explore the importance of multivariate structures; using climate variables which influences climate change. Techniques involved in data mining exercise vary according to the data structures. The multivariate analysis strategy considered here involved choosing an appropriate tool to analyze a process. Factor analysis is introduced into data mining technique in order to reveal the influencing impacts of factors involved as well as solving for multicolinearity effect among the variables. The temporal nature and multidimensionality of the target variables is revealed in the model using multidimensional regression estimates. The strategy of integrating the method of several statistical techniques, using climate variables in Nigeria was employed. R2 of 0.518 was obtained from the ordinary least square regression analysis carried out and the test was not significant at 5% level of significance. However, factor analysis regression strategy gave a good fit with R2 of 0.811 and the test was significant at 5% level of significance. Based on this study, model building should go beyond the usual confirmatory data analysis (CDA, rather it should be complemented with exploratory data analysis (EDA in order to achieve a desired result.

  8. Power analysis of trials with multilevel data

    CERN Document Server

    Moerbeek, Mirjam

    2015-01-01

    Power Analysis of Trials with Multilevel Data covers using power and sample size calculations to design trials that involve nested data structures. The book gives a thorough overview of power analysis that details terminology and notation, outlines key concepts of statistical power and power analysis, and explains why they are necessary in trial design. It guides you in performing power calculations with hierarchical data, which enables more effective trial design.The authors are leading experts in the field who recognize that power analysis has attracted attention from applied statisticians i

  9. Multiple-user data acquisition and analysis system

    International Nuclear Information System (INIS)

    Manzella, V.; Chrien, R.E.; Gill, R.L.; Liou, H.I.; Stelts, M.L.

    1981-01-01

    The nuclear physics program at the Brookhaven National Laboratory High Flux Beam Reactor (HFBR) employs a pair of PDP-11 computers for the dual functions of data acquisition and analysis. The data acquisition is accomplished through CAMAC and features a microprogrammed branch driver to accommodate various experimental inputs. The acquisition computer performs the functions of multi-channel analyzers, multiscaling and time-sequenced multichannel analyzers and gamma-ray coincidence analyzers. The data analysis computer is available for rapid processing of data tapes written by the acquisition computer. The ability to accommodate many users is facilitated by separating the data acquisition and analysis functions, and allowing each user to tailor the analysis to the specific requirements of his own experiment. The system is to be upgraded soon by the introduction of a dual port disk to allow a data base to be available to each computer

  10. Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era

    Science.gov (United States)

    Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul

    2014-05-01

    A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code. We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higher-quality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the

  11. Anticipated Changes in Conducting Scientific Data-Analysis Research in the Big-Data Era

    Science.gov (United States)

    Kuo, Kwo-Sen; Seablom, Michael; Clune, Thomas; Ramachandran, Rahul

    2014-01-01

    A Big-Data environment is one that is capable of orchestrating quick-turnaround analyses involving large volumes of data for numerous simultaneous users. Based on our experiences with a prototype Big-Data analysis environment, we anticipate some important changes in research behaviors and processes while conducting scientific data-analysis research in the near future as such Big-Data environments become the mainstream. The first anticipated change will be the reduced effort and difficulty in most parts of the data management process. A Big-Data analysis environment is likely to house most of the data required for a particular research discipline along with appropriate analysis capabilities. This will reduce the need for researchers to download local copies of data. In turn, this also reduces the need for compute and storage procurement by individual researchers or groups, as well as associated maintenance and management afterwards. It is almost certain that Big-Data environments will require a different "programming language" to fully exploit the latent potential. In addition, the process of extending the environment to provide new analysis capabilities will likely be more involved than, say, compiling a piece of new or revised code.We thus anticipate that researchers will require support from dedicated organizations associated with the environment that are composed of professional software engineers and data scientists. A major benefit will likely be that such extensions are of higherquality and broader applicability than ad hoc changes by physical scientists. Another anticipated significant change is improved collaboration among the researchers using the same environment. Since the environment is homogeneous within itself, many barriers to collaboration are minimized or eliminated. For example, data and analysis algorithms can be seamlessly shared, reused and re-purposed. In conclusion, we will be able to achieve a new level of scientific productivity in the Big-Data

  12. Data analysis with Mplus

    CERN Document Server

    Geiser, Christian

    2012-01-01

    A practical introduction to using Mplus for the analysis of multivariate data, this volume provides step-by-step guidance, complete with real data examples, numerous screen shots, and output excerpts. The author shows how to prepare a data set for import in Mplus using SPSS. He explains how to specify different types of models in Mplus syntax and address typical caveats--for example, assessing measurement invariance in longitudinal SEMs. Coverage includes path and factor analytic models as well as mediational, longitudinal, multilevel, and latent class models. Specific programming tips an

  13. Data Analysis Strategies in Medical Imaging.

    Science.gov (United States)

    Parmar, Chintan; Barry, Joseph D; Hosny, Ahmed; Quackenbush, John; Aerts, Hugo Jwl

    2018-03-26

    Radiographic imaging continues to be one of the most effective and clinically useful tools within oncology. Sophistication of artificial intelligence (AI) has allowed for detailed quantification of radiographic characteristics of tissues using predefined engineered algorithms or deep learning methods. Precedents in radiology as well as a wealth of research studies hint at the clinical relevance of these characteristics. However, there are critical challenges associated with the analysis of medical imaging data. While some of these challenges are specific to the imaging field, many others like reproducibility and batch effects are generic and have already been addressed in other quantitative fields such as genomics. Here, we identify these pitfalls and provide recommendations for analysis strategies of medical imaging data including data normalization, development of robust models, and rigorous statistical analyses. Adhering to these recommendations will not only improve analysis quality, but will also enhance precision medicine by allowing better integration of imaging data with other biomedical data sources. Copyright ©2018, American Association for Cancer Research.

  14. Intelligent data-acquisition instrumentation for special nuclear material assay data analysis

    International Nuclear Information System (INIS)

    Ethridge, C.D.

    1980-01-01

    The Detection, Surveillance, Verification, and Recovery Group of the Los Alamos Scientific Laboratory Energy Division/Nuclear Safeguards Programs is now utilizing intelligent data-acquisition instrumentation for assay data analysis of special nuclear material. The data acquisition and analysis are enabled by the incorporation of a number-crunching microprocessor sequenced by a single component microcomputer. Microcomputer firmware establishes the capability for processing the computation of several selected functions and also the ability of instrumentation self-diagnostics

  15. Substituting missing data in compositional analysis

    International Nuclear Information System (INIS)

    Real, Carlos; Angel Fernandez, J.; Aboal, Jesus R.; Carballeira, Alejo

    2011-01-01

    Multivariate analysis of environmental data sets requires the absence of missing values or their substitution by small values. However, if the data is transformed logarithmically prior to the analysis, this solution cannot be applied because the logarithm of a small value might become an outlier. Several methods for substituting the missing values can be found in the literature although none of them guarantees that no distortion of the structure of the data set is produced. We propose a method for the assessment of these distortions which can be used for deciding whether to retain or not the samples or variables containing missing values and for the investigation of the performance of different substitution techniques. The method analyzes the structure of the distances among samples using Mantel tests. We present an application of the method to PCDD/F data measured in samples of terrestrial moss as part of a biomonitoring study. - Highlights: → Missing values in multivariate data sets must be substituted prior to analysis. → The substituted values can modify the structure of the data set. → We developed a method to estimate the magnitude of the alterations. → The method is simple and based on the Mantel test. → The method allowed the identification of problematic variables in a sample data set. - A method is presented for the assessment of the possible distortions in multivariate analysis caused by the substitution of missing values.

  16. Advances in Risk Analysis with Big Data.

    Science.gov (United States)

    Choi, Tsan-Ming; Lambert, James H

    2017-08-01

    With cloud computing, Internet-of-things, wireless sensors, social media, fast storage and retrieval, etc., organizations and enterprises have access to unprecedented amounts and varieties of data. Current risk analysis methodology and applications are experiencing related advances and breakthroughs. For example, highway operations data are readily available, and making use of them reduces risks of traffic crashes and travel delays. Massive data of financial and enterprise systems support decision making under risk by individuals, industries, regulators, etc. In this introductory article, we first discuss the meaning of big data for risk analysis. We then examine recent advances in risk analysis with big data in several topic areas. For each area, we identify and introduce the relevant articles that are featured in the special issue. We conclude with a discussion on future research opportunities. © 2017 Society for Risk Analysis.

  17. Computer system for environmental sample analysis and data storage and analysis

    International Nuclear Information System (INIS)

    Brauer, F.P.; Fager, J.E.

    1976-01-01

    A mini-computer based environmental sample analysis and data storage system has been developed. The system is used for analytical data acquisition, computation, storage of analytical results, and tabulation of selected or derived results for data analysis, interpretation and reporting. This paper discussed the structure, performance and applications of the system

  18. QUAGOL: a guide for qualitative data analysis.

    Science.gov (United States)

    Dierckx de Casterlé, Bernadette; Gastmans, Chris; Bryon, Els; Denier, Yvonne

    2012-03-01

    Data analysis is a complex and contested part of the qualitative research process, which has received limited theoretical attention. Researchers are often in need of useful instructions or guidelines on how to analyze the mass of qualitative data, but face the lack of clear guidance for using particular analytic methods. The aim of this paper is to propose and discuss the Qualitative Analysis Guide of Leuven (QUAGOL), a guide that was developed in order to be able to truly capture the rich insights of qualitative interview data. The article describes six major problems researchers are often struggling with during the process of qualitative data analysis. Consequently, the QUAGOL is proposed as a guide to facilitate the process of analysis. Challenges emerged and lessons learned from own extensive experiences with qualitative data analysis within the Grounded Theory Approach, as well as from those of other researchers (as described in the literature), were discussed and recommendations were presented. Strengths and pitfalls of the proposed method were discussed in detail. The Qualitative Analysis Guide of Leuven (QUAGOL) offers a comprehensive method to guide the process of qualitative data analysis. The process consists of two parts, each consisting of five stages. The method is systematic but not rigid. It is characterized by iterative processes of digging deeper, constantly moving between the various stages of the process. As such, it aims to stimulate the researcher's intuition and creativity as optimal as possible. The QUAGOL guide is a theory and practice-based guide that supports and facilitates the process of analysis of qualitative interview data. Although the method can facilitate the process of analysis, it cannot guarantee automatic quality. The skills of the researcher and the quality of the research team remain the most crucial components of a successful process of analysis. Additionally, the importance of constantly moving between the various stages

  19. Geographical data structures supporting regional analysis

    International Nuclear Information System (INIS)

    Edwards, R.G.; Durfee, R.C.

    1978-01-01

    In recent years the computer has become a valuable aid in solving regional environmental problems. Over a hundred different geographic information systems have been developed to digitize, store, analyze, and display spatially distributed data. One important aspect of these systems is the data structure (e.g. grids, polygons, segments) used to model the environment being studied. This paper presents eight common geographic data structures and their use in studies of coal resources, power plant siting, population distributions, LANDSAT imagery analysis, and landuse analysis

  20. msBiodat analysis tool, big data analysis for high-throughput experiments.

    Science.gov (United States)

    Muñoz-Torres, Pau M; Rokć, Filip; Belužic, Robert; Grbeša, Ivana; Vugrek, Oliver

    2016-01-01

    Mass spectrometry (MS) are a group of a high-throughput techniques used to increase knowledge about biomolecules. They produce a large amount of data which is presented as a list of hundreds or thousands of proteins. Filtering those data efficiently is the first step for extracting biologically relevant information. The filtering may increase interest by merging previous data with the data obtained from public databases, resulting in an accurate list of proteins which meet the predetermined conditions. In this article we present msBiodat Analysis Tool, a web-based application thought to approach proteomics to the big data analysis. With this tool, researchers can easily select the most relevant information from their MS experiments using an easy-to-use web interface. An interesting feature of msBiodat analysis tool is the possibility of selecting proteins by its annotation on Gene Ontology using its Gene Id, ensembl or UniProt codes. The msBiodat analysis tool is a web-based application that allows researchers with any programming experience to deal with efficient database querying advantages. Its versatility and user-friendly interface makes easy to perform fast and accurate data screening by using complex queries. Once the analysis is finished, the result is delivered by e-mail. msBiodat analysis tool is freely available at http://msbiodata.irb.hr.

  1. The Data Party: Involving Stakeholders in Meaningful Data Analysis

    Science.gov (United States)

    Franz, Nancy K.

    2013-01-01

    A hallmark of Extension includes the involvement of stakeholders in research and program needs assessment, design, implementation, evaluation, and reporting. A data party can be used to enhance this stakeholder involvement specifically in data analysis. This type of event can not only increase client participation in Extension programming and…

  2. Qualitative Data Analysis Strategies

    OpenAIRE

    Greaves, Kristoffer

    2014-01-01

    A set of concept maps for qualitative data analysis strategies, inspired by Corbin, JM & Strauss, AL 2008, Basics of qualitative research: Techniques and procedures for developing grounded theory, 3rd edn, Sage Publications, Inc, Thousand Oaks, California.

  3. A method for data base management and analysis for wind tunnel data

    Science.gov (United States)

    Biser, Aileen O.

    1987-01-01

    To respond to the need for improved data base management and analysis capabilities for wind-tunnel data at the Langley 16-Foot Transonic Tunnel, research was conducted into current methods of managing wind-tunnel data and a method was developed as a solution to this need. This paper describes the development of the data base management and analysis method for wind-tunnel data. The design and implementation of the software system are discussed and examples of its use are shown.

  4. Methods for Mediation Analysis with Missing Data

    Science.gov (United States)

    Zhang, Zhiyong; Wang, Lijuan

    2013-01-01

    Despite wide applications of both mediation models and missing data techniques, formal discussion of mediation analysis with missing data is still rare. We introduce and compare four approaches to dealing with missing data in mediation analysis including list wise deletion, pairwise deletion, multiple imputation (MI), and a two-stage maximum…

  5. Graph-Based Analysis of Nuclear Smuggling Data

    International Nuclear Information System (INIS)

    Cook, Diane; Holder, Larry; Thompson, Sandra E.; Whitney, Paul D.; Chilton, Lawrence

    2009-01-01

    Much of the data that is collected and analyzed today is structural, consisting not only of entities but also of relationships between the entities. As a result, analysis applications rely upon automated structural data mining approaches to find patterns and concepts of interest. This ability to analyze structural data has become a particular challenge in many security-related domains. In these domains, focusing on the relationships between entities in the data is critical to detect important underlying patterns. In this study we apply structural data mining techniques to automate analysis of nuclear smuggling data. In particular, we choose to model the data as a graph and use graph-based relational learning to identify patterns and concepts of interest in the data. In this paper, we identify the analysis questions that are of importance to security analysts and describe the knowledge representation and data mining approach that we adopt for this challenge. We analyze the results using the Russian nuclear smuggling event database.

  6. A Statistical Toolkit for Data Analysis

    International Nuclear Information System (INIS)

    Donadio, S.; Guatelli, S.; Mascialino, B.; Pfeiffer, A.; Pia, M.G.; Ribon, A.; Viarengo, P.

    2006-01-01

    The present project aims to develop an open-source and object-oriented software Toolkit for statistical data analysis. Its statistical testing component contains a variety of Goodness-of-Fit tests, from Chi-squared to Kolmogorov-Smirnov, to less known, but generally much more powerful tests such as Anderson-Darling, Goodman, Fisz-Cramer-von Mises, Kuiper, Tiku. Thanks to the component-based design and the usage of the standard abstract interfaces for data analysis, this tool can be used by other data analysis systems or integrated in experimental software frameworks. This Toolkit has been released and is downloadable from the web. In this paper we describe the statistical details of the algorithms, the computational features of the Toolkit and describe the code validation

  7. Thematic mapper data analysis

    Science.gov (United States)

    Settle, M.; Chavez, P.; Kieffer, H. H.; Everett, J. R.; Kahle, A. B.; Kitcho, C. A.; Milton, N. M.; Mouat, D. A.

    1983-01-01

    The geological applications of remote sensing technology are discussed, with emphasis given to the analysis of data from the Thematic Mapper (TM) instrument onboard the Landsat 4 satellite. The flight history and design characteristics of the Landsat 4/TM are reviewed, and some difficulties endountered in the interpretation of raw TM data are discussed, including: the volume of data; residual noise; detector-to-detector striping; and spatial misregistration between measurements. Preliminary results of several geological, lithological, geobotanical mapping experiments are presented as examples of the geological applications of the TM, and some areas for improving the guality of TM imagery are identified.

  8. XML-based analysis interface for particle physics data analysis

    International Nuclear Information System (INIS)

    Hu Jifeng; Lu Xiaorui; Zhang Yangheng

    2011-01-01

    The letter emphasizes on an XML-based interface and its framework for particle physics data analysis. The interface uses a concise XML syntax to describe, in data analysis, the basic tasks: event-selection, kinematic fitting, particle identification, etc. and a basic processing logic: the next step goes on if and only if this step succeeds. The framework can perform an analysis without compiling by loading the XML-interface file, setting p in run-time and running dynamically. An analysis coding in XML instead of C++, easy-to-understood arid use, effectively reduces the work load, and enables users to carry out their analyses quickly. The framework has been developed on the BESⅢ offline software system (BOSS) with the object-oriented C++ programming. These functions, required by the regular tasks and the basic processing logic, are implemented with both standard modules or inherited from the modules in BOSS. The interface and its framework have been tested to perform physics analysis. (authors)

  9. Control, data acquisition, data analysis and remote participation in LHD

    International Nuclear Information System (INIS)

    Nagayama, Y.; Emoto, M.; Nakanishi, H.; Sudo, S.; Imazu, S.; Inagaki, S.; Iwata, C.; Kojima, M.; Nonomura, M.; Ohsuna, M.; Tsuda, K.; Yoshida, M.; Chikaraishi, H.; Funaba, H.; Horiuchi, R.; Ishiguro, S.; Ito, Y.; Kubo, S.; Mase, A.; Mito, T.

    2008-01-01

    This paper presents the control, data acquisition, data analysis and remote participation facilities of the Large Helical Device (LHD), which is designed to confine the plasma in steady state. In LHD the plasma duration exceeds 3000 s by controlling the plasma position, the density and the ICRF heating. The 'LABCOM' data acquisition system takes both the short-pulse and the steady-state data. A two-layer Mass Storage System with RAIDs and Blu-ray Disk jukeboxes in a storage area network has been developed to increase capacity of storage. The steady-state data can be monitored with a Web browser in real time. A high-level data analysis system with Web interfaces is being developed in order to provide easier usage of LHD data and large FORTRAN codes in a supercomputer. A virtual laboratory system for the Japanese fusion community has been developed with Multi-protocol Label Switching Virtual Private Network Technology. Collaborators at remote sites can join the LHD experiment or use the NIFS supercomputer system as if they were working in the LHD control room

  10. Structural Dynamics and Data Analysis

    Science.gov (United States)

    Luthman, Briana L.

    2013-01-01

    This project consists of two parts, the first will be the post-flight analysis of data from a Delta IV launch vehicle, and the second will be a Finite Element Analysis of a CubeSat. Shock and vibration data was collected on WGS-5 (Wideband Global SATCOM- 5) which was launched on a Delta IV launch vehicle. Using CAM (CAlculation with Matrices) software, the data is to be plotted into Time History, Shock Response Spectrum, and SPL (Sound Pressure Level) curves. In this format the data is to be reviewed and compared to flight instrumentation data from previous flights of the same launch vehicle. This is done to ensure the current mission environments, such as shock, random vibration, and acoustics, are not out of family with existing flight experience. In family means the peaks on the SRS curve for WGS-5 are similar to the peaks from the previous flights and there are no major outliers. The curves from the data will then be compiled into a useful format so that is can be peer reviewed then presented before an engineering review board if required. Also, the reviewed data will be uploaded to the Engineering Review Board Information System (ERBIS) to archive. The second part of this project is conducting Finite Element Analysis of a CubeSat. In 2010, Merritt Island High School partnered with NASA to design, build and launch a CubeSat. The team is now called StangSat in honor of their mascot, the mustang. Over the past few years, the StangSat team has built a satellite and has now been manifested for flight on a SpaceX Falcon 9 launch in 2014. To prepare for the final launch, a test flight was conducted in Mojave, California. StangSat was launched on a Prospector 18D, a high altitude rocket made by Garvey Spacecraft Corporation, along with their sister satellite CP9 built by California Polytechnic University. However, StangSat was damaged during an off nominal landing and this project will give beneficial insights into what loads the CubeSat experienced during the crash

  11. Gravity Probe B data analysis: II. Science data and their handling prior to the final analysis

    International Nuclear Information System (INIS)

    Silbergleit, A S; Conklin, J W; Heifetz, M I; Holmes, T; Li, J; Mandel, I; Solomonik, V G; Stahl, K; P W Worden Jr; Everitt, C W F; Adams, M; Berberian, J E; Bencze, W; Clarke, B; Al-Jadaan, A; Keiser, G M; Kozaczuk, J A; Al-Meshari, M; Muhlfelder, B; Salomon, M

    2015-01-01

    The results of the Gravity Probe B relativity science mission published in Everitt et al (2011 Phys. Rev. Lett. 106 221101) required a rather sophisticated analysis of experimental data due to several unexpected complications discovered on-orbit. We give a detailed description of the Gravity Probe B data reduction. In the first paper (Silbergleit et al Class. Quantum Grav. 22 224018) we derived the measurement models, i.e., mathematical expressions for all the signals to analyze. In the third paper (Conklin et al Class. Quantum Grav. 22 224020) we explain the estimation algorithms and their program implementation, and discuss the experiment results obtained through data reduction. This paper deals with the science data preparation for the main analysis yielding the relativistic drift estimates. (paper)

  12. Statistical data analysis using SAS intermediate statistical methods

    CERN Document Server

    Marasinghe, Mervyn G

    2018-01-01

    The aim of this textbook (previously titled SAS for Data Analytics) is to teach the use of SAS for statistical analysis of data for advanced undergraduate and graduate students in statistics, data science, and disciplines involving analyzing data. The book begins with an introduction beyond the basics of SAS, illustrated with non-trivial, real-world, worked examples. It proceeds to SAS programming and applications, SAS graphics, statistical analysis of regression models, analysis of variance models, analysis of variance with random and mixed effects models, and then takes the discussion beyond regression and analysis of variance to conclude. Pedagogically, the authors introduce theory and methodological basis topic by topic, present a problem as an application, followed by a SAS analysis of the data provided and a discussion of results. The text focuses on applied statistical problems and methods. Key features include: end of chapter exercises, downloadable SAS code and data sets, and advanced material suitab...

  13. Enhancing yeast transcription analysis through integration of heterogeneous data

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Nielsen, Jens

    2004-01-01

    of Saccharomyces cerevisiae whole genome transcription data. A special focus is on the quantitative aspects of normalisation and mathematical modelling approaches, since they are expected to play an increasing role in future DNA microarray analysis studies. Data analysis is exemplified with cluster analysis......DNA microarray technology enables the simultaneous measurement of the transcript level of thousands of genes. Primary analysis can be done with basic statistical tools and cluster analysis, but effective and in depth analysis of the vast amount of transcription data requires integration with data...... from several heterogeneous data Sources, such as upstream promoter sequences, genome-scale metabolic models, annotation databases and other experimental data. In this review, we discuss how experimental design, normalisation, heterogeneous data and mathematical modelling can enhance analysis...

  14. Statistical analysis of environmental data

    International Nuclear Information System (INIS)

    Beauchamp, J.J.; Bowman, K.O.; Miller, F.L. Jr.

    1975-10-01

    This report summarizes the analyses of data obtained by the Radiological Hygiene Branch of the Tennessee Valley Authority from samples taken around the Browns Ferry Nuclear Plant located in Northern Alabama. The data collection was begun in 1968 and a wide variety of types of samples have been gathered on a regular basis. The statistical analysis of environmental data involving very low-levels of radioactivity is discussed. Applications of computer calculations for data processing are described

  15. Statistical data analysis handbook

    National Research Council Canada - National Science Library

    Wall, Francis J

    1986-01-01

    It must be emphasized that this is not a text book on statistics. Instead it is a working tool that presents data analysis in clear, concise terms which can be readily understood even by those without formal training in statistics...

  16. FERRET data analysis code

    International Nuclear Information System (INIS)

    Schmittroth, F.

    1979-09-01

    A documentation of the FERRET data analysis code is given. The code provides a way to combine related measurements and calculations in a consistent evaluation. Basically a very general least-squares code, it is oriented towards problems frequently encountered in nuclear data and reactor physics. A strong emphasis is on the proper treatment of uncertainties and correlations and in providing quantitative uncertainty estimates. Documentation includes a review of the method, structure of the code, input formats, and examples

  17. Teaching Data Analysis with Interactive Visual Narratives

    Science.gov (United States)

    Saundage, Dilal; Cybulski, Jacob L.; Keller, Susan; Dharmasena, Lasitha

    2016-01-01

    Data analysis is a major part of business analytics (BA), which refers to the skills, methods, and technologies that enable managers to make swift, quality decisions based on large amounts of data. BA has become a major component of Information Systems (IS) courses all over the world. The challenge for IS educators is to teach data analysis--the…

  18. Data archiving and analysis for CWDD

    International Nuclear Information System (INIS)

    Coleman, T.A.; Novick, A.H.; Meystrik, C.C.; Marselle, J.R.

    1992-01-01

    A computer system has been developed to handle archiving and analysis of data acquired during operations of the Continuous Wave Deuterium Demonstrator (CWDD). Data files generated by the CWDD Instrumentation and Control system are transferred across a local area network to the CWDD Archive system where they are enlisted into the archive and stored on removeable media optical disk drives. A relational database management system maintains an on-line database catalog of all archived files. This database contains information about file contents and formats, and holds signal parameter configuration tables needed to extract and interpret data from the files. Software has been developed to assist the selection and retrieval of data on demand based upon references in the catalog. Data retrieved from the archive is transferred to commercial data visualization applications for viewing, plotting and analysis

  19. Beyond Constant Comparison Qualitative Data Analysis: Using NVivo

    Science.gov (United States)

    Leech, Nancy L.; Onwuegbuzie, Anthony J.

    2011-01-01

    The purposes of this paper are to outline seven types of qualitative data analysis techniques, to present step-by-step guidance for conducting these analyses via a computer-assisted qualitative data analysis software program (i.e., NVivo9), and to present screenshots of the data analysis process. Specifically, the following seven analyses are…

  20. Spectral map-analysis: a method to analyze gene expression data

    OpenAIRE

    Bijnens, Luc J.M.; Lewi, Paul J.; Göhlmann, Hinrich W.; Molenberghs, Geert; Wouters, Luc

    2004-01-01

    bioinformatics; biplot; correspondence factor analysis; data mining; data visualization; gene expression data; microarray data; multivariate exploratory data analysis; principal component analysis; Spectral map analysis

  1. A Hierarchical Visualization Analysis Model of Power Big Data

    Science.gov (United States)

    Li, Yongjie; Wang, Zheng; Hao, Yang

    2018-01-01

    Based on the conception of integrating VR scene and power big data analysis, a hierarchical visualization analysis model of power big data is proposed, in which levels are designed, targeting at different abstract modules like transaction, engine, computation, control and store. The regularly departed modules of power data storing, data mining and analysis, data visualization are integrated into one platform by this model. It provides a visual analysis solution for the power big data.

  2. Pengembangan Aplikasi Antarmuka Layanan Big Data Analysis

    Directory of Open Access Journals (Sweden)

    Gede Karya

    2017-11-01

    Full Text Available In the 2016 Higher Competitive Grants Research (Hibah Bersaing Dikti, we have been successfully developed models, infrastructure and modules of Hadoop-based big data analysis application. It has also successfully developed a virtual private network (VPN network that allows integration and access to the infrastructure from outside the FTIS Computer Laboratorium. Infrastructure and application modules of analysis are then wanted to be presented as services to small and medium enterprises (SMEs in Indonesia. This research aims to develop application of big data analysis service interface integrated with Hadoop-Cluster. The research begins with finding appropriate methods and techniques for scheduling jobs, calling for ready-made Java Map-Reduce (MR application modules, and techniques for tunneling input / output and meta-data construction of service request (input and service output. The above methods and techniques are then developed into a web-based service application, as well as an executable module that runs on Java and J2EE based programming environment and can access Hadoop-Cluster in the FTIS Computer Lab. The resulting application can be accessed by the public through the site http://bigdata.unpar.ac.id. Based on the test results, the application has functioned well in accordance with the specifications and can be used to perform big data analysis. Keywords: web based service, big data analysis, Hadop, J2EE Abstrak Pada penelitian Hibah Bersaing Dikti tahun 2016 telah berhasil dikembangkan model, infrastruktur dan modul-modul aplikasi big data analysis berbasis Hadoop. Selain itu juga telah berhasil dikembangkan jaringan virtual private network (VPN yang memungkinkan integrasi dan akses infrastruktur tersebut dari luar Laboratorium Komputer FTIS. Infrastruktur dan modul aplikasi analisis tersebut selanjutnya ingin dipresentasikan sebagai layanan kepada usaha kecil dan menengah (UKM di Indonesia. Penelitian ini bertujuan untuk mengembangkan

  3. A Web Services Data Analysis Grid

    Energy Technology Data Exchange (ETDEWEB)

    William A Watson III; Ian Bird; Jie Chen; Bryan Hess; Andy Kowalski; Ying Chen

    2002-07-01

    The trend in large-scale scientific data analysis is to exploit compute, storage and other resources located at multiple sites, and to make those resources accessible to the scientist as if they were a single, coherent system. Web technologies driven by the huge and rapidly growing electronic commerce industry provide valuable components to speed the deployment of such sophisticated systems. Jefferson Lab, where several hundred terabytes of experimental data are acquired each year, is in the process of developing a web-based distributed system for data analysis and management. The essential aspects of this system are a distributed data grid (site independent access to experiment, simulation and model data) and a distributed batch system, augmented with various supervisory and management capabilities, and integrated using Java and XML-based web services.

  4. A Web Services Data Analysis Grid

    International Nuclear Information System (INIS)

    William A Watson III; Ian Bird; Jie Chen; Bryan Hess; Andy Kowalski; Ying Chen

    2002-01-01

    The trend in large-scale scientific data analysis is to exploit compute, storage and other resources located at multiple sites, and to make those resources accessible to the scientist as if they were a single, coherent system. Web technologies driven by the huge and rapidly growing electronic commerce industry provide valuable components to speed the deployment of such sophisticated systems. Jefferson Lab, where several hundred terabytes of experimental data are acquired each year, is in the process of developing a web-based distributed system for data analysis and management. The essential aspects of this system are a distributed data grid (site independent access to experiment, simulation and model data) and a distributed batch system, augmented with various supervisory and management capabilities, and integrated using Java and XML-based web services

  5. Geotechnical field data and analysis report

    International Nuclear Information System (INIS)

    1990-09-01

    The geotechnical Field Data and Analysis Report documents the geomechanical data collected at the Waste Isolation Pilot Plant up to June 30, 1989 and describes the conditions of underground openings from July 1, 1988 to June 30, 1989. The data is required to understand performance during operations and does not include data from tests performed to support performance assessment. In summary, the underground openings have performed in a satisfactory manner during the reporting period. This analysis is based primarily on the evaluation of instrumentation data, in particular the comparison of measured convergence with predictions, and the observations of exposed rock surfaces. The main concerns during this period have been the deterioration found in Site Preliminary Design Validation Test Rooms 1 and 2 and some spalling found in Panel 1. 14 refs., 45 figs., 11 tabs

  6. Analysis of longitudinal data from animals where some data are missing in SPSS

    Science.gov (United States)

    Duricki, DA; Soleman, S; Moon, LDF

    2017-01-01

    Testing of therapies for disease or injury often involves analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly where some data are missing) yet are not used widely by pre-clinical researchers. We provide here an easy to use protocol for analysing longitudinal data from animals and present a click-by-click guide for performing suitable analyses using the statistical package SPSS. We guide readers through analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. We show that repeated measures analysis of covariance failed to detect a treatment effect when a few data points were missing (due to animal drop-out) whereas analysis using an alternative method detected a beneficial effect of treatment; specifically, we demonstrate the superiority of linear models (with various covariance structures) analysed using Restricted Maximum Likelihood estimation (to include all available data). This protocol takes two hours to follow. PMID:27196723

  7. SIMONE: Tool for Data Analysis and Simulation

    International Nuclear Information System (INIS)

    Chudoba, V.; Hnatio, B.; Sharov, P.; Papka, Paul

    2013-06-01

    SIMONE is a software tool based on the ROOT Data Analysis Framework and developed in collaboration of FLNR JINR and iThemba LABS. It is intended for physicists planning experiments and analysing experimental data. The goal of the SIMONE framework is to provide a flexible system, user friendly, efficient and well documented. It is intended for simulation of a wide range of Nuclear Physics experiments. The most significant conditions and physical processes can be taken into account during simulation of the experiment. The user can create his own experimental setup through the access of predefined detector geometries. Simulated data is made available in the same format as for the real experiment for identical analysis of both experimental and simulated data. Significant time reduction is expected during experiment planning and data analysis. (authors)

  8. KDD for science data analysis: Issues and examples

    International Nuclear Information System (INIS)

    Fayyad, U.; Haussler, D.; Stolorz, P.

    1996-01-01

    The analysis of the massive data sets collected by scientific instruments demands automation as a prerequisite to analysis. There is an urgent need to create an intermediate level at which scientists can operate effectively; isolating them from the massive sizes and harnessing human analysis capabilities to focus on tasks in which machines do not even remotely approach humans-namely, creative data analysis, theory and hypothesis formation, and drawing insights into underlying phenomena. We give an overview of the main issues in the exploitation of scientific datasets, present five case studies where KDD tools play important and enabling roles, and conclude with future challenges for data mining and KDD techniques in science data analysis

  9. Lipidomic data analysis: Tutorial, practical guidelines and applications

    Energy Technology Data Exchange (ETDEWEB)

    Checa, Antonio [Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Scheeles väg 2, SE-171 77 Stockholm (Sweden); Bedia, Carmen [Department of Environmental Chemistry, IDAEA-CSIC, Jordi Girona 18–26, Barcelona 08034 (Spain); Jaumot, Joaquim, E-mail: joaquim.jaumot@idaea.csic.es [Department of Environmental Chemistry, IDAEA-CSIC, Jordi Girona 18–26, Barcelona 08034 (Spain)

    2015-07-23

    Highlights: • An overview of chemometric methods applied to lipidomic data analysis is presented. • A lipidomic data set is analyzed showing the strengths of the introduced methods. • Practical guidelines for lipidomic data analysis are discussed. • Examples of applications of lipidomic data analysis in different fields are provided. - Abstract: Lipids are a broad group of biomolecules involved in diverse critical biological roles such as cellular membrane structure, energy storage or cell signaling and homeostasis. Lipidomics is the -omics science that pursues the comprehensive characterization of lipids present in a biological sample. Different analytical strategies such as nuclear magnetic resonance or mass spectrometry with or without previous chromatographic separation are currently used to analyze the lipid composition of a sample. However, current analytical techniques provide a vast amount of data which complicates the interpretation of results without the use of advanced data analysis tools. The choice of the appropriate chemometric method is essential to extract valuable information from the crude data as well as to interpret the lipidomic results in the biological context studied. The present work summarizes the diverse methods of analysis than can be used to study lipidomic data, from statistical inference tests to more sophisticated multivariate analysis methods. In addition to the theoretical description of the methods, application of various methods to a particular lipidomic data set as well as literature examples are presented.

  10. Contracting Data Analysis: Assessment of Government-Wide Trends

    Science.gov (United States)

    2017-03-01

    CONTRACTING DATA ANALYSIS Assessment of Government -wide Trends Report to Congressional Addressees March 2017...Office Highlights of GAO-17-244SP, a report to congressional addressees March 2017 CONTRACTING DATA ANALYSIS Assessment of Government -wide...Trends What GAO Found GAO’s analysis of government -wide contracting data found that while defense obligations to buy products and services

  11. Archiving, Distribution and Analysis of Solar-B Data

    Science.gov (United States)

    Shimojo, M.

    2007-10-01

    The Solar-B Mission Operation and Data Analysis (MODA) working group has been discussing the data analysis system for Solar-B data since 2001. In the paper, based on the Solar-B MODA document and the recent work in Japan, we introduce the dataflow from Solar-B to scientists, the data format and data-level of Solar-B data, and the data searching/providing system.

  12. The JASMIN Analysis Platform - bridging the gap between traditional climate data practicies and data-centric analysis paradigms

    Science.gov (United States)

    Pascoe, Stephen; Iwi, Alan; kershaw, philip; Stephens, Ag; Lawrence, Bryan

    2014-05-01

    The advent of large-scale data and the consequential analysis problems have led to two new challenges for the research community: how to share such data to get the maximum value and how to carry out efficient analysis. Solving both challenges require a form of parallelisation: the first is social parallelisation (involving trust and information sharing), the second data parallelisation (involving new algorithms and tools). The JASMIN infrastructure supports both kinds of parallelism by providing a multi-tennent environment with petabyte-scale storage, VM provisioning and batch cluster facilities. The JASMIN Analysis Platform (JAP) is an analysis software layer for JASMIN which emphasises ease of transition from a researcher's local environment to JASMIN. JAP brings together tools traditionally used by multiple communities and configures them to work together, enabling users to move analysis from their local environment to JASMIN without rewriting code. JAP also provides facilities to exploit JASMIN's parallel capabilities whilst maintaining their familiar analysis environment where ever possible. Modern opensource analysis tools typically have multiple dependent packages, increasing the installation burden on system administrators. When you consider a suite of tools, often with both common and conflicting dependencies, analysis pipelines can become locked to a particular installation simply because of the effort required to reconstruct the dependency tree. JAP addresses this problem by providing a consistent suite of RPMs compatible with RedHat Enterprise Linux and CentOS 6.4. Researchers can install JAP locally, either as RPMs or through a pre-built VM image, giving them the confidence to know moving analysis to JASMIN will not disrupt their environment. Analysis parallelisation is in it's infancy in climate sciences, with few tools capable of exploiting any parallel environment beyond manual scripting of the use of multiple processors. JAP begins to bridge this

  13. EBT data acquisition and analysis system

    International Nuclear Information System (INIS)

    Burris, R.D.; Greenwood, D.E.; Stanton, J.S.; Geoffroy, K.A.

    1980-10-01

    This document describes the design and implementation of a data acquisition and analysis system for the EBT fusion experiment. The system includes data acquisition on five computers, automatic transmission of that data to a large, central data base, and a powerful data retrieval system. The system is flexible and easy to use, and it provides a fully documented record of the experiments

  14. Analysis of mixed data methods & applications

    CERN Document Server

    de Leon, Alexander R

    2013-01-01

    A comprehensive source on mixed data analysis, Analysis of Mixed Data: Methods & Applications summarizes the fundamental developments in the field. Case studies are used extensively throughout the book to illustrate interesting applications from economics, medicine and health, marketing, and genetics. Carefully edited for smooth readability and seamless transitions between chaptersAll chapters follow a common structure, with an introduction and a concluding summary, and include illustrative examples from real-life case studies in developmental toxicolog

  15. On vehicular traffic data analysis

    Energy Technology Data Exchange (ETDEWEB)

    Brics, Martins; Mahnke, Reinhard [Institute of Physics, Rostock University (Germany)

    2011-07-01

    This contribution consists of analysis of empirical vehicular traffic flow data. The main focus lies on the Next Generation Simulation (NGSIM) data. The first findings show that there are artificial structures within the data due to errors of monitoring as well as smoothing position measurement data. As a result speed data show discretisation in 5 feet per second. The aim of this investigation is to construct microscopic traffic flow models which are in agreement to the analysed empirical data. The ongoing work follows the subject of research summarized by Christof Liebe in his PhD thesis entitled ''Physics of traffic flow: Empirical data and dynamical models'' (Rostock, 2010).

  16. Object-oriented data analysis framework for neutron scattering experiments

    International Nuclear Information System (INIS)

    Suzuki, Jiro; Nakatani, Takeshi; Ohhara, Takashi; Inamura, Yasuhiro; Yonemura, Masao; Morishima, Takahiro; Aoyagi, Tetsuo; Manabe, Atsushi; Otomo, Toshiya

    2009-01-01

    Materials and Life Science Facility (MLF) of Japan Proton Accelerator Research Complex (J-PARC) is one of the facilities that provided the highest intensity pulsed neutron and muon beams. The MLF computing environment design group organizes the computing environments of MLF and instruments. It is important that the computing environment is provided by the facility side, because meta-data formats, the analysis functions and also data analysis strategy should be shared among many instruments in MLF. The C++ class library, named Manyo-lib, is a framework software for developing data reduction and analysis softwares. The framework is composed of the class library for data reduction and analysis operators, network distributed data processing modules and data containers. The class library is wrapped by the Python interface created by SWIG. All classes of the framework can be called from Python language, and Manyo-lib will be cooperated with the data acquisition and data-visualization components through the MLF-platform, a user interface unified in MLF, which is working on Python language. Raw data in the event-data format obtained by data acquisition systems will be converted into histogram format data on Manyo-lib in high performance, and data reductions and analysis are performed with user-application software developed based on Manyo-lib. We enforce standardization of data containers with Manyo-lib, and many additional fundamental data containers in Manyo-lib have been designed and developed. Experimental and analysis data in the data containers can be converted into NeXus file. Manyo-lib is the standard framework for developing analysis software in MLF, and prototypes of data-analysis softwares for each instrument are being developed by the instrument teams.

  17. Analysis of mass spectrometry data in proteomics

    DEFF Research Database (Denmark)

    Matthiesen, Rune; Jensen, Ole N

    2008-01-01

    The systematic study of proteins and protein networks, that is, proteomics, calls for qualitative and quantitative analysis of proteins and peptides. Mass spectrometry (MS) is a key analytical technology in current proteomics and modern mass spectrometers generate large amounts of high-quality data...... that in turn allow protein identification, annotation of secondary modifications, and determination of the absolute or relative abundance of individual proteins. Advances in mass spectrometry-driven proteomics rely on robust bioinformatics tools that enable large-scale data analysis. This chapter describes...... some of the basic concepts and current approaches to the analysis of MS and MS/MS data in proteomics....

  18. Distributed Data Analysis in ATLAS

    CERN Document Server

    Nilsson, P; The ATLAS collaboration

    2012-01-01

    Data analysis using grid resources is one of the fundamental challenges to be addressed before the start of LHC data taking. The ATLAS detector will produce petabytes of data per year, and roughly one thousand users will need to run physics analyses on this data. Appropriate user interfaces and helper applications have been made available to ensure that the grid resources can be used without requiring expertise in grid technology. These tools enlarge the number of grid users from a few production administrators to potentially all participating physicists. ATLAS makes use of three grid infrastructures for the distributed analysis: the EGEE sites, the Open Science Grid, and NorduGrid. These grids are managed by the gLite workload management system, the PanDA workload management system, and ARC middleware; many sites can be accessed via both the gLite WMS and PanDA. Users can choose between two front-end tools to access the distributed resources. Ganga is a tool co-developed with LHCb to provide a common interfa...

  19. Analysis of biomarker data a practical guide

    CERN Document Server

    Looney, Stephen W

    2015-01-01

    A "how to" guide for applying statistical methods to biomarker data analysis Presenting a solid foundation for the statistical methods that are used to analyze biomarker data, Analysis of Biomarker Data: A Practical Guide features preferred techniques for biomarker validation. The authors provide descriptions of select elementary statistical methods that are traditionally used to analyze biomarker data with a focus on the proper application of each method, including necessary assumptions, software recommendations, and proper interpretation of computer output. In addition, the book discusses

  20. Intelligent Data Analysis in the 21st Century

    Science.gov (United States)

    Cohen, Paul; Adams, Niall

    When IDA began, data sets were small and clean, data provenance and management were not significant issues, workflows and grid computing and cloud computing didn’t exist, and the world was not populated with billions of cellphone and computer users. The original conception of intelligent data analysis — automating some of the reasoning of skilled data analysts — has not been updated to account for the dramatic changes in what skilled data analysis means, today. IDA might update its mission to address pressing problems in areas such as climate change, habitat loss, education, and medicine. It might anticipate data analysis opportunities five to ten years out, such as customizing educational trajectories to individual students, and personalizing medical protocols. Such developments will elevate the conference and our community by shifting our focus from arbitrary measures of the performance of isolated algorithms to the practical, societal value of intelligent data analysis systems.

  1. The ASDEX integrated data analysis system AIDA

    International Nuclear Information System (INIS)

    Grassie, K.; Gruber, O.; Kardaun, O.; Kaufmann, M.; Lackner, K.; Martin, P.; Mast, K.F.; McCarthy, P.J.; Mertens, V.; Pohl, D.; Rang, U.; Wunderlich, R.

    1989-11-01

    Since about two years, the ASDEX integrated data analysis system (AIDA), which combines the database (DABA) and the statistical analysis system (SAS), is successfully in operation. Besides a considerable, but meaningful, reduction of the 'raw' shot data, it offers the advantage of carefully selected and precisely defined datasets, which are easily accessible for informative tabular data overviews (DABA), and multi-shot analysis (SAS). Even rather complicated, statistical analyses can be performed efficiently within this system. In this report, we want to summarise AIDA's main features, give some details on its set-up and on the physical models which have been used for the derivation of the processed data. We also give short introduction how to use DABA and SAS. (orig.)

  2. Coupling Visualization and Data Analysis for Knowledge Discovery from Multi-dimensional Scientific Data

    International Nuclear Information System (INIS)

    Rubel, Oliver; Ahern, Sean; Bethel, E. Wes; Biggin, Mark D.; Childs, Hank; Cormier-Michel, Estelle; DePace, Angela; Eisen, Michael B.; Fowlkes, Charless C.; Geddes, Cameron G.R.; Hagen, Hans; Hamann, Bernd; Huang, Min-Yu; Keranen, Soile V.E.; Knowles, David W.; Hendriks, Chris L. Luengo; Malik, Jitendra; Meredith, Jeremy; Messmer, Peter; Prabhat; Ushizima, Daniela; Weber, Gunther H.; Wu, Kesheng

    2010-01-01

    Knowledge discovery from large and complex scientific data is a challenging task. With the ability to measure and simulate more processes at increasingly finer spatial and temporal scales, the growing number of data dimensions and data objects presents tremendous challenges for effective data analysis and data exploration methods and tools. The combination and close integration of methods from scientific visualization, information visualization, automated data analysis, and other enabling technologies 'such as efficient data management' supports knowledge discovery from multi-dimensional scientific data. This paper surveys two distinct applications in developmental biology and accelerator physics, illustrating the effectiveness of the described approach.

  3. A SWOT Analysis of Big Data

    Science.gov (United States)

    Ahmadi, Mohammad; Dileepan, Parthasarati; Wheatley, Kathleen K.

    2016-01-01

    This is the decade of data analytics and big data, but not everyone agrees with the definition of big data. Some researchers see it as the future of data analysis, while others consider it as hype and foresee its demise in the near future. No matter how it is defined, big data for the time being is having its glory moment. The most important…

  4. Information, Privacy and Stability in Adaptive Data Analysis

    OpenAIRE

    Smith, Adam

    2017-01-01

    Traditional statistical theory assumes that the analysis to be performed on a given data set is selected independently of the data themselves. This assumption breaks downs when data are re-used across analyses and the analysis to be performed at a given stage depends on the results of earlier stages. Such dependency can arise when the same data are used by several scientific studies, or when a single analysis consists of multiple stages. How can we draw statistically valid conclusions when da...

  5. Data Analysis of Cybercrimes in Businesses

    Directory of Open Access Journals (Sweden)

    Balan Shilpa

    2017-12-01

    Full Text Available In the current digital age, most people have become very dependent on technology for their daily work tasks. With the rise of the technological advancements, cyber-attacks have also increased. Over the past few years, there have been several security breaches. When sensitive data are breached, both organisations and consumers are affected. In the present research, we analyse the cyber security risks and its impact on organisations. To perform the analysis, a big data technology such as R programming is used. For example, using a big data analysis, it was found that the majority of businesses detected at least one incident involving a local area network (LAN breach.

  6. SWISH DataLab: A Web Interface for Data Exploration and Analysis

    NARCIS (Netherlands)

    T. Bogaard (Tessel); J. Wielemaker (Jan); L. Hollink (Laura); J.R. van Ossenbruggen (Jacco)

    2017-01-01

    textabstractSWISH DataLab is a single integrated collaborative environment for data processing, exploration and analysis combining Prolog and R. The web interface makes it possible to share the data, the code of all processing steps and the results among researchers; and a versioning system

  7. On the Impact of Inhomogeneities in Meteorological Data on VLBI Data Analysis

    Science.gov (United States)

    Balidakis, Kyriakos; Heinkelmann, Robert; Phogat, Apurva; Soja, Benedikt; Glaser, Susanne; Nilsson, Tobias; Karbon, Maria; Schuh, Harald

    2016-12-01

    In this study, we address the issue of the quality of meteorological data employed for VLBI data analysis. We use data from six numerical weather models (NWMs) to form references on which the homogenization process is based. We explore the impact of the choice of NWM as well as the way to extract data from it. Among our findings is that data from the surface fields of NWMs are not suitable for either geodetic analysis or homogenization efforts, whether they are in their original form or after they have been compensated for the height difference between the orography of the NWM and the actual elevation. The reason lies in the fact that for 77% of the VLBI stations a height bias larger than 2.5 mm appears, as well as an average bias in the zenith wet delay estimates of 12.2 mm. Should the proposed extraction approach be followed, the difference between operational and reanalysis NWMs is not significant for such an application. Our conclusions are based on the analysis of VLBI data over 13 years.

  8. SOLE: enhanced FIA data analysis capabilities

    Science.gov (United States)

    Michael Spinney; Paul Van Deusen

    2009-01-01

    The Southern On Line Estimator (SOLE), is an Internet-based annual forest inventory and analysis (FIA) data analysis tool developed cooperatively by the National Council for Air and Stream Improvement and the Forest Service, U.S. Department of Agriculture's Forest Inventory and Analysis program at the Southern Research Station. Recent development of SOLE has...

  9. Leveraging Data Analysis for Domain Experts: An Embeddable Framework for Basic Data Science Tasks

    Science.gov (United States)

    Lohrer, Johannes-Y.; Kaltenthaler, Daniel; Kröger, Peer

    2016-01-01

    In this paper, we describe a framework for data analysis that can be embedded into a base application. Since it is important to analyze the data directly inside the application where the data is entered, a tool that allows the scientists to easily work with their data, supports and motivates the execution of further analysis of their data, which…

  10. Spatiotemporal Data Mining, Analysis, and Visualization of Human Activity Data

    Science.gov (United States)

    Li, Xun

    2012-01-01

    This dissertation addresses the research challenge of developing efficient new methods for discovering useful patterns and knowledge in large volumes of electronically collected spatiotemporal activity data. I propose to analyze three types of such spatiotemporal activity data in a methodological framework that integrates spatial analysis, data…

  11. Proteomics wants cRacker: automated standardized data analysis of LC-MS derived proteomic data.

    Science.gov (United States)

    Zauber, Henrik; Schulze, Waltraud X

    2012-11-02

    The large-scale analysis of thousands of proteins under various experimental conditions or in mutant lines has gained more and more importance in hypothesis-driven scientific research and systems biology in the past years. Quantitative analysis by large scale proteomics using modern mass spectrometry usually results in long lists of peptide ion intensities. The main interest for most researchers, however, is to draw conclusions on the protein level. Postprocessing and combining peptide intensities of a proteomic data set requires expert knowledge, and the often repetitive and standardized manual calculations can be time-consuming. The analysis of complex samples can result in very large data sets (lists with several 1000s to 100,000 entries of different peptides) that cannot easily be analyzed using standard spreadsheet programs. To improve speed and consistency of the data analysis of LC-MS derived proteomic data, we developed cRacker. cRacker is an R-based program for automated downstream proteomic data analysis including data normalization strategies for metabolic labeling and label free quantitation. In addition, cRacker includes basic statistical analysis, such as clustering of data, or ANOVA and t tests for comparison between treatments. Results are presented in editable graphic formats and in list files.

  12. Integrated Data Collection Analysis (IDCA) Program — RDX Standard Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Sandstrom, Mary M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brown, Geoffrey W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Preston, Daniel N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pollard, Colin J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Warner, Kirstin F. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Sorensen, Daniel N. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Remmers, Daniel L. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Phillips, Jason J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shelley, Timothy J. [Bureau of Alcohol, Tobacco and Firearms, Huntsville, AL (United States); Reyes, Jose A. [Applied Research Associates, Tyndall AFB, FL (United States); Hsu, Peter C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Reynolds, John G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-03-04

    The Integrated Data Collection Analysis (IDCA) program is conducting a proficiency study for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard, for a third and fourth time in the Proficiency Test and averaged with the analysis results from the first and second time. The results, from averaging all four sets (1, 2, 3 and 4) of data suggest a material to have slightly more impact sensitivity, more BAM friction sensitivity, less ABL friction sensitivity, similar ESD sensitivity, and same DSC sensitivity, compared to the results from Set 1, which was used previously as the values for the RDX standard in IDCA Analysis Reports.

  13. Analysis of high-fold gamma data

    International Nuclear Information System (INIS)

    Radford, D. C.; Cromaz, M.; Beyer, C. J.

    1999-01-01

    Historically, γ-γ and γ-γ-γ coincidence spectra were utilized to build nuclear level schemes. With the development of large detector arrays, it has became possible to analyze higher fold coincidence data sets. This paper briefly reports on software to analyze 4-fold coincidence data sets that allows creation of 4-fold histograms (hypercubes) of at least 1024 channels per side (corresponding to a 43 gigachannel data space) that will fit onto a few gigabytes of disk space, and extraction of triple-gated spectra in a few seconds. Future detector arrays may have even much higher efficiencies, and detect as many as 15 or 20 γ rays simultaneously; such data will require very different algorithms for storage and analysis. Difficulties inherent in the analysis of such data are discussed, and two possible new solutions are presented, namely adaptive list-mode systems and 'list-list-mode' storage

  14. Statistical methods for astronomical data analysis

    CERN Document Server

    Chattopadhyay, Asis Kumar

    2014-01-01

    This book introduces “Astrostatistics” as a subject in its own right with rewarding examples, including work by the authors with galaxy and Gamma Ray Burst data to engage the reader. This includes a comprehensive blending of Astrophysics and Statistics. The first chapter’s coverage of preliminary concepts and terminologies for astronomical phenomenon will appeal to both Statistics and Astrophysics readers as helpful context. Statistics concepts covered in the book provide a methodological framework. A unique feature is the inclusion of different possible sources of astronomical data, as well as software packages for converting the raw data into appropriate forms for data analysis. Readers can then use the appropriate statistical packages for their particular data analysis needs. The ideas of statistical inference discussed in the book help readers determine how to apply statistical tests. The authors cover different applications of statistical techniques already developed or specifically introduced for ...

  15. Textbooks for Responsible Data Analysis in Excel

    Science.gov (United States)

    Garrett, Nathan

    2015-01-01

    With 27 million users, Excel (Microsoft Corporation, Seattle, WA) is the most common business data analysis software. However, audits show that almost all complex spreadsheets have errors. The author examined textbooks to understand why responsible data analysis is taught. A purposeful sample of 10 textbooks was coded, and then compared against…

  16. Data analysis techniques for gravitational wave observations

    Indian Academy of Sciences (India)

    Astrophysical sources of gravitational waves fall broadly into three categories: (i) transient and bursts, (ii) periodic or continuous wave and (iii) stochastic. Each type of source requires a different type of data analysis strategy. In this talk various data analysis strategies will be reviewed. Optimal filtering is used for extracting ...

  17. Compositional Data Analysis Theory and Applications

    CERN Document Server

    Pawlowsky-Glahn, Vera

    2011-01-01

    This book presents the state-of-the-art in compositional data analysis and will feature a collection of papers covering theory, applications to various fields of science and software. Areas covered will range from geology, biology, environmental sciences, forensic sciences, medicine and hydrology. Key features:Provides the state-of-the-art text in compositional data analysisCovers a variety of subject areas, from geology to medicineWritten by leading researchers in the fieldIs supported by a website featuring R code

  18. Analysis of Ordinal Categorical Data

    CERN Document Server

    Agresti, Alan

    2012-01-01

    Statistical science's first coordinated manual of methods for analyzing ordered categorical data, now fully revised and updated, continues to present applications and case studies in fields as diverse as sociology, public health, ecology, marketing, and pharmacy. Analysis of Ordinal Categorical Data, Second Edition provides an introduction to basic descriptive and inferential methods for categorical data, giving thorough coverage of new developments and recent methods. Special emphasis is placed on interpretation and application of methods including an integrated comparison of the available st

  19. Chaotic data analysis of heart R-R interval EKG data

    International Nuclear Information System (INIS)

    Frison, T.W.; Peng, C.K.; Goldberger, A.; Katz, R.A.

    1996-01-01

    Cardiac beat-to-beat interval data is analyzed with a chaotic data analysis toolkit. The embedding dimension of ten data sets from healthy subjects is 7 or at most 8. Ten of the eleven pathological data sets have an embedding dimension of 9 or greater. Statistically, the first local minimum of average mutual information for healthy hearts is larger than the pathological cases. But, there is a large standard deviation for this metric that blurs the distinction between the healthy and pathological data. copyright 1996 American Institute of Physics

  20. Lectures on categorical data analysis

    CERN Document Server

    Rudas, Tamás

    2018-01-01

    This book offers a relatively self-contained presentation of the fundamental results in categorical data analysis, which plays a central role among the statistical techniques applied in the social, political and behavioral sciences, as well as in marketing and medical and biological research. The methods applied are mainly aimed at understanding the structure of associations among variables and the effects of other variables on these interactions. A great advantage of studying categorical data analysis is that many concepts in statistics become transparent when discussed in a categorical data context, and, in many places, the book takes this opportunity to comment on general principles and methods in statistics, addressing not only the “how” but also the “why.” Assuming minimal background in calculus, linear algebra, probability theory and statistics, the book is designed to be used in upper-undergraduate and graduate-level courses in the field and in more general statistical methodology courses, as w...

  1. LISA Pathfinder instrument data analysis

    Science.gov (United States)

    Guzman, Felipe

    LISA Pathfinder (LPF) is an ESA-launched demonstration mission of key technologies required for the joint NASA-ESA gravitational wave observatory in space, LISA. As part of the LPF interferometry investigations, analytic models of noise sources and corresponding noise subtrac-tion techniques have been developed to correct for effects like the coupling of test mass jitter into displacement readout, and fluctuations of the laser frequency or optical pathlength difference. Ground testing of pre-flight hardware of the Optical Metrology Subsystem is currently ongoing at the Albert Einstein Institute Hannover. In collaboration with NASA Goddard Space Flight Center, the LPF mission data analysis tool LTPDA is being used to analyze the data product of these tests. Furthermore, the noise subtraction techniques and in-flight experiment runs for noise characterization are being defined as part of the mission experiment master plan. We will present the data analysis outcome of pre-flight hardware ground tests and possible noise subtraction strategies for in-flight instrument operations.

  2. Structural-Vibration-Response Data Analysis

    Science.gov (United States)

    Smith, W. R.; Hechenlaible, R. N.; Perez, R. C.

    1983-01-01

    Computer program developed as structural-vibration-response data analysis tool for use in dynamic testing of Space Shuttle. Program provides fast and efficient time-domain least-squares curve-fitting procedure for reducing transient response data to obtain structural model frequencies and dampings from free-decay records. Procedure simultaneously identifies frequencies, damping values, and participation factors for noisy multiple-response records.

  3. Radiation and environmental data analysis computer (REDAC) hardware, software band analysis procedures

    International Nuclear Information System (INIS)

    Hendricks, T.J.

    1985-01-01

    The REDAC was conceived originally as a tape verifier for the Radiation and Environmental Data Acquisition Recorder (REDAR). From that simple beginning in 1971, the REDAC has evolved into a family of systems used for complete analysis of data obtained by the REDAR and other acquisition systems. Portable or mobile REDACs are deployed to support checkout and analysis tasks in the field. Laboratory systems are additionally used for software development, physics investigations, data base management and graphics. System configurations range from man-portable systems to a large laboratory-based system which supports time-shared analysis and development tasks. Custom operating software allows the analyst to process data either interactively or by batch procedures. Analysis packages are provided for numerous necessary functions. All these analysis procedures can be performed even on the smallest man-portable REDAC. Examples of the multi-isotope stripping and radiation isopleth mapping are presented. Techniques utilized for these operations are also presented

  4. CMS Analysis and Data Reduction with Apache Spark

    Energy Technology Data Exchange (ETDEWEB)

    Gutsche, Oliver [Fermilab; Canali, Luca [CERN; Cremer, Illia [Magnetic Corp., Waltham; Cremonesi, Matteo [Fermilab; Elmer, Peter [Princeton U.; Fisk, Ian [Flatiron Inst., New York; Girone, Maria [CERN; Jayatilaka, Bo [Fermilab; Kowalkowski, Jim [Fermilab; Khristenko, Viktor [CERN; Motesnitsalis, Evangelos [CERN; Pivarski, Jim [Princeton U.; Sehrish, Saba [Fermilab; Surdy, Kacper [CERN; Svyatkovskiy, Alexey [Princeton U.

    2017-10-31

    Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HEP have not changed (filtering and transforming experiment-specific data formats), these new technologies use different approaches and tools, promising a fresh look at analysis of very large datasets that could potentially reduce the time-to-physics with increased interactivity. Moreover these new tools are typically actively developed by large communities, often profiting of industry resources, and under open source licensing. These factors result in a boost for adoption and maturity of the tools and for the communities supporting them, at the same time helping in reducing the cost of ownership for the end-users. In this talk, we are presenting studies of using Apache Spark for end user data analysis. We are studying the HEP analysis workflow separated into two thrusts: the reduction of centrally produced experiment datasets and the end-analysis up to the publication plot. Studying the first thrust, CMS is working together with CERN openlab and Intel on the CMS Big Data Reduction Facility. The goal is to reduce 1 PB of official CMS data to 1 TB of ntuple output for analysis. We are presenting the progress of this 2-year project with first results of scaling up Spark-based HEP analysis. Studying the second thrust, we are presenting studies on using Apache Spark for a CMS Dark Matter physics search, comparing Spark's feasibility, usability and performance to the ROOT-based analysis.

  5. Gaussian process regression analysis for functional data

    CERN Document Server

    Shi, Jian Qing

    2011-01-01

    Gaussian Process Regression Analysis for Functional Data presents nonparametric statistical methods for functional regression analysis, specifically the methods based on a Gaussian process prior in a functional space. The authors focus on problems involving functional response variables and mixed covariates of functional and scalar variables.Covering the basics of Gaussian process regression, the first several chapters discuss functional data analysis, theoretical aspects based on the asymptotic properties of Gaussian process regression models, and new methodological developments for high dime

  6. Guide on reflectivity data analysis

    International Nuclear Information System (INIS)

    Lee, Jeong Soo; Ku, Ja Seung; Seong, Baek Seok; Lee, Chang Hee; Hong, Kwang Pyo; Choi, Byung Hoon

    2004-09-01

    This report contains reduction and fitting process of neutron reflectivity data by REFLRED and REFLFIT in NIST. Because the detail of data reduction like BKG, footprint and data normalization was described, it will be useful to the user who has no experience in this field. Also, reflectivity and BKG of d-PS thin film were measured by HANARO neutron reflectometer. From these, the structure of d-PS thin film was analyzed with REFLRED and REFLFIT. Because the structure of thin film such as thickness, roughness and SLD was attained in the work, the possibility of data analysis with REFLRED and REFLFIT was certified

  7. DATA ENVELOPMENT ANALYSIS OF BANKING SECTOR IN BANGLADESH

    Directory of Open Access Journals (Sweden)

    Md. Rashedul Hoque

    2012-05-01

    Full Text Available Banking sector of Bangladesh is flourishing and contributing to its economy. In this aspect measuring efficiency is important. Data Envelopment Analysis technique is used for this purpose. The data are collected from the annual reports of twenty four different banks in Bangladesh. Data Envelopment Analysis is mainly of two types - constant returns to scale and variable returns to scale. Since this study attempts to maximize output, so the output oriented Data Envelopment Analysis is used. The most efficient bank is one that obtains the highest efficiency score.

  8. A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

    International Nuclear Information System (INIS)

    Titov, M; Záruba, G; De, K; Klimentov, A

    2012-01-01

    One of the most important aspects in any computing distribution system is efficient data replication over storage or computing centers, that guarantees high data availability and low cost for resource utilization. In this paper we propose a data distribution scheme for the production and distributed analysis system PanDA at the ATLAS experiment. Our proposed scheme is based on the investigation of data usage. Thus, the paper is focused on the main concepts of data popularity in the PanDA system and their utilization. Data popularity is represented as the set of parameters that are used to predict the future data state in terms of popularity levels.

  9. ADAGE signature analysis: differential expression analysis with data-defined gene sets.

    Science.gov (United States)

    Tan, Jie; Huyck, Matthew; Hu, Dongbo; Zelaya, René A; Hogan, Deborah A; Greene, Casey S

    2017-11-22

    Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server ( http://adage.greenelab.com ) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and ∆anr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. We designed

  10. Bridging ImmunoGenomic Data Analysis Workflow Gaps (BIGDAWG): An integrated case-control analysis pipeline.

    Science.gov (United States)

    Pappas, Derek J; Marin, Wesley; Hollenbach, Jill A; Mack, Steven J

    2016-03-01

    Bridging ImmunoGenomic Data-Analysis Workflow Gaps (BIGDAWG) is an integrated data-analysis pipeline designed for the standardized analysis of highly-polymorphic genetic data, specifically for the HLA and KIR genetic systems. Most modern genetic analysis programs are designed for the analysis of single nucleotide polymorphisms, but the highly polymorphic nature of HLA and KIR data require specialized methods of data analysis. BIGDAWG performs case-control data analyses of highly polymorphic genotype data characteristic of the HLA and KIR loci. BIGDAWG performs tests for Hardy-Weinberg equilibrium, calculates allele frequencies and bins low-frequency alleles for k×2 and 2×2 chi-squared tests, and calculates odds ratios, confidence intervals and p-values for each allele. When multi-locus genotype data are available, BIGDAWG estimates user-specified haplotypes and performs the same binning and statistical calculations for each haplotype. For the HLA loci, BIGDAWG performs the same analyses at the individual amino-acid level. Finally, BIGDAWG generates figures and tables for each of these comparisons. BIGDAWG obviates the error-prone reformatting needed to traffic data between multiple programs, and streamlines and standardizes the data-analysis process for case-control studies of highly polymorphic data. BIGDAWG has been implemented as the bigdawg R package and as a free web application at bigdawg.immunogenomics.org. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  11. Qualitative case study data analysis: an example from practice.

    Science.gov (United States)

    Houghton, Catherine; Murphy, Kathy; Shaw, David; Casey, Dympna

    2015-05-01

    To illustrate an approach to data analysis in qualitative case study methodology. There is often little detail in case study research about how data were analysed. However, it is important that comprehensive analysis procedures are used because there are often large sets of data from multiple sources of evidence. Furthermore, the ability to describe in detail how the analysis was conducted ensures rigour in reporting qualitative research. The research example used is a multiple case study that explored the role of the clinical skills laboratory in preparing students for the real world of practice. Data analysis was conducted using a framework guided by the four stages of analysis outlined by Morse ( 1994 ): comprehending, synthesising, theorising and recontextualising. The specific strategies for analysis in these stages centred on the work of Miles and Huberman ( 1994 ), which has been successfully used in case study research. The data were managed using NVivo software. Literature examining qualitative data analysis was reviewed and strategies illustrated by the case study example provided. Discussion Each stage of the analysis framework is described with illustration from the research example for the purpose of highlighting the benefits of a systematic approach to handling large data sets from multiple sources. By providing an example of how each stage of the analysis was conducted, it is hoped that researchers will be able to consider the benefits of such an approach to their own case study analysis. This paper illustrates specific strategies that can be employed when conducting data analysis in case study research and other qualitative research designs.

  12. Hurricane Data Analysis Tool

    Science.gov (United States)

    Liu, Zhong; Ostrenga, Dana; Leptoukh, Gregory

    2011-01-01

    In order to facilitate Earth science data access, the NASA Goddard Earth Sciences Data Information Services Center (GES DISC) has developed a web prototype, the Hurricane Data Analysis Tool (HDAT; URL: http://disc.gsfc.nasa.gov/HDAT), to allow users to conduct online visualization and analysis of several remote sensing and model datasets for educational activities and studies of tropical cyclones and other weather phenomena. With a web browser and few mouse clicks, users can have a full access to terabytes of data and generate 2-D or time-series plots and animation without downloading any software and data. HDAT includes data from the NASA Tropical Rainfall Measuring Mission (TRMM), the NASA Quick Scatterometer(QuikSCAT) and NECP Reanalysis, and the NCEP/CPC half-hourly, 4-km Global (60 N - 60 S) IR Dataset. The GES DISC archives TRMM data. The daily global rainfall product derived from the 3-hourly multi-satellite precipitation product (3B42 V6) is available in HDAT. The TRMM Microwave Imager (TMI) sea surface temperature from the Remote Sensing Systems is in HDAT as well. The NASA QuikSCAT ocean surface wind and the NCEP Reanalysis provide ocean surface and atmospheric conditions, respectively. The global merged IR product, also known as, the NCEP/CPC half-hourly, 4-km Global (60 N -60 S) IR Dataset, is one of TRMM ancillary datasets. They are globally-merged pixel-resolution IR brightness temperature data (equivalent blackbody temperatures), merged from all available geostationary satellites (GOES-8/10, METEOSAT-7/5 & GMS). The GES DISC has collected over 10 years of the data beginning from February of 2000. This high temporal resolution (every 30 minutes) dataset not only provides additional background information to TRMM and other satellite missions, but also allows observing a wide range of meteorological phenomena from space, such as, hurricanes, typhoons, tropical cyclones, mesoscale convection system, etc. Basic functions include selection of area of

  13. Analysis of Home Health Sensor Data

    NARCIS (Netherlands)

    Kröse, B.; van Hoof, J.; Demiris, G.; Wouters, E.J.M.

    2014-01-01

    This chapter focuses on the analysis of data that is collected from sensors in the home environment. First we discuss the need for a good model that relates sensor data (or features derived from the data) to indicators of health and well-being. Then we present several methods for model building. We

  14. Processing of pulse oximeter data using discrete wavelet analysis.

    Science.gov (United States)

    Lee, Seungjoon; Ibey, Bennett L; Xu, Weijian; Wilson, Mark A; Ericson, M Nance; Coté, Gerard L

    2005-07-01

    A wavelet-based signal processing technique was employed to improve an implantable blood perfusion monitoring system. Data was acquired from both in vitro and in vivo sources: a perfusion model and the proximal jejunum of an adult pig. Results showed that wavelet analysis could isolate perfusion signals from raw, periodic, in vitro data as well as fast Fourier transform (FFT) methods. However, for the quasi-periodic in vivo data segments, wavelet analysis provided more consistent results than the FFT analysis for data segments of 50, 10, and 5 s in length. Wavelet analysis has thus been shown to require less data points for quasi-periodic data than FFT analysis making it a good choice for an indwelling perfusion monitor where power consumption and reaction time are paramount.

  15. Data formats design of laser irradiation experiments in view of data analysis

    International Nuclear Information System (INIS)

    Su Chunxiao; Yu Xiaoqi; Yang Cunbang; Guo Su; Chen Hongsu

    2002-01-01

    The designing rules of new data file formats of laser irradiation experiments are introduced. Object-oriented programs are designed in studying experimental data of the laser facilities. The new format data files are combinations of the experiment data and diagnostic configuration data, which are applied in data processing and analysis. The edit of diagnostic configuration data in data acquisition program is also described

  16. CMS Data Analysis: Current Status and Future Strategy

    CERN Document Server

    Innocente, V

    2003-01-01

    We present the current status of CMS data analysis architecture and describe work on future Grid-based distributed analysis prototypes. CMS has two main software frameworks related to data analysis: COBRA, the main framework, and IGUANA, the interactive visualisation framework. Software using these frameworks is used today in the world-wide production and analysis of CMS data. We describe their overall design and present examples of their current use with emphasis on interactive analysis. CMS is currently developing remote analysis prototypes, including one based on Clarens, a Grid-enabled client-server tool. Use of the prototypes by CMS physicists will guide us in forming a Grid-enriched analysis strategy. The status of this work is presented, as is an outline of how we plan to leverage the power of our existing frameworks in the migration of CMS software to the Grid.

  17. Quality Analysis of Open Street Map Data

    Science.gov (United States)

    Wang, M.; Li, Q.; Hu, Q.; Zhou, M.

    2013-05-01

    Crowd sourcing geographic data is an opensource geographic data which is contributed by lots of non-professionals and provided to the public. The typical crowd sourcing geographic data contains GPS track data like OpenStreetMap, collaborative map data like Wikimapia, social websites like Twitter and Facebook, POI signed by Jiepang user and so on. These data will provide canonical geographic information for pubic after treatment. As compared with conventional geographic data collection and update method, the crowd sourcing geographic data from the non-professional has characteristics or advantages of large data volume, high currency, abundance information and low cost and becomes a research hotspot of international geographic information science in the recent years. Large volume crowd sourcing geographic data with high currency provides a new solution for geospatial database updating while it need to solve the quality problem of crowd sourcing geographic data obtained from the non-professionals. In this paper, a quality analysis model for OpenStreetMap crowd sourcing geographic data is proposed. Firstly, a quality analysis framework is designed based on data characteristic analysis of OSM data. Secondly, a quality assessment model for OSM data by three different quality elements: completeness, thematic accuracy and positional accuracy is presented. Finally, take the OSM data of Wuhan for instance, the paper analyses and assesses the quality of OSM data with 2011 version of navigation map for reference. The result shows that the high-level roads and urban traffic network of OSM data has a high positional accuracy and completeness so that these OSM data can be used for updating of urban road network database.

  18. Inspection, visualisation and analysis of quantitative proteomics data

    OpenAIRE

    Gatto, Laurent

    2016-01-01

    Material Quantitative Proteomics and Data Analysis Course. 4 - 5 April 2016, Queen Hotel, Chester, UK Table D - Inspection, visualisation and analysis of quantitative proteomics data, Laurent Gatto (University of Cambridge)

  19. Data Analysis and reduction in Hanford's corrosion monitoring systems

    International Nuclear Information System (INIS)

    EDGEMON, G.L.

    1999-01-01

    A project to improve the Hanford Site's corrosion monitoring strategy was started in 1995. The project is designed to integrate EN-based corrosion monitoring into the site's corrosion monitoring strategy. In order to monitor multiple tanks, a major focus of this project has been to automate the data collection and analysis process. Data collection and analysis from the early EN corrosion monitoring equipment (241-AZ-101 and 241-AN-107) was primarily performed manually by a trained operator skilled in the analysis of EN data. Thousands of raw data files were collected, manually sorted and stored. Further statistical analysis of these files was performed by manually stripping out data from thousands of raw data files and calculating statistics in a spreadsheet format. Plotting and other graphical display analyses were performed by manually exporting data from the data files or spreadsheet into another plotting or presentation software package. In 1999, an Amulet/PRP system was procured and employed on the 241-AN-102 corrosion monitoring system. A duplicate system was purchased for use on the upcoming 241-AN-105 system. A third system has been procured and will eventually be used to upgrade the 241-AN-107 system. The Amulet software has greatly improved the automation of waste tank EN data analysis. In contrast with previous systems, the Amulet operator no longer has to manually collect, sort, store, and analyze thousands of raw EN data files. Amulet writes all data to a single database. Statistical analysis, uniform corrosion rate, and other derived parameters are automatically calculated in Amulet from the raw data while the raw data are being collected. Other improvements in plotting and presentation make inspection of the data a much quicker and relatively easy task. These and other improvements have greatly improved the speed at which EN data can be analyzed in addition to improving the quality of the final interpretation. The increase in data automation offered

  20. Earth Science Data Analysis in the Era of Big Data

    Science.gov (United States)

    Kuo, K.-S.; Clune, T. L.; Ramachandran, R.

    2014-01-01

    Anyone with even a cursory interest in information technology cannot help but recognize that "Big Data" is one of the most fashionable catchphrases of late. From accurate voice and facial recognition, language translation, and airfare prediction and comparison, to monitoring the real-time spread of flu, Big Data techniques have been applied to many seemingly intractable problems with spectacular successes. They appear to be a rewarding way to approach many currently unsolved problems. Few fields of research can claim a longer history with problems involving voluminous data than Earth science. The problems we are facing today with our Earth's future are more complex and carry potentially graver consequences than the examples given above. How has our climate changed? Beside natural variations, what is causing these changes? What are the processes involved and through what mechanisms are these connected? How will they impact life as we know it? In attempts to answer these questions, we have resorted to observations and numerical simulations with ever-finer resolutions, which continue to feed the "data deluge." Plausibly, many Earth scientists are wondering: How will Big Data technologies benefit Earth science research? As an example from the global water cycle, one subdomain among many in Earth science, how would these technologies accelerate the analysis of decades of global precipitation to ascertain the changes in its characteristics, to validate these changes in predictive climate models, and to infer the implications of these changes to ecosystems, economies, and public health? Earth science researchers need a viable way to harness the power of Big Data technologies to analyze large volumes and varieties of data with velocity and veracity. Beyond providing speedy data analysis capabilities, Big Data technologies can also play a crucial, albeit indirect, role in boosting scientific productivity by facilitating effective collaboration within an analysis environment

  1. Abnormal traffic flow data detection based on wavelet analysis

    Directory of Open Access Journals (Sweden)

    Xiao Qian

    2016-01-01

    Full Text Available In view of the traffic flow data of non-stationary, the abnormal data detection is difficult.proposed basing on the wavelet analysis and least squares method of abnormal traffic flow data detection in this paper.First using wavelet analysis to make the traffic flow data of high frequency and low frequency component and separation, and then, combined with least square method to find abnormal points in the reconstructed signal data.Wavelet analysis and least square method, the simulation results show that using wavelet analysis of abnormal traffic flow data detection, effectively reduce the detection results of misjudgment rate and false negative rate.

  2. Full second order chromatographic/spectrometric data matrices for automated sample identification and component analysis by non-data-reducing image analysis

    DEFF Research Database (Denmark)

    Nielsen, Niles-Peter Vest; Smedsgaard, Jørn; Frisvad, Jens Christian

    1999-01-01

    A data analysis method is proposed for identification and for confirmation of classification schemes, based on single- or multiple-wavelength chromatographic profiles. The proposed method works directly on the chromatographic data without data reduction procedures such as peak area or retention...... classes from the reference chromatograms, This feature is a valuable aid in selecting components for further analysis, The identification method is demonstrated on two data sets: 212 isolates from 41 food-borne Penicillium species and 61 isolates from 6 soil-borne Penicillium species. Both data sets...

  3. A program for activation analysis data processing

    International Nuclear Information System (INIS)

    Janczyszyn, J.; Loska, L.; Taczanowski, S.

    1978-01-01

    An ALGOL program for activation analysis data handling is presented. The program may be used either for single channel spectrometry data or for multichannel spectrometry. The calculation of instrumental error and of analysis standard deviation is carried out. The outliers are tested, and the regression line diagram with the related observations are plotted by the program. (author)

  4. Functional data analysis of sleeping energy expenditure.

    Science.gov (United States)

    Lee, Jong Soo; Zakeri, Issa F; Butte, Nancy F

    2017-01-01

    Adequate sleep is crucial during childhood for metabolic health, and physical and cognitive development. Inadequate sleep can disrupt metabolic homeostasis and alter sleeping energy expenditure (SEE). Functional data analysis methods were applied to SEE data to elucidate the population structure of SEE and to discriminate SEE between obese and non-obese children. Minute-by-minute SEE in 109 children, ages 5-18, was measured in room respiration calorimeters. A smoothing spline method was applied to the calorimetric data to extract the true smoothing function for each subject. Functional principal component analysis was used to capture the important modes of variation of the functional data and to identify differences in SEE patterns. Combinations of functional principal component analysis and classifier algorithm were used to classify SEE. Smoothing effectively removed instrumentation noise inherent in the room calorimeter data, providing more accurate data for analysis of the dynamics of SEE. SEE exhibited declining but subtly undulating patterns throughout the night. Mean SEE was markedly higher in obese than non-obese children, as expected due to their greater body mass. SEE was higher among the obese than non-obese children (p0.1, after post hoc testing). Functional principal component scores for the first two components explained 77.8% of the variance in SEE and also differed between groups (p = 0.037). Logistic regression, support vector machine or random forest classification methods were able to distinguish weight-adjusted SEE between obese and non-obese participants with good classification rates (62-64%). Our results implicate other factors, yet to be uncovered, that affect the weight-adjusted SEE of obese and non-obese children. Functional data analysis revealed differences in the structure of SEE between obese and non-obese children that may contribute to disruption of metabolic homeostasis.

  5. Visual modeling in an analysis of multidimensional data

    Science.gov (United States)

    Zakharova, A. A.; Vekhter, E. V.; Shklyar, A. V.; Pak, A. J.

    2018-01-01

    The article proposes an approach to solve visualization problems and the subsequent analysis of multidimensional data. Requirements to the properties of visual models, which were created to solve analysis problems, are described. As a perspective direction for the development of visual analysis tools for multidimensional and voluminous data, there was suggested an active use of factors of subjective perception and dynamic visualization. Practical results of solving the problem of multidimensional data analysis are shown using the example of a visual model of empirical data on the current state of studying processes of obtaining silicon carbide by an electric arc method. There are several results of solving this problem. At first, an idea of possibilities of determining the strategy for the development of the domain, secondly, the reliability of the published data on this subject, and changes in the areas of attention of researchers over time.

  6. Data management and statistical analysis for environmental assessment

    International Nuclear Information System (INIS)

    Wendelberger, J.R.; McVittie, T.I.

    1995-01-01

    Data management and statistical analysis for environmental assessment are important issues on the interface of computer science and statistics. Data collection for environmental decision making can generate large quantities of various types of data. A database/GIS system developed is described which provides efficient data storage as well as visualization tools which may be integrated into the data analysis process. FIMAD is a living database and GIS system. The system has changed and developed over time to meet the needs of the Los Alamos National Laboratory Restoration Program. The system provides a repository for data which may be accessed by different individuals for different purposes. The database structure is driven by the large amount and varied types of data required for environmental assessment. The integration of the database with the GIS system provides the foundation for powerful visualization and analysis capabilities

  7. Method for statistical data analysis of multivariate observations

    CERN Document Server

    Gnanadesikan, R

    1997-01-01

    A practical guide for multivariate statistical techniques-- now updated and revised In recent years, innovations in computer technology and statistical methodologies have dramatically altered the landscape of multivariate data analysis. This new edition of Methods for Statistical Data Analysis of Multivariate Observations explores current multivariate concepts and techniques while retaining the same practical focus of its predecessor. It integrates methods and data-based interpretations relevant to multivariate analysis in a way that addresses real-world problems arising in many areas of inte

  8. ADAS: Atomic data, modelling and analysis for fusion

    International Nuclear Information System (INIS)

    Summers, H. P.; O'Mullane, M. G.; Whiteford, A. D.; Badnell, N. R.; Loch, S. D.

    2007-01-01

    The Atomic Data and Analysis Structure, ADAS, comprises extensive fundamental and derived atomic data collections, interactive codes for the manipulation and generation of collisional-radiative data and models, off-line codes for large scale fundamental atomic data production and codes for diagnostic analysis in the fusion and astrophysical environments. ADAS data are organized according to precise specifications, tuned to application and are assigned to numbered ADAS data formats. Some of these formats contain very large quantities of data and some have achieved wide-scale adoption in the fusion community.The paper focuses on recent extensions of ADAS designed to orient ADAS to the needs of ITER. The issue of heavy atomic species, expected to be present as ITER wall and divertor materials, dopants or control species, will be addressed with a view to the economized handling of the emission and ionisation state data needed for diagnostic spectral analysis. Charge exchange and beam emission spectroscopic capabilities and developments in ADAS will be reviewed from an ITER perspective and in the context of a shared analysis between fusion laboratories. Finally an overview and summary of current large scale fundamental data production in the framework of the ADAS project will be given and its intended availability in both fusion and astrophysics noted

  9. Status of MTP Data Analysis for TCSP

    Science.gov (United States)

    Mahoney, Michael J.

    2006-01-01

    Topics covered include: a) MTP temperature calibration and data analysis; b) Background for interpreting MTP data; c) Large amplitude temperature structure; d) Gravity waves (GWs) in MTP data; and e) Subsidence over hurricanes.

  10. A data skimming service for locally resident analysis data

    International Nuclear Information System (INIS)

    Cranshaw, J; Gieraltowski, J; Malon, D; May, E; Gardner, R W; Mambelli, M

    2008-01-01

    A Data Skimming Service (DSS) is a site-level service for rapid event filtering and selection from locally resident datasets based on metadata queries to associated 'tag' databases. In US ATLAS, we expect most if not all of the AOD-based datasets to be replicated to each of the five Tier 2 regional facilities in the US Tier 1 'cloud' coordinated by Brookhaven National Laboratory. Entire datasets will consist of on the order of several terabytes of data, and providing easy, quick access to skimmed subsets of these data will be vital to physics working groups. Typically, physicists will be interested in portions of the complete datasets, selected according to event-level attributes (number of jets, missing Et, etc) and content (specific analysis objects for subsequent processing). In this paper we describe methods used to classify data (metadata tag generation) and to store these results in a local database. Next we discuss a general framework which includes methods for accessing this information, defining skims, specifying event output content, accessing locally available storage through a variety of interfaces (SRM, dCache/dccp, gridftp), accessing remote storage elements as specified, and user job submission tools through local or grid schedulers. The advantages of the DSS are the ability to quickly 'browse' datasets and design skims, for example, pre-adjusting cuts to get to a desired skim level with minimal use of compute resources, and to encode these analysis operations in a database for re-analysis and archival purposes. Additionally the framework has provisions to operate autonomously in the event that external, central resources are not available, and to provide, as a reduced package, a minimal skimming service tailored to the needs of small Tier 3 centres or individual users

  11. Hierarchical modeling and analysis for spatial data

    CERN Document Server

    Banerjee, Sudipto; Gelfand, Alan E

    2003-01-01

    Among the many uses of hierarchical modeling, their application to the statistical analysis of spatial and spatio-temporal data from areas such as epidemiology And environmental science has proven particularly fruitful. Yet to date, the few books that address the subject have been either too narrowly focused on specific aspects of spatial analysis, or written at a level often inaccessible to those lacking a strong background in mathematical statistics.Hierarchical Modeling and Analysis for Spatial Data is the first accessible, self-contained treatment of hierarchical methods, modeling, and dat

  12. Conducting Qualitative Data Analysis: Managing Dynamic Tensions within

    Science.gov (United States)

    Chenail, Ronald J.

    2012-01-01

    In the third of a series of "how-to" essays on conducting qualitative data analysis, Ron Chenail examines the dynamic tensions within the process of qualitative data analysis that qualitative researchers must manage in order to produce credible and creative results. These tensions include (a) the qualities of the data and the qualitative data…

  13. The Analysis of Polyploid Genetic Data.

    Science.gov (United States)

    Meirmans, Patrick G; Liu, Shenglin; van Tienderen, Peter H

    2018-03-16

    Though polyploidy is an important aspect of the evolutionary genetics of both plants and animals, the development of population genetic theory of polyploids has seriously lagged behind that of diploids. This is unfortunate since the analysis of polyploid genetic data-and the interpretation of the results-requires even more scrutiny than with diploid data. This is because of several polyploidy-specific complications in segregation and genotyping such as tetrasomy, double reduction, and missing dosage information. Here, we review the theoretical and statistical aspects of the population genetics of polyploids. We discuss several widely used types of inferences, including genetic diversity, Hardy-Weinberg equilibrium, population differentiation, genetic distance, and detecting population structure. For each, we point out how the statistical approach, expected result, and interpretation differ between different ploidy levels. We also discuss for each type of inference what biases may arise from the polyploid-specific complications and how these biases can be overcome. From our overview, it is clear that the statistical toolbox that is available for the analysis of genetic data is flexible and still expanding. Modern sequencing techniques will soon be able to overcome some of the current limitations to the analysis of polyploid data, though the techniques are lagging behind those available for diploids. Furthermore, the availability of more data may aggravate the biases that can arise, and increase the risk of false inferences. Therefore, simulations such as we used throughout this review are an important tool to verify the results of analyses of polyploid genetic data.

  14. Statistical Analysis of Research Data | Center for Cancer Research

    Science.gov (United States)

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. The Statistical Analysis of Research Data (SARD) course will be held on April 5-6, 2018 from 9 a.m.-5 p.m. at the National Institutes of Health's Natcher Conference Center, Balcony C on the Bethesda Campus. SARD is designed to provide an overview on the general principles of statistical analysis of research data.  The first day will feature univariate data analysis, including descriptive statistics, probability distributions, one- and two-sample inferential statistics.

  15. Mining survey data for SWOT analysis

    OpenAIRE

    Phadermrod, Boonyarat

    2016-01-01

    Strengths, Weaknesses, Opportunities and Threats (SWOT) analysis is one of the most important tools for strategic planning. The traditional method of conducting SWOT analysis does not prioritize and is likely to hold subjective views that may result in an improper strategic action. Accordingly, this research exploits Importance-Performance Analysis (IPA), a technique for measuring customers’ satisfaction based on survey data, to systematically generate prioritized SWOT factors based on custom...

  16. Data Analysis for the LISA Pathfinder Mission

    Science.gov (United States)

    Thorpe, James Ira

    2009-01-01

    The LTP (LISA Technology Package) is the core part of the Laser Interferometer Space Antenna (LISA) Pathfinder mission. The main goal of the mission is to study the sources of any disturbances that perturb the motion of the freely-falling test masses from their geodesic trajectories as well as 10 test various technologies needed for LISA. The LTP experiment is designed as a sequence of experimental runs in which the performance of the instrument is studied and characterized under different operating conditions. In order to best optimize subsequent experimental runs, each run must be promptly analysed to ensure that the following ones make best use of the available knowledge of the instrument ' In order to do this, all analyses must be designed and tested in advance of the mission and have sufficient built-in flexibility to account for unexpected results or behaviour. To support this activity, a robust and flexible data analysis software package is also required. This poster presents two of the main components that make up the data analysis effort: the data analysis software and the mock-data challenges used to validate analysis procedures and experiment designs.

  17. Matrix-based introduction to multivariate data analysis

    CERN Document Server

    Adachi, Kohei

    2016-01-01

    This book enables readers who may not be familiar with matrices to understand a variety of multivariate analysis procedures in matrix forms. Another feature of the book is that it emphasizes what model underlies a procedure and what objective function is optimized for fitting the model to data. The author believes that the matrix-based learning of such models and objective functions is the fastest way to comprehend multivariate data analysis. The text is arranged so that readers can intuitively capture the purposes for which multivariate analysis procedures are utilized: plain explanations of the purposes with numerical examples precede mathematical descriptions in almost every chapter. This volume is appropriate for undergraduate students who already have studied introductory statistics. Graduate students and researchers who are not familiar with matrix-intensive formulations of multivariate data analysis will also find the book useful, as it is based on modern matrix formulations with a special emphasis on ...

  18. A Probabilistic Analysis of Data Popularity in ATLAS Data Caching

    CERN Document Server

    Titov, M; The ATLAS collaboration; Záruba, G; De, K

    2012-01-01

    Efficient distribution of physics data over ATLAS grid sites is one of the most important tasks for user data processing. ATLAS' initial static data distribution model over-replicated some unpopular data and under-replicated popular data, creating heavy disk space loads while underutilizing some processing resources due to low data availability. Thus, a new data distribution mechanism was implemented, PD2P (PanDA Dynamic Data Placement) within the production and distributed analysis system PanDA that dynamically reacts to user data needs, basing dataset distribution principally on user demand. Data deletion is also demand driven, reducing replica counts for unpopular data. This dynamic model has led to substantial improvements in efficient utilization of storage and processing resources. Based on this experience, in this work we seek to further improve data placement policy by investigating in detail how data popularity is calculated. For this it is necessary to precisely define what data popularity means, wh...

  19. Data-driven security analysis, visualization and dashboards

    CERN Document Server

    Jacobs, Jay

    2014-01-01

    Uncover hidden patterns of data and respond with countermeasures Security professionals need all the tools at their disposal to increase their visibility in order to prevent security breaches and attacks. This careful guide explores two of the most powerful ? data analysis and visualization. You'll soon understand how to harness and wield data, from collection and storage to management and analysis as well as visualization and presentation. Using a hands-on approach with real-world examples, this book shows you how to gather feedback, measure the effectiveness of your security methods, and ma

  20. Data structures and algorithm analysis in C++

    CERN Document Server

    Shaffer, Clifford A

    2011-01-01

    With its focus on creating efficient data structures and algorithms, this comprehensive text helps readers understand how to select or design the tools that will best solve specific problems. It uses Microsoft C++ as the programming language and is suitable for second-year data structure courses and computer science courses in algorithm analysis.Techniques for representing data are presented within the context of assessing costs and benefits, promoting an understanding of the principles of algorithm analysis and the effects of a chosen physical medium. The text also explores tradeoff issues, f

  1. Data structures and algorithm analysis in Java

    CERN Document Server

    Shaffer, Clifford A

    2011-01-01

    With its focus on creating efficient data structures and algorithms, this comprehensive text helps readers understand how to select or design the tools that will best solve specific problems. It uses Java as the programming language and is suitable for second-year data structure courses and computer science courses in algorithm analysis. Techniques for representing data are presented within the context of assessing costs and benefits, promoting an understanding of the principles of algorithm analysis and the effects of a chosen physical medium. The text also explores tradeoff issues, familiari

  2. Simulation and analysis of plutonium reprocessing plant data

    International Nuclear Information System (INIS)

    Burr, T.; Coulter, A.; Wangen, L.

    1996-01-01

    It will be difficult for large-throughput reprocessing plants to meet International Atomic Energy Agency (IAEA) detection goals for protracted diversion of plutonium by materials accounting alone. Therefore, the IAEA is considering supplementing traditional material balance analysis with analysis of solution monitoring data (frequent snapshots of such solution parameters as level, density, and temperature for all major process vessels). Analysis of solution monitoring data will enhance safeguards by improving anomaly detection and resolution, maintaining continuity of knowledge, and validating and improving measurement error models. However, there are costs associated with accessing and analyzing the data. To minimize these costs, analysis methods should be as complete as possible simple to implement, and require little human effort. As a step toward that goal, the authors have implemented simple analysis methods for use in an off-line situation. These methods use solution level to recognize major tank activities, such as tank-to-tank transfers and sampling. In this paper, the authors describe their application to realistic simulated data (the methods were developed by using both real and simulated data), and they present some quantifiable benefits of solution monitoring

  3. Approaches to data analysis of multiple-choice questions

    OpenAIRE

    Lin Ding; Robert Beichner

    2009-01-01

    This paper introduces five commonly used approaches to analyzing multiple-choice test data. They are classical test theory, factor analysis, cluster analysis, item response theory, and model analysis. Brief descriptions of the goals and algorithms of these approaches are provided, together with examples illustrating their applications in physics education research. We minimize mathematics, instead placing emphasis on data interpretation using these approaches.

  4. High-Level Overview of Data Needs for RE Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, Anthony

    2016-12-22

    This presentation provides a high level overview of analysis topics and associated data needs. Types of renewable energy analysis are grouped into two buckets: First, analysis for renewable energy potential, and second, analysis for other goals. Data requirements are similar but and they build upon one another.

  5. R data analysis without programming

    CERN Document Server

    Gerbing, David W

    2013-01-01

    This book prepares readers to analyze data and interpret statistical results using R more quickly than other texts. R is a challenging program to learn because code must be created to get started. To alleviate that challenge, Professor Gerbing developed lessR. LessR extensions remove the need to program. By introducing R through less R, readers learn how to organize data for analysis, read the data into R, and produce output without performing numerous functions and programming exercises first. With lessR, readers can select the necessary procedure and change the relevant variables without pro

  6. Probabilistic Principal Component Analysis for Metabolomic Data.

    LENUS (Irish Health Repository)

    Nyamundanda, Gift

    2010-11-23

    Abstract Background Data from metabolomic studies are typically complex and high-dimensional. Principal component analysis (PCA) is currently the most widely used statistical technique for analyzing metabolomic data. However, PCA is limited by the fact that it is not based on a statistical model. Results Here, probabilistic principal component analysis (PPCA) which addresses some of the limitations of PCA, is reviewed and extended. A novel extension of PPCA, called probabilistic principal component and covariates analysis (PPCCA), is introduced which provides a flexible approach to jointly model metabolomic data and additional covariate information. The use of a mixture of PPCA models for discovering the number of inherent groups in metabolomic data is demonstrated. The jackknife technique is employed to construct confidence intervals for estimated model parameters throughout. The optimal number of principal components is determined through the use of the Bayesian Information Criterion model selection tool, which is modified to address the high dimensionality of the data. Conclusions The methods presented are illustrated through an application to metabolomic data sets. Jointly modeling metabolomic data and covariates was successfully achieved and has the potential to provide deeper insight to the underlying data structure. Examination of confidence intervals for the model parameters, such as loadings, allows for principled and clear interpretation of the underlying data structure. A software package called MetabolAnalyze, freely available through the R statistical software, has been developed to facilitate implementation of the presented methods in the metabolomics field.

  7. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  8. Licensing Support System: Preliminary data scope analysis

    International Nuclear Information System (INIS)

    1989-01-01

    The purpose of this analysis is to determine the content and scope of the Licensing Support System (LSS) data base. Both user needs and currently available data bases that, at least in part, address those needs have been analyzed. This analysis, together with the Preliminary Needs Analysis (DOE, 1988d) is a first effort under the LSS Design and Implementation Contract toward developing a sound requirements foundation for subsequent design work. These reports are preliminary. Further refinements must be made before requirements can be specified in sufficient detail to provide a basis for suitably specific system specifications. This document provides a baseline for what is known at this time. Additional analyses, currently being conducted, will provide more precise information on the content and scope of the LSS data base. 23 refs., 4 figs., 8 tabs

  9. Complex Visual Data Analysis, Uncertainty, and Representation

    National Research Council Canada - National Science Library

    Schunn, Christian D; Saner, Lelyn D; Kirschenbaum, Susan K; Trafton, J. G; Littleton, Eliza B

    2007-01-01

    ... (weather forecasting, submarine target motion analysis, and fMRI data analysis). Internal spatial representations are coded from spontaneous gestures made during cued-recall summaries of problem solving activities...

  10. The Role of Microsimulation in Longitudinal Data Analysis

    Directory of Open Access Journals (Sweden)

    Douglas A. Wolf

    2001-12-01

    microsimulation also has the potential to contribute to longitudinal data analysis in several ways, including extending the range of outputs generated by a model, addressing several defective-data problems, and serving as a vehicle for missing-data imputation. This paper discusses microsimulation procedures suitable for several commonly-used statistical models applied to longitudinal data. It also addresses the unique role that can be played by microsimulation in longitudinal data analysis, and the problem of accounting for the several sources of variability associated with microsimulation procedures.

  11. Utilization of Integrated Process Control, Data Capture, and Data Analysis in Construction of Accelerator Systems

    International Nuclear Information System (INIS)

    Bonnie Madre; Charles Reece; Joseph Ozelis; Valerie Bookwalter

    2003-01-01

    Jefferson Lab has developed a web-based system that integrates commercial database, data analysis, document archiving and retrieval, and user interface software, into a coherent knowledge management product (Pansophy). This product provides important tools for the successful pursuit of major projects such as accelerator system development and construction, by offering elements of process and procedure control, data capture and review, and data mining and analysis. After a period of initial development, Pansophy is now being used in Jefferson Lab's SNS superconducting linac construction effort, as a means for structuring and implementing the QA program, for process control and tracking, and for cryomodule test data capture and presentation/analysis. Development of Pansophy is continuing, in particular data queries and analysis functions that are the cornerstone of its utility

  12. Estimating Most Productive Scale Size in Data Envelopment Analysis with Integer Value Data

    Science.gov (United States)

    Dwi Sari, Yunita; Angria S, Layla; Efendi, Syahril; Zarlis, Muhammad

    2018-01-01

    The most productive scale size (MPSS) is a measurement that states how resources should be organized and utilized to achieve optimal results. The most productive scale size (MPSS) can be used as a benchmark for the success of an industry or company in producing goods or services. To estimate the most productive scale size (MPSS), each decision making unit (DMU) should pay attention the level of input-output efficiency, by data envelopment analysis (DEA) method decision making unit (DMU) can identify units used as references that can help to find the cause and solution from inefficiencies can optimize productivity that main advantage in managerial applications. Therefore, data envelopment analysis (DEA) is chosen to estimating most productive scale size (MPSS) that will focus on the input of integer value data with the CCR model and the BCC model. The purpose of this research is to find the best solution for estimating most productive scale size (MPSS) with input of integer value data in data envelopment analysis (DEA) method.

  13. RCRA groundwater data analysis protocol for the Hanford Site, Washington

    International Nuclear Information System (INIS)

    Chou, C.J.; Jackson, R.L.

    1992-04-01

    The Resource Conservation and Recovery Act of 1976 (RCRA) groundwater monitoring program currently involves site-specific monitoring of 20 facilities on the Hanford Site in southeastern Washington. The RCRA groundwater monitoring program has collected abundant data on groundwater quality. These data are used to assess the impact of a facility on groundwater quality or whether remediation efforts under RCRA corrective action programs are effective. Both evaluations rely on statistical analysis of groundwater monitoring data. The need for information on groundwater quality by regulators and environmental managers makes statistical analysis of monitoring data an important part of RCRA groundwater monitoring programs. The complexity of groundwater monitoring programs and variabilities (spatial, temporal, and analytical) exhibited in groundwater quality variables indicate the need for a data analysis protocol to guide statistical analysis. A data analysis protocol was developed from the perspective of addressing regulatory requirements, data quality, and management information needs. This data analysis protocol contains four elements: data handling methods; graphical evaluation techniques; statistical tests for trend, central tendency, and excursion analysis; and reporting procedures for presenting results to users

  14. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Stevens, Garrison Nicole [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Van Buren, Kendra Lu [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Hemez, Francois M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-07-21

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, and finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where

  15. Network analysis for the visualization and analysis of qualitative data.

    Science.gov (United States)

    Pokorny, Jennifer J; Norman, Alex; Zanesco, Anthony P; Bauer-Wu, Susan; Sahdra, Baljinder K; Saron, Clifford D

    2018-03-01

    We present a novel manner in which to visualize the coding of qualitative data that enables representation and analysis of connections between codes using graph theory and network analysis. Network graphs are created from codes applied to a transcript or audio file using the code names and their chronological location. The resulting network is a representation of the coding data that characterizes the interrelations of codes. This approach enables quantification of qualitative codes using network analysis and facilitates examination of associations of network indices with other quantitative variables using common statistical procedures. Here, as a proof of concept, we applied this method to a set of interview transcripts that had been coded in 2 different ways and the resultant network graphs were examined. The creation of network graphs allows researchers an opportunity to view and share their qualitative data in an innovative way that may provide new insights and enhance transparency of the analytical process by which they reach their conclusions. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  16. Integrative sparse principal component analysis of gene expression data.

    Science.gov (United States)

    Liu, Mengque; Fan, Xinyan; Fang, Kuangnan; Zhang, Qingzhao; Ma, Shuangge

    2017-12-01

    In the analysis of gene expression data, dimension reduction techniques have been extensively adopted. The most popular one is perhaps the PCA (principal component analysis). To generate more reliable and more interpretable results, the SPCA (sparse PCA) technique has been developed. With the "small sample size, high dimensionality" characteristic of gene expression data, the analysis results generated from a single dataset are often unsatisfactory. Under contexts other than dimension reduction, integrative analysis techniques, which jointly analyze the raw data of multiple independent datasets, have been developed and shown to outperform "classic" meta-analysis and other multidatasets techniques and single-dataset analysis. In this study, we conduct integrative analysis by developing the iSPCA (integrative SPCA) method. iSPCA achieves the selection and estimation of sparse loadings using a group penalty. To take advantage of the similarity across datasets and generate more accurate results, we further impose contrasted penalties. Different penalties are proposed to accommodate different data conditions. Extensive simulations show that iSPCA outperforms the alternatives under a wide spectrum of settings. The analysis of breast cancer and pancreatic cancer data further shows iSPCA's satisfactory performance. © 2017 WILEY PERIODICALS, INC.

  17. Time series analysis methods and applications for flight data

    CERN Document Server

    Zhang, Jianye

    2017-01-01

    This book focuses on different facets of flight data analysis, including the basic goals, methods, and implementation techniques. As mass flight data possesses the typical characteristics of time series, the time series analysis methods and their application for flight data have been illustrated from several aspects, such as data filtering, data extension, feature optimization, similarity search, trend monitoring, fault diagnosis, and parameter prediction, etc. An intelligent information-processing platform for flight data has been established to assist in aircraft condition monitoring, training evaluation and scientific maintenance. The book will serve as a reference resource for people working in aviation management and maintenance, as well as researchers and engineers in the fields of data analysis and data mining.

  18. Big data analysis new algorithms for a new society

    CERN Document Server

    Stefanowski, Jerzy

    2016-01-01

    This edited volume is devoted to Big Data Analysis from a Machine Learning standpoint as presented by some of the most eminent researchers in this area. It demonstrates that Big Data Analysis opens up new research problems which were either never considered before, or were only considered within a limited range. In addition to providing methodological discussions on the principles of mining Big Data and the difference between traditional statistical data analysis and newer computing frameworks, this book presents recently developed algorithms affecting such areas as business, financial forecasting, human mobility, the Internet of Things, information networks, bioinformatics, medical systems and life science. It explores, through a number of specific examples, how the study of Big Data Analysis has evolved and how it has started and will most likely continue to affect society. While the benefits brought upon by Big Data Analysis are underlined, the book also discusses some of the warnings that have been issued...

  19. Bayesian networks for omics data analysis

    NARCIS (Netherlands)

    Gavai, A.K.

    2009-01-01

    This thesis focuses on two aspects of high throughput technologies, i.e. data storage and data analysis, in particular in transcriptomics and metabolomics. Both technologies are part of a research field that is generally called ‘omics’ (or ‘-omics’, with a leading hyphen), which refers to genomics,

  20. Analyzing how we do Analysis and Consume Data, Results from the SciDAC-Data Project

    Science.gov (United States)

    Ding, P.; Aliaga, L.; Mubarak, M.; Tsaris, A.; Norman, A.; Lyon, A.; Ross, R.

    2017-10-01

    One of the main goals of the Dept. of Energy funded SciDAC-Data project is to analyze the more than 410,000 high energy physics datasets that have been collected, generated and defined over the past two decades by experiments using the Fermilab storage facilities. These datasets have been used as the input to over 5.6 million recorded analysis projects, for which detailed analytics have been gathered. The analytics and meta information for these datasets and analysis projects are being combined with knowledge of their part of the HEP analysis chains for major experiments to understand how modern computing and data delivery is being used. We present the first results of this project, which examine in detail how the CDF, D0, NOvA, MINERvA and MicroBooNE experiments have organized, classified and consumed petascale datasets to produce their physics results. The results include analysis of the correlations in dataset/file overlap, data usage patterns, data popularity, dataset dependency and temporary dataset consumption. The results provide critical insight into how workflows and data delivery schemes can be combined with different caching strategies to more efficiently perform the work required to mine these large HEP data volumes and to understand the physics analysis requirements for the next generation of HEP computing facilities. In particular we present a detailed analysis of the NOvA data organization and consumption model corresponding to their first and second oscillation results (2014-2016) and the first look at the analysis of the Tevatron Run II experiments. We present statistical distributions for the characterization of these data and data driven models describing their consumption.

  1. Approaches to data analysis of multiple-choice questions

    Directory of Open Access Journals (Sweden)

    Lin Ding

    2009-09-01

    Full Text Available This paper introduces five commonly used approaches to analyzing multiple-choice test data. They are classical test theory, factor analysis, cluster analysis, item response theory, and model analysis. Brief descriptions of the goals and algorithms of these approaches are provided, together with examples illustrating their applications in physics education research. We minimize mathematics, instead placing emphasis on data interpretation using these approaches.

  2. Toward improved analysis of concentration data: Embracing nondetects.

    Science.gov (United States)

    Shoari, Niloofar; Dubé, Jean-Sébastien

    2018-03-01

    Various statistical tests on concentration data serve to support decision-making regarding characterization and monitoring of contaminated media, assessing exposure to a chemical, and quantifying the associated risks. However, the routine statistical protocols cannot be directly applied because of challenges arising from nondetects or left-censored observations, which are concentration measurements below the detection limit of measuring instruments. Despite the existence of techniques based on survival analysis that can adjust for nondetects, these are seldom taken into account properly. A comprehensive review of the literature showed that managing policies regarding analysis of censored data do not always agree and that guidance from regulatory agencies may be outdated. Therefore, researchers and practitioners commonly resort to the most convenient way of tackling the censored data problem by substituting nondetects with arbitrary constants prior to data analysis, although this is generally regarded as a bias-prone approach. Hoping to improve the interpretation of concentration data, the present article aims to familiarize researchers in different disciplines with the significance of left-censored observations and provides theoretical and computational recommendations (under both frequentist and Bayesian frameworks) for adequate analysis of censored data. In particular, the present article synthesizes key findings from previous research with respect to 3 noteworthy aspects of inferential statistics: estimation of descriptive statistics, hypothesis testing, and regression analysis. Environ Toxicol Chem 2018;37:643-656. © 2017 SETAC. © 2017 SETAC.

  3. Data analysis and approximate models model choice, location-scale, analysis of variance, nonparametric regression and image analysis

    CERN Document Server

    Davies, Patrick Laurie

    2014-01-01

    Introduction IntroductionApproximate Models Notation Two Modes of Statistical AnalysisTowards One Mode of Analysis Approximation, Randomness, Chaos, Determinism ApproximationA Concept of Approximation Approximation Approximating a Data Set by a Model Approximation Regions Functionals and EquivarianceRegularization and Optimality Metrics and DiscrepanciesStrong and Weak Topologies On Being (almost) Honest Simulations and Tables Degree of Approximation and p-values ScalesStability of Analysis The Choice of En(α, P) Independence Procedures, Approximation and VaguenessDiscrete Models The Empirical Density Metrics and Discrepancies The Total Variation Metric The Kullback-Leibler and Chi-Squared Discrepancies The Po(λ) ModelThe b(k, p) and nb(k, p) Models The Flying Bomb Data The Student Study Times Data OutliersOutliers, Data Analysis and Models Breakdown Points and Equivariance Identifying Outliers and Breakdown Outliers in Multivariate Data Outliers in Linear Regression Outliers in Structured Data The Location...

  4. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Jaffrézic, Florence; de Koning, Dirk-Jan; Boettcher, Paul J

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical...... methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two...... quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised...

  5. Data analysis of event tape and connection

    International Nuclear Information System (INIS)

    Gong Huili

    1995-01-01

    The data analysis on the VAX-11/780 computer is briefly described, the data is from the recorded event tape of JUHU data acquisition system on the PDP-11/44 computer. The connection of the recorded event tapes of the XSYS data acquisition system on VAX computer is also introduced

  6. Multiregion analysis of creep rupture data of 316 stainless steel

    International Nuclear Information System (INIS)

    Maruyama, Kouichi; Armaki, Hassan Ghassemi; Yoshimi, Kyosuke

    2007-01-01

    A creep rupture data set of 316 stainless steel containing 319 data points at nine heats was subjected to a conventional single-region analysis and a multiregion analysis. In the former, the conventional Larson-Miller analysis was applied to the whole data set. In the latter, a data set of a single heat is divided into several data sets, so that the Orr-Sherby-Dorn (OSD) constant Q takes a unique value in each data set, and the conventional OSD analysis was applied to each divided data set. A region with a low value of Q appears in long-term creep of eight heats. Predicted values of the 10 5 h creep rupture stress of three heats were lower than the 99% confidence limit evaluated by the single-region analysis, suggesting that the single-region analysis is error prone. The multiregion analysis is necessary for the correct evaluation of the long-term creep properties of 316 stainless steel

  7. Remote Sensing Data Visualization, Fusion and Analysis via Giovanni

    Science.gov (United States)

    Leptoukh, G.; Zubko, V.; Gopalan, A.; Khayat, M.

    2007-01-01

    We describe Giovanni, the NASA Goddard developed online visualization and analysis tool that allows users explore various phenomena without learning remote sensing data formats and downloading voluminous data. Using MODIS aerosol data as an example, we formulate an approach to the data fusion for Giovanni to further enrich online multi-sensor remote sensing data comparison and analysis.

  8. Python for data analysis data wrangling with Pandas, NumPy, and IPython

    CERN Document Server

    McKinney, Wes

    2017-01-01

    Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the IPython shell and Jupyter notebook for exploratory computing Learn basic and advanced features in NumPy (Numerical Python) Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib ...

  9. Data Mining and Analysis

    Science.gov (United States)

    Samms, Kevin O.

    2015-01-01

    The Data Mining project seeks to bring the capability of data visualization to NASA anomaly and problem reporting systems for the purpose of improving data trending, evaluations, and analyses. Currently NASA systems are tailored to meet the specific needs of its organizations. This tailoring has led to a variety of nomenclatures and levels of annotation for procedures, parts, and anomalies making difficult the realization of the common causes for anomalies. Making significant observations and realizing the connection between these causes without a common way to view large data sets is difficult to impossible. In the first phase of the Data Mining project a portal was created to present a common visualization of normalized sensitive data to customers with the appropriate security access. The tool of the visualization itself was also developed and fine-tuned. In the second phase of the project we took on the difficult task of searching and analyzing the target data set for common causes between anomalies. In the final part of the second phase we have learned more about how much of the analysis work will be the job of the Data Mining team, how to perform that work, and how that work may be used by different customers in different ways. In this paper I detail how our perspective has changed after gaining more insight into how the customers wish to interact with the output and how that has changed the product.

  10. Data analysis and interpretation for environmental surveillance

    International Nuclear Information System (INIS)

    1992-06-01

    The Data Analysis and Interpretation for Environmental Surveillance Conference was held in Lexington, Kentucky, February 5--7, 1990. The conference was sponsored by what is now the Office of Environmental Compliance and Documentation, Oak Ridge National Laboratory. Participants included technical professionals from all Martin Marietta Energy Systems facilities, Westinghouse Materials Company of Ohio, Pacific Northwest Laboratory, and several technical support contractors. Presentations at the conference ranged the full spectrum of issues that effect the analysis and interpretation of environmental data. Topics included tracking systems for samples and schedules associated with ongoing programs; coalescing data from a variety of sources and pedigrees into integrated data bases; methods for evaluating the quality of environmental data through empirical estimates of parameters such as charge balance, pH, and specific conductance; statistical applications to the interpretation of environmental information; and uses of environmental information in risk and dose assessments. Hearing about and discussing this wide variety of topics provided an opportunity to capture the subtlety of each discipline and to appreciate the continuity that is required among the disciplines in order to perform high-quality environmental information analysis

  11. Using influence diagrams for data worth analysis

    International Nuclear Information System (INIS)

    Sharif Heger, A.; White, Janis E.

    1997-01-01

    Decision-making under uncertainty describes most environmental remediation and waste management problems. Inherent limitations in knowledge concerning contaminants, environmental fate and transport, remedies, and risks force decision-makers to select a course of action based on uncertain and incomplete information. Because uncertainties can be reduced by collecting additional data., uncertainty and sensitivity analysis techniques have received considerable attention. When costs associated with reducing uncertainty are considered in a decision problem, the objective changes; rather than determine what data to collect to reduce overall uncertainty, the goal is to determine what data to collect to best differentiate between possible courses of action or decision alternatives. Environmental restoration and waste management requires cost-effective methods for characterization and monitoring, and these methods must also satisfy regulatory requirements. Characterization and monitoring activities imply that, sooner or later, a decision must be made about collecting new field data. Limited fiscal resources for data collection should be committed only to those data that have the most impact on the decision at lowest possible cost. Applying influence diagrams in combination with data worth analysis produces a method which not only satisfies these requirements but also gives rise to an intuitive representation of complex structures not possible in the more traditional decision tree representation. This paper demonstrates the use of influence diagrams in data worth analysis by applying to a monitor-and-treat problem often encountered in environmental decision problems

  12. Data Processing and Analysis Systems for JT-60U

    International Nuclear Information System (INIS)

    Matsuda, T.; Totsuka, T.; Tsugita, T.; Oshima, T.; Sakata, S.; Sato, M.; Iwasaki, K.

    2002-01-01

    The JT-60U data processing system is a large computer complex gradually modernized by utilizing progressive computer and network technology. A main computer using state-of-the-art CMOS technology can handle ∼550 MB of data per discharge. A gigabit ethernet switch with FDDI ports has been introduced to cope with the increase of handling data. Workstation systems with VMEbus serial highway drivers for CAMAC have been developed and used to replace many minicomputer systems. VMEbus-based fast data acquisition systems have also been developed to enlarge and replace a minicomputer system for mass data.The JT-60U data analysis system is composed of a JT-60U database server and a JT-60U analysis server, which are distributed UNIX servers. The experimental database is stored in the 1TB RAID disk of the JT-60U database server and is composed of ZENKEI and diagnostic databases. Various data analysis tools are available on the JT-60U analysis server. For the remote collaboration, technical features of the data analysis system have been applied to the computer system to access JT-60U data via the Internet. Remote participation in JT-60U experiments has been successfully conducted since 1996

  13. System for the analysis of cohort mortality data

    International Nuclear Information System (INIS)

    McLain, R.; Frome, E.L.

    1986-01-01

    A system is developed for the analysis of cohort mortality data. This Mortality Analysis System (MAS) is designed as a research tool in epidemiologic studies. The system allows a researcher to investigate the effect of one or more factors on the mortality of a study cohort. Variables can be categorized as factors to allow for stratification in the analysis. DATA steps and PROC MATRIX are incorporated in the system to produce the output. Person-years, observed deaths, and expected deaths are calculated and cross-classified by the levels of the factors. The resulting data set can be used to compute the standardized mortality ratios (SMR) for each stratum level. Poisson regression models can then be used for further statistical analysis

  14. Parallel interactive data analysis with PROOF

    International Nuclear Information System (INIS)

    Ballintijn, Maarten; Biskup, Marek; Brun, Rene; Canal, Philippe; Feichtinger, Derek; Ganis, Gerardo; Kickinger, Guenter; Peters, Andreas; Rademakers, Fons

    2006-01-01

    The Parallel ROOT Facility, PROOF, enables the analysis of much larger data sets on a shorter time scale. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. The system provides transparent and interactive access to gigabytes today. Being part of the ROOT framework PROOF inherits the benefits of a performant object storage system and a wealth of statistical and visualization tools. This paper describes the data analysis model of ROOT and the latest developments on closer integration of PROOF into that model and the ROOT user environment, e.g. support for PROOF-based browsing of trees stored remotely, and the popular TTree::Draw() interface. We also outline the ongoing developments aimed to improve the flexibility and user-friendliness of the system

  15. Advances in Moessbauer data analysis

    International Nuclear Information System (INIS)

    Souza, Paulo A. de

    1998-01-01

    The whole Moessbauer community generates a huge amount of data in several fields of human knowledge since the first publication of Rudolf Moessbauer. Interlaboratory measurements of the same substance may result in minor differences in the Moessbauer Parameters (MP) of isomer shift, quadrupole splitting and internal magnetic field. Therefore, a conventional data bank of published MP will be of limited help in identification of substances. Data bank search for exact information became incapable to differentiate the values of Moessbauer parameters within the experimental errors (e.g., IS = 0.22 mm/s from IS = 0.23 mm/s), but physically both values may be considered the same. An artificial neural network (ANN) is able to identify a substance and its crystalline structure from measured MP, and its slight variations do not represent an obstacle for the ANN identification. A barrier to the popularization of Moessbauer spectroscopy as an analytical technique is the absence of a full automated equipment, since the analysis of a Moessbauer spectrum normally is time-consuming and requires a specialist. In this work, the fitting process of a Moessbauer spectrum was completely automated through the use of genetic algorithms and fuzzy logic. Both software and hardware systems were implemented turning out to be a fully automated Moessbauer data analysis system. The developed system will be presented

  16. PATTER, Pattern Recognition Data Analysis

    International Nuclear Information System (INIS)

    Cox, L.C. Jr.; Bender, C.F.

    1986-01-01

    1 - Description of program or function: PATTER is an interactive program with extensive facilities for modeling analytical processes and solving complex data analysis problems using statistical methods, spectral analysis, and pattern recognition techniques. PATTER addresses the type of problem generally stated as follows: given a set of objects and a list of measurements made on these objects, is it possible to find or predict a property of the objects which is not directly measurable but is known to define some unknown relationship? When employed intelligently, PATTER will act upon a data set in such a way it becomes apparent if useful information, beyond that already discerned, is contained in the data. 2 - Method of solution: In order to solve the general problem, PATTER contains preprocessing techniques to produce new variables that are related to the values of the measurements which may reduce the number of variables and/or reveal useful information about the 'obscure' property; display techniques to represent the variable space in some way that can be easily projected onto a two- or three-dimensional plot for human observation to see if any significant clustering of points occurs; and learning techniques based on both unsupervised and supervised methods, to extract as much information from the data as possible so that the optimum solution can be found

  17. Data analysis facility at LAMPF

    International Nuclear Information System (INIS)

    Perry, D.G.; Amann, J.F.; Butler, H.S.; Hoffman, C.J.; Mischke, R.E.; Shera, E.B.; Thiessen, H.A.

    1977-11-01

    This report documents the discussions and conclusions of a study held in July 1977 to develop the requirements for a data analysis facility to support the experimental program in medium-energy physics at the Clinton P. Anderson Meson Physics Facility (LAMPF). 2 tables

  18. An evaluation of Oracle for persistent data storage and analysis of LHC physics data

    International Nuclear Information System (INIS)

    Grancher, E.; Marczukajtis, M.

    2001-01-01

    CERN's IT/DB group is currently exploring the possibility of using oracle to store LHC physics data. It presents preliminary results from this work, concentrating on two aspects: the storage of RAW and the analysis of TAG data. The RAW data part of the study discusses the throughput that one can achieve with the oracle database system, the options for storing the data and an estimation of the associated overheads. The TAG data analysis focuses on the use of new and extended indexing features of oracle to perform efficient cuts on the data. The tests were performed with Oracle 8.1.7

  19. Microrheology with optical tweezers: data analysis

    International Nuclear Information System (INIS)

    Tassieri, Manlio; Warren, Rebecca L; Cooper, Jonathan M; Evans, R M L; Bailey, Nicholas J

    2012-01-01

    We present a data analysis procedure that provides the solution to a long-standing issue in microrheology studies, i.e. the evaluation of the fluids' linear viscoelastic properties from the analysis of a finite set of experimental data, describing (for instance) the time-dependent mean-square displacement of suspended probe particles experiencing Brownian fluctuations. We report, for the first time in the literature, the linear viscoelastic response of an optically trapped bead suspended in a Newtonian fluid, over the entire range of experimentally accessible frequencies. The general validity of the proposed method makes it transferable to the majority of microrheology and rheology techniques. (paper)

  20. Game data analysis tools and methods

    CERN Document Server

    Coupart, Thibault

    2013-01-01

    This book features an introduction to the basic theoretical tenets of data analysis from a game developer's point of view, as well as a practical guide to performing gameplay analysis on a real-world game.This book is ideal for video game developers who want to try and experiment with the game analytics approach for their own productions. It will provide a good overview of the themes you need to pay attention to, and will pave the way for success. Furthermore, the book also provides a wide range of concrete examples that will be useful for any game data analysts or scientists who want to impro

  1. [Preliminarily application of content analysis to qualitative nursing data].

    Science.gov (United States)

    Liang, Shu-Yuan; Chuang, Yeu-Hui; Wu, Shu-Fang

    2012-10-01

    Content analysis is a methodology for objectively and systematically studying the content of communication in various formats. Content analysis in nursing research and nursing education is called qualitative content analysis. Qualitative content analysis is frequently applied to nursing research, as it allows researchers to determine categories inductively and deductively. This article examines qualitative content analysis in nursing research from theoretical and practical perspectives. We first describe how content analysis concepts such as unit of analysis, meaning unit, code, category, and theme are used. Next, we describe the basic steps involved in using content analysis, including data preparation, data familiarization, analysis unit identification, creating tentative coding categories, category refinement, and establishing category integrity. Finally, this paper introduces the concept of content analysis rigor, including dependability, confirmability, credibility, and transferability. This article elucidates the content analysis method in order to help professionals conduct systematic research that generates data that are informative and useful in practical application.

  2. Application of Ontology Technology in Health Statistic Data Analysis.

    Science.gov (United States)

    Guo, Minjiang; Hu, Hongpu; Lei, Xingyun

    2017-01-01

    Research Purpose: establish health management ontology for analysis of health statistic data. Proposed Methods: this paper established health management ontology based on the analysis of the concepts in China Health Statistics Yearbook, and used protégé to define the syntactic and semantic structure of health statistical data. six classes of top-level ontology concepts and their subclasses had been extracted and the object properties and data properties were defined to establish the construction of these classes. By ontology instantiation, we can integrate multi-source heterogeneous data and enable administrators to have an overall understanding and analysis of the health statistic data. ontology technology provides a comprehensive and unified information integration structure of the health management domain and lays a foundation for the efficient analysis of multi-source and heterogeneous health system management data and enhancement of the management efficiency.

  3. Design database for quantitative trait loci (QTL) data warehouse, data mining, and meta-analysis.

    Science.gov (United States)

    Hu, Zhi-Liang; Reecy, James M; Wu, Xiao-Lin

    2012-01-01

    A database can be used to warehouse quantitative trait loci (QTL) data from multiple sources for comparison, genomic data mining, and meta-analysis. A robust database design involves sound data structure logistics, meaningful data transformations, normalization, and proper user interface designs. This chapter starts with a brief review of relational database basics and concentrates on issues associated with curation of QTL data into a relational database, with emphasis on the principles of data normalization and structure optimization. In addition, some simple examples of QTL data mining and meta-analysis are included. These examples are provided to help readers better understand the potential and importance of sound database design.

  4. Genetic data analysis for plant and animal breeding

    Science.gov (United States)

    This book is an advanced textbook covering the application of quantitative genetics theory to analysis of actual data (both trait and DNA marker information) for breeding populations of crops, trees, and animals. Chapter 1 is an introduction to basic software used for trait data analysis. Chapter 2 ...

  5. Statistical analysis of network data with R

    CERN Document Server

    Kolaczyk, Eric D

    2014-01-01

    Networks have permeated everyday life through everyday realities like the Internet, social networks, and viral marketing. As such, network analysis is an important growth area in the quantitative sciences, with roots in social network analysis going back to the 1930s and graph theory going back centuries. Measurement and analysis are integral components of network research. As a result, statistical methods play a critical role in network analysis. This book is the first of its kind in network research. It can be used as a stand-alone resource in which multiple R packages are used to illustrate how to conduct a wide range of network analyses, from basic manipulation and visualization, to summary and characterization, to modeling of network data. The central package is igraph, which provides extensive capabilities for studying network graphs in R. This text builds on Eric D. Kolaczyk’s book Statistical Analysis of Network Data (Springer, 2009).

  6. Challenges of Big Data Analysis.

    Science.gov (United States)

    Fan, Jianqing; Han, Fang; Liu, Han

    2014-06-01

    Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.

  7. MAGMA: generalized gene-set analysis of GWAS data.

    Science.gov (United States)

    de Leeuw, Christiaan A; Mooij, Joris M; Heskes, Tom; Posthuma, Danielle

    2015-04-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical power for most methods is strongly affected by linkage disequilibrium between markers, multi-marker associations are often hard to detect, and the reliance on permutation to compute p-values tends to make the analysis computationally very expensive. To address these issues we have developed MAGMA, a novel tool for gene and gene-set analysis. The gene analysis is based on a multiple regression model, to provide better statistical performance. The gene-set analysis is built as a separate layer around the gene analysis for additional flexibility. This gene-set analysis also uses a regression structure to allow generalization to analysis of continuous properties of genes and simultaneous analysis of multiple gene sets and other gene properties. Simulations and an analysis of Crohn's Disease data are used to evaluate the performance of MAGMA and to compare it to a number of other gene and gene-set analysis tools. The results show that MAGMA has significantly more power than other tools for both the gene and the gene-set analysis, identifying more genes and gene sets associated with Crohn's Disease while maintaining a correct type 1 error rate. Moreover, the MAGMA analysis of the Crohn's Disease data was found to be considerably faster as well.

  8. SECIMTools: a suite of metabolomics data analysis tools.

    Science.gov (United States)

    Kirpich, Alexander S; Ibarra, Miguel; Moskalenko, Oleksandr; Fear, Justin M; Gerken, Joseph; Mi, Xinlei; Ashrafi, Ali; Morse, Alison M; McIntyre, Lauren M

    2018-04-20

    Metabolomics has the promise to transform the area of personalized medicine with the rapid development of high throughput technology for untargeted analysis of metabolites. Open access, easy to use, analytic tools that are broadly accessible to the biological community need to be developed. While technology used in metabolomics varies, most metabolomics studies have a set of features identified. Galaxy is an open access platform that enables scientists at all levels to interact with big data. Galaxy promotes reproducibility by saving histories and enabling the sharing workflows among scientists. SECIMTools (SouthEast Center for Integrated Metabolomics) is a set of Python applications that are available both as standalone tools and wrapped for use in Galaxy. The suite includes a comprehensive set of quality control metrics (retention time window evaluation and various peak evaluation tools), visualization techniques (hierarchical cluster heatmap, principal component analysis, modular modularity clustering), basic statistical analysis methods (partial least squares - discriminant analysis, analysis of variance, t-test, Kruskal-Wallis non-parametric test), advanced classification methods (random forest, support vector machines), and advanced variable selection tools (least absolute shrinkage and selection operator LASSO and Elastic Net). SECIMTools leverages the Galaxy platform and enables integrated workflows for metabolomics data analysis made from building blocks designed for easy use and interpretability. Standard data formats and a set of utilities allow arbitrary linkages between tools to encourage novel workflow designs. The Galaxy framework enables future data integration for metabolomics studies with other omics data.

  9. Big climate data analysis

    Science.gov (United States)

    Mudelsee, Manfred

    2015-04-01

    The Big Data era has begun also in the climate sciences, not only in economics or molecular biology. We measure climate at increasing spatial resolution by means of satellites and look farther back in time at increasing temporal resolution by means of natural archives and proxy data. We use powerful supercomputers to run climate models. The model output of the calculations made for the IPCC's Fifth Assessment Report amounts to ~650 TB. The 'scientific evolution' of grid computing has started, and the 'scientific revolution' of quantum computing is being prepared. This will increase computing power, and data amount, by several orders of magnitude in the future. However, more data does not automatically mean more knowledge. We need statisticians, who are at the core of transforming data into knowledge. Statisticians notably also explore the limits of our knowledge (uncertainties, that is, confidence intervals and P-values). Mudelsee (2014 Climate Time Series Analysis: Classical Statistical and Bootstrap Methods. Second edition. Springer, Cham, xxxii + 454 pp.) coined the term 'optimal estimation'. Consider the hyperspace of climate estimation. It has many, but not infinite, dimensions. It consists of the three subspaces Monte Carlo design, method and measure. The Monte Carlo design describes the data generating process. The method subspace describes the estimation and confidence interval construction. The measure subspace describes how to detect the optimal estimation method for the Monte Carlo experiment. The envisaged large increase in computing power may bring the following idea of optimal climate estimation into existence. Given a data sample, some prior information (e.g. measurement standard errors) and a set of questions (parameters to be estimated), the first task is simple: perform an initial estimation on basis of existing knowledge and experience with such types of estimation problems. The second task requires the computing power: explore the hyperspace to

  10. Database tools for enhanced analysis of TMX-U data

    International Nuclear Information System (INIS)

    Stewart, M.E.; Carter, M.R.; Casper, T.A.; Meyer, W.H.; Perkins, D.E.; Whitney, D.M.

    1986-01-01

    A commercial database software package has been used to create several databases and tools that assist and enhance the ability of experimental physicists to analyze data from the Tandem Mirror Experiment-Upgrade (TMX-U) experiment. This software runs on a DEC-20 computer in M-Divisions's User Service Center at Lawrence Livermore National Laboratory (LLNL), where data can be analyzed off line from the main TMX-U acquisition computers. When combined with interactive data analysis programs, these tools provide the capability to do batch-style processing or interactive data analysis on the computers in the USC or the supercomputers of the National Magnetic Fusion Energy Computer Center (NMFECC) in addition to the normal processing done by the TMX-U acquisition system. One database tool provides highly reduced data for searching and correlation analysis of several diagnostic signals within a single shot or over many shots. A second database tool provides retrieval and storage of unreduced data for use in detailed analysis of one or more diagnostic signals. We will show how these database tools form the core of an evolving off-line data analysis environment on the USC computers

  11. Database tools for enhanced analysis of TMX-U data

    International Nuclear Information System (INIS)

    Stewart, M.E.; Carter, M.R.; Casper, T.A.; Meyer, W.H.; Perkins, D.E.; Whitney, D.M.

    1986-01-01

    A commercial database software package has been used to create several databases and tools that assist and enhance the ability of experimental physicists to analyze data from the Tandem Mirror Experiment-Upgrade (TMX-U) experiment. This software runs on a DEC-20 computer in M-Division's User Service Center at Lawrence Livermore National Laboratory (LLNL), where data can be analyzed offline from the main TMX-U acquisition computers. When combined with interactive data analysis programs, these tools provide the capability to do batch-style processing or interactive data analysis on the computers in the USC or the supercomputers of the National Magnetic Fusion Energy Computer Center (NMFECC) in addition to the normal processing done by the TMX-U acquisition system. One database tool provides highly reduced data for searching and correlation analysis of several diagnostic signals within a single shot or over many shots. A second database tool provides retrieval and storage of unreduced data for use in detailed analysis of one or more diagnostic signals. We will show how these database tools form the core of an evolving offline data analysis environment on the USC computers

  12. Analysis of facility-monitoring data

    Energy Technology Data Exchange (ETDEWEB)

    Howell, J.A.

    1996-09-01

    This paper discusses techniques for analysis of data collected from nuclear-safeguards facility-monitoring systems. These methods can process information gathered from sensors and make interpretations that are in the best interests of the facility or agency, thereby enhancing safeguards while shortening inspection time.

  13. Handbook on data envelopment analysis

    CERN Document Server

    Cooper, William W; Zhu, Joe

    2011-01-01

    Focusing on extensively used Data Envelopment Analysis topics, this volume aims to both describe the state of the field and extend the frontier of DEA research. New chapters include DEA models for DMUs, network DEA, models for supply chain operations and applications, and new developments.

  14. Big data analysis for smart farming

    NARCIS (Netherlands)

    Kempenaar, C.; Lokhorst, C.; Bleumer, E.J.B.; Veerkamp, R.F.; Been, Th.; Evert, van F.K.; Boogaardt, M.J.; Ge, L.; Wolfert, J.; Verdouw, C.N.; Bekkum, van Michael; Feldbrugge, L.; Verhoosel, Jack P.C.; Waaij, B.D.; Persie, van M.; Noorbergen, H.

    2016-01-01

    In this report we describe results of a one-year TO2 institutes project on the development of big data technologies within the milk production chain. The goal of this project is to ‘create’ an integration platform for big data analysis for smart farming and to develop a show case. This includes both

  15. A novel water quality data analysis framework based on time-series data mining.

    Science.gov (United States)

    Deng, Weihui; Wang, Guoyin

    2017-07-01

    The rapid development of time-series data mining provides an emerging method for water resource management research. In this paper, based on the time-series data mining methodology, we propose a novel and general analysis framework for water quality time-series data. It consists of two parts: implementation components and common tasks of time-series data mining in water quality data. In the first part, we propose to granulate the time series into several two-dimensional normal clouds and calculate the similarities in the granulated level. On the basis of the similarity matrix, the similarity search, anomaly detection, and pattern discovery tasks in the water quality time-series instance dataset can be easily implemented in the second part. We present a case study of this analysis framework on weekly Dissolve Oxygen time-series data collected from five monitoring stations on the upper reaches of Yangtze River, China. It discovered the relationship of water quality in the mainstream and tributary as well as the main changing patterns of DO. The experimental results show that the proposed analysis framework is a feasible and efficient method to mine the hidden and valuable knowledge from water quality historical time-series data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  16. Enabling Analysis of Big, Thick, Long, and Wide Data: Data Management for the Analysis of a Large Longitudinal and Cross-National Narrative Data Set.

    Science.gov (United States)

    Winskell, Kate; Singleton, Robyn; Sabben, Gaelle

    2018-03-01

    Distinctive longitudinal narrative data, collected during a critical 18-year period in the history of the HIV epidemic, offer a unique opportunity to examine how young Africans are making sense of evolving developments in HIV prevention and treatment. More than 200,000 young people from across sub-Saharan Africa took part in HIV-themed scriptwriting contests held at eight discrete time points between 1997 and 2014, creating more than 75,000 narratives. This article describes the data reduction and management strategies developed for our cross-national and longitudinal study of these qualitative data. The study aims to inform HIV communication practice by identifying cultural meanings and contextual factors that inform sexual behaviors and social practices, and also to help increase understanding of processes of sociocultural change. We describe our sampling strategies and our triangulating methodologies, combining in-depth narrative analysis, thematic qualitative analysis, and quantitative analysis, which are designed to enable systematic comparison without sacrificing ethnographic richness.

  17. Planning representation for automated exploratory data analysis

    Science.gov (United States)

    St. Amant, Robert; Cohen, Paul R.

    1994-03-01

    Igor is a knowledge-based system for exploratory statistical analysis of complex systems and environments. Igor has two related goals: to help automate the search for interesting patterns in data sets, and to help develop models that capture significant relationships in the data. We outline a language for Igor, based on techniques of opportunistic planning, which balances control and opportunism. We describe the application of Igor to the analysis of the behavior of Phoenix, an artificial intelligence planning system.

  18. NGNP Data Management and Analysis System Analysis and Web Delivery Capabilities

    Energy Technology Data Exchange (ETDEWEB)

    Cynthia D. Gentillon

    2010-09-01

    Projects for the Very High Temperature Reactor Technology Development Office provide data in support of Nuclear Regulatory Commission licensing of the very high temperature reactor. Fuel and materials to be used in the reactor are tested and characterized to quantify performance in high-temperature and high-fluence environments. In addition, thermal-hydraulic experiments are conducted to validate codes used to assess reactor safety. The Very High Temperature Reactor Technology Development Office has established the NGNP Data Management and Analysis System (NDMAS) at the Idaho National Laboratory to ensure that very high temperature reactor data are (1) qualified for use, (2) stored in a readily accessible electronic form, and (3) analyzed to extract useful results. This document focuses on the third NDMAS objective. It describes capabilities for displaying the data in meaningful ways and for data analysis to identify useful relationships among the measured quantities.

  19. Multivariate statistical analysis of atom probe tomography data

    International Nuclear Information System (INIS)

    Parish, Chad M.; Miller, Michael K.

    2010-01-01

    The application of spectrum imaging multivariate statistical analysis methods, specifically principal component analysis (PCA), to atom probe tomography (APT) data has been investigated. The mathematical method of analysis is described and the results for two example datasets are analyzed and presented. The first dataset is from the analysis of a PM 2000 Fe-Cr-Al-Ti steel containing two different ultrafine precipitate populations. PCA properly describes the matrix and precipitate phases in a simple and intuitive manner. A second APT example is from the analysis of an irradiated reactor pressure vessel steel. Fine, nm-scale Cu-enriched precipitates having a core-shell structure were identified and qualitatively described by PCA. Advantages, disadvantages, and future prospects for implementing these data analysis methodologies for APT datasets, particularly with regard to quantitative analysis, are also discussed.

  20. Using MetaboAnalyst 3.0 for Comprehensive Metabolomics Data Analysis.

    Science.gov (United States)

    Xia, Jianguo; Wishart, David S

    2016-09-07

    MetaboAnalyst (http://www.metaboanalyst.ca) is a comprehensive Web application for metabolomic data analysis and interpretation. MetaboAnalyst handles most of the common metabolomic data types from most kinds of metabolomics platforms (MS and NMR) for most kinds of metabolomics experiments (targeted, untargeted, quantitative). In addition to providing a variety of data processing and normalization procedures, MetaboAnalyst also supports a number of data analysis and data visualization tasks using a range of univariate, multivariate methods such as PCA (principal component analysis), PLS-DA (partial least squares discriminant analysis), heatmap clustering and machine learning methods. MetaboAnalyst also offers a variety of tools for metabolomic data interpretation including MSEA (metabolite set enrichment analysis), MetPA (metabolite pathway analysis), and biomarker selection via ROC (receiver operating characteristic) curve analysis, as well as time series and power analysis. This unit provides an overview of the main functional modules and the general workflow of the latest version of MetaboAnalyst (MetaboAnalyst 3.0), followed by eight detailed protocols. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  1. Data Analysis with Open Source Tools

    CERN Document Server

    Janert, Philipp

    2010-01-01

    Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications. Along the way, you'll experiment with conce

  2. Qualitative data analysis a methods sourcebook

    CERN Document Server

    Miles, Matthew B; Saldana, Johnny

    2014-01-01

    The Third Edition of Miles & Huberman's classic research methods text is updated and streamlined by Johnny SaldaNa, author of The Coding Manual for Qualitative Researchers. Several of the data display strategies from previous editions are now presented in re-envisioned and reorganized formats to enhance reader accessibility and comprehension. The Third Edition's presentation of the fundamentals of research design and data management is followed by five distinct methods of analysis: exploring, describing, ordering, explaining, and predicting. Miles and Huberman's original research studies are profiled and accompanied with new examples from SaldaNa's recent qualitative work. The book's most celebrated chapter, "Drawing and Verifying Conclusions," is retained and revised, and the chapter on report writing has been greatly expanded, and is now called "Writing About Qualitative Research." Comprehensive and authoritative, Qualitative Data Analysis has been elegantly revised for a new generation of qualitative r...

  3. Statistics and analysis of scientific data

    CERN Document Server

    Bonamente, Massimiliano

    2017-01-01

    The revised second edition of this textbook provides the reader with a solid foundation in probability theory and statistics as applied to the physical sciences, engineering and related fields. It covers a broad range of numerical and analytical methods that are essential for the correct analysis of scientific data, including probability theory, distribution functions of statistics, fits to two-dimensional data and parameter estimation, Monte Carlo methods and Markov chains. Features new to this edition include: • a discussion of statistical techniques employed in business science, such as multiple regression analysis of multivariate datasets. • a new chapter on the various measures of the mean including logarithmic averages. • new chapters on systematic errors and intrinsic scatter, and on the fitting of data with bivariate errors. • a new case study and additional worked examples. • mathematical derivations and theoretical background material have been appropriately marked,to improve the readabili...

  4. Essentials of multivariate data analysis

    CERN Document Server

    Spencer, Neil H

    2013-01-01

    ""… this text provides an overview at an introductory level of several methods in multivariate data analysis. It contains in-depth examples from one data set woven throughout the text, and a free [Excel] Add-In to perform the analyses in Excel, with step-by-step instructions provided for each technique. … could be used as a text (possibly supplemental) for courses in other fields where researchers wish to apply these methods without delving too deeply into the underlying statistics.""-The American Statistician, February 2015

  5. Analysis of Functional Data with Focus on Multinomial Regression and Multilevel Data

    DEFF Research Database (Denmark)

    Mousavi, Seyed Nourollah

    Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects of application...... and methodological development. Our main Functional data analysis (FDA) is a fast growing area in statistical research with increasingly diverse range of application from economics, medicine, agriculture, chemometrics, etc. Functional regression is an area of FDA which has received the most attention both in aspects...

  6. Language workbench user interfaces for data analysis

    Directory of Open Access Journals (Sweden)

    Victoria M. Benson

    2015-02-01

    Full Text Available Biological data analysis is frequently performed with command line software. While this practice provides considerable flexibility for computationally savy individuals, such as investigators trained in bioinformatics, this also creates a barrier to the widespread use of data analysis software by investigators trained as biologists and/or clinicians. Workflow systems such as Galaxy and Taverna have been developed to try and provide generic user interfaces that can wrap command line analysis software. These solutions are useful for problems that can be solved with workflows, and that do not require specialized user interfaces. However, some types of analyses can benefit from custom user interfaces. For instance, developing biomarker models from high-throughput data is a type of analysis that can be expressed more succinctly with specialized user interfaces. Here, we show how Language Workbench (LW technology can be used to model the biomarker development and validation process. We developed a language that models the concepts of Dataset, Endpoint, Feature Selection Method and Classifier. These high-level language concepts map directly to abstractions that analysts who develop biomarker models are familiar with. We found that user interfaces developed in the Meta-Programming System (MPS LW provide convenient means to configure a biomarker development project, to train models and view the validation statistics. We discuss several advantages of developing user interfaces for data analysis with a LW, including increased interface consistency, portability and extension by language composition. The language developed during this experiment is distributed as an MPS plugin (available at http://campagnelab.org/software/bdval-for-mps/.

  7. Language workbench user interfaces for data analysis

    Science.gov (United States)

    Benson, Victoria M.

    2015-01-01

    Biological data analysis is frequently performed with command line software. While this practice provides considerable flexibility for computationally savy individuals, such as investigators trained in bioinformatics, this also creates a barrier to the widespread use of data analysis software by investigators trained as biologists and/or clinicians. Workflow systems such as Galaxy and Taverna have been developed to try and provide generic user interfaces that can wrap command line analysis software. These solutions are useful for problems that can be solved with workflows, and that do not require specialized user interfaces. However, some types of analyses can benefit from custom user interfaces. For instance, developing biomarker models from high-throughput data is a type of analysis that can be expressed more succinctly with specialized user interfaces. Here, we show how Language Workbench (LW) technology can be used to model the biomarker development and validation process. We developed a language that models the concepts of Dataset, Endpoint, Feature Selection Method and Classifier. These high-level language concepts map directly to abstractions that analysts who develop biomarker models are familiar with. We found that user interfaces developed in the Meta-Programming System (MPS) LW provide convenient means to configure a biomarker development project, to train models and view the validation statistics. We discuss several advantages of developing user interfaces for data analysis with a LW, including increased interface consistency, portability and extension by language composition. The language developed during this experiment is distributed as an MPS plugin (available at http://campagnelab.org/software/bdval-for-mps/). PMID:25755929

  8. Federal metering data analysis needs and existing tools

    Energy Technology Data Exchange (ETDEWEB)

    Henderson, Jordan W. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Fowler, Kimberly M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-07-01

    Agencies have been working to improve their metering data collection, management, and analysis efforts over the last decade (since EPAct 2005) and will continue to address these challenges as new requirements and data needs come into place. Unfortunately there is no “one-size-fits-all” solution. As agencies continue to expand their capabilities to use metered consumption data to reducing resource use and improve operations, the hope is that shared knowledge will empower others to follow suit. This paper discusses the Federal metering data analysis needs and some existing tools.

  9. User analysis of LHCb data with Ganga

    CERN Document Server

    Maier, A; Cowan, G; Egede, U; Elmsheuser, J; Gaidioz, B; Harrison, K; Lee, H -C; Liko, D; Moscicki, J; Muraru, A; Pajchel, K; Reece, W; Samset, B; Slater, M; Soroko, A; van der Ster, D; Williams, M; Tan, C L

    2010-01-01

    GANGA (http://cern.ch/ganga) is a job-management tool that offers a simple, efficient and consistent user analysis tool in a variety of heterogeneous environments: from local clusters to global Grid systems. Experiment specific plug-ins allow GANGA to be customised for each experiment. For LHCb users GANGA is the officially supported and advertised tool for job submission to the Grid. The LHCb specific plug-ins allow support for end-to-end analysis helping the user to perform his complete analysis with the help of GANGA. This starts with the support for data selection, where a user can select data sets from the LHCb Bookkeeping system. Next comes the set up for large analysis jobs: with tailored plug-ins for the LHCb core software, jobs can be managed by the splitting of these analysis jobs with the subsequent merging of the resulting files. Furthermore, GANGA offers support for Toy Monte-Carlos to help the user tune their analysis. In addition to describing the GANGA architecture, typical usage patterns with...

  10. User analysis of LHCb data with Ganga

    International Nuclear Information System (INIS)

    Maier, Andrew; Gaidioz, Benjamin; Moscicki, Jakub; Muraru, Adrian; Ster, Daniel van der; Brochu, Frederic; Cowan, Greg; Egede, Ulrik; Reece, Will; Williams, Mike; Elmsheuser, Johannes; Harrison, Karl; Slater, Mark; Tan, Chun Lik; Lee, Hurng-Chun; Liko, Dietrich; Pajchel, Katarina; Samset, Bjoern; Soroko, Alexander

    2010-01-01

    GANGA (http://cern.ch/ganga) is a job-management tool that offers a simple, efficient and consistent user analysis tool in a variety of heterogeneous environments: from local clusters to global Grid systems. Experiment specific plug-ins allow GANGA to be customised for each experiment. For LHCb users GANGA is the officially supported and advertised tool for job submission to the Grid. The LHCb specific plug-ins allow support for end-to-end analysis helping the user to perform his complete analysis with the help of GANGA. This starts with the support for data selection, where a user can select data sets from the LHCb Bookkeeping system. Next comes the set up for large analysis jobs: with tailored plug-ins for the LHCb core software, jobs can be managed by the splitting of these analysis jobs with the subsequent merging of the resulting files. Furthermore, GANGA offers support for Toy Monte-Carlos to help the user tune their analysis. In addition to describing the GANGA architecture, typical usage patterns within LHCb and experience with the updated LHCb DIRAC workload management system are presented.

  11. RECOG-ORNL, Pattern Recognition Data Analysis

    International Nuclear Information System (INIS)

    Begovich, C.L.; Larson, N.M.

    2000-01-01

    Description of program or function: RECOG-ORNL, a general-purpose pattern recognition code, is a modification of the RECOG program, written at Lawrence Livermore National Laboratory. RECOG-ORNL contains techniques for preprocessing, analyzing, and displaying data, and for unsupervised and supervised learning. Data preprocessing routines transform the data into useful representations by auto-calling, selecting important variables, and/or adding products or transformations of the variables of the data set. Data analysis routines use correlations to evaluate the data and interrelationships among the data. Display routines plot the multidimensional patterns in two dimensions or plot histograms, patterns, or one variable versus another. Unsupervised learning techniques search for classes contained inherently in the data. Supervised learning techniques use known information about some of the data to generate predicted properties for an unknown set

  12. Integrating Data Clustering and Visualization for the Analysis of 3D Gene Expression Data

    Energy Technology Data Exchange (ETDEWEB)

    Data Analysis and Visualization (IDAV) and the Department of Computer Science, University of California, Davis, One Shields Avenue, Davis CA 95616, USA,; nternational Research Training Group ``Visualization of Large and Unstructured Data Sets,' ' University of Kaiserslautern, Germany; Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA; Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA; Life Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley CA 94720, USA,; Computer Science Division,University of California, Berkeley, CA, USA,; Computer Science Department, University of California, Irvine, CA, USA,; All authors are with the Berkeley Drosophila Transcription Network Project, Lawrence Berkeley National Laboratory,; Rubel, Oliver; Weber, Gunther H.; Huang, Min-Yu; Bethel, E. Wes; Biggin, Mark D.; Fowlkes, Charless C.; Hendriks, Cris L. Luengo; Keranen, Soile V. E.; Eisen, Michael B.; Knowles, David W.; Malik, Jitendra; Hagen, Hans; Hamann, Bernd

    2008-05-12

    The recent development of methods for extracting precise measurements of spatial gene expression patterns from three-dimensional (3D) image data opens the way for new analyses of the complex gene regulatory networks controlling animal development. We present an integrated visualization and analysis framework that supports user-guided data clustering to aid exploration of these new complex datasets. The interplay of data visualization and clustering-based data classification leads to improved visualization and enables a more detailed analysis than previously possible. We discuss (i) integration of data clustering and visualization into one framework; (ii) application of data clustering to 3D gene expression data; (iii) evaluation of the number of clusters k in the context of 3D gene expression clustering; and (iv) improvement of overall analysis quality via dedicated post-processing of clustering results based on visualization. We discuss the use of this framework to objectively define spatial pattern boundaries and temporal profiles of genes and to analyze how mRNA patterns are controlled by their regulatory transcription factors.

  13. Analysis of CERN computing infrastructure and monitoring data

    Science.gov (United States)

    Nieke, C.; Lassnig, M.; Menichetti, L.; Motesnitsalis, E.; Duellmann, D.

    2015-12-01

    Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.

  14. Analysis of ChIP-seq Data in R/Bioconductor.

    Science.gov (United States)

    de Santiago, Ines; Carroll, Thomas

    2018-01-01

    The development of novel high-throughput sequencing methods for ChIP (chromatin immunoprecipitation) has provided a very powerful tool to study gene regulation in multiple conditions at unprecedented resolution and scale. Proactive quality-control and appropriate data analysis techniques are of critical importance to extract the most meaningful results from the data. Over the last years, an array of R/Bioconductor tools has been developed allowing researchers to process and analyze ChIP-seq data. This chapter provides an overview of the methods available to analyze ChIP-seq data based primarily on software packages from the open-source Bioconductor project. Protocols described in this chapter cover basic steps including data alignment, peak calling, quality control and data visualization, as well as more complex methods such as the identification of differentially bound regions and functional analyses to annotate regulatory regions. The steps in the data analysis process were demonstrated on publicly available data sets and will serve as a demonstration of the computational procedures routinely used for the analysis of ChIP-seq data in R/Bioconductor, from which readers can construct their own analysis pipelines.

  15. Big and complex data analysis methodologies and applications

    CERN Document Server

    2017-01-01

    This volume conveys some of the surprises, puzzles and success stories in high-dimensional and complex data analysis and related fields. Its peer-reviewed contributions showcase recent advances in variable selection, estimation and prediction strategies for a host of useful models, as well as essential new developments in the field. The continued and rapid advancement of modern technology now allows scientists to collect data of increasingly unprecedented size and complexity. Examples include epigenomic data, genomic data, proteomic data, high-resolution image data, high-frequency financial data, functional and longitudinal data, and network data. Simultaneous variable selection and estimation is one of the key statistical problems involved in analyzing such big and complex data. The purpose of this book is to stimulate research and foster interaction between researchers in the area of high-dimensional data analysis. More concretely, its goals are to: 1) highlight and expand the breadth of existing methods in...

  16. Data analysis & probability drill sheets : grades 6-8

    CERN Document Server

    Forest, Chris

    2011-01-01

    For grades 6-8, our Common Core State Standards-based resource meets the data analysis & probability concepts addressed by the NCTM standards and encourages your students to review the concepts in unique ways. Each drill sheet contains warm-up and timed drill activities for the student to practice data analysis & probability concepts.

  17. Integrating Data Transformation in Principal Components Analysis

    KAUST Repository

    Maadooliat, Mehdi; Huang, Jianhua Z.; Hu, Jianhua

    2015-01-01

    Principal component analysis (PCA) is a popular dimension reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior

  18. Data Collection, Collaboration, Analysis, and Publication Using the Open Data Repository's (ODR) Data Publisher

    Science.gov (United States)

    Lafuente, B.; Stone, N.; Bristow, T.; Keller, R. M.; Blake, D. F.; Downs, R. T.; Pires, A.; Dateo, C. E.; Fonda, M.

    2017-12-01

    In development for nearly four years, the Open Data Repository's (ODR) Data Publisher software has become a useful tool for researchers' data needs. Data Publisher facilitates the creation of customized databases with flexible permission sets that allow researchers to share data collaboratively while improving data discovery and maintaining ownership rights. The open source software provides an end-to-end solution from collection to final repository publication. A web-based interface allows researchers to enter data, view data, and conduct analysis using any programming language supported by JupyterHub (http://www.jupyterhub.org). This toolset makes it possible for a researcher to store and manipulate their data in the cloud from any internet capable device. Data can be embargoed in the system until a date selected by the researcher. For instance, open publication can be set to a date that coincides with publication of data analysis in a third party journal. In conjunction with teams at NASA Ames and the University of Arizona, a number of pilot studies are being conducted to guide the software development so that it allows them to publish and share their data. These pilots include (1) the Astrobiology Habitable Environments Database (AHED), a central searchable repository designed to promote and facilitate the integration and sharing of all the data generated by the diverse disciplines in astrobiology; (2) a database containing the raw and derived data products from the CheMin instrument on the MSL rover Curiosity (http://odr.io/CheMin), featuring a versatile graphing system, instructions and analytical tools to process the data, and a capability to download data in different formats; and (3) the Mineral Evolution project, which by correlating the diversity of mineral species with their ages, localities, and other measurable properties aims to understand how the episodes of planetary accretion and differentiation, plate tectonics, and origin of life lead to a

  19. Data Farming Process and Initial Network Analysis Capabilities

    Directory of Open Access Journals (Sweden)

    Gary Horne

    2016-01-01

    Full Text Available Data Farming, network applications and approaches to integrate network analysis and processes to the data farming paradigm are presented as approaches to address complex system questions. Data Farming is a quantified approach that examines questions in large possibility spaces using modeling and simulation. It evaluates whole landscapes of outcomes to draw insights from outcome distributions and outliers. Social network analysis and graph theory are widely used techniques for the evaluation of social systems. Incorporation of these techniques into the data farming process provides analysts examining complex systems with a powerful new suite of tools for more fully exploring and understanding the effect of interactions in complex systems. The integration of network analysis with data farming techniques provides modelers with the capability to gain insight into the effect of network attributes, whether the network is explicitly defined or emergent, on the breadth of the model outcome space and the effect of model inputs on the resultant network statistics.

  20. The Run 2 ATLAS Analysis Event Data Model

    CERN Document Server

    SNYDER, S; The ATLAS collaboration; NOWAK, M; EIFERT, T; BUCKLEY, A; ELSING, M; GILLBERG, D; MOYSE, E; KOENEKE, K; KRASZNAHORKAY, A

    2014-01-01

    During the LHC's first Long Shutdown (LS1) ATLAS set out to establish a new analysis model, based on the experience gained during Run 1. A key component of this is a new Event Data Model (EDM), called the xAOD. This format, which is now in production, provides the following features: A separation of the EDM into interface classes that the user code directly interacts with, and data storage classes that hold the payload data. The user sees an Array of Structs (AoS) interface, while the data is stored in a Struct of Arrays (SoA) format in memory, thus making it possible to efficiently auto-vectorise reconstruction code. A simple way of augmenting and reducing the information saved for different data objects. This makes it possible to easily decorate objects with new properties during data analysis, and to remove properties that the analysis does not need. A persistent file format that can be explored directly with ROOT, either with or without loading any additional libraries. This allows fast interactive naviga...

  1. CMS Analysis and Data Reduction with Apache Spark

    OpenAIRE

    Gutsche, Oliver; Canali, Luca; Cremer, Illia; Cremonesi, Matteo; Elmer, Peter; Fisk, Ian; Girone, Maria; Jayatilaka, Bo; Kowalkowski, Jim; Khristenko, Viktor; Motesnitsalis, Evangelos; Pivarski, Jim; Sehrish, Saba; Surdy, Kacper; Svyatkovskiy, Alexey

    2017-01-01

    Experimental Particle Physics has been at the forefront of analyzing the world's largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called "Big Data" technologies have emerged from industry and open source projects to support the analysis of Petabyte and Exabyte datasets in industry. While the principles of data analysis in HE...

  2. Oasis: online analysis of small RNA deep sequencing data.

    Science.gov (United States)

    Capece, Vincenzo; Garcia Vizcaino, Julio C; Vidal, Ramon; Rahman, Raza-Ur; Pena Centeno, Tonatiuh; Shomroni, Orr; Suberviola, Irantzu; Fischer, Andre; Bonn, Stefan

    2015-07-01

    Oasis is a web application that allows for the fast and flexible online analysis of small-RNA-seq (sRNA-seq) data. It was designed for the end user in the lab, providing an easy-to-use web frontend including video tutorials, demo data and best practice step-by-step guidelines on how to analyze sRNA-seq data. Oasis' exclusive selling points are a differential expression module that allows for the multivariate analysis of samples, a classification module for robust biomarker detection and an advanced programming interface that supports the batch submission of jobs. Both modules include the analysis of novel miRNAs, miRNA targets and functional analyses including GO and pathway enrichment. Oasis generates downloadable interactive web reports for easy visualization, exploration and analysis of data on a local system. Finally, Oasis' modular workflow enables for the rapid (re-) analysis of data. Oasis is implemented in Python, R, Java, PHP, C++ and JavaScript. It is freely available at http://oasis.dzne.de. stefan.bonn@dzne.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  3. [The meta-analysis of data from individual patients].

    NARCIS (Netherlands)

    Rovers, M.M.; Reitsma, J.B.

    2012-01-01

    - An IPD (Individual Participant Data) meta-analysis requires collecting original individual patient data and calculating an estimated effect based on these data.- The use of individual patient data has various advantages: the original data and the results of published analyses are verified,

  4. National survey on dose data analysis in computed tomography.

    Science.gov (United States)

    Heilmaier, Christina; Treier, Reto; Merkle, Elmar Max; Alkhadi, Hatem; Weishaupt, Dominik; Schindera, Sebastian

    2018-05-28

    A nationwide survey was performed assessing current practice of dose data analysis in computed tomography (CT). All radiological departments in Switzerland were asked to participate in the on-line survey composed of 19 questions (16 multiple choice, 3 free text). It consisted of four sections: (1) general information on the department, (2) dose data analysis, (3) use of a dose management software (DMS) and (4) radiation protection activities. In total, 152 out of 241 Swiss radiological departments filled in the whole questionnaire (return rate, 63%). Seventy-nine per cent of the departments (n = 120/152) analyse dose data on a regular basis with considerable heterogeneity in the frequency (1-2 times per year, 45%, n = 54/120; every month, 35%, n = 42/120) and method of analysis. Manual analysis is carried out by 58% (n = 70/120) compared with 42% (n = 50/120) of departments using a DMS. Purchase of a DMS is planned by 43% (n = 30/70) of the departments with manual analysis. Real-time analysis of dose data is performed by 42% (n = 21/50) of the departments with a DMS; however, residents can access the DMS in clinical routine only in 20% (n = 10/50) of the departments. An interdisciplinary dose team, which among other things communicates dose data internally (63%, n = 76/120) and externally, is already implemented in 57% (n = 68/120) departments. Swiss radiological departments are committed to radiation safety. However, there is high heterogeneity among them regarding the frequency and method of dose data analysis as well as the use of DMS and radiation protection activities. • Swiss radiological departments are committed to and interest in radiation safety as proven by a 63% return rate of the survey. • Seventy-nine per cent of departments analyse dose data on a regular basis with differences in the frequency and method of analysis: 42% use a dose management software, while 58% currently perform manual dose data analysis. Of the latter, 43% plan to buy a dose

  5. Statistics and analysis of scientific data

    CERN Document Server

    Bonamente, Massimiliano

    2013-01-01

    Statistics and Analysis of Scientific Data covers the foundations of probability theory and statistics, and a number of numerical and analytical methods that are essential for the present-day analyst of scientific data. Topics covered include probability theory, distribution functions of statistics, fits to two-dimensional datasheets and parameter estimation, Monte Carlo methods and Markov chains. Equal attention is paid to the theory and its practical application, and results from classic experiments in various fields are used to illustrate the importance of statistics in the analysis of scientific data. The main pedagogical method is a theory-then-application approach, where emphasis is placed first on a sound understanding of the underlying theory of a topic, which becomes the basis for an efficient and proactive use of the material for practical applications. The level is appropriate for undergraduates and beginning graduate students, and as a reference for the experienced researcher. Basic calculus is us...

  6. Secondary Data Analysis in Family Research

    Science.gov (United States)

    Hofferth, Sandra L.

    2005-01-01

    This article first provides an overview of the part that secondary data analysis plays in the field of family studies in the early 21st century. It addresses changes over time in the use of existing omnibus data sets and discusses their advantages and disadvantages. The second part of the article focuses on the elements that make a study a…

  7. QUALITATIVE DATA AND ERROR MEASUREMENT IN INPUT-OUTPUT-ANALYSIS

    NARCIS (Netherlands)

    NIJKAMP, P; OOSTERHAVEN, J; OUWERSLOOT, H; RIETVELD, P

    1992-01-01

    This paper is a contribution to the rapidly emerging field of qualitative data analysis in economics. Ordinal data techniques and error measurement in input-output analysis are here combined in order to test the reliability of a low level of measurement and precision of data by means of a stochastic

  8. Plasma data analysis using statistical analysis system

    International Nuclear Information System (INIS)

    Yoshida, Z.; Iwata, Y.; Fukuda, Y.; Inoue, N.

    1987-01-01

    Multivariate factor analysis has been applied to a plasma data base of REPUTE-1. The characteristics of the reverse field pinch plasma in REPUTE-1 are shown to be explained by four independent parameters which are described in the report. The well known scaling laws F/sub chi/ proportional to I/sub p/, T/sub e/ proportional to I/sub p/, and tau/sub E/ proportional to N/sub e/ are also confirmed. 4 refs., 8 figs., 1 tab

  9. SEURAT: visual analytics for the integrated analysis of microarray data.

    Science.gov (United States)

    Gribov, Alexander; Sill, Martin; Lück, Sonja; Rücker, Frank; Döhner, Konstanze; Bullinger, Lars; Benner, Axel; Unwin, Antony

    2010-06-03

    In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.

  10. SEURAT: Visual analytics for the integrated analysis of microarray data

    Directory of Open Access Journals (Sweden)

    Bullinger Lars

    2010-06-01

    Full Text Available Abstract Background In translational cancer research, gene expression data is collected together with clinical data and genomic data arising from other chip based high throughput technologies. Software tools for the joint analysis of such high dimensional data sets together with clinical data are required. Results We have developed an open source software tool which provides interactive visualization capability for the integrated analysis of high-dimensional gene expression data together with associated clinical data, array CGH data and SNP array data. The different data types are organized by a comprehensive data manager. Interactive tools are provided for all graphics: heatmaps, dendrograms, barcharts, histograms, eventcharts and a chromosome browser, which displays genetic variations along the genome. All graphics are dynamic and fully linked so that any object selected in a graphic will be highlighted in all other graphics. For exploratory data analysis the software provides unsupervised data analytics like clustering, seriation algorithms and biclustering algorithms. Conclusions The SEURAT software meets the growing needs of researchers to perform joint analysis of gene expression, genomical and clinical data.

  11. Application of Open Source Technologies for Oceanographic Data Analysis

    Science.gov (United States)

    Huang, T.; Gangl, M.; Quach, N. T.; Wilson, B. D.; Chang, G.; Armstrong, E. M.; Chin, T. M.; Greguska, F.

    2015-12-01

    NEXUS is a data-intensive analysis solution developed with a new approach for handling science data that enables large-scale data analysis by leveraging open source technologies such as Apache Cassandra, Apache Spark, Apache Solr, and Webification. NEXUS has been selected to provide on-the-fly time-series and histogram generation for the Soil Moisture Active Passive (SMAP) mission for Level 2 and Level 3 Active, Passive, and Active Passive products. It also provides an on-the-fly data subsetting capability. NEXUS is designed to scale horizontally, enabling it to handle massive amounts of data in parallel. It takes a new approach on managing time and geo-referenced array data by dividing data artifacts into chunks and stores them in an industry-standard, horizontally scaled NoSQL database. This approach enables the development of scalable data analysis services that can infuse and leverage the elastic computing infrastructure of the Cloud. It is equipped with a high-performance geospatial and indexed data search solution, coupled with a high-performance data Webification solution free from file I/O bottlenecks, as well as a high-performance, in-memory data analysis engine. In this talk, we will focus on the recently funded AIST 2014 project by using NEXUS as the core for oceanographic anomaly detection service and web portal. We call it, OceanXtremes

  12. Opening up to Big Data: Computer-Assisted Analysis of Textual Data in Social Sciences

    Directory of Open Access Journals (Sweden)

    Gregor Wiedemann

    2013-05-01

    Full Text Available Two developments in computational text analysis may change the way qualitative data analysis in social sciences is performed: 1. the availability of digital text worth to investigate is growing rapidly, and 2. the improvement of algorithmic information extraction approaches, also called text mining, allows for further bridging the gap between qualitative and quantitative text analysis. The key factor hereby is the inclusion of context into computational linguistic models which extends conventional computational content analysis towards the extraction of meaning. To clarify methodological differences of various computer-assisted text analysis approaches the article suggests a typology from the perspective of a qualitative researcher. This typology shows compatibilities between manual qualitative data analysis methods and computational, rather quantitative approaches for large scale mixed method text analysis designs. URN: http://nbn-resolving.de/urn:nbn:de:0114-fqs1302231

  13. Data analysis through interactive computer animation method (DATICAM)

    International Nuclear Information System (INIS)

    Curtis, J.N.; Schwieder, D.H.

    1983-01-01

    DATICAM is an interactive computer animation method designed to aid in the analysis of nuclear research data. DATICAM was developed at the Idaho National Engineering Laboratory (INEL) by EG and G Idaho, Inc. INEL analysts use DATICAM to produce computer codes that are better able to predict the behavior of nuclear power reactors. In addition to increased code accuracy, DATICAM has saved manpower and computer costs. DATICAM has been generalized to assist in the data analysis of virtually any data-producing dynamic process

  14. Visualization and Data Analysis for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.

  15. Network-Based Visual Analysis of Tabular Data

    Science.gov (United States)

    Liu, Zhicheng

    2012-01-01

    Tabular data is pervasive in the form of spreadsheets and relational databases. Although tables often describe multivariate data without explicit network semantics, it may be advantageous to explore the data modeled as a graph or network for analysis. Even when a given table design conveys some static network semantics, analysts may want to look…

  16. Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration.

    Science.gov (United States)

    Cambiaghi, Alice; Ferrario, Manuela; Masseroli, Marco

    2017-05-01

    Metabolomics is a rapidly growing field consisting of the analysis of a large number of metabolites at a system scale. The two major goals of metabolomics are the identification of the metabolites characterizing each organism state and the measurement of their dynamics under different situations (e.g. pathological conditions, environmental factors). Knowledge about metabolites is crucial for the understanding of most cellular phenomena, but this information alone is not sufficient to gain a comprehensive view of all the biological processes involved. Integrated approaches combining metabolomics with transcriptomics and proteomics are thus required to obtain much deeper insights than any of these techniques alone. Although this information is available, multilevel integration of different 'omics' data is still a challenge. The handling, processing, analysis and integration of these data require specialized mathematical, statistical and bioinformatics tools, and several technical problems hampering a rapid progress in the field exist. Here, we review four main tools for number of users or provided features (MetaCoreTM, MetaboAnalyst, InCroMAP and 3Omics) out of the several available for metabolomic data analysis and integration with other 'omics' data, highlighting their strong and weak aspects; a number of related issues affecting data analysis and integration are also identified and discussed. Overall, we provide an objective description of how some of the main currently available software packages work, which may help the experimental practitioner in the choice of a robust pipeline for metabolomic data analysis and integration. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. Identification of noise in linear data sets by factor analysis

    International Nuclear Information System (INIS)

    Roscoe, B.A.; Hopke, Ph.K.

    1982-01-01

    A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors. (author)

  18. Best practice for analysis of shared clinical trial data

    Directory of Open Access Journals (Sweden)

    Sally Hollis

    2016-07-01

    Full Text Available Abstract Background Greater transparency, including sharing of patient-level data for further research, is an increasingly important topic for organisations who sponsor, fund and conduct clinical trials. This is a major paradigm shift with the aim of maximising the value of patient-level data from clinical trials for the benefit of future patients and society. We consider the analysis of shared clinical trial data in three broad categories: (1 reanalysis - further investigation of the efficacy and safety of the randomized intervention, (2 meta-analysis, and (3 supplemental analysis for a research question that is not directly assessing the randomized intervention. Discussion In order to support appropriate interpretation and limit the risk of misleading findings, analysis of shared clinical trial data should have a pre-specified analysis plan. However, it is not generally possible to limit bias and control multiplicity to the extent that is possible in the original trial design, conduct and analysis, and this should be acknowledged and taken into account when interpreting results. We highlight a number of areas where specific considerations arise in planning, conducting, interpreting and reporting analyses of shared clinical trial data. A key issue is that that these analyses essentially share many of the limitations of any post hoc analyses beyond the original specified analyses. The use of individual patient data in meta-analysis can provide increased precision and reduce bias. Supplemental analyses are subject to many of the same issues that arise in broader epidemiological analyses. Specific discussion topics are addressed within each of these areas. Summary Increased provision of patient-level data from industry and academic-led clinical trials for secondary research can benefit future patients and society. Responsible data sharing, including transparency of the research objectives, analysis plans and of the results will support appropriate

  19. Application of Workflow Technology for Big Data Analysis Service

    Directory of Open Access Journals (Sweden)

    Bin Zhang

    2018-04-01

    Full Text Available This study presents a lightweight representational state transfer-based cloud workflow system to construct a big data intelligent software-as-a-service (SaaS platform. The system supports the dynamic construction and operation of an intelligent data analysis application, and realizes rapid development and flexible deployment of the business analysis process that can improve the interaction and response time of the process. The proposed system integrates offline-batch and online-streaming analysis models that allow users to conduct batch and streaming computing simultaneously. Users can rend cloud capabilities and customize a set of big data analysis applications in the form of workflow processes. This study elucidates the architecture and application modeling, customization, dynamic construction, and scheduling of a cloud workflow system. A chain workflow foundation mechanism is proposed to combine several analysis components into a chain component that can promote efficiency. Four practical application cases are provided to verify the analysis capability of the system. Experimental results show that the proposed system can support multiple users in accessing the system concurrently and effectively uses data analysis algorithms. The proposed SaaS workflow system has been used in network operators and has achieved good results.

  20. Analysis strategies for high-resolution UHF-fMRI data.

    Science.gov (United States)

    Polimeni, Jonathan R; Renvall, Ville; Zaretskaya, Natalia; Fischl, Bruce

    2018-03-01

    Functional MRI (fMRI) benefits from both increased sensitivity and specificity with increasing magnetic field strength, making it a key application for Ultra-High Field (UHF) MRI scanners. Most UHF-fMRI studies utilize the dramatic increases in sensitivity and specificity to acquire high-resolution data reaching sub-millimeter scales, which enable new classes of experiments to probe the functional organization of the human brain. This review article surveys advanced data analysis strategies developed for high-resolution fMRI at UHF. These include strategies designed to mitigate distortion and artifacts associated with higher fields in ways that attempt to preserve spatial resolution of the fMRI data, as well as recently introduced analysis techniques that are enabled by these extremely high-resolution data. Particular focus is placed on anatomically-informed analyses, including cortical surface-based analysis, which are powerful techniques that can guide each step of the analysis from preprocessing to statistical analysis to interpretation and visualization. New intracortical analysis techniques for laminar and columnar fMRI are also reviewed and discussed. Prospects for single-subject individualized analyses are also presented and discussed. Altogether, there are both specific challenges and opportunities presented by UHF-fMRI, and the use of proper analysis strategies can help these valuable data reach their full potential. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Data fusion qualitative sensitivity analysis

    International Nuclear Information System (INIS)

    Clayton, E.A.; Lewis, R.E.

    1995-09-01

    Pacific Northwest Laboratory was tasked with testing, debugging, and refining the Hanford Site data fusion workstation (DFW), with the assistance of Coleman Research Corporation (CRC), before delivering the DFW to the environmental restoration client at the Hanford Site. Data fusion is the mathematical combination (or fusion) of disparate data sets into a single interpretation. The data fusion software used in this study was developed by CRC. The data fusion software developed by CRC was initially demonstrated on a data set collected at the Hanford Site where three types of data were combined. These data were (1) seismic reflection, (2) seismic refraction, and (3) depth to geologic horizons. The fused results included a contour map of the top of a low-permeability horizon. This report discusses the results of a sensitivity analysis of data fusion software to variations in its input parameters. The data fusion software developed by CRC has a large number of input parameters that can be varied by the user and that influence the results of data fusion. Many of these parameters are defined as part of the earth model. The earth model is a series of 3-dimensional polynomials with horizontal spatial coordinates as the independent variables and either subsurface layer depth or values of various properties within these layers (e.g., compression wave velocity, resistivity) as the dependent variables

  2. Frontier Assignment for Sensitivity Analysis of Data Envelopment Analysis

    Science.gov (United States)

    Naito, Akio; Aoki, Shingo; Tsuji, Hiroshi

    To extend the sensitivity analysis capability for DEA (Data Envelopment Analysis), this paper proposes frontier assignment based DEA (FA-DEA). The basic idea of FA-DEA is to allow a decision maker to decide frontier intentionally while the traditional DEA and Super-DEA decide frontier computationally. The features of FA-DEA are as follows: (1) provides chances to exclude extra-influential DMU (Decision Making Unit) and finds extra-ordinal DMU, and (2) includes the function of the traditional DEA and Super-DEA so that it is able to deal with sensitivity analysis more flexibly. Simple numerical study has shown the effectiveness of the proposed FA-DEA and the difference from the traditional DEA.

  3. Automating dChip: toward reproducible sharing of microarray data analysis

    Directory of Open Access Journals (Sweden)

    Li Cheng

    2008-05-01

    Full Text Available Abstract Background During the past decade, many software packages have been developed for analysis and visualization of various types of microarrays. We have developed and maintained the widely used dChip as a microarray analysis software package accessible to both biologist and data analysts. However, challenges arise when dChip users want to analyze large number of arrays automatically and share data analysis procedures and parameters. Improvement is also needed when the dChip user support team tries to identify the causes of reported analysis errors or bugs from users. Results We report here implementation and application of the dChip automation module. Through this module, dChip automation files can be created to include menu steps, parameters, and data viewpoints to run automatically. A data-packaging function allows convenient transfer from one user to another of the dChip software, microarray data, and analysis procedures, so that the second user can reproduce the entire analysis session of the first user. An analysis report file can also be generated during an automated run, including analysis logs, user comments, and viewpoint screenshots. Conclusion The dChip automation module is a step toward reproducible research, and it can prompt a more convenient and reproducible mechanism for sharing microarray software, data, and analysis procedures and results. Automation data packages can also be used as publication supplements. Similar automation mechanisms could be valuable to the research community if implemented in other genomics and bioinformatics software packages.

  4. Dealing with poor data quality of OSINT data in fraud risk analysis

    NARCIS (Netherlands)

    van Keulen, Maurice

    2015-01-01

    Governmental organizations responsible for keeping certain types of fraud under control, often use data-driven methods for both immediate detection of fraud, or for fraud risk analysis aimed at more effectively targeting inspections. A blind spot in such methods, is that the source data often

  5. Application of computer intensive data analysis methods to the analysis of digital images and spatial data

    DEFF Research Database (Denmark)

    Windfeld, Kristian

    1992-01-01

    Computer-intensive methods for data analysis in a traditional setting has developed rapidly in the last decade. The application of and adaption of some of these methods to the analysis of multivariate digital images and spatial data are explored, evaluated and compared to well established classical...... into the projection pursuit is presented. Examples from remote sensing are given. The ACE algorithm for computing non-linear transformations for maximizing correlation is extended and applied to obtain a non-linear transformation that maximizes autocorrelation or 'signal' in a multivariate image....... This is a generalization of the minimum /maximum autocorrelation factors (MAF's) which is a linear method. The non-linear method is compared to the linear method when analyzing a multivariate TM image from Greenland. The ACE method is shown to give a more detailed decomposition of the image than the MAF-transformation...

  6. Interactive analysis of systems biology molecular expression data

    Directory of Open Access Journals (Sweden)

    Prabhakar Sunil

    2008-02-01

    Full Text Available Abstract Background Systems biology aims to understand biological systems on a comprehensive scale, such that the components that make up the whole are connected to one another and work through dependent interactions. Molecular correlations and comparative studies of molecular expression are crucial to establishing interdependent connections in systems biology. The existing software packages provide limited data mining capability. The user must first generate visualization data with a preferred data mining algorithm and then upload the resulting data into the visualization package for graphic visualization of molecular relations. Results Presented is a novel interactive visual data mining application, SysNet that provides an interactive environment for the analysis of high data volume molecular expression information of most any type from biological systems. It integrates interactive graphic visualization and statistical data mining into a single package. SysNet interactively presents intermolecular correlation information with circular and heatmap layouts. It is also applicable to comparative analysis of molecular expression data, such as time course data. Conclusion The SysNet program has been utilized to analyze elemental profile changes in response to an increasing concentration of iron (Fe in growth media (an ionomics dataset. This study case demonstrates that the SysNet software is an effective platform for interactive analysis of molecular expression information in systems biology.

  7. airborne data analysis/monitor system

    Science.gov (United States)

    Stephison, D. B.

    1981-01-01

    An Airborne Data Analysis/Monitor System (ADAMS), a ROLM 1666 computer based system installed onboard test airplanes used during experimental testing is evaluated. In addition to the 1666 computer, the ADAMS hardware includes a DDC System 90 fixed head disk and a Miltape DD400 floppy disk. Boeing designed a DMA interface to the data acquisition system and an intelligent terminal to reduce system overhead and simplify operator commands. The ADAMS software includes RMX/RTOS and both ROLM FORTRAN and assembly language are used. The ADAMS provides real time displays that enable onboard test engineers to make rapid decisions about test conduct thus reducing the cost and time required to certify new model airplanes, and improved the quality of data derived from the test, leading to more rapid development of improvements resulting in quieter, safer, and more efficient airplanes. The availability of airborne data processing removes most of the weather and geographical restrictions imposed by telemetered flight test data systems. A data base is maintained to describe the airplane, the data acquisition system, the type of testing, and the conditions under which the test is performed.

  8. Functional data analysis of sleeping energy expenditure

    Science.gov (United States)

    Adequate sleep is crucial during childhood for metabolic health, and physical and cognitive development. Inadequate sleep can disrupt metabolic homeostasis and alter sleeping energy expenditure (SEE). Functional data analysis methods were applied to SEE data to elucidate the population structure of ...

  9. PIVOT: platform for interactive analysis and visualization of transcriptomics data.

    Science.gov (United States)

    Zhu, Qin; Fisher, Stephen A; Dueck, Hannah; Middleton, Sarah; Khaladkar, Mugdha; Kim, Junhyong

    2018-01-05

    Many R packages have been developed for transcriptome analysis but their use often requires familiarity with R and integrating results of different packages requires scripts to wrangle the datatypes. Furthermore, exploratory data analyses often generate multiple derived datasets such as data subsets or data transformations, which can be difficult to track. Here we present PIVOT, an R-based platform that wraps open source transcriptome analysis packages with a uniform user interface and graphical data management that allows non-programmers to interactively explore transcriptomics data. PIVOT supports more than 40 popular open source packages for transcriptome analysis and provides an extensive set of tools for statistical data manipulations. A graph-based visual interface is used to represent the links between derived datasets, allowing easy tracking of data versions. PIVOT further supports automatic report generation, publication-quality plots, and program/data state saving, such that all analysis can be saved, shared and reproduced. PIVOT will allow researchers with broad background to easily access sophisticated transcriptome analysis tools and interactively explore transcriptome datasets.

  10. Advances in statistical models for data analysis

    CERN Document Server

    Minerva, Tommaso; Vichi, Maurizio

    2015-01-01

    This edited volume focuses on recent research results in classification, multivariate statistics and machine learning and highlights advances in statistical models for data analysis. The volume provides both methodological developments and contributions to a wide range of application areas such as economics, marketing, education, social sciences and environment. The papers in this volume were first presented at the 9th biannual meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society, held in September 2013 at the University of Modena and Reggio Emilia, Italy.

  11. Lag profile inversion method for EISCAT data analysis

    Directory of Open Access Journals (Sweden)

    I. I. Virtanen

    2008-03-01

    Full Text Available The present standard EISCAT incoherent scatter experiments are based on alternating codes that are decoded in power domain by simple summation and subtraction operations. The signal is first digitised and then different lagged products are calculated and decoded in real time. Only the decoded lagged products are saved for further analysis so that both the original data samples and the undecoded lagged products are lost. A fit of plasma parameters can be later performed using the recorded lagged products. In this paper we describe a different analysis method, which makes use of statistical inversion in removing range ambiguities from the lag profiles. An analysis program carrying out both the lag profile inversion and the fit of the plasma parameters has been constructed. Because recording the received signal itself instead of the lagged products allows very flexible data analysis, the program is constructed to use raw data, i.e. IQ-sampled signal recorded from an IF stage of the radar. The program is now capable of analysing standard alternating-coded EISCAT experiments as well as experiments with any other kind of radar modulation if raw data is available. The program calculates the ambiguous lag profiles and is capable of inverting them as such but, for analysis in real time, time integration is needed before inversion. We demonstrate the method using alternating code experiments in the EISCAT UHF radar and specific hardware connected to the second IF stage of the receiver. This method produces a data stream of complex samples, which are stored for later processing. The raw data is analysed with lag profile inversion and the results are compared to those given by the standard method.

  12. Meta-Analysis for Primary and Secondary Data Analysis: The Super-Experiment Metaphor.

    Science.gov (United States)

    Jackson, Sally

    1991-01-01

    Considers the relation between meta-analysis statistics and analysis of variance statistics. Discusses advantages and disadvantages as a primary data analysis tool. Argues that the two approaches are partial paraphrases of one another. Advocates an integrative approach that introduces the best of meta-analytic thinking into primary analysis…

  13. Statistical methods for categorical data analysis

    CERN Document Server

    Powers, Daniel

    2008-01-01

    This book provides a comprehensive introduction to methods and models for categorical data analysis and their applications in social science research. Companion website also available, at https://webspace.utexas.edu/dpowers/www/

  14. Basic statistical tools in research and data analysis

    Directory of Open Access Journals (Sweden)

    Zulfiqar Ali

    2016-01-01

    Full Text Available Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and reporting of the research findings. The statistical analysis gives meaning to the meaningless numbers, thereby breathing life into a lifeless data. The results and inferences are precise only if proper statistical tests are used. This article will try to acquaint the reader with the basic research tools that are utilised while conducting various studies. The article covers a brief outline of the variables, an understanding of quantitative and qualitative variables and the measures of central tendency. An idea of the sample size estimation, power analysis and the statistical errors is given. Finally, there is a summary of parametric and non-parametric tests used for data analysis.

  15. [Quantitative data analysis for live imaging of bone.

    Science.gov (United States)

    Seno, Shigeto

    Bone tissue is a hard tissue, it was difficult to observe the interior of the bone tissue alive. With the progress of microscopic technology and fluorescent probe technology in recent years, it becomes possible to observe various activities of various cells forming bone society. On the other hand, the quantitative increase in data and the diversification and complexity of the images makes it difficult to perform quantitative analysis by visual inspection. It has been expected to develop a methodology for processing microscopic images and data analysis. In this article, we introduce the research field of bioimage informatics which is the boundary area of biology and information science, and then outline the basic image processing technology for quantitative analysis of live imaging data of bone.

  16. User's manual of JT-60 experimental data analysis system

    International Nuclear Information System (INIS)

    Hirayama, Takashi; Morishima, Soichi; Yoshioka, Yuji

    2010-02-01

    In the Japan Atomic Energy Agency Naka Fusion Institute, a lot of experiments have been conducted by using the large tokamak device JT-60 aiming to realize fusion power plant. In order to optimize the JT-60 experiment and to investigate complex characteristics of plasma, JT-60 experimental data analysis system was developed and used for collecting, referring and analyzing the JT-60 experimental data. Main components of the system are a data analysis server and a database server for the analyses and accumulation of the experimental data respectively. Other peripheral devices of the system are magnetic disk units, NAS (Network Attached Storage) device, and a backup tape drive. This is a user's manual of the JT-60 experimental data analysis system. (author)

  17. Rethinking Meta-Analysis: Applications for Air Pollution Data and Beyond

    Science.gov (United States)

    Goodman, Julie E; Petito Boyce, Catherine; Sax, Sonja N; Beyer, Leslie A; Prueitt, Robyn L

    2015-01-01

    Meta-analyses offer a rigorous and transparent systematic framework for synthesizing data that can be used for a wide range of research areas, study designs, and data types. Both the outcome of meta-analyses and the meta-analysis process itself can yield useful insights for answering scientific questions and making policy decisions. Development of the National Ambient Air Quality Standards illustrates many potential applications of meta-analysis. These applications demonstrate the strengths and limitations of meta-analysis, issues that arise in various data realms, how meta-analysis design choices can influence interpretation of results, and how meta-analysis can be used to address bias and heterogeneity. Reviewing available data from a meta-analysis perspective can provide a useful framework and impetus for identifying and refining strategies for future research. Moreover, increased pervasiveness of a meta-analysis mindset—focusing on how the pieces of the research puzzle fit together—would benefit scientific research and data syntheses regardless of whether or not a quantitative meta-analysis is undertaken. While an individual meta-analysis can only synthesize studies addressing the same research question, the results of separate meta-analyses can be combined to address a question encompassing multiple data types. This observation applies to any scientific or policy area where information from a variety of disciplines must be considered to address a broader research question. PMID:25969128

  18. Integration, warehousing, and analysis strategies of Omics data.

    Science.gov (United States)

    Gedela, Srinubabu

    2011-01-01

    "-Omics" is a current suffix for numerous types of large-scale biological data generation procedures, which naturally demand the development of novel algorithms for data storage and analysis. With next generation genome sequencing burgeoning, it is pivotal to decipher a coding site on the genome, a gene's function, and information on transcripts next to the pure availability of sequence information. To explore a genome and downstream molecular processes, we need umpteen results at the various levels of cellular organization by utilizing different experimental designs, data analysis strategies and methodologies. Here comes the need for controlled vocabularies and data integration to annotate, store, and update the flow of experimental data. This chapter explores key methodologies to merge Omics data by semantic data carriers, discusses controlled vocabularies as eXtensible Markup Languages (XML), and provides practical guidance, databases, and software links supporting the integration of Omics data.

  19. Status of data and data needs for XRF and PIXE based element analysis

    International Nuclear Information System (INIS)

    Kapoor, S.S.; Choudhary, R.K.

    1986-01-01

    The status of data and data needs for X-ray fluorescence (XRF) and particle induced X-ray (PIXE) are examined to determine the areas where additional and improved data are required to improve the accuracy, precision and sensitivity of quantitative element analysis by the above techniques. (author)

  20. Astronomical Image and Data Analysis

    CERN Document Server

    Starck, J.-L

    2006-01-01

    With information and scale as central themes, this comprehensive survey explains how to handle real problems in astronomical data analysis using a modern arsenal of powerful techniques. It treats those innovative methods of image, signal, and data processing that are proving to be both effective and widely relevant. The authors are leaders in this rapidly developing field and draw upon decades of experience. They have been playing leading roles in international projects such as the Virtual Observatory and the Grid. The book addresses not only students and professional astronomers and astrophysicists, but also serious amateur astronomers and specialists in earth observation, medical imaging, and data mining. The coverage includes chapters or appendices on: detection and filtering; image compression; multichannel, multiscale, and catalog data analytical methods; wavelets transforms, Picard iteration, and software tools. This second edition of Starck and Murtagh's highly appreciated reference again deals with to...

  1. Multi-Dimensional Customer Data Analysis in Online Auctions

    Institute of Scientific and Technical Information of China (English)

    LAO Guoling; XIONG Kuan; QIN Zheng

    2007-01-01

    In this paper, we designed a customer-centered data warehouse system with five subjects: listing, bidding, transaction,accounts, and customer contact based on the business process of online auction companies. For each subject, we analyzed its fact indexes and dimensions. Then take transaction subject as example,analyzed the data warehouse model in detail, and got the multi-dimensional analysis structure of transaction subject. At last, using data mining to do customer segmentation, we divided customers into four types: impulse customer, prudent customer, potential customer, and ordinary customer. By the result of multi-dimensional customer data analysis, online auction companies can do more target marketing and increase customer loyalty.

  2. Data analysis for physical scientists featuring Excel

    CERN Document Server

    Kirkup, Les

    2012-01-01

    The ability to summarise data, compare models and apply computer-based analysis tools are vital skills necessary for studying and working in the physical sciences. This textbook supports undergraduate students as they develop and enhance these skills. Introducing data analysis techniques, this textbook pays particular attention to the internationally recognised guidelines for calculating and expressing measurement uncertainty. This new edition has been revised to incorporate Excel® 2010. It also provides a practical approach to fitting models to data using non-linear least squares, a powerful technique which can be applied to many types of model. Worked examples using actual experimental data help students understand how the calculations apply to real situations. Over 200 in-text exercises and end-of-chapter problems give students the opportunity to use the techniques themselves and gain confidence in applying them. Answers to the exercises and problems are given at the end of the book.

  3. Data preparation for functional data analysis of PM10 in Peninsular Malaysia

    Science.gov (United States)

    Shaadan, Norshahida; Jemain, Abdul Aziz; Deni, Sayang Mohd

    2014-07-01

    The use of curves or functional data in the study analysis is increasingly gaining momentum in the various fields of research. The statistical method to analyze such data is known as functional data analysis (FDA). The first step in FDA is to convert the observed data points which are repeatedly recorded over a period of time or space into either a rough (raw) or smooth curve. In the case of the smooth curve, basis functions expansion is one of the methods used for the data conversion. The data can be converted into a smooth curve either by using the regression smoothing or roughness penalty smoothing approach. By using the regression smoothing approach, the degree of curve's smoothness is very dependent on k number of basis functions; meanwhile for the roughness penalty approach, the smoothness is dependent on a roughness coefficient given by parameter λ Based on previous studies, researchers often used the rather time-consuming trial and error or cross validation method to estimate the appropriate number of basis functions. Thus, this paper proposes a statistical procedure to construct functional data or curves for the hourly and daily recorded data. The Bayesian Information Criteria is used to determine the number of basis functions while the Generalized Cross Validation criteria is used to identify the parameter λ The proposed procedure is then applied on a ten year (2001-2010) period of PM10 data from 30 air quality monitoring stations that are located in Peninsular Malaysia. It was found that the number of basis functions required for the construction of the PM10 daily curve in Peninsular Malaysia was in the interval of between 14 and 20 with an average value of 17; the first percentile is 15 and the third percentile is 19. Meanwhile the initial value of the roughness coefficient was in the interval of between 10-5 and 10-7 and the mode was 10-6. An example of the functional descriptive analysis is also shown.

  4. Compass 2011 data analysis and reporting.

    Science.gov (United States)

    2013-05-01

    Past efforts include data analysis and reporting performance and outcomes for signs, pavement, shoulders, roadsides, drainage, traffic, and bridges. In : the 2005 Compass report, measures for bridge inspection and maintenance were added, and historic...

  5. Compass 2012 data analysis and reporting.

    Science.gov (United States)

    2014-05-01

    Past efforts include data analysis and reporting performance and outcomes for signs, pavement, shoulders, roadsides, drainage, traffic, and bridges. In : the 2005 Compass report, measures for bridge inspection and maintenance were added, and historic...

  6. The EADGENE Microarray Data Analysis Workshop

    DEFF Research Database (Denmark)

    de Koning, Dirk-Jan; Jaffrézic, Florence; Lund, Mogens Sandø

    2007-01-01

    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from...... 10 countries performed and discussed the statistical analyses of real and simulated 2-colour microarray data that were distributed among participants. The real data consisted of 48 microarrays from a disease challenge experiment in dairy cattle, while the simulated data consisted of 10 microarrays...... statistical weights, to omitting a large number of spots or omitting entire slides. Surprisingly, these very different approaches gave quite similar results when applied to the simulated data, although not all participating groups analysed both real and simulated data. The workshop was very successful...

  7. Further Development of Rotating Rake Mode Measurement Data Analysis

    Science.gov (United States)

    Dahl, Milo D.; Hixon, Ray; Sutliff, Daniel L.

    2013-01-01

    The Rotating Rake mode measurement system was designed to measure acoustic duct modes generated by a fan stage. After analysis of the measured data, the mode amplitudes and phases were quantified. For low-speed fans within axisymmetric ducts, mode power levels computed from rotating rake measured data would agree with the far-field power levels on a tone by tone basis. However, this agreement required that the sound from the noise sources within the duct propagated outward from the duct exit without reflection at the exit and previous studies suggested conditions could exist where significant reflections could occur. To directly measure the modes propagating in both directions within a duct, a second rake was mounted to the rotating system with an offset in both the axial and the azimuthal directions. The rotating rake data analysis technique was extended to include the data measured by the second rake. The analysis resulted in a set of circumferential mode levels at each of the two rake microphone locations. Radial basis functions were then least-squares fit to this data to obtain the radial mode amplitudes for the modes propagating in both directions within the duct. The fit equations were also modified to allow evanescent mode amplitudes to be computed. This extension of the rotating rake data analysis technique was tested using simulated data, numerical code produced data, and preliminary in-duct measured data.

  8. Computerized diagnostic data analysis and 3-D visualization

    International Nuclear Information System (INIS)

    Schuhmann, D.; Haubner, M.; Krapichler, C.; Englmeier, K.H.; Seemann, M.; Schoepf, U.J.; Gebicke, K.; Reiser, M.

    1998-01-01

    Purpose: To survey methods for 3D data visualization and image analysis which can be used for computer based diagnostics. Material and methods: The methods available are explained in short terms and links to the literature are presented. Methods which allow basic manipulation of 3D data are windowing, rotation and clipping. More complex methods for visualization of 3D data are multiplanar reformation, volume projections (MIP, semi-transparent projections) and surface projections. Methods for image analysis comprise local data transformation (e.g. filtering) and definition and application of complex models (e.g. deformable models). Results: Volume projections produce an impression of the 3D data set without reducing the data amount. This supports the interpretation of the 3D data set and saves time in comparison to any investigation which requires examination of all slice images. More advanced techniques for visualization, e.g. surface projections and hybrid rendering visualize anatomical information to a very detailed extent, but both techniques require the segmentation of the structures of interest. Image analysis methods can be used to extract these structures (e.g. an organ) from the image data. Discussion: At the present time volume projections are robust and fast enough to be used routinely. Surface projections can be used to visualize complex and presegmented anatomical features. (orig.) [de

  9. Topological data analysis: A promising big data exploration tool in biology, analytical chemistry and physical chemistry.

    Science.gov (United States)

    Offroy, Marc; Duponchel, Ludovic

    2016-03-03

    An important feature of experimental science is that data of various kinds is being produced at an unprecedented rate. This is mainly due to the development of new instrumental concepts and experimental methodologies. It is also clear that the nature of acquired data is significantly different. Indeed in every areas of science, data take the form of always bigger tables, where all but a few of the columns (i.e. variables) turn out to be irrelevant to the questions of interest, and further that we do not necessary know which coordinates are the interesting ones. Big data in our lab of biology, analytical chemistry or physical chemistry is a future that might be closer than any of us suppose. It is in this sense that new tools have to be developed in order to explore and valorize such data sets. Topological data analysis (TDA) is one of these. It was developed recently by topologists who discovered that topological concept could be useful for data analysis. The main objective of this paper is to answer the question why topology is well suited for the analysis of big data set in many areas and even more efficient than conventional data analysis methods. Raman analysis of single bacteria should be providing a good opportunity to demonstrate the potential of TDA for the exploration of various spectroscopic data sets considering different experimental conditions (with high noise level, with/without spectral preprocessing, with wavelength shift, with different spectral resolution, with missing data). Copyright © 2016 Elsevier B.V. All rights reserved.

  10. GET electronics samples data analysis

    International Nuclear Information System (INIS)

    Giovinazzo, J.; Goigoux, T.; Anvar, S.; Baron, P.; Blank, B.; Delagnes, E.; Grinyer, G.F.; Pancin, J.; Pedroza, J.L.; Pibernat, J.; Pollacco, E.; Rebii, A.

    2016-01-01

    The General Electronics for TPCs (GET) has been developed to equip a generation of time projection chamber detectors for nuclear physics, and may also be used for a wider range of detector types. The goal of this paper is to propose first analysis procedures to be applied on raw data samples from the GET system, in order to correct for systematic effects observed on test measurements. We also present a method to estimate the response function of the GET system channels. The response function is required in analysis where the input signal needs to be reconstructed, in terms of time distribution, from the registered output samples.

  11. Analysis of ISEE-3/ICE solar wind data

    Science.gov (United States)

    Coplan, Michael A.

    1989-01-01

    Under the grant that ended November 11, 1988 work was accomplished in a number of areas, as follows: (1) Analysis of solar wind data; (2) Analysis of Giacobini/Zinner encounter data; (3) Investigation of solar wind and magnetospheric electron velocity distributions; and (4) Experimental investigation of the electronic structure of clusters. Reprints and preprints of publications resulting from this work are included in the appendices.

  12. Vapor Pressure Data Analysis and Statistics

    Science.gov (United States)

    2016-12-01

    near 8, 2000, and 200, respectively. The A (or a) value is directly related to vapor pressure and will be greater for high vapor pressure materials...1, (10) where n is the number of data points, Yi is the natural logarithm of the i th experimental vapor pressure value, and Xi is the...VAPOR PRESSURE DATA ANALYSIS AND STATISTICS ECBC-TR-1422 Ann Brozena RESEARCH AND TECHNOLOGY DIRECTORATE

  13. Data analysis with the DIANA meta-scheduling approach

    International Nuclear Information System (INIS)

    Anjum, A; McClatchey, R; Willers, I

    2008-01-01

    The concepts, design and evaluation of the Data Intensive and Network Aware (DIANA) meta-scheduling approach for solving the challenges of data analysis being faced by CERN experiments are discussed in this paper. Our results suggest that data analysis can be made robust by employing fault tolerant and decentralized meta-scheduling algorithms supported in our DIANA meta-scheduler. The DIANA meta-scheduler supports data intensive bulk scheduling, is network aware and follows a policy centric meta-scheduling. In this paper, we demonstrate that a decentralized and dynamic meta-scheduling approach is an effective strategy to cope with increasing numbers of users, jobs and datasets. We present 'quality of service' related statistics for physics analysis through the application of a policy centric fair-share scheduling model. The DIANA meta-schedulers create a peer-to-peer hierarchy of schedulers to accomplish resource management that changes with evolving loads and is dynamic and adapts to the volatile nature of the resources

  14. Data analysis algorithms for gravitational-wave experiments

    International Nuclear Information System (INIS)

    Bonifazi, P.; Ferrari, V.; Frasca, S.; Pallottino, G.V.; Pizzella, G.

    1978-01-01

    The analysis of the sensitivity of a gravitational-wave antenna system shows that the role of the algorithms used for the analysis of the experimental data is comparable to that of the experimental apparatus. After a discussion of the processing performed on the input signals by the antenna and the electronic instrumentation, we derive a mathematical model of the system. This model is then used as a basis for the discussion of a number of data analysis algorithms that include also the Wiener-Kolmogoroff optimum filter; the performances of the algorithms are presented in terms of signal-to-noise ratio and sensitivity to short bursts of resonant gravitational waves. The theoretical results are in good agreement with the experimental results obtained with a small cryogenic antenna (24 kg)

  15. Multivariate Analysis of Industrial Scale Fermentation Data

    DEFF Research Database (Denmark)

    Mears, Lisa; Nørregård, Rasmus; Stocks, Stuart M.

    2015-01-01

    Multivariate analysis allows process understanding to be gained from the vast and complex datasets recorded from fermentation processes, however the application of such techniques to this field can be limited by the data pre-processing requirements and data handling. In this work many iterations...

  16. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    CERN Document Server

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  17. WEATHER FORECAST DATA SEMANTIC ANALYSIS IN F-LOGIC

    Directory of Open Access Journals (Sweden)

    Ana Meštrović

    2007-06-01

    Full Text Available This paper addresses the semantic analysis problem in a spoken dialog system developed for the domain of weather forecasts. The main goal of semantic analysis is to extract the meaning from the spoken utterances and to transform it into a domain database format. In this work a semantic database for the domain of weather forecasts is represented using the F-logic formalism. Semantic knowledge is captured through semantic categories a semantic dictionary using phrases and output templates. Procedures for semantic analysis of Croatian weather data combine parsing techniques for Croatian language and slot filling approach. Semantic analysis is conducted in three phases. In the first phase the main semantic category for the input utterance is determined. The lattices are used for hierarchical semantic relation representation and main category derivation. In the second phase semantic units are analyzed and knowledge slots in the database are filled. Since some slot values of input data are missing in the third phase, incomplete data is updated with missing values. All rules for semantic analysis are defined in the F-logic and implemented using the FLORA-2 system. The results of semantic analysis evaluation in terms of frame and slot error rates are presented.

  18. Real-time data analysis at the LHC: present and future

    CERN Document Server

    Gligorov, V.V.

    2015-01-01

    The Large Hadron Collider (LHC), which collides protons at an energy of 14 TeV, produces hundreds of exabytes of data per year, making it one of the largest sources of data in the world today. At present it is not possible to even transfer most of this data from the four main particle detectors at the LHC to "offline" data facilities, much less to permanently store it for future processing. For this reason the LHC detectors are equipped with real-time analysis systems, called triggers, which process this volume of data and select the most interesting proton-proton collisions. The LHC experiment triggers reduce the data produced by the LHC by between 1/1000 and 1/100000, to tens of petabytes per year, allowing its economical storage and further analysis. The bulk of the data-reduction is performed by custom electronics which ignores most of the data in its decision making, and is therefore unable to exploit the most powerful known data analysis strategies. I cover the present status of real-time data analysis ...

  19. A data analysis framework for biomedical big data: Application on mesoderm differentiation of human pluripotent stem cells.

    Science.gov (United States)

    Ulfenborg, Benjamin; Karlsson, Alexander; Riveiro, Maria; Améen, Caroline; Åkesson, Karolina; Andersson, Christian X; Sartipy, Peter; Synnergren, Jane

    2017-01-01

    The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure

  20. An overview of data acquisition, signal coding and data analysis techniques for MST radars

    Science.gov (United States)

    Rastogi, P. K.

    1986-01-01

    An overview is given of the data acquisition, signal processing, and data analysis techniques that are currently in use with high power MST/ST (mesosphere stratosphere troposphere/stratosphere troposphere) radars. This review supplements the works of Rastogi (1983) and Farley (1984) presented at previous MAP workshops. A general description is given of data acquisition and signal processing operations and they are characterized on the basis of their disparate time scales. Then signal coding, a brief description of frequently used codes, and their limitations are discussed, and finally, several aspects of statistical data processing such as signal statistics, power spectrum and autocovariance analysis, outlier removal techniques are discussed.

  1. Analyzing coastal environments by means of functional data analysis

    Science.gov (United States)

    Sierra, Carlos; Flor-Blanco, Germán; Ordoñez, Celestino; Flor, Germán; Gallego, José R.

    2017-07-01

    Here we used Functional Data Analysis (FDA) to examine particle-size distributions (PSDs) in a beach/shallow marine sedimentary environment in Gijón Bay (NW Spain). The work involved both Functional Principal Components Analysis (FPCA) and Functional Cluster Analysis (FCA). The grainsize of the sand samples was characterized by means of laser dispersion spectroscopy. Within this framework, FPCA was used as a dimension reduction technique to explore and uncover patterns in grain-size frequency curves. This procedure proved useful to describe variability in the structure of the data set. Moreover, an alternative approach, FCA, was applied to identify clusters and to interpret their spatial distribution. Results obtained with this latter technique were compared with those obtained by means of two vector approaches that combine PCA with CA (Cluster Analysis). The first method, the point density function (PDF), was employed after adapting a log-normal distribution to each PSD and resuming each of the density functions by its mean, sorting, skewness and kurtosis. The second applied a centered-log-ratio (clr) to the original data. PCA was then applied to the transformed data, and finally CA to the retained principal component scores. The study revealed functional data analysis, specifically FPCA and FCA, as a suitable alternative with considerable advantages over traditional vector analysis techniques in sedimentary geology studies.

  2. MAGMA: Generalized Gene-Set Analysis of GWAS Data

    NARCIS (Netherlands)

    de Leeuw, C.A.; Mooij, J.M.; Heskes, T.; Posthuma, D.

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical

  3. MAGMA: generalized gene-set analysis of GWAS data.

    NARCIS (Netherlands)

    de Leeuw, C.A.; Mooij, J.M.; Heskes, T.; Posthuma, D.

    2015-01-01

    By aggregating data for complex traits in a biologically meaningful way, gene and gene-set analysis constitute a valuable addition to single-marker analysis. However, although various methods for gene and gene-set analysis currently exist, they generally suffer from a number of issues. Statistical

  4. Data-flow Analysis of Programs with Associative Arrays

    Directory of Open Access Journals (Sweden)

    David Hauzar

    2014-05-01

    Full Text Available Dynamic programming languages, such as PHP, JavaScript, and Python, provide built-in data structures including associative arrays and objects with similar semantics—object properties can be created at run-time and accessed via arbitrary expressions. While a high level of security and safety of applications written in these languages can be of a particular importance (consider a web application storing sensitive data and providing its functionality worldwide, dynamic data structures pose significant challenges for data-flow analysis making traditional static verification methods both unsound and imprecise. In this paper, we propose a sound and precise approach for value and points-to analysis of programs with associative arrays-like data structures, upon which data-flow analyses can be built. We implemented our approach in a web-application domain—in an analyzer of PHP code.

  5. Analysis of capture-recapture data

    CERN Document Server

    McCrea, Rachel S

    2014-01-01

    An important first step in studying the demography of wild animals is to identify the animals uniquely through applying markings, such as rings, tags, and bands. Once the animals are encountered again, researchers can study different forms of capture-recapture data to estimate features, such as the mortality and size of the populations. Capture-recapture methods are also used in other areas, including epidemiology and sociology.With an emphasis on ecology, Analysis of Capture-Recapture Data covers many modern developments of capture-recapture and related models and methods and places them in the historical context of research from the past 100 years. The book presents both classical and Bayesian methods.A range of real data sets motivates and illustrates the material and many examples illustrate biometry and applied statistics at work. In particular, the authors demonstrate several of the modeling approaches using one substantial data set from a population of great cormorants. The book also discusses which co...

  6. An Introduction to Analysis of Financial Data with R

    DEFF Research Database (Denmark)

    Nielsen, Søren Feodor

    2014-01-01

    A review of: An introduction to analysis of financial data with R / by Ruey S. Tsay. (Hoboken : John Wiley & Sons, 2012)......A review of: An introduction to analysis of financial data with R / by Ruey S. Tsay. (Hoboken : John Wiley & Sons, 2012)...

  7. Collection and analysis of 2013-2014 travel time data.

    Science.gov (United States)

    2017-07-04

    This report documents the findings of Planning Study 27, Collection and Analysis of 2013-2014 Travel Time Data, which is a continuation of Planning Study 24, Analysis of Historical Travel Time Data. The main scope is to analyze newly acquired link-re...

  8. Trends in Planetary Data Analysis. Executive summary of the Planetary Data Workshop

    Science.gov (United States)

    Evans, N.

    1984-09-01

    Planetary data include non-imaging remote sensing data, which includes spectrometric, radiometric, and polarimetric remote sensing observations. Also included are in-situ, radio/radar data, and Earth based observation. Also discussed is development of a planetary data system. A catalog to identify observations will be the initial entry point for all levels of users into the data system. There are seven distinct data support services: encyclopedia, data index, data inventory, browse, search, sample, and acquire. Data systems for planetary science users must provide access to data, process, store, and display data. Two standards will be incorporated into the planetary data system: Standard communications protocol and Standard format data unit. The data system configuration must combine a distributed system with those of a centralized system. Fiscal constraints have made prioritization important. Activities include saving previous mission data, planning/cost analysis, and publishing of proceedings.

  9. Analysis of Nigerian Hydrometeorological Data | Dike | Nigerian ...

    African Journals Online (AJOL)

    Missing records were determined by the mass curve analysis for rainfall and regression analysis for runoff involving runoff data at neighbouring site. Tests on time homogeneity, showed that the annual rainfall records at Port Harcourt, Enugu and Lokoja were stationary and random, the annual runoff records of River Niger at ...

  10. Spacecraft Interactions Modeling and Post-Mission Data Analysis

    National Research Council Canada - National Science Library

    Bonito, N

    1996-01-01

    Software systems were designed and developed for data management, data acquisition, interactive visualization and analysis of solar arrays, tethered objects, and large object space plasma interactions...

  11. Visualizing data for environmental analysis

    International Nuclear Information System (INIS)

    Benson, J.

    1997-01-01

    The Environmental Restoration Project at Los Alamos National Laboratory (LANL) has over 11,000 sampling locations in a 44 square mile area. The sample analyses contain raw analytical chemistry values for over 2,300 analytes and compounds used to define and remediate contaminated areas at LANL. The data consist of 2.5 million records in an oracle database. Maps are often used to visualize the data. Problems arise when a client specifies a particular kind of map without fully understanding the limitations of the data or the map. The ability of maps to convey information is dependent on many factors, though all maps are data dependent. The quantity, spatial distribution, and numerical range of the data can limit use with certain kinds of maps. To address these issues and educate the clients, several types of statistical maps (e.g., choropleth, isarithm, and graduated symbol such as bubble and spike) used for environmental analysis were chosen to show the advantages, disadvantages, and data limitations of each. By examining both the complexity of the analytical data and the limitations of the map type, it is possible to consider how reality has been transformed through the map, and if that transformation accurately conveys the information present

  12. Visualizing data for environmental analysis

    Energy Technology Data Exchange (ETDEWEB)

    Benson, J.

    1997-04-01

    The Environmental Restoration Project at Los Alamos National Laboratory (LANL) has over 11,000 sampling locations in a 44 square mile area. The sample analyses contain raw analytical chemistry values for over 2,300 analytes and compounds used to define and remediate contaminated areas at LANL. The data consist of 2.5 million records in an oracle database. Maps are often used to visualize the data. Problems arise when a client specifies a particular kind of map without fully understanding the limitations of the data or the map. The ability of maps to convey information is dependent on many factors, though all maps are data dependent. The quantity, spatial distribution, and numerical range of the data can limit use with certain kinds of maps. To address these issues and educate the clients, several types of statistical maps (e.g., choropleth, isarithm, and graduated symbol such as bubble and spike) used for environmental analysis were chosen to show the advantages, disadvantages, and data limitations of each. By examining both the complexity of the analytical data and the limitations of the map type, it is possible to consider how reality has been transformed through the map, and if that transformation accurately conveys the information present.

  13. An Experimental Metagenome Data Management and AnalysisSystem

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  14. STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis.

    Science.gov (United States)

    Dwork, Cynthia; Feldman, Vitaly; Hardt, Moritz; Pitassi, Toniann; Reingold, Omer; Roth, Aaron

    2015-08-07

    Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses. Copyright © 2015, American Association for the Advancement of Science.

  15. The PUMA test program and data analysis

    International Nuclear Information System (INIS)

    Han, J.T.; Morrison, D.L.

    1997-01-01

    The PUMA test program is sponsored by the U.S. Nuclear Regulatory Commission to provide data that are relevant to various Boiling Water Reactor phenomena. The author briefly describes the PUMA test program and facility, presents the objective of the program, provides data analysis for a large-break loss-of-coolant accident test, and compares the data with a RELAP5/MOD 3.1.2 calculation

  16. Data Analysis for the Behavioral Sciences Using SPSS

    Science.gov (United States)

    Lawner Weinberg, Sharon; Knapp Abramowitz, Sarah

    2002-04-01

    This book is written from the perspective that statistics is an integrated set of tools used together to uncover the story contained in numerical data. Accordingly, the book comes with a disk containing a series of real data sets to motivate discussions of appropriate methods of analysis. The presentation is based on a conceptual approach supported by an understanding of underlying mathematical foundations. Students learn that more than one method of analysis is typically needed and that an ample characterization of results is a critical component of any data analytic plan. The use of real data and SPSS to perform computations and create graphical summaries enables a greater emphasis on conceptual understanding and interpretation.

  17. Trip generation and data analysis study.

    Science.gov (United States)

    2015-09-01

    Through the Trip Generation and Data Analysis Study, the District of Columbia Department of : Transportation (DDOT) is undertaking research to better understand multimodal urban trip generation : at mixed-use sites in the District. The study is helpi...

  18. The GIS and data solution for advanced business analysis

    Directory of Open Access Journals (Sweden)

    Carmen RADUT

    2009-12-01

    Full Text Available The GIS Business Analyst is a suite of Geographic Information System (GIS-enabled tools, wizards, and data that provides business professionals with a complete solution for site evaluation, selective customer profiling, and trade area market analysis. Running simple reports, mapping the results, and performing complex probability models are among the capabilities The GIS Business Analyst offers in one affordable desktop analysis solution. Data and analyses produced by The GIS Business Analyst can be shared across departments, reducing redundant research and marketing efforts, speeding analysis of results, and increasing employee efficiency. The GIS Business Analyst is the first suite of tools for unlocking the intelligence of geography, demographic, consumer lifestyle, and business data. It is a valuable asset for business decision making such as analyzing market share and competition, determining new site expansions or reductions, and targeting new customers. The ability to analyze and visualize the geographic component of business data reveals trends, patterns, and opportunities hidden in tabular data. By combining information, such as sales data of the organization, customer information, and competitor locations, with geographic data, such as demographics, territories, or store locations, the GIS Business Analyst helps the user better understand organization market, organization customers, and organization competition. The business intelligence systems bring geographic information systems, marketing analysis tools, and demographic data products together to offer the user powerful ways to compete in today's business strategies.

  19. Spectral analysis of Floating Car Data

    OpenAIRE

    Gössel, F.; Michler, E.; Wrase, B.

    2003-01-01

    Floating Car Data (FCD) are one important data source in traffic telematic systems. The original variable in these systems is the vehicle velocity. The paper analyses the measured value “vehicle velocity" by methods of information technology. Consequences for processing, transmission and storage of FCD under condition of limited resources are discussed. Starting point of the investigation is the analysis of spectral characteristics of velocity-time-profiles. The spectra are determined by...

  20. Analysis of HARP TPC krypton data

    CERN Document Server

    Dydak, F

    2004-01-01

    This memo describes the procedure which was adopted to equalize the response of the 3972 pads of the HARP TPC, using radioactive 83mKr gas. The results obtained from the study of reconstructed krypton clusters in the calibration data taken in 2002 are reported. Two complementary methods were employed in the data analysis. Compatible results were obtained for channel-to-channel equalization constants. An estimate of the overall systematic uncertainty was derived.

  1. An Atomic Data and Analysis Structure

    International Nuclear Information System (INIS)

    Summers, Hugh P.

    2000-01-01

    The Atomic Data and Analysis Structure (ADAS) Project is a shared activity of a world-wide consortium of fusion and astrophysical laboratories directed at developing and maintaining a common approach to analysing and modelling the radiating properties of plasmas. The origin and objectives of ADAS and the organization of its codes and data collections outlined. Current special projects in the ADAS Project work-plans are listed and an illustration given of ADAS at work. (author)

  2. Data analysis and source modelling for LISA

    International Nuclear Information System (INIS)

    Shang, Yu

    2014-01-01

    The gravitational waves are one of the most important predictions in general relativity. Besides of the directly proof of the existence of GWs, there are already several ground based detectors (such as LIGO, GEO, etc) and the planed future space mission (such as: LISA) which are aim to detect the GWs directly. GW contain a large amount of information of its source, extracting these information can help us dig out the physical property of the source, even open a new window for understanding the Universe. Hence, GW data analysis will be a challenging task in seeking the GWs. In this thesis, I present two works about the data analysis for LISA. In the first work, we introduce an extended multimodal genetic algorithm which utilizes the properties of the signal and the detector response function to analyze the data from the third round of mock LISA data challenge. We have found all five sources present in the data and recovered the coalescence time, chirp mass, mass ratio and sky location with reasonable accuracy. As for the orbital angular momentum and two spins of the Black Holes, we have found a large number of widely separated modes in the parameter space with similar maximum likelihood values. The performance of this method is comparable, if not better, to already existing algorithms. In the second work, we introduce an new phenomenological waveform model for the extreme mass ratio inspiral system. This waveform consists of a set of harmonics with constant amplitude and slowly evolving phase which we decompose in a Taylor series. We use these phenomenological templates to detect the signal in the simulated data, and then, assuming a particular EMRI model, estimate the physical parameters of the binary with high precision. The results show that our phenomenological waveform is very feasible in the data analysis of EMRI signal.

  3. FIND--a unified framework for neural data analysis.

    Science.gov (United States)

    Meier, Ralph; Egert, Ulrich; Aertsen, Ad; Nawrot, Martin P

    2008-10-01

    The complexity of neurophysiology data has increased tremendously over the last years, especially due to the widespread availability of multi-channel recording techniques. With adequate computing power the current limit for computational neuroscience is the effort and time it takes for scientists to translate their ideas into working code. Advanced analysis methods are complex and often lack reproducibility on the basis of published descriptions. To overcome this limitation we develop FIND (Finding Information in Neural Data) as a platform-independent, open source framework for the analysis of neuronal activity data based on Matlab (Mathworks). Here, we outline the structure of the FIND framework and describe its functionality, our measures of quality control, and the policies for developers and users. Within FIND we have developed a unified data import from various proprietary formats, simplifying standardized interfacing with tools for analysis and simulation. The toolbox FIND covers a steadily increasing number of tools. These analysis tools address various types of neural activity data, including discrete series of spike events, continuous time series and imaging data. Additionally, the toolbox provides solutions for the simulation of parallel stochastic point processes to model multi-channel spiking activity. We illustrate two examples of complex analyses with FIND tools: First, we present a time-resolved characterization of the spiking irregularity in an in vivo extracellular recording from a mushroom-body extrinsic neuron in the honeybee during odor stimulation. Second, we describe layer specific input dynamics in the rat primary visual cortex in vivo in response to visual flash stimulation on the basis of multi-channel spiking activity.

  4. Combining triggers in HEP data analysis

    International Nuclear Information System (INIS)

    Lendermann, Victor; Herbst, Michael; Krueger, Katja; Schultz-Coulon, Hans-Christian; Stamen, Rainer; Haller, Johannes

    2009-01-01

    Modern high-energy physics experiments collect data using dedicated complex multi-level trigger systems which perform an online selection of potentially interesting events. In general, this selection suffers from inefficiencies. A further loss of statistics occurs when the rate of accepted events is artificially scaled down in order to meet bandwidth constraints. An offline analysis of the recorded data must correct for the resulting losses in order to determine the original statistics of the analysed data sample. This is particularly challenging when data samples recorded by several triggers are combined. In this paper we present methods for the calculation of the offline corrections and study their statistical performance. Implications on building and operating trigger systems are discussed. (orig.)

  5. Combining triggers in HEP data analysis

    Energy Technology Data Exchange (ETDEWEB)

    Lendermann, Victor; Herbst, Michael; Krueger, Katja; Schultz-Coulon, Hans-Christian; Stamen, Rainer [Heidelberg Univ. (Germany). Kirchhoff-Institut fuer Physik; Haller, Johannes [Hamburg Univ. (Germany). Institut fuer Experimentalphysik

    2009-01-15

    Modern high-energy physics experiments collect data using dedicated complex multi-level trigger systems which perform an online selection of potentially interesting events. In general, this selection suffers from inefficiencies. A further loss of statistics occurs when the rate of accepted events is artificially scaled down in order to meet bandwidth constraints. An offline analysis of the recorded data must correct for the resulting losses in order to determine the original statistics of the analysed data sample. This is particularly challenging when data samples recorded by several triggers are combined. In this paper we present methods for the calculation of the offline corrections and study their statistical performance. Implications on building and operating trigger systems are discussed. (orig.)

  6. An automated data management/analysis system for space shuttle orbiter tiles. [stress analysis

    Science.gov (United States)

    Giles, G. L.; Ballas, M.

    1982-01-01

    An engineering data management system was combined with a nonlinear stress analysis program to provide a capability for analyzing a large number of tiles on the space shuttle orbiter. Tile geometry data and all data necessary of define the tile loads environment accessed automatically as needed for the analysis of a particular tile or a set of tiles. User documentation provided includes: (1) description of computer programs and data files contained in the system; (2) definitions of all engineering data stored in the data base; (3) characteristics of the tile anaytical model; (4) instructions for preparation of user input; and (5) a sample problem to illustrate use of the system. Description of data, computer programs, and analytical models of the tile are sufficiently detailed to guide extension of the system to include additional zones of tiles and/or additional types of analyses

  7. Towards adaptive, streaming analysis of x-ray tomography data

    Energy Technology Data Exchange (ETDEWEB)

    Thomas, Mathew; Kleese van Dam, Kerstin; Marshall, Matthew J.; Kuprat, Andrew P.; Carson, James P.; Lansing, Carina S.; Guillen, Zoe C.; Miller, Erin A.; Lanekoff, Ingela; Laskin, Julia

    2015-03-04

    Temporal and spatial resolution of chemical imaging methodologies such as x-ray tomography are rapidly increasing, leading to more complex experimental procedures and fast growing data volumes. Automated analysis pipelines and big data analytics are becoming essential to effectively evaluate the results of such experiments. Offering those data techniques in an adaptive, streaming environment can further substantially improve the scientific discovery process, by enabling experimental control and steering based on the evaluation of emerging phenomena as they are observed by the experiment. Pacific Northwest National Laboratory (PNNL)’ Chemical Imaging Initiative (CII - http://imaging.pnnl.gov/ ) has worked since 2011 towards developing a framework that allows users to rapidly compose and customize high throughput experimental analysis pipelines for multiple instrument types. The framework, named ‘Rapid Experimental Analysis’ (REXAN) Framework [1], is based on the idea of reusable component libraries and utilizes the PNNL developed collaborative data management and analysis environment ‘Velo’, to provide a user friendly analysis and data management environment for experimental facilities. This article will, discuss the capabilities established for X-Ray tomography, discuss lessons learned, and provide an overview of our more recent work in the Analysis in Motion Initiative (AIM - http://aim.pnnl.gov/ ) at PNNL to provide REXAN capabilities in a streaming environment.

  8. Calibration data Analysis Package (CAP): An IDL based widget application for analysis of X-ray calibration data

    Science.gov (United States)

    Vaishali, S.; Narendranath, S.; Sreekumar, P.

    An IDL (interactive data language) based widget application developed for the calibration of C1XS (Narendranath et al., 2010) instrument on Chandrayaan-1 is modified to provide a generic package for the analysis of data from x-ray detectors. The package supports files in ascii as well as FITS format. Data can be fitted with a list of inbuilt functions to derive the spectral redistribution function (SRF). We have incorporated functions such as `HYPERMET' (Philips & Marlow 1976) including non Gaussian components in the SRF such as low energy tail, low energy shelf and escape peak. In addition users can incorporate additional models which may be required to model detector specific features. Spectral fits use a routine `mpfit' which uses Leven-Marquardt least squares fitting method. The SRF derived from this tool can be fed into an accompanying program to generate a redistribution matrix file (RMF) compatible with the X-ray spectral analysis package XSPEC. The tool provides a user friendly interface of help to beginners and also provides transparency and advanced features for experts.

  9. Discuss on luminescence dose data analysis technology

    International Nuclear Information System (INIS)

    Ma Xinhua; Xiao Wuyun; Ai Xianyun; Shi Zhilan; Liu Ying

    2009-01-01

    This article describes the development of luminescence dose data measurement and processing technology. General design planning of luminescence dose data measurement and processing technology is put forward with the diverse demands. The emphasis is focused on dose data processing method, luminescence curve analysis method, using of network, mechanics of communication among computers, data base management system of individual dose in this paper. The main methods and skills used in this technology as well as their advantages are also discussed. And it offers general design references for development luminescence dose data processing software. (authors)

  10. Uncertain data envelopment analysis

    CERN Document Server

    Wen, Meilin

    2014-01-01

    This book is intended to present the milestones in the progression of uncertain Data envelopment analysis (DEA). Chapter 1 gives some basic introduction to uncertain theories, including probability theory, credibility theory, uncertainty theory and chance theory. Chapter 2 presents a comprehensive review and discussion of basic DEA models. The stochastic DEA is introduced in Chapter 3, in which the inputs and outputs are assumed to be random variables. To obtain the probability distribution of a random variable, a lot of samples are needed to apply the statistics inference approach. Chapter 4

  11. Analysis of Runway Incursion Data

    Science.gov (United States)

    Green, Lawrence L.

    2013-01-01

    A statistical analysis of runway incursion (RI) events was conducted to ascertain relevance to the top ten challenges of the National Aeronautics and Space Administration Aviation Safety Program (AvSP). The information contained in the RI database was found to contain data that may be relevant to several of the AvSP top ten challenges. When combined with other data from the FAA documenting air traffic volume from calendar year 2000 through 2011, the structure of a predictive model emerges that can be used to forecast the frequency of RI events at various airports for various classes of aircraft and under various environmental conditions.

  12. Bioinformatics and Microarray Data Analysis on the Cloud.

    Science.gov (United States)

    Calabrese, Barbara; Cannataro, Mario

    2016-01-01

    High-throughput platforms such as microarray, mass spectrometry, and next-generation sequencing are producing an increasing volume of omics data that needs large data storage and computing power. Cloud computing offers massive scalable computing and storage, data sharing, on-demand anytime and anywhere access to resources and applications, and thus, it may represent the key technology for facing those issues. In fact, in the recent years it has been adopted for the deployment of different bioinformatics solutions and services both in academia and in the industry. Although this, cloud computing presents several issues regarding the security and privacy of data, that are particularly important when analyzing patients data, such as in personalized medicine. This chapter reviews main academic and industrial cloud-based bioinformatics solutions; with a special focus on microarray data analysis solutions and underlines main issues and problems related to the use of such platforms for the storage and analysis of patients data.

  13. Expediting Combinatorial Data Set Analysis by Combining Human and Algorithmic Analysis.

    Science.gov (United States)

    Stein, Helge Sören; Jiao, Sally; Ludwig, Alfred

    2017-01-09

    A challenge in combinatorial materials science remains the efficient analysis of X-ray diffraction (XRD) data and its correlation to functional properties. Rapid identification of phase-regions and proper assignment of corresponding crystal structures is necessary to keep pace with the improved methods for synthesizing and characterizing materials libraries. Therefore, a new modular software called htAx (high-throughput analysis of X-ray and functional properties data) is presented that couples human intelligence tasks used for "ground-truth" phase-region identification with subsequent unbiased verification by an algorithm to efficiently analyze which phases are present in a materials library. Identified phases and phase-regions may then be correlated to functional properties in an expedited manner. For the functionality of htAx to be proven, two previously published XRD benchmark data sets of the materials systems Al-Cr-Fe-O and Ni-Ti-Cu are analyzed by htAx. The analysis of ∼1000 XRD patterns takes less than 1 day with htAx. The proposed method reliably identifies phase-region boundaries and robustly identifies multiphase structures. The method also addresses the problem of identifying regions with previously unpublished crystal structures using a special daisy ternary plot.

  14. Model Selection in Data Analysis Competitions

    DEFF Research Database (Denmark)

    Wind, David Kofoed; Winther, Ole

    2014-01-01

    The use of data analysis competitions for selecting the most appropriate model for a problem is a recent innovation in the field of predictive machine learning. Two of the most well-known examples of this trend was the Netflix Competition and recently the competitions hosted on the online platform...... performers from Kaggle and use previous personal experiences from competing in Kaggle competitions. The stated hypotheses about feature engineering, ensembling, overfitting, model complexity and evaluation metrics give indications and guidelines on how to select a proper model for performing well...... Kaggle. In this paper, we will state and try to verify a set of qualitative hypotheses about predictive modelling, both in general and in the scope of data analysis competitions. To verify our hypotheses we will look at previous competitions and their outcomes, use qualitative interviews with top...

  15. Approaches to Data Analysis of Multiple-Choice Questions

    Science.gov (United States)

    Ding, Lin; Beichner, Robert

    2009-01-01

    This paper introduces five commonly used approaches to analyzing multiple-choice test data. They are classical test theory, factor analysis, cluster analysis, item response theory, and model analysis. Brief descriptions of the goals and algorithms of these approaches are provided, together with examples illustrating their applications in physics…

  16. Preparing data for analysis using microsoft Excel.

    Science.gov (United States)

    Elliott, Alan C; Hynan, Linda S; Reisch, Joan S; Smith, Janet P

    2006-09-01

    A critical component essential to good research is the accurate and efficient collection and preparation of data for analysis. Most medical researchers have little or no training in data management, often causing not only excessive time spent cleaning data but also a risk that the data set contains collection or recording errors. The implementation of simple guidelines based on techniques used by professional data management teams will save researchers time and money and result in a data set better suited to answer research questions. Because Microsoft Excel is often used by researchers to collect data, specific techniques that can be implemented in Excel are presented.

  17. WFIRST: Microlensing Analysis Data Challenge

    Science.gov (United States)

    Street, Rachel; WFIRST Microlensing Science Investigation Team

    2018-01-01

    WFIRST will produce thousands of high cadence, high photometric precision lightcurves of microlensing events, from which a wealth of planetary and stellar systems will be discovered. However, the analysis of such lightcurves has historically been very time consuming and expensive in both labor and computing facilities. This poses a potential bottleneck to deriving the full science potential of the WFIRST mission. To address this problem, the WFIRST Microlensing Science Investigation Team designing a series of data challenges to stimulate research to address outstanding problems of microlensing analysis. These range from the classification and modeling of triple lens events to methods to efficiently yet thoroughly search a high-dimensional parameter space for the best fitting models.

  18. An integrated data-analysis and database system for AMS 14C

    International Nuclear Information System (INIS)

    Kjeldsen, Henrik; Olsen, Jesper; Heinemeier, Jan

    2010-01-01

    AMSdata is the name of a combined database and data-analysis system for AMS 14 C and stable-isotope work that has been developed at Aarhus University. The system (1) contains routines for data analysis of AMS and MS data, (2) allows a flexible and accurate description of sample extraction and pretreatment, also when samples are split into several fractions, and (3) keeps track of all measured, calculated and attributed data. The structure of the database is flexible and allows an unlimited number of measurement and pretreatment procedures. The AMS 14 C data analysis routine is fairly advanced and flexible, and it can be easily optimized for different kinds of measuring processes. Technically, the system is based on a Microsoft SQL server and includes stored SQL procedures for the data analysis. Microsoft Office Access is used for the (graphical) user interface, and in addition Excel, Word and Origin are exploited for input and output of data, e.g. for plotting data during data analysis.

  19. Reducing bias in the analysis of counting statistics data

    International Nuclear Information System (INIS)

    Hammersley, A.P.; Antoniadis, A.

    1997-01-01

    In the analysis of counting statistics data it is common practice to estimate the variance of the measured data points as the data points themselves. This practice introduces a bias into the results of further analysis which may be significant, and under certain circumstances lead to false conclusions. In the case of normal weighted least squares fitting this bias is quantified and methods to avoid it are proposed. (orig.)

  20. Initial implementation of a comparative data analysis ontology.

    Science.gov (United States)

    Prosdocimi, Francisco; Chisham, Brandon; Pontelli, Enrico; Thompson, Julie D; Stoltzfus, Arlin

    2009-07-03

    Comparative analysis is used throughout biology. When entities under comparison (e.g. proteins, genomes, species) are related by descent, evolutionary theory provides a framework that, in principle, allows N-ary comparisons of entities, while controlling for non-independence due to relatedness. Powerful software tools exist for specialized applications of this approach, yet it remains under-utilized in the absence of a unifying informatics infrastructure. A key step in developing such an infrastructure is the definition of a formal ontology. The analysis of use cases and existing formalisms suggests that a significant component of evolutionary analysis involves a core problem of inferring a character history, relying on key concepts: "Operational Taxonomic Units" (OTUs), representing the entities to be compared; "character-state data" representing the observations compared among OTUs; "phylogenetic tree", representing the historical path of evolution among the entities; and "transitions", the inferred evolutionary changes in states of characters that account for observations. Using the Web Ontology Language (OWL), we have defined these and other fundamental concepts in a Comparative Data Analysis Ontology (CDAO). CDAO has been evaluated for its ability to represent token data sets and to support simple forms of reasoning. With further development, CDAO will provide a basis for tools (for semantic transformation, data retrieval, validation, integration, etc.) that make it easier for software developers and biomedical researchers to apply evolutionary methods of inference to diverse types of data, so as to integrate this powerful framework for reasoning into their research.

  1. Analysis of Hydrologic Properties Data

    Energy Technology Data Exchange (ETDEWEB)

    H. H. Liu

    2003-04-03

    This Model Report describes the methods used to determine hydrologic properties based on the available field data from the unsaturated zone (UZ) at Yucca Mountain, Nevada, and documents validation of the active fracture model (AFM). This work was planned in ''Technical Work Plan (TWP) for: Performance Assessment Unsaturated Zone'' (BSC 2002 [160819], Sections 1.10.2, 1.10.3, and 1.10.8). Fracture and matrix properties are developed by analyzing available survey data from the Exploratory Studies Facility (ESF), Cross Drift for Enhanced Characterization of Repository Block (ECRB), and/or boreholes; air injection testing data from surface boreholes and from boreholes in the ESF; and data from laboratory testing of core samples. The AFM is validated on the basis of experimental observations and theoretical developments. This report is a revision of an Analysis Model Report, under the same title, as a scientific analysis with Document Identifier number ANL-NBS-HS-000002 (BSC 2001 [159725]) that did not document activities to validate the AFM. The principal purpose of this work is to provide representative uncalibrated estimates of fracture and matrix properties for use in the model report ''Calibrated Properties Model'' (BSC 2003 [160240]). The present work also provides fracture geometry properties for generating dual permeability grids as documented in the Scientific Analysis Report, ''Development of Numerical Grids for UZ Flow and Transport Modeling'' (BSC 2003 [160109]). The resulting calibrated property sets and numerical grids from these reports will be used in the Unsaturated Zone Flow and Transport Process Model (UZ Model), and Total System Performance Assessment (TSPA) models. The fracture and matrix properties developed in this Model Report include: (1) Fracture properties (frequency, permeability, van Genuchten a and m parameters, aperture, porosity, and interface area) for each UZ Model layer; (2

  2. Data envelopment analysis with uncertain data: An application for Iranian electricity distribution companies

    International Nuclear Information System (INIS)

    Sadjadi, S.J.; Omrani, H.

    2008-01-01

    This paper presents Data Envelopment Analysis (DEA) model with uncertain data for performance assessment of electricity distribution companies. During the past two decades, DEA has been widely used for benchmarking the electricity distribution companies. However, there is no study among many existing DEA approaches where the uncertainty in data is allowed and, at the same time, the distribution of the random data is permitted to be unknown. The proposed method of this paper develops a new DEA method with the consideration of uncertainty on output parameters. The method is based on the adaptation of recently developed robust optimization approaches proposed by Ben-Tal and Nemirovski [2000. Robust solutions of linear programming problems contaminated with uncertain data. Mathematical Programming 88, 411-421] and Bertsimas et al. [2004. Robust linear optimization under general norms. Operations Research Letters 32, 510-516]. The results are compared with an existing parametric Stochastic Frontier Analysis (SFA) using data from 38 electricity distribution companies in Iran to show the effects of the data uncertainties on the performance of DEA outputs. The results indicate that the robust DEA approach can be a relatively more reliable method for efficiency estimating and ranking strategies

  3. Effective and efficient analysis of spatio-temporal data

    Science.gov (United States)

    Zhang, Zhongnan

    Spatio-temporal data mining, i.e., mining knowledge from large amount of spatio-temporal data, is a highly demanding field because huge amounts of spatio-temporal data have been collected in various applications, ranging from remote sensing, to geographical information systems (GIS), computer cartography, environmental assessment and planning, etc. The collection data far exceeded human's ability to analyze which make it crucial to develop analysis tools. Recent studies on data mining have extended to the scope of data mining from relational and transactional datasets to spatial and temporal datasets. Among the various forms of spatio-temporal data, remote sensing images play an important role, due to the growing wide-spreading of outer space satellites. In this dissertation, we proposed two approaches to analyze the remote sensing data. The first one is about applying association rules mining onto images processing. Each image was divided into a number of image blocks. We built a spatial relationship for these blocks during the dividing process. This made a large number of images into a spatio-temporal dataset since each image was shot in time-series. The second one implemented co-occurrence patterns discovery from these images. The generated patterns represent subsets of spatial features that are located together in space and time. A weather analysis is composed of individual analysis of several meteorological variables. These variables include temperature, pressure, dew point, wind, clouds, visibility and so on. Local-scale models provide detailed analysis and forecasts of meteorological phenomena ranging from a few kilometers to about 100 kilometers in size. When some of above meteorological variables have some special change tendency, some kind of severe weather will happen in most cases. Using the discovery of association rules, we found that some special meteorological variables' changing has tight relation with some severe weather situation that will happen

  4. Big data scalability for high throughput processing and analysis of vehicle engineering data

    OpenAIRE

    Lu, Feng

    2017-01-01

    "Sympathy for Data" is a platform that is utilized for Big Data automation analytics. It is based on visual interface and workflow configurations. The main purpose of the platform is to reuse parts of code for structured analysis of vehicle engineering data. However, there are some performance issues on a single machine for processing a large amount of data in Sympathy for Data. There are also disk and CPU IO intensive issues when the data is oversized and the platform need fits comfortably i...

  5. Integrative data analysis of male reproductive disorders

    DEFF Research Database (Denmark)

    Edsgard, Stefan Daniel

    of such data in conjunction with data from publicly available repositories. This thesis presents an introduction to disease genetics and molecular systems biology, followed by four studies that each provide detailed clues to the etiology of male reproductive disorders. Finally, a fifth study illustrates......-wide association data with respect to copy number variation and show that the aggregated effect of rare variants can influence the risk for testicular cancer. Paper V provides an example of the application of RNA-Seq for expression analysis of a species with an unsequenced genome. We analysed the plant...... of this thesis is the identification of the molecular basis of male reproductive disorders, with a special focus on testicular cancer. To this end, clinical samples were characterized by microarraybased transcription and genomic variation assays and molecular entities were identified by computational analysis...

  6. Truck Roll Stability Data Collection and Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Stevens, SS

    2001-07-02

    The principal objective of this project was to collect and analyze vehicle and highway data that are relevant to the problem of truck rollover crashes, and in particular to the subset of rollover crashes that are caused by the driver error of entering a curve at a speed too great to allow safe completion of the turn. The data are of two sorts--vehicle dynamic performance data, and highway geometry data as revealed by vehicle behavior in normal driving. Vehicle dynamic performance data are relevant because the roll stability of a tractor trailer depends both on inherent physical characteristics of the vehicle and on the weight and distribution of the particular cargo that is being carried. Highway geometric data are relevant because the set of crashes of primary interest to this study are caused by lateral acceleration demand in a curve that exceeds the instantaneous roll stability of the vehicle. An analysis of data quality requires an evaluation of the equipment used to collect the data because the reliability and accuracy of both the equipment and the data could profoundly affect the safety of the driver and other highway users. Therefore, a concomitant objective was an evaluation of the performance of the set of data-collection equipment on the truck and trailer. The objective concerning evaluation of the equipment was accomplished, but the results were not entirely positive. Significant engineering apparently remains to be done before a reliable system can be fielded. Problems were identified with the trailer to tractor fiber optic connector used for this test. In an over-the-road environment, the communication between the trailer instrumentation and the tractor must be dependable. In addition, the computer in the truck must be able to withstand the rigors of the road. The major objective--data collection and analysis--was also accomplished. Using data collected by instruments on the truck, a ''bad-curve'' database can be generated. Using

  7. One-Click Data Analysis Software for Science Operations

    Science.gov (United States)

    Navarro, Vicente

    2015-12-01

    One of the important activities of ESA Science Operations Centre is to provide Data Analysis Software (DAS) to enable users and scientists to process data further to higher levels. During operations and post-operations, Data Analysis Software (DAS) is fully maintained and updated for new OS and library releases. Nonetheless, once a Mission goes into the "legacy" phase, there are very limited funds and long-term preservation becomes more and more difficult. Building on Virtual Machine (VM), Cloud computing and Software as a Service (SaaS) technologies, this project has aimed at providing long-term preservation of Data Analysis Software for the following missions: - PIA for ISO (1995) - SAS for XMM-Newton (1999) - Hipe for Herschel (2009) - EXIA for EXOSAT (1983) Following goals have guided the architecture: - Support for all operations, post-operations and archive/legacy phases. - Support for local (user's computer) and cloud environments (ESAC-Cloud, Amazon - AWS). - Support for expert users, requiring full capabilities. - Provision of a simple web-based interface. This talk describes the architecture, challenges, results and lessons learnt gathered in this project.

  8. Exploring charge density analysis in crystals at high pressure: data collection, data analysis and advanced modelling.

    Science.gov (United States)

    Casati, Nicola; Genoni, Alessandro; Meyer, Benjamin; Krawczuk, Anna; Macchi, Piero

    2017-08-01

    The possibility to determine electron-density distribution in crystals has been an enormous breakthrough, stimulated by a favourable combination of equipment for X-ray and neutron diffraction at low temperature, by the development of simplified, though accurate, electron-density models refined from the experimental data and by the progress in charge density analysis often in combination with theoretical work. Many years after the first successful charge density determination and analysis, scientists face new challenges, for example: (i) determination of the finer details of the electron-density distribution in the atomic cores, (ii) simultaneous refinement of electron charge and spin density or (iii) measuring crystals under perturbation. In this context, the possibility of obtaining experimental charge density at high pressure has recently been demonstrated [Casati et al. (2016). Nat. Commun. 7, 10901]. This paper reports on the necessities and pitfalls of this new challenge, focusing on the species syn-1,6:8,13-biscarbonyl[14]annulene. The experimental requirements, the expected data quality and data corrections are discussed in detail, including warnings about possible shortcomings. At the same time, new modelling techniques are proposed, which could enable specific information to be extracted, from the limited and less accurate observations, like the degree of localization of double bonds, which is fundamental to the scientific case under examination.

  9. Data warehousing as a basis for web-based documentation of data mining and analysis.

    Science.gov (United States)

    Karlsson, J; Eklund, P; Hallgren, C G; Sjödin, J G

    1999-01-01

    In this paper we present a case study for data warehousing intended to support data mining and analysis. We also describe a prototype for data retrieval. Further we discuss some technical issues related to a particular choice of a patient record environment.

  10. Managing Electrochemical Noise Data by Exception Application of an On Line EN Data Analysis Technique to Data From a High Level Nuclear Waste Tank

    International Nuclear Information System (INIS)

    EDGEMON, G.L.

    2003-01-01

    Electrochemical noise has been used a t the Hanford Site for a number of years to monitor in real time for pitting corrosion and stress corrosion cracking (SCC) mechanisms in high level nuclear waste tanks. Currently the monitoring technique has only been implemented on three of the 177 underground storage tanks on the site. Widespread implementation of the technique has been held back for of a number of reasons, including issues around managing the large volume of data associated with electrochemical noise and the complexity of data analysis. Expert review of raw current and potential measurements is the primary form of data analysis currently used at the Hanford site. This paper demonstrates the application of an on-line data filtering and analysis technique that could allow data from field applications of electrochemical noise to be managed by exception, transforming electrochemical noise data into a process parameter and focusing data analysis efforts on the important data. Results of the analysis demonstrate a data compression rate of 95%; that is, only 5% of the data would require expert analysis if such a technique were implemented. It is also demonstrated that this technique is capable of identifying key periods where localized corrosion activity is apparent

  11. Sentiment analysis in twitter data using data analytic techniques for predictive modelling

    Science.gov (United States)

    Razia Sulthana, A.; Jaithunbi, A. K.; Sai Ramesh, L.

    2018-04-01

    Sentiment analysis refers to the task of natural language processing to determine whether a piece of text contains subjective information and the kind of subjective information it expresses. The subjective information represents the attitude behind the text: positive, negative or neutral. Understanding the opinions behind user-generated content automatically is of great concern. We have made data analysis with huge amount of tweets taken as big data and thereby classifying the polarity of words, sentences or entire documents. We use linear regression for modelling the relationship between a scalar dependent variable Y and one or more explanatory variables (or independent variables) denoted X. We conduct a series of experiments to test the performance of the system.

  12. Collective Thomson scattering data analysis for Wendelstein 7-X

    DEFF Research Database (Denmark)

    Abramovic, I.; Pavone, A.; Svensson, J.

    2017-01-01

    Collective Thomson scattering (CTS) diagnostic is being installed on the Wendelstein 7-X stellarator to measure the bulk ion temperature in the upcoming experimental campaign. In order to prepare for the data analysis, a forward model of the diagnostic (eCTS) has been developed and integrated...... into the Bayesian data analysis framework Minerva. Synthetic spectra have been calculated with the forward model and inverted using Minerva in order to demonstrate the feasibility to measure the ion temperature in the presence of nuisance parameters that also influence CTS spectra. In this paper we report...... on the results of this anlysis and discuss the main sources of uncertainty in the CTS data analysis....

  13. Programs for nuclear data analysis

    International Nuclear Information System (INIS)

    Bell, R.A.I.

    1975-01-01

    The following report details a number of programs and subroutines which are useful for analysis of data from nuclear physics experiments. Most of them are available from pool pack 005 on the IBM1800 computer. All of these programs are stored there as core loads, and the subroutines and functions in relocatable format. The nature and location of other programs are specified as appropriate. (author)

  14. Advancing data management and analysis in different scientific disciplines

    Science.gov (United States)

    Fischer, M.; Gasthuber, M.; Giesler, A.; Hardt, M.; Meyer, J.; Prabhune, A.; Rigoll, F.; Schwarz, K.; Streit, A.

    2017-10-01

    Over the past several years, rapid growth of data has affected many fields of science. This has often resulted in the need for overhauling or exchanging the tools and approaches in the disciplines’ data life cycles. However, this allows the application of new data analysis methods and facilitates improved data sharing. The project Large-Scale Data Management and Analysis (LSDMA) of the German Helmholtz Association has been addressing both specific and generic requirements in its data life cycle successfully since 2012. Its data scientists work together with researchers from the fields such as climatology, energy and neuroscience to improve the community-specific data life cycles, in several cases even all stages of the data life cycle, i.e. from data acquisition to data archival. LSDMA scientists also study methods and tools that are of importance to many communities, e.g. data repositories and authentication and authorization infrastructure.

  15. Serial Expression Analysis: a web tool for the analysis of serial gene expression data

    Science.gov (United States)

    Nueda, Maria José; Carbonell, José; Medina, Ignacio; Dopazo, Joaquín; Conesa, Ana

    2010-01-01

    Serial transcriptomics experiments investigate the dynamics of gene expression changes associated with a quantitative variable such as time or dosage. The statistical analysis of these data implies the study of global and gene-specific expression trends, the identification of significant serial changes, the comparison of expression profiles and the assessment of transcriptional changes in terms of cellular processes. We have created the SEA (Serial Expression Analysis) suite to provide a complete web-based resource for the analysis of serial transcriptomics data. SEA offers five different algorithms based on univariate, multivariate and functional profiling strategies framed within a user-friendly interface and a project-oriented architecture to facilitate the analysis of serial gene expression data sets from different perspectives. SEA is available at sea.bioinfo.cipf.es. PMID:20525784

  16. Planning, Conducting, and Documenting Data Analysis for Program Improvement

    Science.gov (United States)

    Winer, Abby; Taylor, Cornelia; Derrington, Taletha; Lucas, Anne

    2015-01-01

    This 2015 document was developed to help technical assistance (TA) providers and state staff define and limit the scope of data analysis for program improvement efforts, including the State Systemic Improvement Plan (SSIP); develop a plan for data analysis; document alternative hypotheses and additional analyses as they are generated; and…

  17. GenePublisher: automated analysis of DNA microarray data

    DEFF Research Database (Denmark)

    Knudsen, Steen; Workman, Christopher; Sicheritz-Ponten, T.

    2003-01-01

    GenePublisher, a system for automatic analysis of data from DNA microarray experiments, has been implemented with a web interface at http://www.cbs.dtu.dk/services/GenePublisher. Raw data are uploaded to the server together with aspecification of the data. The server performs normalization...

  18. Advances in Mössbauer data analysis

    Science.gov (United States)

    de Souza, Paulo A.

    1998-08-01

    The whole Mössbauer community generates a huge amount of data in several fields of human knowledge since the first publication of Rudolf Mössbauer. Interlaboratory measurements of the same substance may result in minor differences in the Mössbauer Parameters (MP) of isomer shift, quadrupole splitting and internal magnetic field. Therefore, a conventional data bank of published MP will be of limited help in identification of substances. Data bank search for exact information became incapable to differentiate the values of Mössbauer parameters within the experimental errors (e.g., IS = 0.22 mm/s from IS = 0.23 mm/s), but physically both values may be considered the same. An artificial neural network (ANN) is able to identify a substance and its crystalline structure from measured MP, and its slight variations do not represent an obstacle for the ANN identification. A barrier to the popularization of Mössbauer spectroscopy as an analytical technique is the absence of a full automated equipment, since the analysis of a Mössbauer spectrum normally is time-consuming and requires a specialist. In this work, the fitting process of a Mössbauer spectrum was completely automated through the use of genetic algorithms and fuzzy logic. Both software and hardware systems were implemented turning out to be a fully automated Mössbauer data analysis system. The developed system will be presented.

  19. Neural network for automatic analysis of motility data

    DEFF Research Database (Denmark)

    Jakobsen, Erik; Kruse-Andersen, S; Kolberg, Jens Godsk

    1994-01-01

    comparable. However, the neural network recognized pressure peaks clearly generated by muscular activity that had escaped detection by the conventional program. In conclusion, we believe that neurocomputing has potential advantages for automatic analysis of gastrointestinal motility data.......Continuous recording of intraluminal pressures for extended periods of time is currently regarded as a valuable method for detection of esophageal motor abnormalities. A subsequent automatic analysis of the resulting motility data relies on strict mathematical criteria for recognition of pressure...

  20. Representative Sampling for reliable data analysis

    DEFF Research Database (Denmark)

    Petersen, Lars; Esbensen, Kim Harry

    2005-01-01

    regime in order to secure the necessary reliability of: samples (which must be representative, from the primary sampling onwards), analysis (which will not mean anything outside the miniscule analytical volume without representativity ruling all mass reductions involved, also in the laboratory) and data...

  1. Sensitivity analysis of the nuclear data for MYRRHA reactor modelling

    International Nuclear Information System (INIS)

    Stankovskiy, Alexey; Van den Eynde, Gert; Cabellos, Oscar; Diez, Carlos J.; Schillebeeckx, Peter; Heyse, Jan

    2014-01-01

    A global sensitivity analysis of effective neutron multiplication factor k eff to the change of nuclear data library revealed that JEFF-3.2T2 neutron-induced evaluated data library produces closer results to ENDF/B-VII.1 than does JEFF-3.1.2. The analysis of contributions of individual evaluations into k eff sensitivity allowed establishing the priority list of nuclides for which uncertainties on nuclear data must be improved. Detailed sensitivity analysis has been performed for two nuclides from this list, 56 Fe and 238 Pu. The analysis was based on a detailed survey of the evaluations and experimental data. To track the origin of the differences in the evaluations and their impact on k eff , the reaction cross-sections and multiplicities in one evaluation have been substituted by the corresponding data from other evaluations. (authors)

  2. Analysis of data as information: quality assurance approach.

    Science.gov (United States)

    Ivankovic, D; Kern, J; Bartolic, A; Vuletic, S

    1993-01-01

    Describes a prototype module for data analysis of the healthcare delivery system. It consists of three main parts: data/variable selection; algorithms for the analysis of quantitative and qualitative changes in the system; and interpretation and explanation of the results. Such a module designed for primary health care has been installed on a PC in the health manager's office. Data enter the information system through the standard DBMS procedures, followed by calculating a number of different indicators and the time series, as the ordered sequences of indicators, according to demands of the manager. The last procedure is "the change analysis" with estimation of unexpected differences between and within some units, e.g. health-care teams, as well as some unexpected variabilities and trends. As an example, presents and discusses the diagnostic pattern of neurotic cases, referral patterns and preventive behaviour of GP's teams as well.

  3. Explorative data analysis of two-dimensional electrophoresis gels

    DEFF Research Database (Denmark)

    Schultz, J.; Gottlieb, D.M.; Petersen, Marianne Kjerstine

    2004-01-01

    of gels is presented. First, an approach is demonstrated in which no prior knowledge of the separated proteins is used. Alignment of the gels followed by a simple transformation of data makes it possible to analyze the gels in an automated explorative manner by principal component analysis, to determine......Methods for classification of two-dimensional (2-DE) electrophoresis gels based on multivariate data analysis are demonstrated. Two-dimensional gels of ten wheat varieties are analyzed and it is demonstrated how to classify the wheat varieties in two qualities and a method for initial screening...... if the gels should be further analyzed. A more detailed approach is done by analyzing spot volume lists by principal components analysis and partial least square regression. The use of spot volume data offers a mean to investigate the spot pattern and link the classified protein patterns to distinct spots...

  4. In situ visualization and data analysis for turbidity currents simulation

    Science.gov (United States)

    Camata, Jose J.; Silva, Vítor; Valduriez, Patrick; Mattoso, Marta; Coutinho, Alvaro L. G. A.

    2018-01-01

    Turbidity currents are underflows responsible for sediment deposits that generate geological formations of interest for the oil and gas industry. LibMesh-sedimentation is an application built upon the libMesh library to simulate turbidity currents. In this work, we present the integration of libMesh-sedimentation with in situ visualization and in transit data analysis tools. DfAnalyzer is a solution based on provenance data to extract and relate strategic simulation data in transit from multiple data for online queries. We integrate libMesh-sedimentation and ParaView Catalyst to perform in situ data analysis and visualization. We present a parallel performance analysis for two turbidity currents simulations showing that the overhead for both in situ visualization and in transit data analysis is negligible. We show that our tools enable monitoring the sediments appearance at runtime and steer the simulation based on the solver convergence and visual information on the sediment deposits, thus enhancing the analytical power of turbidity currents simulations.

  5. Delve: A Data Set Retrieval and Document Analysis System

    KAUST Repository

    Akujuobi, Uchenna Thankgod

    2017-12-29

    Academic search engines (e.g., Google scholar or Microsoft academic) provide a medium for retrieving various information on scholarly documents. However, most of these popular scholarly search engines overlook the area of data set retrieval, which should provide information on relevant data sets used for academic research. Due to the increasing volume of publications, it has become a challenging task to locate suitable data sets on a particular research area for benchmarking or evaluations. We propose Delve, a web-based system for data set retrieval and document analysis. This system is different from other scholarly search engines as it provides a medium for both data set retrieval and real time visual exploration and analysis of data sets and documents.

  6. BASE - 2nd generation software for microarray data management and analysis

    Directory of Open Access Journals (Sweden)

    Nordborg Nicklas

    2009-10-01

    Full Text Available Abstract Background Microarray experiments are increasing in size and samples are collected asynchronously over long time. Available data are re-analysed as more samples are hybridized. Systematic use of collected data requires tracking of biomaterials, array information, raw data, and assembly of annotations. To meet the information tracking and data analysis challenges in microarray experiments we reimplemented and improved BASE version 1.2. Results The new BASE presented in this report is a comprehensive annotable local microarray data repository and analysis application providing researchers with an efficient information management and analysis tool. The information management system tracks all material from biosource, via sample and through extraction and labelling to raw data and analysis. All items in BASE can be annotated and the annotations can be used as experimental factors in downstream analysis. BASE stores all microarray experiment related data regardless if analysis tools for specific techniques or data formats are readily available. The BASE team is committed to continue improving and extending BASE to make it usable for even more experimental setups and techniques, and we encourage other groups to target their specific needs leveraging on the infrastructure provided by BASE. Conclusion BASE is a comprehensive management application for information, data, and analysis of microarray experiments, available as free open source software at http://base.thep.lu.se under the terms of the GPLv3 license.

  7. BASE--2nd generation software for microarray data management and analysis.

    Science.gov (United States)

    Vallon-Christersson, Johan; Nordborg, Nicklas; Svensson, Martin; Häkkinen, Jari

    2009-10-12

    Microarray experiments are increasing in size and samples are collected asynchronously over long time. Available data are re-analysed as more samples are hybridized. Systematic use of collected data requires tracking of biomaterials, array information, raw data, and assembly of annotations. To meet the information tracking and data analysis challenges in microarray experiments we reimplemented and improved BASE version 1.2. The new BASE presented in this report is a comprehensive annotable local microarray data repository and analysis application providing researchers with an efficient information management and analysis tool. The information management system tracks all material from biosource, via sample and through extraction and labelling to raw data and analysis. All items in BASE can be annotated and the annotations can be used as experimental factors in downstream analysis. BASE stores all microarray experiment related data regardless if analysis tools for specific techniques or data formats are readily available. The BASE team is committed to continue improving and extending BASE to make it usable for even more experimental setups and techniques, and we encourage other groups to target their specific needs leveraging on the infrastructure provided by BASE. BASE is a comprehensive management application for information, data, and analysis of microarray experiments, available as free open source software at http://base.thep.lu.se under the terms of the GPLv3 license.

  8. ACCURACY ANALYSIS OF KINECT DEPTH DATA

    Directory of Open Access Journals (Sweden)

    K. Khoshelham

    2012-09-01

    Full Text Available This paper presents an investigation of the geometric quality of depth data obtained by the Kinect sensor. Based on the mathematical model of depth measurement by the sensor a theoretical error analysis is presented, which provides an insight into the factors influencing the accuracy of the data. Experimental results show that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimetres up to about 4 cm at the maximum range of the sensor. The accuracy of the data is also found to be influenced by the low resolution of the depth measurements.

  9. Statistical analysis and interpolation of compositional data in materials science.

    Science.gov (United States)

    Pesenson, Misha Z; Suram, Santosh K; Gregoire, John M

    2015-02-09

    Compositional data are ubiquitous in chemistry and materials science: analysis of elements in multicomponent systems, combinatorial problems, etc., lead to data that are non-negative and sum to a constant (for example, atomic concentrations). The constant sum constraint restricts the sampling space to a simplex instead of the usual Euclidean space. Since statistical measures such as mean and standard deviation are defined for the Euclidean space, traditional correlation studies, multivariate analysis, and hypothesis testing may lead to erroneous dependencies and incorrect inferences when applied to compositional data. Furthermore, composition measurements that are used for data analytics may not include all of the elements contained in the material; that is, the measurements may be subcompositions of a higher-dimensional parent composition. Physically meaningful statistical analysis must yield results that are invariant under the number of composition elements, requiring the application of specialized statistical tools. We present specifics and subtleties of compositional data processing through discussion of illustrative examples. We introduce basic concepts, terminology, and methods required for the analysis of compositional data and utilize them for the spatial interpolation of composition in a sputtered thin film. The results demonstrate the importance of this mathematical framework for compositional data analysis (CDA) in the fields of materials science and chemistry.

  10. iTemplate: A template-based eye movement data analysis approach.

    Science.gov (United States)

    Xiao, Naiqi G; Lee, Kang

    2018-02-08

    Current eye movement data analysis methods rely on defining areas of interest (AOIs). Due to the fact that AOIs are created and modified manually, variances in their size, shape, and location are unavoidable. These variances affect not only the consistency of the AOI definitions, but also the validity of the eye movement analyses based on the AOIs. To reduce the variances in AOI creation and modification and achieve a procedure to process eye movement data with high precision and efficiency, we propose a template-based eye movement data analysis method. Using a linear transformation algorithm, this method registers the eye movement data from each individual stimulus to a template. Thus, users only need to create one set of AOIs for the template in order to analyze eye movement data, rather than creating a unique set of AOIs for all individual stimuli. This change greatly reduces the error caused by the variance from manually created AOIs and boosts the efficiency of the data analysis. Furthermore, this method can help researchers prepare eye movement data for some advanced analysis approaches, such as iMap. We have developed software (iTemplate) with a graphic user interface to make this analysis method available to researchers.

  11. Application of pattern mixture models to address missing data in longitudinal data analysis using SPSS.

    Science.gov (United States)

    Son, Heesook; Friedmann, Erika; Thomas, Sue A

    2012-01-01

    Longitudinal studies are used in nursing research to examine changes over time in health indicators. Traditional approaches to longitudinal analysis of means, such as analysis of variance with repeated measures, are limited to analyzing complete cases. This limitation can lead to biased results due to withdrawal or data omission bias or to imputation of missing data, which can lead to bias toward the null if data are not missing completely at random. Pattern mixture models are useful to evaluate the informativeness of missing data and to adjust linear mixed model (LMM) analyses if missing data are informative. The aim of this study was to provide an example of statistical procedures for applying a pattern mixture model to evaluate the informativeness of missing data and conduct analyses of data with informative missingness in longitudinal studies using SPSS. The data set from the Patients' and Families' Psychological Response to Home Automated External Defibrillator Trial was used as an example to examine informativeness of missing data with pattern mixture models and to use a missing data pattern in analysis of longitudinal data. Prevention of withdrawal bias, omitted data bias, and bias toward the null in longitudinal LMMs requires the assessment of the informativeness of the occurrence of missing data. Missing data patterns can be incorporated as fixed effects into LMMs to evaluate the contribution of the presence of informative missingness to and control for the effects of missingness on outcomes. Pattern mixture models are a useful method to address the presence and effect of informative missingness in longitudinal studies.

  12. Analysis of longitudinal data from animals with missing values using SPSS.

    Science.gov (United States)

    Duricki, Denise A; Soleman, Sara; Moon, Lawrence D F

    2016-06-01

    Testing of therapies for disease or injury often involves the analysis of longitudinal data from animals. Modern analytical methods have advantages over conventional methods (particularly when some data are missing), yet they are not used widely by preclinical researchers. Here we provide an easy-to-use protocol for the analysis of longitudinal data from animals, and we present a click-by-click guide for performing suitable analyses using the statistical package IBM SPSS Statistics software (SPSS). We guide readers through the analysis of a real-life data set obtained when testing a therapy for brain injury (stroke) in elderly rats. If a few data points are missing, as in this example data set (for example, because of animal dropout), repeated-measures analysis of covariance may fail to detect a treatment effect. An alternative analysis method, such as the use of linear models (with various covariance structures), and analysis using restricted maximum likelihood estimation (to include all available data) can be used to better detect treatment effects. This protocol takes 2 h to carry out.

  13. Analysis of gravity data using trend surfaces

    Science.gov (United States)

    Asimopolos, Natalia-Silvia; Asimopolos, Laurentiu

    2013-04-01

    In this paper we have developed algorithms and related software programs for calculating of trend surfaces of higher order. These methods of analysis of trends, like mobile media applications are filtration systems for geophysical data in surface. In particular we presented few case studies for gravity data and gravity maps. Analysis with polynomial trend surfaces contributes to the recognition, isolation and measurement of trends that can be represented by surfaces or hyper-surfaces (in several sizes), thus achieving a separation in regional variations and local variations. This separation is achieved by adjusting the trend function at different values. Trend surfaces using the regression analysis satisfy the criterion of least squares. The difference between the surface of trend and the observed value in a certain point is the residual value. Residual sum of squares of these values should be minimal as the criterion of least squares. The trend surface is considered as regional or large-scale and the residual value will be regarded as local or small-scale component. Removing the regional trend has the effect of highlighting local components represented by residual values. Surface analysis and hyper-surfaces principles are applied to the surface trend and any number of dimensions. For hyper-surfaces we can work with polynomial functions with four or more variables (three variables of space and other variables for interest parameters) that have great importance in some applications. In the paper we presented the mathematical developments about generalized trend surfaces and case studies about gravimetric data. The trend surfaces have the great advantage that the effect of regional anomalies can be expressed as analytic functions. These tendency surfaces allows subsequent mathematical processing and interesting generalizations, with great advantage to work with polynomial functions compared with the original discrete data. For gravity data we estimate the depth of

  14. Language-Agnostic Reproducible Data Analysis Using Literate Programming.

    Science.gov (United States)

    Vassilev, Boris; Louhimo, Riku; Ikonen, Elina; Hautaniemi, Sampsa

    2016-01-01

    A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs to human readers. The code is rearranged to follow the logic of the program, and to explain that logic in a natural language. The code executed by the computer is extracted from the literate source code. As such, literate programming is an ideal formalism for systematizing analysis steps in biomedical research. We have developed the reproducible computing tool Lir (literate, reproducible computing) that allows a tool-agnostic approach to biomedical data analysis. We demonstrate the utility of Lir by applying it to a case study. Our aim was to investigate the role of endosomal trafficking regulators to the progression of breast cancer. In this analysis, a variety of tools were combined to interpret the available data: a relational database, standard command-line tools, and a statistical computing environment. The analysis revealed that the lipid transport related genes LAPTM4B and NDRG1 are coamplified in breast cancer patients, and identified genes potentially cooperating with LAPTM4B in breast cancer progression. Our case study demonstrates that with Lir, an array of tools can be combined in the same data analysis to improve efficiency, reproducibility, and ease of understanding. Lir is an open-source software available at github.com/borisvassilev/lir.

  15. Validation of Fourier analysis of videokeratographic data.

    Science.gov (United States)

    Sideroudi, Haris; Labiris, Georgios; Ditzel, Fienke; Tsaragli, Efi; Georgatzoglou, Kimonas; Siganos, Haralampos; Kozobolis, Vassilios

    2017-06-15

    The aim was to assess the repeatability of Fourier transfom analysis of videokeratographic data using Pentacam in normal (CG), keratoconic (KC) and post-CXL (CXL) corneas. This was a prospective, clinic-based, observational study. One randomly selected eye from all study participants was included in the analysis: 62 normal eyes (CG group), 33 keratoconus eyes (KC group), while 34 eyes, which had already received CXL treatment, formed the CXL group. Fourier analysis of keratometric data were obtained using Pentacam, by two different operators within each of two sessions. Precision, repeatability and Intraclass Correlation Coefficient (ICC), were calculated for evaluating intrassesion and intersession repeatability for the following parameters: Spherical Component (SphRmin, SphEcc), Maximum Decentration (Max Dec), Regular Astigmatism, and Irregularitiy (Irr). Bland-Altman analysis was used for assessing interobserver repeatability. All parameters were presented to be repeatable, reliable and reproductible in all groups. Best intrasession and intersession repeatability and reliability were detected for parameters SphRmin, SphEcc and Max Dec parameters for both operators using ICC (intrasession: ICC > 98%, intersession: ICC > 94.7%) and within subject standard deviation. Best precision and lowest range of agreement was found for the SphRmin parameter (CG: 0.05, KC: 0.16, and CXL: 0.2) in all groups, while the lowest repeatability, reliability and reproducibility was detected for the Irr parameter. The Pentacam system provides accurate measurements of Fourier tranform keratometric data. A single Pentacam scan will be sufficient for most clinical applications.

  16. Information Retrieval Using Hadoop Big Data Analysis

    Science.gov (United States)

    Motwani, Deepak; Madan, Madan Lal

    This paper concern on big data analysis which is the cognitive operation of probing huge amounts of information in an attempt to get uncovers unseen patterns. Through Big Data Analytics Applications such as public and private organization sectors have formed a strategic determination to turn big data into cut throat benefit. The primary occupation of extracting value from big data give rise to a process applied to pull information from multiple different sources; this process is known as extract transforms and lode. This paper approach extract information from log files and Research Paper, awareness reduces the efforts for blueprint finding and summarization of document from several positions. The work is able to understand better Hadoop basic concept and increase the user experience for research. In this paper, we propose an approach for analysis log files for finding concise information which is useful and time saving by using Hadoop. Our proposed approach will be applied on different research papers on a specific domain and applied for getting summarized content for further improvement and make the new content.

  17. Multi-source Geospatial Data Analysis with Google Earth Engine

    Science.gov (United States)

    Erickson, T.

    2014-12-01

    The Google Earth Engine platform is a cloud computing environment for data analysis that combines a public data catalog with a large-scale computational facility optimized for parallel processing of geospatial data. The data catalog is a multi-petabyte archive of georeferenced datasets that include images from Earth observing satellite and airborne sensors (examples: USGS Landsat, NASA MODIS, USDA NAIP), weather and climate datasets, and digital elevation models. Earth Engine supports both a just-in-time computation model that enables real-time preview and debugging during algorithm development for open-ended data exploration, and a batch computation mode for applying algorithms over large spatial and temporal extents. The platform automatically handles many traditionally-onerous data management tasks, such as data format conversion, reprojection, and resampling, which facilitates writing algorithms that combine data from multiple sensors and/or models. Although the primary use of Earth Engine, to date, has been the analysis of large Earth observing satellite datasets, the computational platform is generally applicable to a wide variety of use cases that require large-scale geospatial data analyses. This presentation will focus on how Earth Engine facilitates the analysis of geospatial data streams that originate from multiple separate sources (and often communities) and how it enables collaboration during algorithm development and data exploration. The talk will highlight current projects/analyses that are enabled by this functionality.https://earthengine.google.org

  18. Exploratory Climate Data Visualization and Analysis Using DV3D and UVCDAT

    Science.gov (United States)

    Maxwell, Thomas

    2012-01-01

    Earth system scientists are being inundated by an explosion of data generated by ever-increasing resolution in both global models and remote sensors. Advanced tools for accessing, analyzing, and visualizing very large and complex climate data are required to maintain rapid progress in Earth system research. To meet this need, NASA, in collaboration with the Ultra-scale Visualization Climate Data Analysis Tools (UVCOAT) consortium, is developing exploratory climate data analysis and visualization tools which provide data analysis capabilities for the Earth System Grid (ESG). This paper describes DV3D, a UV-COAT package that enables exploratory analysis of climate simulation and observation datasets. OV3D provides user-friendly interfaces for visualization and analysis of climate data at a level appropriate for scientists. It features workflow inte rfaces, interactive 40 data exploration, hyperwall and stereo visualization, automated provenance generation, and parallel task execution. DV30's integration with CDAT's climate data management system (COMS) and other climate data analysis tools provides a wide range of high performance climate data analysis operations. DV3D expands the scientists' toolbox by incorporating a suite of rich new exploratory visualization and analysis methods for addressing the complexity of climate datasets.

  19. Rasch analysis on OSCE data : An illustrative example.

    Science.gov (United States)

    Tor, E; Steketee, C

    2011-01-01

    The Objective Structured Clinical Examination (OSCE) is a widely used tool for the assessment of clinical competence in health professional education. The goal of the OSCE is to make reproducible decisions on pass/fail status as well as students' levels of clinical competence according to their demonstrated abilities based on the scores. This paper explores the use of the polytomous Rasch model in evaluating the psychometric properties of OSCE scores through a case study. The authors analysed an OSCE data set (comprised of 11 stations) for 80 fourth year medical students based on the polytomous Rasch model in an effort to answer two research questions: (1) Do the clinical tasks assessed in the 11 OSCE stations map on to a common underlying construct, namely clinical competence? (2) What other insights can Rasch analysis offer in terms of scaling, item analysis and instrument validation over and above the conventional analysis based on classical test theory? The OSCE data set has demonstrated a sufficient degree of fit to the Rasch model (Χ(2) = 17.060, DF=22, p=0.76) indicating that the 11 OSCE station scores have sufficient psychometric properties to form a measure for a common underlying construct, i.e. clinical competence. Individual OSCE station scores with good fit to the Rasch model (p > 0.1 for all Χ(2) statistics) further corroborated the characteristic of unidimensionality of the OSCE scale for clinical competence. A Person Separation Index (PSI) of 0.704 indicates sufficient level of reliability for the OSCE scores. Other useful findings from the Rasch analysis that provide insights, over and above the analysis based on classical test theory, are also exemplified using the data set. The polytomous Rasch model provides a useful and supplementary approach to the calibration and analysis of OSCE examination data.

  20. Fundamentals of quantitative PET data analysis

    NARCIS (Netherlands)

    Willemsen, ATM; van den Hoff, J

    2002-01-01

    Drug analysis and development with PET should fully exhaust the ability of this tomographic technique to quantify regional tracer concentrations in vivo. Data evaluation based on visual inspection or assessment of regional image contrast is not sufficient for this purpose since much of the

  1. PyMICE: APython library for analysis of IntelliCage data.

    Science.gov (United States)

    Dzik, Jakub M; Puścian, Alicja; Mijakowska, Zofia; Radwanska, Kasia; Łęski, Szymon

    2018-04-01

    IntelliCage is an automated system for recording the behavior of a group of mice housed together. It produces rich, detailed behavioral data calling for new methods and software for their analysis. Here we present PyMICE, a free and open-source library for analysis of IntelliCage data in the Python programming language. We describe the design and demonstrate the use of the library through a series of examples. PyMICE provides easy and intuitive access to IntelliCage data, and thus facilitates the possibility of using numerous other Python scientific libraries to form a complete data analysis workflow.

  2. A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma.

    Science.gov (United States)

    Stripelis, Dimitris; Ambite, José Luis; Chiang, Yao-Yi; Eckel, Sandrah P; Habre, Rima

    2017-04-01

    According to the Centers for Disease Control, in the United States there are 6.8 million children living with asthma. Despite the importance of the disease, the available prognostic tools are not sufficient for biomedical researchers to thoroughly investigate the potential risks of the disease at scale. To overcome these challenges we present a big data integration and analysis infrastructure developed by our Data and Software Coordination and Integration Center (DSCIC) of the NIBIB-funded Pediatric Research using Integrated Sensor Monitoring Systems (PRISMS) program. Our goal is to help biomedical researchers to efficiently predict and prevent asthma attacks. The PRISMS-DSCIC is responsible for collecting, integrating, storing, and analyzing real-time environmental, physiological and behavioral data obtained from heterogeneous sensor and traditional data sources. Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS. A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized schema. The system provides both batch and stream analytic capabilities over the massive data generated by wearable and fixed sensors.

  3. UPDG: Utilities package for data analysis of Pooled DNA GWAS

    Directory of Open Access Journals (Sweden)

    Ho Daniel WH

    2012-01-01

    Full Text Available Abstract Background Despite being a well-established strategy for cost reduction in disease gene mapping, pooled DNA association study is much less popular than the individual DNA approach. This situation is especially true for pooled DNA genomewide association study (GWAS, for which very few computer resources have been developed for its data analysis. This motivates the development of UPDG (Utilities package for data analysis of Pooled DNA GWAS. Results UPDG represents a generalized framework for data analysis of pooled DNA GWAS with the integration of Unix/Linux shell operations, Perl programs and R scripts. With the input of raw intensity data from GWAS, UPDG performs the following tasks in a stepwise manner: raw data manipulation, correction for allelic preferential amplification, normalization, nested analysis of variance for genetic association testing, and summarization of analysis results. Detailed instructions, procedures and commands are provided in the comprehensive user manual describing the whole process from preliminary preparation of software installation to final outcome acquisition. An example dataset (input files and sample output files is also included in the package so that users can easily familiarize themselves with the data file formats, working procedures and expected output. Therefore, UPDG is especially useful for users with some computer knowledge, but without a sophisticated programming background. Conclusions UPDG provides a free, simple and platform-independent one-stop service to scientists working on pooled DNA GWAS data analysis, but with less advanced programming knowledge. It is our vision and mission to reduce the hindrance for performing data analysis of pooled DNA GWAS through our contribution of UPDG. More importantly, we hope to promote the popularity of pooled DNA GWAS, which is a very useful research strategy.

  4. The Statistical Analysis of Failure Time Data

    CERN Document Server

    Kalbfleisch, John D

    2011-01-01

    Contains additional discussion and examples on left truncation as well as material on more general censoring and truncation patterns.Introduces the martingale and counting process formulation swil lbe in a new chapter.Develops multivariate failure time data in a separate chapter and extends the material on Markov and semi Markov formulations.Presents new examples and applications of data analysis.

  5. Data and statistical methods for analysis of trends and patterns

    International Nuclear Information System (INIS)

    Atwood, C.L.; Gentillon, C.D.; Wilson, G.E.

    1992-11-01

    This report summarizes topics considered at a working meeting on data and statistical methods for analysis of trends and patterns in US commercial nuclear power plants. This meeting was sponsored by the Office of Analysis and Evaluation of Operational Data (AEOD) of the Nuclear Regulatory Commission (NRC). Three data sets are briefly described: Nuclear Plant Reliability Data System (NPRDS), Licensee Event Report (LER) data, and Performance Indicator data. Two types of study are emphasized: screening studies, to see if any trends or patterns appear to be present; and detailed studies, which are more concerned with checking the analysis assumptions, modeling any patterns that are present, and searching for causes. A prescription is given for a screening study, and ideas are suggested for a detailed study, when the data take of any of three forms: counts of events per time, counts of events per demand, and non-event data

  6. PHIDIAS- Pathogen Host Interaction Data Integration and Analysis

    Indian Academy of Sciences (India)

    PHIDIAS- Pathogen Host Interaction Data Integration and Analysis- allows searching of integrated genome sequences, conserved domains and gene expressions data related to pathogen host interactions in high priority agents for public health and security ...

  7. Intermittency analysis of correlated data

    International Nuclear Information System (INIS)

    Wosiek, B.

    1992-01-01

    We describe the method of the analysis of the dependence of the factorial moments on the bin size in which the correlations between the moments computed for different bin sizes are taken into account. For large multiplicity nucleus-nucleus data inclusion of the correlations does not change the values of the slope parameter, but gives errors significantly reduced as compared to the case of fits with no correlations. (author)

  8. Automated software analysis of nuclear core discharge data

    International Nuclear Information System (INIS)

    Larson, T.W.; Halbig, J.K.; Howell, J.A.; Eccleston, G.W.; Klosterbuer, S.F.

    1993-03-01

    Monitoring the fueling process of an on-load nuclear reactor is a full-time job for nuclear safeguarding agencies. Nuclear core discharge monitors (CDMS) can provide continuous, unattended recording of the reactor's fueling activity for later, qualitative review by a safeguards inspector. A quantitative analysis of this collected data could prove to be a great asset to inspectors because more information can be extracted from the data and the analysis time can be reduced considerably. This paper presents a prototype for an automated software analysis system capable of identifying when fuel bundle pushes occurred and monitoring the power level of the reactor. Neural network models were developed for calculating the region on the reactor face from which the fuel was discharged and predicting the burnup. These models were created and tested using actual data collected from a CDM system at an on-load reactor facility. Collectively, these automated quantitative analysis programs could help safeguarding agencies to gain a better perspective on the complete picture of the fueling activity of an on-load nuclear reactor. This type of system can provide a cost-effective solution for automated monitoring of on-load reactors significantly reducing time and effort

  9. A multilevel shape fit analysis of neutron transmission data

    International Nuclear Information System (INIS)

    Naguib, K.; Sallam, O.H.; Adib, M.

    1989-01-01

    A multilevel shape fit analysis of neutron transmission data is presented. A multilevel computer code SHAPE is used to analyse clean transmission data obtained from time-of-flight (TOF) measurements. The shape analysis deduces the parameters of the observed resonances in the energy region considered in the measurements. The shape code is based upon a least square fit of a multilevel Breit-Wigner formula and includes both instrumental resolution and Doppler broadenings. Operating the SHAPE code on a test example of a measured transmission data of 151 Eu, 153 Eu and natural Eu in the energy range 0.025-1 eV acquired a good result for the used technique of analysis. (author)

  10. Development of data analysis tool for combat system integration

    Directory of Open Access Journals (Sweden)

    Seung-Chun Shin

    2013-03-01

    Full Text Available System integration is an important element for the construction of naval combat ships. In particular, because impeccable combat system integration together with the sensors and weapons can ensure the combat capability and survivability of the ship, the integrated performance of the combat system should be verified and validated whether or not it fulfills the requirements of the end user. In order to conduct systematic verification and validation, a data analysis tool is requisite. This paper suggests the Data Extraction, Recording and Analysis Tool (DERAT for the data analysis of the integrated performance of the combat system, including the functional definition, architecture and effectiveness of the DERAT by presenting the test results.

  11. Data analysis treatment in the Juragua Nuclear Power Plant preoperational PSA

    International Nuclear Information System (INIS)

    Valhuerdi Debesa, C.

    1996-01-01

    Data Analysis is an important task within Probabilistic safety Assessment,. which usually determines the level of detail of the analysis, being the way to feed the PSA with the operational experience of the Nuclear Power Plant analysed. In this paper the role of the Data Analysis Task as part of the PSA process and the different kinds of data to be estimated are explained. A description is presented of the organization of the data Analysis in the Juragua NPP Preoperational PSA, the information sources and the criteria handled for the estimation of the different kinds of Data. The Generic Data Base adopted for equipment failures and the state of the generic data issue for VVER reactors and its prospects are also dealt with. The paper concludes with suggestions for the further development of Juragua NPP generic Data Base

  12. USAGE: a web-based approach towards the analysis of SAGE data. Serial Analysis of Gene Expression

    NARCIS (Netherlands)

    van Kampen, A. H.; van Schaik, B. D.; Pauws, E.; Michiels, E. M.; Ruijter, J. M.; Caron, H. N.; Versteeg, R.; Heisterkamp, S. H.; Leunissen, J. A.; Baas, F.; van der Mee, M.

    2000-01-01

    MOTIVATION: SAGE enables the determination of genome-wide mRNA expression profiles. A comprehensive analysis of SAGE data requires software, which integrates (statistical) data analysis methods with a database system. Furthermore, to facilitate data sharing between users, the application should

  13. Advances in research methods for information systems research data mining, data envelopment analysis, value focused thinking

    CERN Document Server

    Osei-Bryson, Kweku-Muata

    2013-01-01

    Advances in social science research methodologies and data analytic methods are changing the way research in information systems is conducted. New developments in statistical software technologies for data mining (DM) such as regression splines or decision tree induction can be used to assist researchers in systematic post-positivist theory testing and development. Established management science techniques like data envelopment analysis (DEA), and value focused thinking (VFT) can be used in combination with traditional statistical analysis and data mining techniques to more effectively explore

  14. Anaphe - OO Libraries and Tools for Data Analysis

    CERN Document Server

    Couet, O; Molnar, Z; Moscicki, J T; Pfeiffer, A; Sang, M

    2001-01-01

    The Anaphe project is an ongoing effort to provide an Object Oriented software environment for data analysis in HENP experiments. A range of commercial and public domain libraries is used to cover basic functionalities; on top of these libraries a set of HENP-specific C++ class libraries for histogram management, fitting, plotting and ntuple-like data analysis has been developed. In order to comply with the user requirements for a command-line driven tool, we have chosen to use a scripting language (Python) as the front-end for a data analysis tool. The loose coupling provided by the consequent use of (AIDA compliant) Abstract Interfaces for each component in combination with the use of shared libraries for their implementation provides an easy integration of existing libraries into modern scripting languages thus allowing for rapid application development. This integration is simplified even further using a specialised toolkit (SWIG) to create "shadow classes" for the Python language, which map the definitio...

  15. DMET-analyzer: automatic analysis of Affymetrix DMET data.

    Science.gov (United States)

    Guzzi, Pietro Hiram; Agapito, Giuseppe; Di Martino, Maria Teresa; Arbitrio, Mariamena; Tassone, Pierfrancesco; Tagliaferri, Pierosandro; Cannataro, Mario

    2012-10-05

    Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) on genes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding the analysis of

  16. Spaceborne Differential SAR Interferometry: Data Analysis Tools for Deformation Measurement

    Directory of Open Access Journals (Sweden)

    Michele Crosetto

    2011-02-01

    Full Text Available This paper is focused on spaceborne Differential Interferometric SAR (DInSAR for land deformation measurement and monitoring. In the last two decades several DInSAR data analysis procedures have been proposed. The objective of this paper is to describe the DInSAR data processing and analysis tools developed at the Institute of Geomatics in almost ten years of research activities. Four main DInSAR analysis procedures are described, which range from the standard DInSAR analysis based on a single interferogram to more advanced Persistent Scatterer Interferometry (PSI approaches. These different procedures guarantee a sufficient flexibility in DInSAR data processing. In order to provide a technical insight into these analysis procedures, a whole section discusses their main data processing and analysis steps, especially those needed in PSI analyses. A specific section is devoted to the core of our PSI analysis tools: the so-called 2+1D phase unwrapping procedure, which couples a 2D phase unwrapping, performed interferogram-wise, with a kind of 1D phase unwrapping along time, performed pixel-wise. In the last part of the paper, some examples of DInSAR results are discussed, which were derived by standard DInSAR or PSI analyses. Most of these results were derived from X-band SAR data coming from the TerraSAR-X and CosmoSkyMed sensors.

  17. Integrating Data Transformation in Principal Components Analysis

    KAUST Repository

    Maadooliat, Mehdi

    2015-01-02

    Principal component analysis (PCA) is a popular dimension reduction method to reduce the complexity and obtain the informative aspects of high-dimensional datasets. When the data distribution is skewed, data transformation is commonly used prior to applying PCA. Such transformation is usually obtained from previous studies, prior knowledge, or trial-and-error. In this work, we develop a model-based method that integrates data transformation in PCA and finds an appropriate data transformation using the maximum profile likelihood. Extensions of the method to handle functional data and missing values are also developed. Several numerical algorithms are provided for efficient computation. The proposed method is illustrated using simulated and real-world data examples.

  18. GBTIDL: Reduction and Analysis of GBT Spectral Line Data

    Science.gov (United States)

    Marganian, P.; Garwood, R. W.; Braatz, J. A.; Radziwill, N. M.; Maddalena, R. J.

    2013-03-01

    GBTIDL is an interactive package for reduction and analysis of spectral line data taken with the Robert C. Byrd Green Bank Telescope (GBT). The package, written entirely in IDL, consists of straightforward yet flexible calibration, averaging, and analysis procedures (the "GUIDE layer") modeled after the UniPOPS and CLASS data reduction philosophies, a customized plotter with many built-in visualization features, and Data I/O and toolbox functionality that can be used for more advanced tasks. GBTIDL makes use of data structures which can also be used to store intermediate results. The package consumes and produces data in GBT SDFITS format. GBTIDL can be run online and have access to the most recent data coming off the telescope, or can be run offline on preprocessed SDFITS files.

  19. Direct analysis of quantal radiation response data

    International Nuclear Information System (INIS)

    Thames, H.D. Jr.; Rozell, M.E.; Tucker, S.L.; Ang, K.K.; Travis, E.L.; Fisher, D.R.

    1986-01-01

    A direct analysis is proposed for quantal (all-or-nothing) responses to fractionated radiation and endpoint-dilution assays of cell survival. As opposed to two-step methods such as the reciprocal-dose technique, in which ED 50 values are first estimated for different fractionation schemes and then fit (as reciprocals) against dose per fraction, all raw data are included in a single maximum-likelihood treatment. The method accommodates variations such as short-interval fractionation regimens designed to determine tissue repair kinetics, tissue response to continuous exposures, and data obtained using endpoint-dilution assays of cell survival after fractionated doses. Monte-Carlo techniques were used to compare the direct and reciprocal-dose methods for analysis of small-scale and large-scale studies of response to fractionated doses. Both methods tended toward biased estimates in the analysis of small-scale (3 fraction numbers) studies. The α/β ratios showed less scatter when estimated by the direct method. The 95% confidence intervals determined by the direct method were more appropriate than those determined by reciprocal-dose analysis, for which 18% (small-scale study) or 8% (large-scale study) of the confidence intervals did not include the 'true' value of α/β. (author)

  20. Time Series Analysis of Insar Data: Methods and Trends

    Science.gov (United States)

    Osmanoglu, Batuhan; Sunar, Filiz; Wdowinski, Shimon; Cano-Cabral, Enrique

    2015-01-01

    Time series analysis of InSAR data has emerged as an important tool for monitoring and measuring the displacement of the Earth's surface. Changes in the Earth's surface can result from a wide range of phenomena such as earthquakes, volcanoes, landslides, variations in ground water levels, and changes in wetland water levels. Time series analysis is applied to interferometric phase measurements, which wrap around when the observed motion is larger than one-half of the radar wavelength. Thus, the spatio-temporal ''unwrapping" of phase observations is necessary to obtain physically meaningful results. Several different algorithms have been developed for time series analysis of InSAR data to solve for this ambiguity. These algorithms may employ different models for time series analysis, but they all generate a first-order deformation rate, which can be compared to each other. However, there is no single algorithm that can provide optimal results in all cases. Since time series analyses of InSAR data are used in a variety of applications with different characteristics, each algorithm possesses inherently unique strengths and weaknesses. In this review article, following a brief overview of InSAR technology, we discuss several algorithms developed for time series analysis of InSAR data using an example set of results for measuring subsidence rates in Mexico City.

  1. The Review of Visual Analysis Methods of Multi-modal Spatio-temporal Big Data

    Directory of Open Access Journals (Sweden)

    ZHU Qing

    2017-10-01

    Full Text Available The visual analysis of spatio-temporal big data is not only the state-of-art research direction of both big data analysis and data visualization, but also the core module of pan-spatial information system. This paper reviews existing visual analysis methods at three levels:descriptive visual analysis, explanatory visual analysis and exploratory visual analysis, focusing on spatio-temporal big data's characteristics of multi-source, multi-granularity, multi-modal and complex association.The technical difficulties and development tendencies of multi-modal feature selection, innovative human-computer interaction analysis and exploratory visual reasoning in the visual analysis of spatio-temporal big data were discussed. Research shows that the study of descriptive visual analysis for data visualizationis is relatively mature.The explanatory visual analysis has become the focus of the big data analysis, which is mainly based on interactive data mining in a visual environment to diagnose implicit reason of problem. And the exploratory visual analysis method needs a major break-through.

  2. EPICS V4 expands support to physics application, data acquisition, and data analysis

    International Nuclear Information System (INIS)

    Dalesio, L.; Carcassi, G.; Kraimer, M.R.; Malitsky, N.; Shen, G.; Davidsaver, M.; Lange, R.; Sekoranja, M.; Rowland, J.; White, G.; Korhonen, T.

    2012-01-01

    EPICS version 4 extends the functionality of version 3 by providing the ability to define, transport, and introspect composite data types. Version 3 provided a set of process variables and a data protocol that adequately defined scalar data along with an atomic set of attributes. While remaining backward compatible, Version 4 is able to easily expand this set with a data protocol capable of exchanging complex data types and parameterized data requests. Additionally, a group of engineers defined reference types for some applications in this environment. The goal of this work is to define a narrow interface with the minimal set of data types needed to support a distributed architecture for physics applications, data acquisition, and data analysis. (authors)

  3. Complex surveys analysis of categorical data

    CERN Document Server

    Mukhopadhyay, Parimal

    2016-01-01

    The primary objective of this book is to study some of the research topics in the area of analysis of complex surveys which have not been covered in any book yet. It discusses the analysis of categorical data using three models: a full model, a log-linear model and a logistic regression model. It is a valuable resource for survey statisticians and practitioners in the field of sociology, biology, economics, psychology and other areas who have to use these procedures in their day-to-day work. It is also useful for courses on sampling and complex surveys at the upper-undergraduate and graduate levels. The importance of sample surveys today cannot be overstated. From voters’ behaviour to fields such as industry, agriculture, economics, sociology, psychology, investigators generally resort to survey sampling to obtain an assessment of the behaviour of the population they are interested in. Many large-scale sample surveys collect data using complex survey designs like multistage stratified cluster designs. The o...

  4. Applying Authentic Data Analysis in Learning Earth Atmosphere

    Science.gov (United States)

    Johan, H.; Suhandi, A.; Samsudin, A.; Wulan, A. R.

    2017-09-01

    The aim of this research was to develop earth science learning material especially earth atmosphere supported by science research with authentic data analysis to enhance reasoning through. Various earth and space science phenomenon require reasoning. This research used experimental research with one group pre test-post test design. 23 pre-service physics teacher participated in this research. Essay test was conducted to get data about reason ability. Essay test was analyzed quantitatively. Observation sheet was used to capture phenomena during learning process. The results showed that student’s reasoning ability improved from unidentified and no reasoning to evidence based reasoning and inductive/deductive rule-based reasoning. Authentic data was considered using Grid Analysis Display System (GrADS). Visualization from GrADS facilitated students to correlate the concepts and bring out real condition of nature in classroom activity. It also helped student to reason the phenomena related to earth and space science concept. It can be concluded that applying authentic data analysis in learning process can help to enhance students reasoning. This study is expected to help lecture to bring out result of geoscience research in learning process and facilitate student understand concepts.

  5. Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)

    Energy Technology Data Exchange (ETDEWEB)

    Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Silva, Claudio [New York Univ. (NYU), NY (United States). Computer Science and Engineering Dept.

    2013-09-30

    For the past three years, a large analysis and visualization effort—funded by the Department of Energy’s Office of Biological and Environmental Research (BER), the National Aeronautics and Space Administration (NASA), and the National Oceanic and Atmospheric Administration (NOAA)—has brought together a wide variety of industry-standard scientific computing libraries and applications to create Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) to serve the global climate simulation and observational research communities. To support interactive analysis and visualization, all components connect through a provenance application–programming interface to capture meaningful history and workflow. Components can be loosely coupled into the framework for fast integration or tightly coupled for greater system functionality and communication with other components. The overarching goal of UV-CDAT is to provide a new paradigm for access to and analysis of massive, distributed scientific data collections by leveraging distributed data architectures located throughout the world. The UV-CDAT framework addresses challenges in analysis and visualization and incorporates new opportunities, including parallelism for better efficiency, higher speed, and more accurate scientific inferences. Today, it provides more than 600 users access to more analysis and visualization products than any other single source.

  6. Analysis of logging data from nuclear borehole tools

    International Nuclear Information System (INIS)

    Hovgaard, J.; Oelgaard, P.L.

    1989-12-01

    The processing procedure for logging data from a borehole of the Stenlille project of Dansk Naturgas A/S has been analysed. The tools considered in the analysis were an integral, natural-gamma tool, a neutron porosity tool, a gamma density tool and a caliper tool. It is believed that in most cases the processing procedure used by the logging company in the interpretation of the raw data is fully understood. An exception is the epithermal part of the neutron porosity tool where all data needed for an interpretation were not available. The analysis has shown that some parts of the interpretation procedure may not be consistent with the physical principle of the tools. (author)

  7. Applied data analysis and modeling for energy engineers and scientists

    CERN Document Server

    Reddy, T Agami

    2011-01-01

    ""Applied Data Analysis and Modeling for Energy Engineers and Scientists"" discusses mathematical models, data analysis, and decision analysis in modeling. The approach taken in this volume focuses on the modeling and analysis of thermal systems in an engineering environment, while also covering a number of other critical areas. Other material covered includes the tools that researchers and engineering professionals will need in order to explore different analysis methods, use critical assessment skills and reach sound engineering conclusions. The book also covers process and system design and

  8. Analysis of DCA experimental data

    International Nuclear Information System (INIS)

    Min, B. J.; Kim, S. Y.; Ryu, S. J.; Seok, H. C.

    2000-01-01

    The lattice characteristics of DCA are calculated with WIMS-ATR code to validate WIMS-AECL code for the lattice analysis of CANDU core by using experimental data of DCA at JNC. Analytical studies of some critical experiments had been performed to analyze the effects of fuel composition. Different items of reactor physics such as local power peaking factor (LPF), effective multiplication factor (Keff) and coolant void reactivity were calculated for two coolant void fractions (0% and 100%). LPFs calculated by WIMS-ATR code are in close agreement with the experimental results. LPFs calculated by WIMS-AECL code with WINFRITH and ENDF/B-V libraries have similar values for both libraries but the differences between experimental data and calculated results by WIMS-AECL code are larger than those of WIMS-ATR code. The maximum difference between the values calculated by WIMS-ATR and experimental values of LPFs are within 1.3%. The coupled code systems WIMS-ATR and CITATION used in this analysis predict Keff within 1% ΔK and coolant void reactivity within 4 % ΔK/K in all cases. The coolant void reactivity of uranium fuel is found to be positive. To validate WIMS-AECL code, the core characteristics of DCA shall be calculated by WIMS-AECL and CITATION codes in the future

  9. Full Life Cycle of Data Analysis with Climate Model Diagnostic Analyzer (CMDA)

    Science.gov (United States)

    Lee, S.; Zhai, C.; Pan, L.; Tang, B.; Zhang, J.; Bao, Q.; Malarout, N.

    2017-12-01

    We have developed a system that supports the full life cycle of a data analysis process, from data discovery, to data customization, to analysis, to reanalysis, to publication, and to reproduction. The system called Climate Model Diagnostic Analyzer (CMDA) is designed to demonstrate that the full life cycle of data analysis can be supported within one integrated system for climate model diagnostic evaluation with global observational and reanalysis datasets. CMDA has four subsystems that are highly integrated to support the analysis life cycle. Data System manages datasets used by CMDA analysis tools, Analysis System manages CMDA analysis tools which are all web services, Provenance System manages the meta data of CMDA datasets and the provenance of CMDA analysis history, and Recommendation System extracts knowledge from CMDA usage history and recommends datasets/analysis tools to users. These four subsystems are not only highly integrated but also easily expandable. New datasets can be easily added to Data System and scanned to be visible to the other subsystems. New analysis tools can be easily registered to be available in the Analysis System and Provenance System. With CMDA, a user can start a data analysis process by discovering datasets of relevance to their research topic using the Recommendation System. Next, the user can customize the discovered datasets for their scientific use (e.g. anomaly calculation, regridding, etc) with tools in the Analysis System. Next, the user can do their analysis with the tools (e.g. conditional sampling, time averaging, spatial averaging) in the Analysis System. Next, the user can reanalyze the datasets based on the previously stored analysis provenance in the Provenance System. Further, they can publish their analysis process and result to the Provenance System to share with other users. Finally, any user can reproduce the published analysis process and results. By supporting the full life cycle of climate data analysis

  10. Ulysses Data Analysis: Magnetic Topology of Heliospheric Structures

    Science.gov (United States)

    Crooker, Nancy

    2001-01-01

    In this final technical report on research funded by a NASA grant, a project overview is given by way of summaries on nine published papers. Research has included: 1) Using suprathermal electron data to study heliospheric magnetic structures; 2) Analysis of magnetic clouds, coronal mass ejections (CME), and the heliospheric current sheet (HCS); 3) Analysis of the corotating interaction region (CIR) which develop from interactions between solar wind streams of different velocities; 4) Use of Ulysses data in the interpretation of heliospheric events and phenomena.

  11. Safety analysis code input automation using the Nuclear Plant Data Bank

    International Nuclear Information System (INIS)

    Kopp, H.; Leung, J.; Tajbakhsh, A.; Viles, F.

    1985-01-01

    The Nuclear Plant Data Bank (NPDB) is a computer-based system that organizes a nuclear power plant's technical data, providing mechanisms for data storage, retrieval, and computer-aided engineering analysis. It has the specific objective to describe thermohydraulic systems in order to support: rapid information retrieval and display, and thermohydraulic analysis modeling. The Nuclear Plant Data Bank (NPBD) system fully automates the storage and analysis based on this data. The system combines the benefits of a structured data base system and computer-aided modeling with links to large scale codes for engineering analysis. Emphasis on a friendly and very graphically oriented user interface facilitates both initial use and longer term efficiency. Specific features are: organization and storage of thermohydraulic data items, ease in locating specific data items, graphical and tabular display capabilities, interactive model construction, organization and display of model input parameters, input deck construction for TRAC and RELAP analysis programs, and traceability of plant data, user model assumptions, and codes used in the input deck construction process. The major accomplishments of this past year were the development of a RELAP model generation capability and the development of a CRAY version of the code

  12. The Volatility of Data Space: Topology Oriented Sensitivity Analysis

    Science.gov (United States)

    Du, Jing; Ligmann-Zielinska, Arika

    2015-01-01

    Despite the difference among specific methods, existing Sensitivity Analysis (SA) technologies are all value-based, that is, the uncertainties in the model input and output are quantified as changes of values. This paradigm provides only limited insight into the nature of models and the modeled systems. In addition to the value of data, a potentially richer information about the model lies in the topological difference between pre-model data space and post-model data space. This paper introduces an innovative SA method called Topology Oriented Sensitivity Analysis, which defines sensitivity as the volatility of data space. It extends SA into a deeper level that lies in the topology of data. PMID:26368929

  13. LHCb Distributed Data Analysis on the Computing Grid

    CERN Document Server

    Paterson, S; Parkes, C

    2006-01-01

    LHCb is one of the four Large Hadron Collider (LHC) experiments based at CERN, the European Organisation for Nuclear Research. The LHC experiments will start taking an unprecedented amount of data when they come online in 2007. Since no single institute has the compute resources to handle this data, resources must be pooled to form the Grid. Where the Internet has made it possible to share information stored on computers across the world, Grid computing aims to provide access to computing power and storage capacity on geographically distributed systems. LHCb software applications must work seamlessly on the Grid allowing users to efficiently access distributed compute resources. It is essential to the success of the LHCb experiment that physicists can access data from the detector, stored in many heterogeneous systems, to perform distributed data analysis. This thesis describes the work performed to enable distributed data analysis for the LHCb experiment on the LHC Computing Grid.

  14. CIAO: CHANDRA/X-RAY DATA ANALYSIS FOR EVERYONE

    Science.gov (United States)

    McDowell, Jonathan; CIAO Team

    2018-01-01

    Eighteen years after the launch of Chandra, the archive is full of scientifically rich data and new observations continue. Improvements in recent years to the data analysis package CIAO (Chandra Interactive Analysis of Observations) and its extensive accompanying documentation make it easier for astronomers without a specialist background in high energy astrophysics to take advantage of this resource.The CXC supports hundreds of CIAO users around the world at all levels of training from high school and undergraduate students to the most experienced X-ray astronomers. In general, we strive to provide a software system which is easy for beginners, yet powerful for advanced users.Chandra data cover a range of instrument configurations and types of target (pointlike, extended and moving), requiring a flexible data analysis system. In addition to CIAO tools using the familiar FTOOLS/IRAF-style parameter interface, CIAO includes applications such as the Sherpa fitting engine which provide access to the data via Python scripting.In this poster we point prospective (and existing!) users to the high level Python scripts now provided to reprocess Chandra or other X-ray mission data, determine source fluxes and upper limits, and estimate backgrounds; and to the latest documentation including the CIAO Gallery, a new entry point featuring the system's different capabilities.This work has been supported by NASA under contract NAS 8-03060 to the Smithsonian Astrophysical Observatory for operation of the Chandra X-ray Center.

  15. Automated NMR relaxation dispersion data analysis using NESSY

    Directory of Open Access Journals (Sweden)

    Gooley Paul R

    2011-10-01

    Full Text Available Abstract Background Proteins are dynamic molecules with motions ranging from picoseconds to longer than seconds. Many protein functions, however, appear to occur on the micro to millisecond timescale and therefore there has been intense research of the importance of these motions in catalysis and molecular interactions. Nuclear Magnetic Resonance (NMR relaxation dispersion experiments are used to measure motion of discrete nuclei within the micro to millisecond timescale. Information about conformational/chemical exchange, populations of exchanging states and chemical shift differences are extracted from these experiments. To ensure these parameters are correctly extracted, accurate and careful analysis of these experiments is necessary. Results The software introduced in this article is designed for the automatic analysis of relaxation dispersion data and the extraction of the parameters mentioned above. It is written in Python for multi platform use and highest performance. Experimental data can be fitted to different models using the Levenberg-Marquardt minimization algorithm and different statistical tests can be used to select the best model. To demonstrate the functionality of this program, synthetic data as well as NMR data were analyzed. Analysis of these data including the generation of plots and color coded structures can be performed with minimal user intervention and using standard procedures that are included in the program. Conclusions NESSY is easy to use open source software to analyze NMR relaxation data. The robustness and standard procedures are demonstrated in this article.

  16. Integrated Data Collection Analysis (IDCA) Program - RDX Standard Data Set 2

    Energy Technology Data Exchange (ETDEWEB)

    Sandstrom, Mary M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brown, Geoffrey W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Preston, Daniel N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pollard, Colin J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Warner, Kirstin F. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Sorensen, Daniel N. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Remmers, Daniel L. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Phillips, Jason J. [Air Force Research Lab. (AFRL), Tyndall Air Force Base, FL (United States); Shelley, Timothy J. [Applied Research Associates, Tyndall Air Force Base, FL (United States); Reyes, Jose A. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Hsu, Peter C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Reynolds, John G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-02-20

    The Integrated Data Collection Analysis (IDCA) program is conducting a proficiency study for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard, from testing the second time in the Proficiency Test. This RDX testing (Set 2) compared to the first (Set 1) was found to have about the same impact sensitivity, have more BAM friction sensitivity, less ABL friction sensitivity, similar ESD sensitivity, and same DSC sensitivity.

  17. Doing bayesian data analysis a tutorial with R and BUGS

    CERN Document Server

    Kruschke, John K

    2011-01-01

    There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis, A Tutorial Introduction with R and BUGS provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. The text delivers comprehensive coverage of all

  18. A QCD analysis of ZEUS diffractive data

    Energy Technology Data Exchange (ETDEWEB)

    Chekanov, S.; Derrick, M.; Magill, S. [Argonne National Laboratory, Argonne, IL (US)] (and others)

    2009-11-15

    ZEUS inclusive diffractive cross-section measurements have been used in a DGLAP next-to-leading-order QCD analysis to extract the diffractive parton distribution functions. Data on diffractive dijet production in deep inelastic scattering have also been included to constrain the gluon density. Predictions based on the extracted parton densities are compared to diffractive charm and dijet photoproduction data. (orig.)

  19. A QCD analysis of ZEUS diffractive data

    International Nuclear Information System (INIS)

    Chekanov, S.; Derrick, M.; Magill, S.

    2009-11-01

    ZEUS inclusive diffractive cross-section measurements have been used in a DGLAP next-to-leading-order QCD analysis to extract the diffractive parton distribution functions. Data on diffractive dijet production in deep inelastic scattering have also been included to constrain the gluon density. Predictions based on the extracted parton densities are compared to diffractive charm and dijet photoproduction data. (orig.)

  20. AMIDST: Analysis of MassIve Data STreams

    DEFF Research Database (Denmark)

    Masegosa, Andres; Martinez, Ana Maria; Borchani, Hanen

    2015-01-01

    The Analysis of MassIve Data STreams (AMIDST) Java toolbox provides a collection of scalable and parallel algorithms for inference and learning of hybrid Bayesian networks from data streams. The toolbox, available at http://amidst.github.io/toolbox/ under the Apache Software License version 2.......0, also efficiently leverages existing functionalities and algorithms by interfacing to software tools such as HUGIN and MOA....