WorldWideScience

Sample records for handle large datasets

  1. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

    Science.gov (United States)

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED

  2. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  3. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  4. Distributed and parallel approach for handle and perform huge datasets

    Science.gov (United States)

    Konopko, Joanna

    2015-12-01

    Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.

  5. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  6. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  7. The Amateurs' Love Affair with Large Datasets

    Science.gov (United States)

    Price, Aaron; Jacoby, S. H.; Henden, A.

    2006-12-01

    Amateur astronomers are professionals in other areas. They bring expertise from such varied and technical careers as computer science, mathematics, engineering, and marketing. These skills, coupled with an enthusiasm for astronomy, can be used to help manage the large data sets coming online in the next decade. We will show specific examples where teams of amateurs have been involved in mining large, online data sets and have authored and published their own papers in peer-reviewed astronomical journals. Using the proposed LSST database as an example, we will outline a framework for involving amateurs in data analysis and education with large astronomical surveys.

  8. Large datasets: Segmentation, feature extraction, and compression

    Energy Technology Data Exchange (ETDEWEB)

    Downing, D.J.; Fedorov, V.; Lawkins, W.F.; Morris, M.D.; Ostrouchov, G.

    1996-07-01

    Large data sets with more than several mission multivariate observations (tens of megabytes or gigabytes of stored information) are difficult or impossible to analyze with traditional software. The amount of output which must be scanned quickly dilutes the ability of the investigator to confidently identify all the meaningful patterns and trends which may be present. The purpose of this project is to develop both a theoretical foundation and a collection of tools for automated feature extraction that can be easily customized to specific applications. Cluster analysis techniques are applied as a final step in the feature extraction process, which helps make data surveying simple and effective.

  9. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  10. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  11. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  12. A Bayesian spatio-temporal geostatistical model with an auxiliary lattice for large datasets

    KAUST Repository

    Xu, Ganggang

    2015-01-01

    When spatio-temporal datasets are large, the computational burden can lead to failures in the implementation of traditional geostatistical tools. In this paper, we propose a computationally efficient Bayesian hierarchical spatio-temporal model in which the spatial dependence is approximated by a Gaussian Markov random field (GMRF) while the temporal correlation is described using a vector autoregressive model. By introducing an auxiliary lattice on the spatial region of interest, the proposed method is not only able to handle irregularly spaced observations in the spatial domain, but it is also able to bypass the missing data problem in a spatio-temporal process. Because the computational complexity of the proposed Markov chain Monte Carlo algorithm is of the order O(n) with n the total number of observations in space and time, our method can be used to handle very large spatio-temporal datasets with reasonable CPU times. The performance of the proposed model is illustrated using simulation studies and a dataset of precipitation data from the coterminous United States.

  13. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  14. Orthology detection combining clustering and synteny for very large datasets.

    Science.gov (United States)

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  15. Orthology detection combining clustering and synteny for very large datasets.

    Directory of Open Access Journals (Sweden)

    Marcus Lechner

    Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  16. Large-scale matrix-handling subroutines 'ATLAS'

    International Nuclear Information System (INIS)

    Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

    1978-03-01

    Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)

  17. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  18. [Parallel virtual reality visualization of extreme large medical datasets].

    Science.gov (United States)

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  19. Final Progress Report for 'An Abstract Job Handling Grid Service for Dataset Analysis'

    International Nuclear Information System (INIS)

    David A Alexander

    2005-01-01

    For Phase I of the Job Handling project, Tech-X has built a Grid service for processing analysis requests, as well as a Graphical User Interface (GUI) client that uses the service. The service is designed to generically support High-Energy Physics (HEP) experimental analysis tasks. It has an extensible, flexible, open architecture and language. The service uses the Solenoidal Tracker At RHIC (STAR) experiment as a working example. STAR is an experiment at the Relativistic Heavy Ion Collider (RHIC) at the Brookhaven National Laboratory (BNL). STAR and other experiments at BNL generate multiple Petabytes of HEP data. The raw data is captured as millions of input files stored in a distributed data catalog. Potentially using thousands of files as input, analysis requests are submitted to a processing environment containing thousands of nodes. The Grid service provides a standard interface to the processing farm. It enables researchers to run large-scale, massively parallel analysis tasks, regardless of the computational resources available in their location

  20. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying

    2014-11-07

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  1. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying; Stein, Michael L.

    2014-01-01

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  2. Privacy-preserving record linkage on large real world datasets.

    Science.gov (United States)

    Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

    2014-08-01

    Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised. Copyright © 2013 Elsevier Inc. All rights reserved.

  3. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  4. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  5. Large-component handling equipment and its use

    International Nuclear Information System (INIS)

    Krieg, S.A.; Swannack, D.L.

    1983-01-01

    The Fast Flux Test Facility (FFTF) reactor systems have special requirements for component replacements during maintenance servicing. Replacement operations must address handling of equipment within shielded metal containers while maintaining an inert atmosphere to prevent reaction of sodium with air. Plant identification of a failed component results in selecting and assembling the maintenance cask and equipment transport system for transfer from the storage facility to the Reactor Containment Building (RCB). This includes a proper diameter and length cask, inert atmosphere control consoles, component lift fixture and support structure for interface with the facility area surrounding the component. This equipment is staged in modular groups in the Reactor Service Building for transfer through the equipment airlock to the containment interior. The failed component is generally prepared for replacement by installation of the special lifting fixture attachment. Assembly of the cask support structure is performed over the component position on the containment building operating floor. The cask and shroud from the reactor interface are inerted after all manual service connections and handling attachments are completed. The component is lifted from the reactor and into the cask interior through a floor valve which is then closed to isolate the component reactor port. The cask with sodium wetted component is transferred to a service/repair location, either within containment or outside, to the Maintenance Facility cleaning and repair area. The complete equipment and handling operations for replacement of a large reactor component are described

  6. Scalable and portable visualization of large atomistic datasets

    Science.gov (United States)

    Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2004-10-01

    A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data

  7. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  8. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  9. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. Comprehensive comparison of large-scale tissue expression datasets

    DEFF Research Database (Denmark)

    Santos Delgado, Alberto; Tsafou, Kalliopi; Stolte, Christian

    2015-01-01

    a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between......For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present......://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface....

  11. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  12. Extending SME to Handle Large-Scale Cognitive Modeling.

    Science.gov (United States)

    Forbus, Kenneth D; Ferguson, Ronald W; Lovett, Andrew; Gentner, Dedre

    2017-07-01

    Analogy and similarity are central phenomena in human cognition, involved in processes ranging from visual perception to conceptual change. To capture this centrality requires that a model of comparison must be able to integrate with other processes and handle the size and complexity of the representations required by the tasks being modeled. This paper describes extensions to Structure-Mapping Engine (SME) since its inception in 1986 that have increased its scope of operation. We first review the basic SME algorithm, describe psychological evidence for SME as a process model, and summarize its role in simulating similarity-based retrieval and generalization. Then we describe five techniques now incorporated into the SME that have enabled it to tackle large-scale modeling tasks: (a) Greedy merging rapidly constructs one or more best interpretations of a match in polynomial time: O(n 2 log(n)); (b) Incremental operation enables mappings to be extended as new information is retrieved or derived about the base or target, to model situations where information in a task is updated over time; (c) Ubiquitous predicates model the varying degrees to which items may suggest alignment; (d) Structural evaluation of analogical inferences models aspects of plausibility judgments; (e) Match filters enable large-scale task models to communicate constraints to SME to influence the mapping process. We illustrate via examples from published studies how these enable it to capture a broader range of psychological phenomena than before. Copyright © 2016 Cognitive Science Society, Inc.

  13. Orthology detection combining clustering and synteny for very large datasets

    OpenAIRE

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the ...

  14. Sparse kernel orthonormalized PLS for feature extraction in large datasets

    DEFF Research Database (Denmark)

    Arenas-García, Jerónimo; Petersen, Kaare Brandt; Hansen, Lars Kai

    2006-01-01

    In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm...... is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method...

  15. Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

    KAUST Repository

    Litvinenko, Alexander

    2017-11-01

    The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.

  16. Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

    KAUST Repository

    Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E.

    2017-01-01

    The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\H$-) matrix format with computational cost $\\mathcal{O}(k^2n \\log^2 n/p)$ and storage $\\mathcal{O}(kn \\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.

  17. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    Science.gov (United States)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  18. Ultrafast superpixel segmentation of large 3D medical datasets

    Science.gov (United States)

    Leblond, Antoine; Kauffmann, Claude

    2016-03-01

    Even with recent hardware improvements, superpixel segmentation of large 3D medical images at interactive speed (Gauss-Seidel like acceleration. The work unit partitioning scheme will however vary on odd- and even-numbered iterations to reduce convergence barriers. Synchronization will be ensured by an 8-step 3D variant of the traditional Red Black Ordering scheme. An attack model and early termination will also be described and implemented as additional acceleration techniques. Using our hybrid framework and typical operating parameters, we were able to compute the superpixels of a high-resolution 512x512x512 aortic angioCT scan in 283 ms using a AMD R9 290X GPU. We achieved a 22.3X speed-up factor compared to the published reference GPU implementation.

  19. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    Science.gov (United States)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  20. Unified Access Architecture for Large-Scale Scientific Datasets

    Science.gov (United States)

    Karna, Risav

    2014-05-01

    Data-intensive sciences have to deploy diverse large scale database technologies for data analytics as scientists have now been dealing with much larger volume than ever before. While array databases have bridged many gaps between the needs of data-intensive research fields and DBMS technologies (Zhang 2011), invocation of other big data tools accompanying these databases is still manual and separate the database management's interface. We identify this as an architectural challenge that will increasingly complicate the user's work flow owing to the growing number of useful but isolated and niche database tools. Such use of data analysis tools in effect leaves the burden on the user's end to synchronize the results from other data manipulation analysis tools with the database management system. To this end, we propose a unified access interface for using big data tools within large scale scientific array database using the database queries themselves to embed foreign routines belonging to the big data tools. Such an invocation of foreign data manipulation routines inside a query into a database can be made possible through a user-defined function (UDF). UDFs that allow such levels of freedom as to call modules from another language and interface back and forth between the query body and the side-loaded functions would be needed for this purpose. For the purpose of this research we attempt coupling of four widely used tools Hadoop (hadoop1), Matlab (matlab1), R (r1) and ScaLAPACK (scalapack1) with UDF feature of rasdaman (Baumann 98), an array-based data manager, for investigating this concept. The native array data model used by an array-based data manager provides compact data storage and high performance operations on ordered data such as spatial data, temporal data, and matrix-based data for linear algebra operations (scidbusr1). Performances issues arising due to coupling of tools with different paradigms, niche functionalities, separate processes and output

  1. Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

    KAUST Repository

    Zhang, Bohai; Sang, Huiyan; Huang, Jianhua Z.

    2014-01-01

    of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov

  2. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

    2018-01-01

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  3. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Müller, Matthias

    2018-03-28

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  4. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    DEFF Research Database (Denmark)

    Jensen, Tue Vissing; Pinson, Pierre

    2017-01-01

    , we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven...... to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecastingof renewable power generation....

  5. Valuation of large variable annuity portfolios: Monte Carlo simulation and synthetic datasets

    Directory of Open Access Journals (Sweden)

    Gan Guojun

    2017-12-01

    Full Text Available Metamodeling techniques have recently been proposed to address the computational issues related to the valuation of large portfolios of variable annuity contracts. However, it is extremely diffcult, if not impossible, for researchers to obtain real datasets frominsurance companies in order to test their metamodeling techniques on such real datasets and publish the results in academic journals. To facilitate the development and dissemination of research related to the effcient valuation of large variable annuity portfolios, this paper creates a large synthetic portfolio of variable annuity contracts based on the properties of real portfolios of variable annuities and implements a simple Monte Carlo simulation engine for valuing the synthetic portfolio. In addition, this paper presents fair market values and Greeks for the synthetic portfolio of variable annuity contracts that are important quantities for managing the financial risks associated with variable annuities. The resulting datasets can be used by researchers to test and compare the performance of various metamodeling techniques.

  6. Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

    Science.gov (United States)

    Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

    2014-01-01

    Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution

  7. Preconditioned dynamic mode decomposition and mode selection algorithms for large datasets using incremental proper orthogonal decomposition

    Science.gov (United States)

    Ohmichi, Yuya

    2017-07-01

    In this letter, we propose a simple and efficient framework of dynamic mode decomposition (DMD) and mode selection for large datasets. The proposed framework explicitly introduces a preconditioning step using an incremental proper orthogonal decomposition (POD) to DMD and mode selection algorithms. By performing the preconditioning step, the DMD and mode selection can be performed with low memory consumption and therefore can be applied to large datasets. Additionally, we propose a simple mode selection algorithm based on a greedy method. The proposed framework is applied to the analysis of three-dimensional flow around a circular cylinder.

  8. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    Science.gov (United States)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  9. A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

    International Nuclear Information System (INIS)

    Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

    2014-01-01

    With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative

  10. Extraction of drainage networks from large terrain datasets using high throughput computing

    Science.gov (United States)

    Gong, Jianya; Xie, Jibo

    2009-02-01

    Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.

  11. The role of metadata in managing large environmental science datasets. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  12. Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

    Science.gov (United States)

    Keuleers, Emmanuel; Balota, David A

    2015-01-01

    This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.

  13. REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

    Science.gov (United States)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history

  14. Immersive Interaction, Manipulation and Analysis of Large 3D Datasets for Planetary and Earth Sciences

    Science.gov (United States)

    Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.

    2017-12-01

    We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.

  15. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Science.gov (United States)

    Jensen, Tue V.; Pinson, Pierre

    2017-11-01

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  16. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

    Science.gov (United States)

    Jensen, Tue V; Pinson, Pierre

    2017-11-28

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  17. Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

    KAUST Repository

    Zhang, Bohai

    2014-01-01

    Various continuously-indexed spatio-temporal process models have been constructed to characterize spatio-temporal dependence structures, but the computational complexity for model fitting and predictions grows in a cubic order with the size of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points. Our approach is applicable to nonseparable and nonstationary spatio-temporal covariance models. We illustrate the effectiveness of our method through simulation experiments and application to an ozone measurement dataset.

  18. Palmprint and Palmvein Recognition Based on DCNN and A New Large-Scale Contactless Palmvein Dataset

    Directory of Open Access Journals (Sweden)

    Lin Zhang

    2018-03-01

    Full Text Available Among the members of biometric identifiers, the palmprint and the palmvein have received significant attention due to their stability, uniqueness, and non-intrusiveness. In this paper, we investigate the problem of palmprint/palmvein recognition and propose a Deep Convolutional Neural Network (DCNN based scheme, namely P a l m R CNN (short for palmprint/palmvein recognition using CNNs. The effectiveness and efficiency of P a l m R CNN have been verified through extensive experiments conducted on benchmark datasets. In addition, though substantial effort has been devoted to palmvein recognition, it is still quite difficult for the researchers to know the potential discriminating capability of the contactless palmvein. One of the root reasons is that a large-scale and publicly available dataset comprising high-quality, contactless palmvein images is still lacking. To this end, a user-friendly acquisition device for collecting high quality contactless palmvein images is at first designed and developed in this work. Then, a large-scale palmvein image dataset is established, comprising 12,000 images acquired from 600 different palms in two separate collection sessions. The collected dataset now is publicly available.

  19. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [CEADEN, Havana 11300, Cuba and INFN, Sezione di Torino, Torino 10125 (Italy); Fiorina, E.; Pennazio, F.; Peroni, C. [Department of Physics, University of Torino, Torino 10125, Italy and INFN, Sezione di Torino, Torino 10125 (Italy); Saletta, M.; Cerello, P., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [INFN, Sezione di Torino, Torino 10125 (Italy); Camarlinghi, N.; Fantacci, M. E. [Department of Physics, University of Pisa, Pisa 56127, Italy and INFN, Sezione di Pisa, Pisa 56127 (Italy)

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  20. A large-scale dataset of solar event reports from automated feature recognition modules

    Science.gov (United States)

    Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

    2016-05-01

    The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.

  1. A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets

    Science.gov (United States)

    Porwal, A.; Carranza, J.; Hale, M.

    2004-12-01

    A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.

  2. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

    Science.gov (United States)

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  3. Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

    KAUST Repository

    Huang, Huang

    2017-07-16

    This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as

  4. Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets

    DEFF Research Database (Denmark)

    Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H

    2013-01-01

    of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined...... in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations....

  5. A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

    KAUST Repository

    Tagle, Felipe

    2017-12-06

    Large, non-Gaussian spatial datasets pose a considerable modeling challenge as the dependence structure implied by the model needs to be captured at different scales, while retaining feasible inference. Skew-normal and skew-t distributions have only recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article presents the first multi-resolution spatial model inspired by the skew-t distribution, where a large-scale effect follows a multivariate normal distribution and the fine-scale effects follow a multivariate skew-normal distributions. The resulting marginal distribution for each region is skew-t, thereby allowing for greater flexibility in capturing skewness and heavy tails characterizing many environmental datasets. Likelihood-based inference is performed using a Monte Carlo EM algorithm. The model is applied as a stochastic generator of daily wind speeds over Saudi Arabia.

  6. Computing and data handling requirements for SSC [Superconducting Super Collider] and LHC [Large Hadron Collider] experiments

    International Nuclear Information System (INIS)

    Lankford, A.J.

    1990-05-01

    A number of issues for computing and data handling in the online in environment at future high-luminosity, high-energy colliders, such as the Superconducting Super Collider (SSC) and Large Hadron Collider (LHC), are outlined. Requirements for trigger processing, data acquisition, and online processing are discussed. Some aspects of possible solutions are sketched. 6 refs., 3 figs

  7. Extended data analysis strategies for high resolution imaging MS : new methods to deal with extremely large image hyperspectral datasets

    NARCIS (Netherlands)

    Klerk, L.A.; Broersen, A.; Fletcher, I.W.; Liere, van R.; Heeren, R.M.A.

    2007-01-01

    The large size of the hyperspectral datasets that are produced with modern mass spectrometric imaging techniques makes it difficult to analyze the results. Unsupervised statistical techniques are needed to extract relevant information from these datasets and reduce the data into a surveyable

  8. Spectral methods in machine learning and new strategies for very large datasets

    Science.gov (United States)

    Belabbas, Mohamed-Ali; Wolfe, Patrick J.

    2009-01-01

    Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490

  9. VisIVO: A Library and Integrated Tools for Large Astrophysical Dataset Exploration

    Science.gov (United States)

    Becciani, U.; Costa, A.; Ersotelos, N.; Krokos, M.; Massimino, P.; Petta, C.; Vitello, F.

    2012-09-01

    VisIVO provides an integrated suite of tools and services that can be used in many scientific fields. VisIVO development starts in the Virtual Observatory framework. VisIVO allows users to visualize meaningfully highly-complex, large-scale datasets and create movies of these visualizations based on distributed infrastructures. VisIVO supports high-performance, multi-dimensional visualization of large-scale astrophysical datasets. Users can rapidly obtain meaningful visualizations while preserving full and intuitive control of the relevant parameters. VisIVO consists of VisIVO Desktop - a stand-alone application for interactive visualization on standard PCs, VisIVO Server - a platform for high performance visualization, VisIVO Web - a custom designed web portal, VisIVOSmartphone - an application to exploit the VisIVO Server functionality and the latest VisIVO features: VisIVO Library allows a job running on a computational system (grid, HPC, etc.) to produce movies directly with the code internal data arrays without the need to produce intermediate files. This is particularly important when running on large computational facilities, where the user wants to have a look at the results during the data production phase. For example, in grid computing facilities, images can be produced directly in the grid catalogue while the user code is running in a system that cannot be directly accessed by the user (a worker node). The deployment of VisIVO on the DG and gLite is carried out with the support of EDGI and EGI-Inspire projects. Depending on the structure and size of datasets under consideration, the data exploration process could take several hours of CPU for creating customized views and the production of movies could potentially last several days. For this reason an MPI parallel version of VisIVO could play a fundamental role in increasing performance, e.g. it could be automatically deployed on nodes that are MPI aware. A central concept in our development is thus to

  10. An Investigation of Large Tilt-Rotor Hover and Low Speed Handling Qualities

    Science.gov (United States)

    Malpica, Carlos A.; Decker, William A.; Theodore, Colin R.; Lindsey, James E.; Lawrence, Ben; Blanken, Chris L.

    2011-01-01

    A piloted simulation experiment conducted on the NASA-Ames Vertical Motion Simulator evaluated the hover and low speed handling qualities of a large tilt-rotor concept, with particular emphasis on longitudinal and lateral position control. Ten experimental test pilots evaluated different combinations of Attitude Command-Attitude Hold (ACAH) and Translational Rate Command (TRC) response types, nacelle conversion actuator authority limits and inceptor choices. Pilots performed evaluations in revised versions of the ADS-33 Hover, Lateral Reposition and Depart/Abort MTEs and moderate turbulence conditions. Level 2 handling qualities ratings were primarily recorded using ACAH response type in all three of the evaluation maneuvers. The baseline TRC conferred Level 1 handling qualities in the Hover MTE, but there was a tendency to enter into a PIO associated with nacelle actuator rate limiting when employing large, aggressive control inputs. Interestingly, increasing rate limits also led to a reduction in the handling qualities ratings. This led to the identification of a nacelle rate to rotor longitudinal flapping coupling effect that induced undesired, pitching motions proportional to the allowable amount of nacelle rate. A modification that counteracted this effect significantly improved the handling qualities. Evaluation of the different response type variants showed that inclusion of TRC response could provide Level 1 handling qualities in the Lateral Reposition maneuver by reducing coupled pitch and heave off axis responses that otherwise manifest with ACAH. Finally, evaluations in the Depart/Abort maneuver showed that uncertainty about commanded nacelle position and ensuing aircraft response, when manually controlling the nacelle, demanded high levels of attention from the pilot. Additional requirements to maintain pitch attitude within 5 deg compounded the necessary workload.

  11. Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

    Science.gov (United States)

    Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian; hide

    2015-01-01

    Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex

  12. Image-based Exploration of Iso-surfaces for Large Multi- Variable Datasets using Parameter Space.

    KAUST Repository

    Binyahib, Roba S.

    2013-05-13

    With an increase in processing power, more complex simulations have resulted in larger data size, with higher resolution and more variables. Many techniques have been developed to help the user to visualize and analyze data from such simulations. However, dealing with a large amount of multivariate data is challenging, time- consuming and often requires high-end clusters. Consequently, novel visualization techniques are needed to explore such data. Many users would like to visually explore their data and change certain visual aspects without the need to use special clusters or having to load a large amount of data. This is the idea behind explorable images (EI). Explorable images are a novel approach that provides limited interactive visualization without the need to re-render from the original data [40]. In this work, the concept of EI has been used to create a workflow that deals with explorable iso-surfaces for scalar fields in a multivariate, time-varying dataset. As a pre-processing step, a set of iso-values for each scalar field is inferred and extracted from a user-assisted sampling technique in time-parameter space. These iso-values are then used to generate iso- surfaces that are then pre-rendered (from a fixed viewpoint) along with additional buffers (i.e. normals, depth, values of other fields, etc.) to provide a compressed representation of iso-surfaces in the dataset. We present a tool that at run-time allows the user to interactively browse and calculate a combination of iso-surfaces superimposed on each other. The result is the same as calculating multiple iso- surfaces from the original data but without the memory and processing overhead. Our tool also allows the user to change the (scalar) values superimposed on each of the surfaces, modify their color map, and interactively re-light the surfaces. We demonstrate the effectiveness of our approach over a multi-terabyte combustion dataset. We also illustrate the efficiency and accuracy of our

  13. Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer

    Directory of Open Access Journals (Sweden)

    Peterlongo Pierre

    2012-03-01

    Full Text Available Abstract Background The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. Results We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. Conclusions Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.

  14. Knowledge discovery in large model datasets in the marine environment: the THREDDS Data Server example

    Directory of Open Access Journals (Sweden)

    A. Bergamasco

    2012-06-01

    Full Text Available In order to monitor, describe and understand the marine environment, many research institutions are involved in the acquisition and distribution of ocean data, both from observations and models. Scientists from these institutions are spending too much time looking for, accessing, and reformatting data: they need better tools and procedures to make the science they do more efficient. The U.S. Integrated Ocean Observing System (US-IOOS is working on making large amounts of distributed data usable in an easy and efficient way. It is essentially a network of scientists, technicians and technologies designed to acquire, collect and disseminate observational and modelled data resulting from coastal and oceanic marine regions investigations to researchers, stakeholders and policy makers. In order to be successful, this effort requires standard data protocols, web services and standards-based tools. Starting from the US-IOOS approach, which is being adopted throughout much of the oceanographic and meteorological sectors, we describe here the CNR-ISMAR Venice experience in the direction of setting up a national Italian IOOS framework using the THREDDS (THematic Real-time Environmental Distributed Data Services Data Server (TDS, a middleware designed to fill the gap between data providers and data users. The TDS provides services that allow data users to find the data sets pertaining to their scientific needs, to access, to visualize and to use them in an easy way, without downloading files to the local workspace. In order to achieve this, it is necessary that the data providers make their data available in a standard form that the TDS understands, and with sufficient metadata to allow the data to be read and searched in a standard way. The core idea is then to utilize a Common Data Model (CDM, a unified conceptual model that describes different datatypes within each dataset. More specifically, Unidata (www.unidata.ucar.edu has developed CDM

  15. FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

    Science.gov (United States)

    Probst, Daniel; Reymond, Jean-Louis

    2018-04-15

    During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.

  16. Prediction of Canopy Heights over a Large Region Using Heterogeneous Lidar Datasets: Efficacy and Challenges

    Directory of Open Access Journals (Sweden)

    Ranjith Gopalakrishnan

    2015-08-01

    Full Text Available Generating accurate and unbiased wall-to-wall canopy height maps from airborne lidar data for large regions is useful to forest scientists and natural resource managers. However, mapping large areas often involves using lidar data from different projects, with varying acquisition parameters. In this work, we address the important question of whether one can accurately model canopy heights over large areas of the Southeastern US using a very heterogeneous dataset of small-footprint, discrete-return airborne lidar data (with 76 separate lidar projects. A unique aspect of this effort is the use of nationally uniform and extensive field data (~1800 forested plots from the Forest Inventory and Analysis (FIA program of the US Forest Service. Preliminary results are quite promising: Over all lidar projects, we observe a good correlation between the 85th percentile of lidar heights and field-measured height (r = 0.85. We construct a linear regression model to predict subplot-level dominant tree heights from distributional lidar metrics (R2 = 0.74, RMSE = 3.0 m, n = 1755. We also identify and quantify the importance of several factors (like heterogeneity of vegetation, point density, the predominance of hardwoods or softwoods, the average height of the forest stand, slope of the plot, and average scan angle of lidar acquisition that influence the efficacy of predicting canopy heights from lidar data. For example, a subset of plots (coefficient of variation of vegetation heights <0.2 significantly reduces the RMSE of our model from 3.0–2.4 m (~20% reduction. We conclude that when all these elements are factored into consideration, combining data from disparate lidar projects does not preclude robust estimation of canopy heights.

  17. Measurement and genetics of human subcortical and hippocampal asymmetries in large datasets.

    Science.gov (United States)

    Guadalupe, Tulio; Zwiers, Marcel P; Teumer, Alexander; Wittfeld, Katharina; Vasquez, Alejandro Arias; Hoogman, Martine; Hagoort, Peter; Fernandez, Guillen; Buitelaar, Jan; Hegenscheid, Katrin; Völzke, Henry; Franke, Barbara; Fisher, Simon E; Grabe, Hans J; Francks, Clyde

    2014-07-01

    Functional and anatomical asymmetries are prevalent features of the human brain, linked to gender, handedness, and cognition. However, little is known about the neurodevelopmental processes involved. In zebrafish, asymmetries arise in the diencephalon before extending within the central nervous system. We aimed to identify genes involved in the development of subtle, left-right volumetric asymmetries of human subcortical structures using large datasets. We first tested the feasibility of measuring left-right volume differences in such large-scale samples, as assessed by two automated methods of subcortical segmentation (FSL|FIRST and FreeSurfer), using data from 235 subjects who had undergone MRI twice. We tested the agreement between the first and second scan, and the agreement between the segmentation methods, for measures of bilateral volumes of six subcortical structures and the hippocampus, and their volumetric asymmetries. We also tested whether there were biases introduced by left-right differences in the regional atlases used by the methods, by analyzing left-right flipped images. While many bilateral volumes were measured well (scan-rescan r = 0.6-0.8), most asymmetries, with the exception of the caudate nucleus, showed lower repeatabilites. We meta-analyzed genome-wide association scan results for caudate nucleus asymmetry in a combined sample of 3,028 adult subjects but did not detect associations at genome-wide significance (P left-right patterning of the viscera. Our results provide important information for researchers who are currently aiming to carry out large-scale genome-wide studies of subcortical and hippocampal volumes, and their asymmetries. Copyright © 2013 Wiley Periodicals, Inc.

  18. Managing Large Multidimensional Array Hydrologic Datasets : A Case Study Comparing NetCDF and SciDB

    NARCIS (Netherlands)

    Liu, H.; van Oosterom, P.J.M.; Hu, C.; Wang, Wen

    2016-01-01

    Management of large hydrologic datasets including storage, structuring, indexing and query is one of the crucial challenges in the era of big data. This research originates from a specific data query problem: time series extraction at specific locations takes a long time when a large

  19. Safe Patient Handling and Mobility: Development and Implementation of a Large-Scale Education Program.

    Science.gov (United States)

    Lee, Corinne; Knight, Suzanne W; Smith, Sharon L; Nagle, Dorothy J; DeVries, Lori

    This article addresses the development, implementation, and evaluation of an education program for safe patient handling and mobility at a large academic medical center. The ultimate goal of the program was to increase safety during patient mobility/transfer and reduce nursing staff injury from lifting/pulling. This comprehensive program was designed on the basis of the principles of prework, application, and support at the point of care. A combination of online learning, demonstration, skill evaluation, and coaching at the point of care was used to achieve the goal. Specific roles and responsibilities were developed to facilitate implementation. It took 17 master trainers, 88 certified trainers, 176 unit-based trainers, and 98 coaches to put 3706 nurses and nursing assistants through the program. Evaluations indicated both an increase in knowledge about safe patient handling and an increased ability to safely mobilize patients. The challenge now is sustainability of safe patient-handling practices and the growth and development of trainers and coaches.

  20. CoVennTree: A new method for the comparative analysis of large datasets

    Directory of Open Access Journals (Sweden)

    Steffen C. Lott

    2015-02-01

    Full Text Available The visualization of massive datasets, such as those resulting from comparative metatranscriptome analyses or the analysis of microbial population structures using ribosomal RNA sequences, is a challenging task. We developed a new method called CoVennTree (Comparative weighted Venn Tree that simultaneously compares up to three multifarious datasets by aggregating and propagating information from the bottom to the top level and produces a graphical output in Cytoscape. With the introduction of weighted Venn structures, the contents and relationships of various datasets can be correlated and simultaneously aggregated without losing information. We demonstrate the suitability of this approach using a dataset of 16S rDNA sequences obtained from microbial populations at three different depths of the Gulf of Aqaba in the Red Sea. CoVennTree has been integrated into the Galaxy ToolShed and can be directly downloaded and integrated into the user instance.

  1. Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

    KAUST Repository

    Huang, Huang

    2017-01-01

    that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements

  2. TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

    Science.gov (United States)

    Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

    2013-05-30

    Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.

  3. Using large hydrological datasets to create a robust, physically based, spatially distributed model for Great Britain

    Science.gov (United States)

    Lewis, Elizabeth; Kilsby, Chris; Fowler, Hayley

    2014-05-01

    The impact of climate change on hydrological systems requires further quantification in order to inform water management. This study intends to conduct such analysis using hydrological models. Such models are of varying forms, of which conceptual, lumped parameter models and physically-based models are two important types. The majority of hydrological studies use conceptual models calibrated against measured river flow time series in order to represent catchment behaviour. This method often shows impressive results for specific problems in gauged catchments. However, the results may not be robust under non-stationary conditions such as climate change, as physical processes and relationships amenable to change are not accounted for explicitly. Moreover, conceptual models are less readily applicable to ungauged catchments, in which hydrological predictions are also required. As such, the physically based, spatially distributed model SHETRAN is used in this study to develop a robust and reliable framework for modelling historic and future behaviour of gauged and ungauged catchments across the whole of Great Britain. In order to achieve this, a large array of data completely covering Great Britain for the period 1960-2006 has been collated and efficiently stored ready for model input. The data processed include a DEM, rainfall, PE and maps of geology, soil and land cover. A desire to make the modelling system easy for others to work with led to the development of a user-friendly graphical interface. This allows non-experts to set up and run a catchment model in a few seconds, a process that can normally take weeks or months. The quality and reliability of the extensive dataset for modelling hydrological processes has also been evaluated. One aspect of this has been an assessment of error and uncertainty in rainfall input data, as well as the effects of temporal resolution in precipitation inputs on model calibration. SHETRAN has been updated to accept gridded rainfall

  4. A Bayesian spatio-temporal geostatistical model with an auxiliary lattice for large datasets

    KAUST Repository

    Xu, Ganggang; Liang, Faming; Genton, Marc G.

    2015-01-01

    method is not only able to handle irregularly spaced observations in the spatial domain, but it is also able to bypass the missing data problem in a spatio-temporal process. Because the computational complexity of the proposed Markov chain Monte Carlo

  5. Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset | Office of Cancer Genomics

    Science.gov (United States)

    Identifying genetic alterations that prime a cancer cell to respond to a particular therapeutic agent can facilitate the development of precision cancer medicines. Cancer cell-line (CCL) profiling of small-molecule sensitivity has emerged as an unbiased method to assess the relationships between genetic or cellular features of CCLs and small-molecule response. Here, we developed annotated cluster multidimensional enrichment analysis to explore the associations between groups of small molecules and groups of CCLs in a new, quantitative sensitivity dataset.

  6. MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets.

    Science.gov (United States)

    Lemieux, Sebastien; Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J; Mader, Sylvie; Sauvageau, Guy

    2017-07-27

    Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  7. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

    2006-01-01

    Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...

  8. Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

    Science.gov (United States)

    Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

    2018-02-01

    The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.

  9. Learning visual balance from large-scale datasets of aesthetically highly rated images

    Science.gov (United States)

    Jahanian, Ali; Vishwanathan, S. V. N.; Allebach, Jan P.

    2015-03-01

    The concept of visual balance is innate for humans, and influences how we perceive visual aesthetics and cognize harmony. Although visual balance is a vital principle of design and taught in schools of designs, it is barely quantified. On the other hand, with emergence of automantic/semi-automatic visual designs for self-publishing, learning visual balance and computationally modeling it, may escalate aesthetics of such designs. In this paper, we present how questing for understanding visual balance inspired us to revisit one of the well-known theories in visual arts, the so called theory of "visual rightness", elucidated by Arnheim. We define Arnheim's hypothesis as a design mining problem with the goal of learning visual balance from work of professionals. We collected a dataset of 120K images that are aesthetically highly rated, from a professional photography website. We then computed factors that contribute to visual balance based on the notion of visual saliency. We fitted a mixture of Gaussians to the saliency maps of the images, and obtained the hotspots of the images. Our inferred Gaussians align with Arnheim's hotspots, and confirm his theory. Moreover, the results support the viability of the center of mass, symmetry, as well as the Rule of Thirds in our dataset.

  10. Open and scalable analytics of large Earth observation datasets: From scenes to multidimensional arrays using SciDB and GDAL

    Science.gov (United States)

    Appel, Marius; Lahn, Florian; Buytaert, Wouter; Pebesma, Edzer

    2018-04-01

    Earth observation (EO) datasets are commonly provided as collection of scenes, where individual scenes represent a temporal snapshot and cover a particular region on the Earth's surface. Using these data in complex spatiotemporal modeling becomes difficult as soon as data volumes exceed a certain capacity or analyses include many scenes, which may spatially overlap and may have been recorded at different dates. In order to facilitate analytics on large EO datasets, we combine and extend the geospatial data abstraction library (GDAL) and the array-based data management and analytics system SciDB. We present an approach to automatically convert collections of scenes to multidimensional arrays and use SciDB to scale computationally intensive analytics. We evaluate the approach in three study cases on national scale land use change monitoring with Landsat imagery, global empirical orthogonal function analysis of daily precipitation, and combining historical climate model projections with satellite-based observations. Results indicate that the approach can be used to represent various EO datasets and that analyses in SciDB scale well with available computational resources. To simplify analyses of higher-dimensional datasets as from climate model output, however, a generalization of the GDAL data model might be needed. All parts of this work have been implemented as open-source software and we discuss how this may facilitate open and reproducible EO analyses.

  11. HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

    Directory of Open Access Journals (Sweden)

    R. Oliveira

    2016-06-01

    Full Text Available Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.

  12. Television food advertising to children in Slovenia: analyses using a large 12-month advertising dataset.

    Science.gov (United States)

    Korošec, Živa; Pravst, Igor

    2016-12-01

    The marketing of energy-dense foods is recognised as a probable causal factor in children's overweight and obesity. To stimulate policymakers to start using nutrient profiling to restrict food marketing, a harmonised model was recently proposed by the WHO. Our objective is to evaluate the television advertising of foods in Slovenia using the above-mentioned model. An analysis is performed using a representative dataset of 93,902 food-related advertisements broadcast in Slovenia in year 2013. The advertisements are linked to specific foods, which are then subject to categorisation according to the WHO and UK nutrient profile model. Advertising of chocolate and confectionery represented 37 % of food-related advertising in all viewing times, and 77 % in children's (4-9 years) viewing hours. During these hours, 96 % of the food advertisements did not pass the criteria for permitted advertising according to the WHO profile model. Evidence from Slovenia shows that, in the absence of efficient regulatory marketing restrictions, television advertising of food to children is almost exclusively linked to energy-dense foods. Minor modifications of the proposed WHO nutrient profile model are suggested.

  13. A Scalable Permutation Approach Reveals Replication and Preservation Patterns of Network Modules in Large Datasets.

    Science.gov (United States)

    Ritchie, Scott C; Watts, Stephen; Fearnley, Liam G; Holt, Kathryn E; Abraham, Gad; Inouye, Michael

    2016-07-01

    Network modules-topologically distinct groups of edges and nodes-that are preserved across datasets can reveal common features of organisms, tissues, cell types, and molecules. Many statistics to identify such modules have been developed, but testing their significance requires heuristics. Here, we demonstrate that current methods for assessing module preservation are systematically biased and produce skewed p values. We introduce NetRep, a rapid and computationally efficient method that uses a permutation approach to score module preservation without assuming data are normally distributed. NetRep produces unbiased p values and can distinguish between true and false positives during multiple hypothesis testing. We use NetRep to quantify preservation of gene coexpression modules across murine brain, liver, adipose, and muscle tissues. Complex patterns of multi-tissue preservation were revealed, including a liver-derived housekeeping module that displayed adipose- and muscle-specific association with body weight. Finally, we demonstrate the broader applicability of NetRep by quantifying preservation of bacterial networks in gut microbiota between men and women. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  14. A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

    KAUST Repository

    Tagle, Felipe; Castruccio, Stefano; Genton, Marc G.

    2017-01-01

    recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article

  15. Information contained within the large scale gas injection test (Lasgit) dataset exposed using a bespoke data analysis tool-kit

    International Nuclear Information System (INIS)

    Bennett, D.P.; Thomas, H.R.; Cuss, R.J.; Harrington, J.F.; Vardon, P.J.

    2012-01-01

    Document available in extended abstract form only. The Large Scale Gas Injection Test (Lasgit) is a field scale experiment run by the British Geological Survey (BGS) and is located approximately 420 m underground at SKB's Aespoe Hard Rock Laboratory (HRL) in Sweden. It has been designed to study the impact on safety of gas build up within a KBS-3V concept high level radioactive waste repository. Lasgit has been in almost continuous operation for approximately seven years and is still underway. An analysis of the dataset arising from the Lasgit experiment with particular attention to the smaller scale features and phenomenon recorded has been undertaken in parallel to the macro scale analysis performed by the BGS. Lasgit is a highly instrumented, frequently sampled and long-lived experiment leading to a substantial dataset containing in excess of 14.7 million datum points. The data is anticipated to include a wealth of information, including information regarding overall processes as well as smaller scale or 'second order' features. Due to the size of the dataset coupled with the detailed analysis of the dataset required and the reduction in subjectivity associated with measurement compared to observation, computational analysis is essential. Moreover, due to the length of operation and complexity of experimental activity, the Lasgit dataset is not typically suited to 'out of the box' time series analysis algorithms. In particular, the features that are not suited to standard algorithms include non-uniformities due to (deliberate) changes in sample rate at various points in the experimental history and missing data due to hardware malfunction/failure causing interruption of logging cycles. To address these features a computational tool-kit capable of performing an Exploratory Data Analysis (EDA) on long-term, large-scale datasets with non-uniformities has been developed. Particular tool-kit abilities include: the parameterization of signal variation in the dataset

  16. Classification of large acoustic datasets using machine learning and crowdsourcing: Application to whale calls

    NARCIS (Netherlands)

    Shamir, L.; Carol Yerby, C.; Simpson, R.; Benda-Beckmann, A.M. von; Tyack, P.; Samarra, F.; Miller, P.; Wallin, J.

    2014-01-01

    Vocal communication is a primary communication method of killer and pilot whales, and is used for transmitting a broad range of messages and information for short and long distance. The large variation in call types of these species makes it challenging to categorize them. In this study, sounds

  17. Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

    2017-09-01

    A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.

  18. Functional Neuroimaging Distinguishes Posttraumatic Stress Disorder from Traumatic Brain Injury in Focused and Large Community Datasets

    OpenAIRE

    Amen, Daniel G.; Raji, Cyrus A.; Willeumier, Kristen; Taylor, Derek; Tarzwell, Robert; Newberg, Andrew; Henderson, Theodore A.

    2015-01-01

    Background Traumatic brain injury (TBI) and posttraumatic stress disorder (PTSD) are highly heterogeneous and often present with overlapping symptomology, providing challenges in reliable classification and treatment. Single photon emission computed tomography (SPECT) may be advantageous in the diagnostic separation of these disorders when comorbid or clinically indistinct. Methods Subjects were selected from a multisite database, where rest and on-task SPECT scans were obtained on a large gr...

  19. The Management Challenge: Handling Exams Involving Large Quantities of Students, on and off Campus--A Design Concept

    Science.gov (United States)

    Larsson, Ken

    2014-01-01

    This paper looks at the process of managing large numbers of exams efficiently and secure with the use of a dedicated IT support. The system integrates regulations on different levels, from national to local, (even down to departments) and ensures that the rules are employed in all stages of handling the exams. The system has a proven record of…

  20. MilxXplore: a web-based system to explore large imaging datasets.

    Science.gov (United States)

    Bourgeat, P; Dore, V; Villemagne, V L; Rowe, C C; Salvado, O; Fripp, J

    2013-01-01

    As large-scale medical imaging studies are becoming more common, there is an increasing reliance on automated software to extract quantitative information from these images. As the size of the cohorts keeps increasing with large studies, there is a also a need for tools that allow results from automated image processing and analysis to be presented in a way that enables fast and efficient quality checking, tagging and reporting on cases in which automatic processing failed or was problematic. MilxXplore is an open source visualization platform, which provides an interface to navigate and explore imaging data in a web browser, giving the end user the opportunity to perform quality control and reporting in a user friendly, collaborative and efficient way. Compared to existing software solutions that often provide an overview of the results at the subject's level, MilxXplore pools the results of individual subjects and time points together, allowing easy and efficient navigation and browsing through the different acquisitions of a subject over time, and comparing the results against the rest of the population. MilxXplore is fast, flexible and allows remote quality checks of processed imaging data, facilitating data sharing and collaboration across multiple locations, and can be easily integrated into a cloud computing pipeline. With the growing trend of open data and open science, such a tool will become increasingly important to share and publish results of imaging analysis.

  1. A highly efficient multi-core algorithm for clustering extremely large datasets

    Directory of Open Access Journals (Sweden)

    Kraus Johann M

    2010-04-01

    Full Text Available Abstract Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer.

  2. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    International Nuclear Information System (INIS)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

    2015-01-01

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  3. Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu, E-mail: lssqlh@mail.sysu.edu.cn; Yang, Jian-Hua, E-mail: lssqlh@mail.sysu.edu.cn [RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou (China)

    2015-01-14

    Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.

  4. EEGVIS: A MATLAB toolbox for browsing, exploring, and viewing large datasets

    Directory of Open Access Journals (Sweden)

    Kay A Robbins

    2012-05-01

    Full Text Available Recent advances in data monitoring and sensor technology have accelerated the acquisition of very large data sets. Streaming data sets from instrumentation such as multi-channel EEG recording usually must undergo substantial pre-processing and artifact removal. Even when using automated procedures, most scientists engage in laborious manual examination and processing to assure high quality data and to indentify interesting or problematic data segments. Researchers also do not have a convenient method of method of visually assessing the effects of applying any stage in a processing pipeline. EEGVIS is a MATLAB toolbox that allows users to quickly explore multi-channel EEG and other large array-based data sets using multi-scale drill-down techniques. Customizable summary views reveal potentially interesting sections of data, which users can explore further by clicking to examine using detailed viewing components. The viewer and a companion browser are built on our MoBBED framework, which has a library of modular viewing components that can be mixed and matched to best reveal structure. Users can easily create new viewers for their specific data without any programming during the exploration process. These viewers automatically support pan, zoom, resizing of individual components, and cursor exploration. The toolbox can be used directly in MATLAB at any stage in a processing pipeline, as a plug in for EEGLAB, or as a standalone precompiled application without MATLAB running. EEGVIS and its supporting packages are freely available under the GNU general public license at http://visual.cs.utsa.edu/ eegvis.

  5. EEGVIS: A MATLAB Toolbox for Browsing, Exploring, and Viewing Large Datasets.

    Science.gov (United States)

    Robbins, Kay A

    2012-01-01

    Recent advances in data monitoring and sensor technology have accelerated the acquisition of very large data sets. Streaming data sets from instrumentation such as multi-channel EEG recording usually must undergo substantial pre-processing and artifact removal. Even when using automated procedures, most scientists engage in laborious manual examination and processing to assure high quality data and to indentify interesting or problematic data segments. Researchers also do not have a convenient method of method of visually assessing the effects of applying any stage in a processing pipeline. EEGVIS is a MATLAB toolbox that allows users to quickly explore multi-channel EEG and other large array-based data sets using multi-scale drill-down techniques. Customizable summary views reveal potentially interesting sections of data, which users can explore further by clicking to examine using detailed viewing components. The viewer and a companion browser are built on our MoBBED framework, which has a library of modular viewing components that can be mixed and matched to best reveal structure. Users can easily create new viewers for their specific data without any programming during the exploration process. These viewers automatically support pan, zoom, resizing of individual components, and cursor exploration. The toolbox can be used directly in MATLAB at any stage in a processing pipeline, as a plug-in for EEGLAB, or as a standalone precompiled application without MATLAB running. EEGVIS and its supporting packages are freely available under the GNU general public license at http://visual.cs.utsa.edu/eegvis.

  6. Calculating p-values and their significances with the Energy Test for large datasets

    Science.gov (United States)

    Barter, W.; Burr, C.; Parkes, C.

    2018-04-01

    The energy test method is a multi-dimensional test of whether two samples are consistent with arising from the same underlying population, through the calculation of a single test statistic (called the T-value). The method has recently been used in particle physics to search for samples that differ due to CP violation. The generalised extreme value function has previously been used to describe the distribution of T-values under the null hypothesis that the two samples are drawn from the same underlying population. We show that, in a simple test case, the distribution is not sufficiently well described by the generalised extreme value function. We present a new method, where the distribution of T-values under the null hypothesis when comparing two large samples can be found by scaling the distribution found when comparing small samples drawn from the same population. This method can then be used to quickly calculate the p-values associated with the results of the test.

  7. GEMINI: a computationally-efficient search engine for large gene expression datasets.

    Science.gov (United States)

    DeFreitas, Timothy; Saddiki, Hachem; Flaherty, Patrick

    2016-02-24

    Low-cost DNA sequencing allows organizations to accumulate massive amounts of genomic data and use that data to answer a diverse range of research questions. Presently, users must search for relevant genomic data using a keyword, accession number of meta-data tag. However, in this search paradigm the form of the query - a text-based string - is mismatched with the form of the target - a genomic profile. To improve access to massive genomic data resources, we have developed a fast search engine, GEMINI, that uses a genomic profile as a query to search for similar genomic profiles. GEMINI implements a nearest-neighbor search algorithm using a vantage-point tree to store a database of n profiles and in certain circumstances achieves an [Formula: see text] expected query time in the limit. We tested GEMINI on breast and ovarian cancer gene expression data from The Cancer Genome Atlas project and show that it achieves a query time that scales as the logarithm of the number of records in practice on genomic data. In a database with 10(5) samples, GEMINI identifies the nearest neighbor in 0.05 sec compared to a brute force search time of 0.6 sec. GEMINI is a fast search engine that uses a query genomic profile to search for similar profiles in a very large genomic database. It enables users to identify similar profiles independent of sample label, data origin or other meta-data information.

  8. BigWig and BigBed: enabling browsing of large distributed datasets.

    Science.gov (United States)

    Kent, W J; Zweig, A S; Barber, G; Hinrichs, A S; Karolchik, D

    2010-09-01

    BigWig and BigBed files are compressed binary indexed files containing data at several resolutions that allow the high-performance display of next-generation sequencing experiment results in the UCSC Genome Browser. The visualization is implemented using a multi-layered software approach that takes advantage of specific capabilities of web-based protocols and Linux and UNIX operating systems files, R trees and various indexing and compression tricks. As a result, only the data needed to support the current browser view is transmitted rather than the entire file, enabling fast remote access to large distributed data sets. Binaries for the BigWig and BigBed creation and parsing utilities may be downloaded at http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/. Source code for the creation and visualization software is freely available for non-commercial use at http://hgdownload.cse.ucsc.edu/admin/jksrc.zip, implemented in C and supported on Linux. The UCSC Genome Browser is available at http://genome.ucsc.edu.

  9. DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

    Science.gov (United States)

    Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

    2010-05-01

    Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.

  10. Functional Neuroimaging Distinguishes Posttraumatic Stress Disorder from Traumatic Brain Injury in Focused and Large Community Datasets.

    Science.gov (United States)

    Amen, Daniel G; Raji, Cyrus A; Willeumier, Kristen; Taylor, Derek; Tarzwell, Robert; Newberg, Andrew; Henderson, Theodore A

    2015-01-01

    Traumatic brain injury (TBI) and posttraumatic stress disorder (PTSD) are highly heterogeneous and often present with overlapping symptomology, providing challenges in reliable classification and treatment. Single photon emission computed tomography (SPECT) may be advantageous in the diagnostic separation of these disorders when comorbid or clinically indistinct. Subjects were selected from a multisite database, where rest and on-task SPECT scans were obtained on a large group of neuropsychiatric patients. Two groups were analyzed: Group 1 with TBI (n=104), PTSD (n=104) or both (n=73) closely matched for demographics and comorbidity, compared to each other and healthy controls (N=116); Group 2 with TBI (n=7,505), PTSD (n=1,077) or both (n=1,017) compared to n=11,147 without either. ROIs and visual readings (VRs) were analyzed using a binary logistic regression model with predicted probabilities inputted into a Receiver Operating Characteristic analysis to identify sensitivity, specificity, and accuracy. One-way ANOVA identified the most diagnostically significant regions of increased perfusion in PTSD compared to TBI. Analysis included a 10-fold cross validation of the protocol in the larger community sample (Group 2). For Group 1, baseline and on-task ROIs and VRs showed a high level of accuracy in differentiating PTSD, TBI and PTSD+TBI conditions. This carefully matched group separated with 100% sensitivity, specificity and accuracy for the ROI analysis and at 89% or above for VRs. Group 2 had lower sensitivity, specificity and accuracy, but still in a clinically relevant range. Compared to subjects with TBI, PTSD showed increases in the limbic regions, cingulum, basal ganglia, insula, thalamus, prefrontal cortex and temporal lobes. This study demonstrates the ability to separate PTSD and TBI from healthy controls, from each other, and detect their co-occurrence, even in highly comorbid samples, using SPECT. This modality may offer a clinical option for aiding

  11. Functional Neuroimaging Distinguishes Posttraumatic Stress Disorder from Traumatic Brain Injury in Focused and Large Community Datasets.

    Directory of Open Access Journals (Sweden)

    Daniel G Amen

    Full Text Available Traumatic brain injury (TBI and posttraumatic stress disorder (PTSD are highly heterogeneous and often present with overlapping symptomology, providing challenges in reliable classification and treatment. Single photon emission computed tomography (SPECT may be advantageous in the diagnostic separation of these disorders when comorbid or clinically indistinct.Subjects were selected from a multisite database, where rest and on-task SPECT scans were obtained on a large group of neuropsychiatric patients. Two groups were analyzed: Group 1 with TBI (n=104, PTSD (n=104 or both (n=73 closely matched for demographics and comorbidity, compared to each other and healthy controls (N=116; Group 2 with TBI (n=7,505, PTSD (n=1,077 or both (n=1,017 compared to n=11,147 without either. ROIs and visual readings (VRs were analyzed using a binary logistic regression model with predicted probabilities inputted into a Receiver Operating Characteristic analysis to identify sensitivity, specificity, and accuracy. One-way ANOVA identified the most diagnostically significant regions of increased perfusion in PTSD compared to TBI. Analysis included a 10-fold cross validation of the protocol in the larger community sample (Group 2.For Group 1, baseline and on-task ROIs and VRs showed a high level of accuracy in differentiating PTSD, TBI and PTSD+TBI conditions. This carefully matched group separated with 100% sensitivity, specificity and accuracy for the ROI analysis and at 89% or above for VRs. Group 2 had lower sensitivity, specificity and accuracy, but still in a clinically relevant range. Compared to subjects with TBI, PTSD showed increases in the limbic regions, cingulum, basal ganglia, insula, thalamus, prefrontal cortex and temporal lobes.This study demonstrates the ability to separate PTSD and TBI from healthy controls, from each other, and detect their co-occurrence, even in highly comorbid samples, using SPECT. This modality may offer a clinical option for

  12. COINS: An innovative informatics and neuroimaging tool suite built for large heterogeneous datasets

    Directory of Open Access Journals (Sweden)

    Adam eScott

    2011-12-01

    Full Text Available The availability of well-characterized neuroimaging data with large numbers of subjects, especially for clinical populations, is critical to advancing our understanding of the healthy and diseased brain. Such data enables questions to be answered in a much more generalizable manner and also has the potential to yield solutions derived from novel methods that were conceived after the original studies' implementation. Though there is currently growing interest in data sharing, the neuroimaging community has been struggling for years with how to best encourage sharing data across brain imaging studies. With the advent of studies that are much more consistent across sites (e.g., resting fMRI, diffusion tensor imaging, and structural imaging the potential of pooling data across studies continues to gain momentum.At the Mind Research Network (MRN, we have developed the COllaborative Informatics and Neuroimaging Suite (COINS; http://coins.mrn.org to provide researchers with an information system based on an open-source model that includes web-based tools to manage studies, subjects, imaging, clinical data and other assessments. The system currently hosts data from 9 institutions, over 300 studies, over 14,000 subjects, and over 19,000 MRI, MEG, and EEG scan sessions in addition to more than 180,000 clinical assessments. In this paper we provide a description of COINS with comparison to a valuable and popular system known as XNAT. Although there are many similarities between COINS and other electronic data management systems, the differences that may concern researchers in the context of multi-site, multi-organizational data-sharing environments with intuitive ease of use and PHI security are emphasized as important attributes.

  13. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

    Science.gov (United States)

    Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

    2015-06-13

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution.

  14. Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?

    KAUST Repository

    Sun, Ying

    2012-10-01

    © 2012 John Wiley & Sons, Ltd. Band depth is an important nonparametric measure that generalizes order statistics and makes univariate methods based on order statistics possible for functional data. However, the computational burden of band depth limits its applicability when large functional or image datasets are considered. This paper proposes an exact fast method to speed up the band depth computation when bands are defined by two curves. Remarkable computational gains are demonstrated through simulation studies comparing our proposal with the original computation and one existing approximate method. For example, we report an experiment where our method can rank one million curves, evaluated at fifty time points each, in 12.4 seconds with Matlab.

  15. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying

    2011-10-31

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  16. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying; Li, Bo; Genton, Marc G.

    2011-01-01

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  17. CImbinator: a web-based tool for drug synergy analysis in small- and large-scale datasets.

    Science.gov (United States)

    Flobak, Åsmund; Vazquez, Miguel; Lægreid, Astrid; Valencia, Alfonso

    2017-08-01

    Drug synergies are sought to identify combinations of drugs particularly beneficial. User-friendly software solutions that can assist analysis of large-scale datasets are required. CImbinator is a web-service that can aid in batch-wise and in-depth analyzes of data from small-scale and large-scale drug combination screens. CImbinator offers to quantify drug combination effects, using both the commonly employed median effect equation, as well as advanced experimental mathematical models describing dose response relationships. CImbinator is written in Ruby and R. It uses the R package drc for advanced drug response modeling. CImbinator is available at http://cimbinator.bioinfo.cnio.es , the source-code is open and available at https://github.com/Rbbt-Workflows/combination_index . A Docker image is also available at https://hub.docker.com/r/mikisvaz/rbbt-ci_mbinator/ . asmund.flobak@ntnu.no or miguel.vazquez@cnio.es. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  18. Large-scale groundwater modeling using global datasets: a test case for the Rhine-Meuse basin

    Directory of Open Access Journals (Sweden)

    E. H. Sutanudjaja

    2011-09-01

    Full Text Available The current generation of large-scale hydrological models does not include a groundwater flow component. Large-scale groundwater models, involving aquifers and basins of multiple countries, are still rare mainly due to a lack of hydro-geological data which are usually only available in developed countries. In this study, we propose a novel approach to construct large-scale groundwater models by using global datasets that are readily available. As the test-bed, we use the combined Rhine-Meuse basin that contains groundwater head data used to verify the model output. We start by building a distributed land surface model (30 arc-second resolution to estimate groundwater recharge and river discharge. Subsequently, a MODFLOW transient groundwater model is built and forced by the recharge and surface water levels calculated by the land surface model. Results are promising despite the fact that we still use an offline procedure to couple the land surface and MODFLOW groundwater models (i.e. the simulations of both models are separately performed. The simulated river discharges compare well to the observations. Moreover, based on our sensitivity analysis, in which we run several groundwater model scenarios with various hydro-geological parameter settings, we observe that the model can reasonably well reproduce the observed groundwater head time series. However, we note that there are still some limitations in the current approach, specifically because the offline-coupling technique simplifies the dynamic feedbacks between surface water levels and groundwater heads, and between soil moisture states and groundwater heads. Also the current sensitivity analysis ignores the uncertainty of the land surface model output. Despite these limitations, we argue that the results of the current model show a promise for large-scale groundwater modeling practices, including for data-poor environments and at the global scale.

  19. Handling Large and Complex Data in a Photovoltaic Research Institution Using a Custom Laboratory Information Management System

    Energy Technology Data Exchange (ETDEWEB)

    White, Robert R.; Munch, Kristin

    2014-01-01

    Twenty-five years ago the desktop computer started becoming ubiquitous in the scientific lab. Researchers were delighted with its ability to both control instrumentation and acquire data on a single system, but they were not completely satisfied. There were often gaps in knowledge that they thought might be gained if they just had more data and they could get the data faster. Computer technology has evolved in keeping with Moore’s Law meeting those desires; however those improvements have of late become both a boon and bane for researchers. Computers are now capable of producing high speed data streams containing terabytes of information; capabilities that evolved faster than envisioned last century. Software to handle large scientific data sets has not kept up. How much information might be lost through accidental mismanagement or how many discoveries are missed through data overload are now vital questions. An important new task in most scientific disciplines involves developing methods to address those issues and to create the software that can handle large data sets with an eye towards scalability. This software must create archived, indexed, and searchable data from heterogeneous instrumentation for the implementation of a strong data-driven materials development strategy. At the National Center for Photovoltaics in the National Renewable Energy Laboratory, we began development a few years ago on a Laboratory Information Management System (LIMS) designed to handle lab-wide scientific data acquisition, management, processing and mining needs for physics and materials science data, and with a specific focus towards future scalability for new equipment or research focuses. We will present the decisions, processes, and problems we went through while building our LIMS system for materials research, its current operational state and our steps for future development.

  20. An efficient method to handle the 'large p, small n' problem for ...

    Indian Academy of Sciences (India)

    So-called 'large p small n' or 'short-fat data' problem can occur if the number of ... (SNPs) in GWAS based on the Haseman–Elston regression. (H–E) (DeFries 2010). ..... For instance, if a population is admixed or genetic heterogeneity, the ...

  1. Solving the challenges of data preprocessing, uploading, archiving, retrieval, analysis and visualization for large heterogeneous paleo- and rock magnetic datasets

    Science.gov (United States)

    Minnett, R.; Koppers, A. A.; Tauxe, L.; Constable, C.; Jarboe, N. A.

    2011-12-01

    The Magnetics Information Consortium (MagIC) provides an archive for the wealth of rock- and paleomagnetic data and interpretations from studies on natural and synthetic samples. As with many fields, most peer-reviewed paleo- and rock magnetic publications only include high level results. However, access to the raw data from which these results were derived is critical for compilation studies and when updating results based on new interpretation and analysis methods. MagIC provides a detailed metadata model with places for everything from raw measurements to their interpretations. Prior to MagIC, these raw data were extremely cumbersome to collect because they mostly existed in a lab's proprietary format on investigator's personal computers or undigitized in field notebooks. MagIC has developed a suite of offline and online tools to enable the paleomagnetic, rock magnetic, and affiliated scientific communities to easily contribute both their previously published data and data supporting an article undergoing peer-review, to retrieve well-annotated published interpretations and raw data, and to analyze and visualize large collections of published data online. Here we present the technology we chose (including VBA in Excel spreadsheets, Python libraries, FastCGI JSON webservices, Oracle procedures, and jQuery user interfaces) and how we implemented it in order to serve the scientific community as seamlessly as possible. These tools are now in use in labs worldwide, have helped archive many valuable legacy studies and datasets, and routinely enable new contributions to the MagIC Database (http://earthref.org/MAGIC/).

  2. Insights into SCP/TAPS proteins of liver flukes based on large-scale bioinformatic analyses of sequence datasets.

    Directory of Open Access Journals (Sweden)

    Cinzia Cantacessi

    Full Text Available BACKGROUND: SCP/TAPS proteins of parasitic helminths have been proposed to play key roles in fundamental biological processes linked to the invasion of and establishment in their mammalian host animals, such as the transition from free-living to parasitic stages and the modulation of host immune responses. Despite the evidence that SCP/TAPS proteins of parasitic nematodes are involved in host-parasite interactions, there is a paucity of information on this protein family for parasitic trematodes of socio-economic importance. METHODOLOGY/PRINCIPAL FINDINGS: We conducted the first large-scale study of SCP/TAPS proteins of a range of parasitic trematodes of both human and veterinary importance (including the liver flukes Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica and F. gigantica as well as the blood flukes Schistosoma mansoni, S. japonicum and S. haematobium. We mined all current transcriptomic and/or genomic sequence datasets from public databases, predicted secondary structures of full-length protein sequences, undertook systematic phylogenetic analyses and investigated the differential transcription of SCP/TAPS genes in O. viverrini and F. hepatica, with an emphasis on those that are up-regulated in the developmental stages infecting the mammalian host. CONCLUSIONS: This work, which sheds new light on SCP/TAPS proteins, guides future structural and functional explorations of key SCP/TAPS molecules associated with diseases caused by flatworms. Future fundamental investigations of these molecules in parasites and the integration of structural and functional data could lead to new approaches for the control of parasitic diseases.

  3. A remote handling rate-position controller for telemanipulating in a large workspace

    International Nuclear Information System (INIS)

    Barrio, Jorge; Ferre, Manuel; Suárez-Ruiz, Francisco; Aracil, Rafael

    2014-01-01

    This paper presents a new haptic rate-position controller, which allows manipulating a slave robot in a large workspace using a small haptic device. This control algorithm is very effective when the master device is much smaller than the slave device. Haptic information is displayed to the user so as to be informed when a change in the operation mode occurs. This controller allows performing tasks in a large remote workspace by using a haptic device with a reduced workspace such as Phantom. Experimental results have been carried out using a slave robot from Kraft Telerobotics and a commercial haptic interface as a master device. A curvature path following task has been simulated using the proposed controller which was compared with the force-position control algorithm. Results obtained show that higher accuracy is obtained when the proposed method is used, spending a similar amount of time to perform the task

  4. A remote handling rate-position controller for telemanipulating in a large workspace

    Energy Technology Data Exchange (ETDEWEB)

    Barrio, Jorge, E-mail: jordi.barrio@upm.es; Ferre, Manuel, E-mail: m.ferre@upm.es; Suárez-Ruiz, Francisco, E-mail: fa.suarez@upm.es; Aracil, Rafael, E-mail: rafael.aracil@upm.es

    2014-01-15

    This paper presents a new haptic rate-position controller, which allows manipulating a slave robot in a large workspace using a small haptic device. This control algorithm is very effective when the master device is much smaller than the slave device. Haptic information is displayed to the user so as to be informed when a change in the operation mode occurs. This controller allows performing tasks in a large remote workspace by using a haptic device with a reduced workspace such as Phantom. Experimental results have been carried out using a slave robot from Kraft Telerobotics and a commercial haptic interface as a master device. A curvature path following task has been simulated using the proposed controller which was compared with the force-position control algorithm. Results obtained show that higher accuracy is obtained when the proposed method is used, spending a similar amount of time to perform the task.

  5. Handling of corn stover bales for combustion in small and large furnaces

    Energy Technology Data Exchange (ETDEWEB)

    Morissette, R.; Savoie, P.; Villeneuve, J. [Agriculture and Agri-Food Canada, Quebec City, PQ (Canada)

    2010-07-01

    This paper reported on a study in which dry corn stover was baled and burned in 2 furnaces in the province of Quebec. Small and large rectangular bale formats were considered for direct combustion. The first combustion unit was a small 500,000 BTU/h dual chamber log wood furnace located at a hay growing farm in Neuville, Quebec. The heat was initially transferred to a hot water pipe system and then transferred to a hot air exchanger to dry hay bales. The small stover bales were placed directly into the combustion furnace. The low density of the bales compared to log wood, required filling up to 8 times more frequently. Stover bales produced an average of 6.4 per cent ash on a DM basis and required an automated system for ash removal. Combustion gas contained levels of particulate matter greater than 1417 mg/m{sup 3}, which is more than the local acceptable maximum of 600 mg/m{sup 3} for combustion furnaces. The second combustion unit was a high capacity 12.5 million BTU/h single chamber furnace located in Saint-Philippe-de-neri, Quebec. It was used to generate steam for a feed pellet mill. Large corn stover bales were broken up and fed on a conveyor and through a screw auger to the furnace. The stover was light compared to the wood chips used in this furnace. For mechanical reasons, the stover could not be fed continuously to the furnace.

  6. Fuel handling alternatives to prepare for large scale fuel channel replacement

    International Nuclear Information System (INIS)

    Martire, S.; Sandu, I.

    2007-01-01

    It is desirable to reduce the duration of defuelling the reactor in preparation for retube, as the cost of replacement power is $750K/day. Three fast defuelling concepts are presented. With the Through Flow Defuelling method, the fuel string is hydraulically pushed into the downstream Fuelling Machine (FM) by flow passing through the fuel channel. The Long Stroke C Ram method replaces the FM C Ram with a longer one capable of pushing all fuel bundles into the receiving FM. Defuelling Hardware uses enhanced design of ram extensions that interconnect mechanically to extend the Ram stroke to push fuel bundles into the receiving FM. This paper will present descriptions of each defuelling concept to prepare for Large Scale Fuel Channel Replacement. Advantages and disadvantages of each concept will be discussed and a recommendation will be made for future implementation. (author)

  7. Efficient Geometry and Data Handling for Large-Scale Monte Carlo - Thermal-Hydraulics Coupling

    Science.gov (United States)

    Hoogenboom, J. Eduard

    2014-06-01

    Detailed coupling of thermal-hydraulics calculations to Monte Carlo reactor criticality calculations requires each axial layer of each fuel pin to be defined separately in the input to the Monte Carlo code in order to assign to each volume the temperature according to the result of the TH calculation, and if the volume contains coolant, also the density of the coolant. This leads to huge input files for even small systems. In this paper a methodology for dynamical assignment of temperatures with respect to cross section data is demonstrated to overcome this problem. The method is implemented in MCNP5. The method is verified for an infinite lattice with 3x3 BWR-type fuel pins with fuel, cladding and moderator/coolant explicitly modeled. For each pin 60 axial zones are considered with different temperatures and coolant densities. The results of the axial power distribution per fuel pin are compared to a standard MCNP5 run in which all 9x60 cells for fuel, cladding and coolant are explicitly defined and their respective temperatures determined from the TH calculation. Full agreement is obtained. For large-scale application the method is demonstrated for an infinite lattice with 17x17 PWR-type fuel assemblies with 25 rods replaced by guide tubes. Again all geometrical detailed is retained. The method was used in a procedure for coupled Monte Carlo and thermal-hydraulics iterations. Using an optimised iteration technique, convergence was obtained in 11 iteration steps.

  8. Assessment of radiation damage behaviour in a large collection of empirically optimized datasets highlights the importance of unmeasured complicating effects

    International Nuclear Information System (INIS)

    Krojer, Tobias; Delft, Frank von

    2011-01-01

    A retrospective analysis of radiation damage behaviour in a statistically significant number of real-life datasets is presented, in order to gauge the importance of the complications not yet measured or rigorously evaluated in current experiments, and the challenges that remain before radiation damage can be considered a problem solved in practice. The radiation damage behaviour in 43 datasets of 34 different proteins collected over a year was examined, in order to gauge the reliability of decay metrics in practical situations, and to assess how these datasets, optimized only empirically for decay, would have benefited from the precise and automatic prediction of decay now possible with the programs RADDOSE [Murray, Garman & Ravelli (2004 ▶). J. Appl. Cryst.37, 513–522] and BEST [Bourenkov & Popov (2010 ▶). Acta Cryst. D66, 409–419]. The results indicate that in routine practice the diffraction experiment is not yet characterized well enough to support such precise predictions, as these depend fundamentally on three interrelated variables which cannot yet be determined robustly and practically: the flux density distribution of the beam; the exact crystal volume; the sensitivity of the crystal to dose. The former two are not satisfactorily approximated from typical beamline information such as nominal beam size and transmission, or two-dimensional images of the beam and crystal; the discrepancies are particularly marked when using microfocus beams (<20 µm). Empirically monitoring decay with the dataset scaling B factor (Bourenkov & Popov, 2010 ▶) appears more robust but is complicated by anisotropic and/or low-resolution diffraction. These observations serve to delineate the challenges, scientific and logistic, that remain to be addressed if tools for managing radiation damage in practical data collection are to be conveniently robust enough to be useful in real time

  9. Evaluation of the Oh, Dubois and IEM Backscatter Models Using a Large Dataset of SAR Data and Experimental Soil Measurements

    Directory of Open Access Journals (Sweden)

    Mohammad Choker

    2017-01-01

    Full Text Available The aim of this paper is to evaluate the most used radar backscattering models (Integral Equation Model “IEM”, Oh, Dubois, and Advanced Integral Equation Model “AIEM” using a wide dataset of SAR (Synthetic Aperture Radar data and experimental soil measurements. These forward models reproduce the radar backscattering coefficients ( σ 0 from soil surface characteristics (dielectric constant, roughness and SAR sensor parameters (radar wavelength, incidence angle, polarization. The analysis dataset is composed of AIRSAR, SIR-C, JERS-1, PALSAR-1, ESAR, ERS, RADARSAT, ASAR and TerraSAR-X data and in situ measurements (soil moisture and surface roughness. Results show that Oh model version developed in 1992 gives the best fitting of the backscattering coefficients in HH and VV polarizations with RMSE values of 2.6 dB and 2.4 dB, respectively. Simulations performed with the Dubois model show a poor correlation between real data and model simulations in HH polarization (RMSE = 4.0 dB and better correlation with real data in VV polarization (RMSE = 2.9 dB. The IEM and the AIEM simulate the backscattering coefficient with high RMSE when using a Gaussian correlation function. However, better simulations are performed with IEM and AIEM by using an exponential correlation function (slightly better fitting with AIEM than IEM. Good agreement was found between the radar data and the simulations using the calibrated version of the IEM modified by Baghdadi (IEM_B with bias less than 1.0 dB and RMSE less than 2.0 dB. These results confirm that, up to date, the IEM modified by Baghdadi (IEM_B is the most adequate to estimate soil moisture and roughness from SAR data.

  10. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  11. Large-scale groundwater modeling using global datasets: A test case for the Rhine-Meuse basin

    NARCIS (Netherlands)

    Sutanudjaja, E.H.; Beek, L.P.H. van; Jong, S.M. de; Geer, F.C. van; Bierkens, M.F.P.

    2011-01-01

    Large-scale groundwater models involving aquifers and basins of multiple countries are still rare due to a lack of hydrogeological data which are usually only available in developed countries. In this study, we propose a novel approach to construct large-scale groundwater models by using global

  12. Large-scale groundwater modeling using global datasets: a test case for the Rhine-Meuse basin

    NARCIS (Netherlands)

    Sutanudjaja, E.H.; Beek, L.P.H. van; Jong, S.M. de; Geer, F.C. van; Bierkens, M.F.P.

    2011-01-01

    The current generation of large-scale hydrological models does not include a groundwater flow component. Large-scale groundwater models, involving aquifers and basins of multiple countries, are still rare mainly due to a lack of hydro-geological data which are usually only available in

  13. Large-scale groundwater modeling using global datasets: A test case for the Rhine-Meuse basin

    NARCIS (Netherlands)

    Sutanudjaja, E.H.; Beek, L.P.H. van; Jong, S.M. de; Geer, F.C. van; Bierkens, M.F.P.

    2011-01-01

    The current generation of large-scale hydrological models does not include a groundwater flow component. Large-scale groundwater models, involving aquifers and basins of multiple countries, are still rare mainly due to a lack of hydro-geological data which are usually only available in developed

  14. Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset

    Directory of Open Access Journals (Sweden)

    Eisenhaber Frank

    2010-02-01

    Full Text Available Abstract Background Algorithms designed to predict protein disorder play an important role in structural and functional genomics, as disordered regions have been reported to participate in important cellular processes. Consequently, several methods with different underlying principles for disorder prediction have been independently developed by various groups. For assessing their usability in automated workflows, we are interested in identifying parameter settings and threshold selections, under which the performance of these predictors becomes directly comparable. Results First, we derived a new benchmark set that accounts for different flavours of disorder complemented with a similar amount of order annotation derived for the same protein set. We show that, using the recommended default parameters, the programs tested are producing a wide range of predictions at different levels of specificity and sensitivity. We identify settings, in which the different predictors have the same false positive rate. We assess conditions when sets of predictors can be run together to derive consensus or complementary predictions. This is useful in the framework of proteome-wide applications where high specificity is required such as in our in-house sequence analysis pipeline and the ANNIE webserver. Conclusions This work identifies parameter settings and thresholds for a selection of disorder predictors to produce comparable results at a desired level of specificity over a newly derived benchmark dataset that accounts equally for ordered and disordered regions of different lengths.

  15. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    KAUST Repository

    Rodriguez, Jesse M.; Batzoglou, Serafim; Bercovici, Sivan

    2013-01-01

    , accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting

  16. Modification of input datasets for the Ensemble Streamflow Prediction based on large scale climatic indices and weather generator

    Czech Academy of Sciences Publication Activity Database

    Šípek, Václav; Daňhelka, J.

    2015-01-01

    Roč. 528, September (2015), s. 720-733 ISSN 0022-1694 Institutional support: RVO:67985874 Keywords : seasonal forecasting * ESP * large-scale climate * weather generator Subject RIV: DA - Hydrology ; Limnology Impact factor: 3.043, year: 2015

  17. Facing the Challenges of Accessing, Managing, and Integrating Large Observational Datasets in Ecology: Enabling and Enriching the Use of NEON's Observational Data

    Science.gov (United States)

    Thibault, K. M.

    2013-12-01

    As the construction of NEON and its transition to operations progresses, more and more data will become available to the scientific community, both from NEON directly and from the concomitant growth of existing data repositories. Many of these datasets include ecological observations of a diversity of taxa in both aquatic and terrestrial environments. Although observational data have been collected and used throughout the history of organismal biology, the field has not yet fully developed a culture of data management, documentation, standardization, sharing and discoverability to facilitate the integration and synthesis of datasets. Moreover, the tools required to accomplish these goals, namely database design, implementation, and management, and automation and parallelization of analytical tasks through computational techniques, have not historically been included in biology curricula, at either the undergraduate or graduate levels. To ensure the success of data-generating projects like NEON in advancing organismal ecology and to increase transparency and reproducibility of scientific analyses, an acceleration of the cultural shift to open science practices, the development and adoption of data standards, such as the DarwinCore standard for taxonomic data, and increased training in computational approaches for biologists need to be realized. Here I highlight several initiatives that are intended to increase access to and discoverability of publicly available datasets and equip biologists and other scientists with the skills that are need to manage, integrate, and analyze data from multiple large-scale projects. The EcoData Retriever (ecodataretriever.org) is a tool that downloads publicly available datasets, re-formats the data into an efficient relational database structure, and then automatically imports the data tables onto a user's local drive into the database tool of the user's choice. The automation of these tasks results in nearly instantaneous execution

  18. Impacts of a lengthening open water season on Alaskan coastal communities: deriving locally relevant indices from large-scale datasets and community observations

    Science.gov (United States)

    Rolph, Rebecca J.; Mahoney, Andrew R.; Walsh, John; Loring, Philip A.

    2018-05-01

    Using thresholds of physical climate variables developed from community observations, together with two large-scale datasets, we have produced local indices directly relevant to the impacts of a reduced sea ice cover on Alaska coastal communities. The indices include the number of false freeze-ups defined by transient exceedances of ice concentration prior to a corresponding exceedance that persists, false break-ups, timing of freeze-up and break-up, length of the open water duration, number of days when the winds preclude hunting via boat (wind speed threshold exceedances), the number of wind events conducive to geomorphological work or damage to infrastructure from ocean waves, and the number of these wind events with on- and along-shore components promoting water setup along the coastline. We demonstrate how community observations can inform use of large-scale datasets to derive these locally relevant indices. The two primary large-scale datasets are the Historical Sea Ice Atlas for Alaska and the atmospheric output from a regional climate model used to downscale the ERA-Interim atmospheric reanalysis. We illustrate the variability and trends of these indices by application to the rural Alaska communities of Kotzebue, Shishmaref, and Utqiaġvik (previously Barrow), although the same procedure and metrics can be applied to other coastal communities. Over the 1979-2014 time period, there has been a marked increase in the number of combined false freeze-ups and false break-ups as well as the number of days too windy for hunting via boat for all three communities, especially Utqiaġvik. At Utqiaġvik, there has been an approximate tripling of the number of wind events conducive to coastline erosion from 1979 to 2014. We have also found a delay in freeze-up and earlier break-up, leading to a lengthened open water period for all of the communities examined.

  19. Impacts of a lengthening open water season on Alaskan coastal communities: deriving locally relevant indices from large-scale datasets and community observations

    Directory of Open Access Journals (Sweden)

    R. J. Rolph

    2018-05-01

    Full Text Available Using thresholds of physical climate variables developed from community observations, together with two large-scale datasets, we have produced local indices directly relevant to the impacts of a reduced sea ice cover on Alaska coastal communities. The indices include the number of false freeze-ups defined by transient exceedances of ice concentration prior to a corresponding exceedance that persists, false break-ups, timing of freeze-up and break-up, length of the open water duration, number of days when the winds preclude hunting via boat (wind speed threshold exceedances, the number of wind events conducive to geomorphological work or damage to infrastructure from ocean waves, and the number of these wind events with on- and along-shore components promoting water setup along the coastline. We demonstrate how community observations can inform use of large-scale datasets to derive these locally relevant indices. The two primary large-scale datasets are the Historical Sea Ice Atlas for Alaska and the atmospheric output from a regional climate model used to downscale the ERA-Interim atmospheric reanalysis. We illustrate the variability and trends of these indices by application to the rural Alaska communities of Kotzebue, Shishmaref, and Utqiaġvik (previously Barrow, although the same procedure and metrics can be applied to other coastal communities. Over the 1979–2014 time period, there has been a marked increase in the number of combined false freeze-ups and false break-ups as well as the number of days too windy for hunting via boat for all three communities, especially Utqiaġvik. At Utqiaġvik, there has been an approximate tripling of the number of wind events conducive to coastline erosion from 1979 to 2014. We have also found a delay in freeze-up and earlier break-up, leading to a lengthened open water period for all of the communities examined.

  20. Modification of input datasets for the Ensemble Streamflow Prediction based on large scale climatic indices and weather generator

    Czech Academy of Sciences Publication Activity Database

    Šípek, Václav; Daňhelka, J.

    2015-01-01

    Roč. 528, September (2015), s. 720-733 ISSN 0022-1694 Institutional support: RVO:67985874 Keywords : sea sonal forecasting * ESP * large-scale climate * weather generator Subject RIV: DA - Hydrology ; Limnology Impact factor: 3.043, year: 2015

  1. A frequency-domain implementation of a sliding-window traffic sign detector for large scale panoramic datasets

    NARCIS (Netherlands)

    Creusen, I.M.; Hazelhoff, L.; With, de P.H.N.

    2013-01-01

    In large-scale automatic traffic sign surveying systems, the primary computational effort is concentrated at the traffic sign detection stage. This paper focuses on reducing the computational load of particularly the sliding window object detection algorithm which is employed for traffic sign

  2. Retrospective analysis of cohort database: Phenotypic variability in a large dataset of patients confirmed to have homozygous familial hypercholesterolemia

    NARCIS (Netherlands)

    Raal, Frederick J.; Sjouke, Barbara; Hovingh, G. Kees; Isaac, Barton F.

    2016-01-01

    These data describe the phenotypic variability in a large cohort of patients confirmed to have homozygous familial hypercholesterolemia. Herein, we describe the observed relationship of treated low-density lipoprotein cholesterol with age. We also overlay the low-density lipoprotein receptor gene

  3. Modelling aggregation on the large scale and regularity on the small scale in spatial point pattern datasets

    DEFF Research Database (Denmark)

    Lavancier, Frédéric; Møller, Jesper

    We consider a dependent thinning of a regular point process with the aim of obtaining aggregation on the large scale and regularity on the small scale in the resulting target point process of retained points. Various parametric models for the underlying processes are suggested and the properties...

  4. Evaluating privacy-preserving record linkage using cryptographic long-term keys and multibit trees on large medical datasets.

    Science.gov (United States)

    Brown, Adrian P; Borgs, Christian; Randall, Sean M; Schnell, Rainer

    2017-06-08

    Integrating medical data using databases from different sources by record linkage is a powerful technique increasingly used in medical research. Under many jurisdictions, unique personal identifiers needed for linking the records are unavailable. Since sensitive attributes, such as names, have to be used instead, privacy regulations usually demand encrypting these identifiers. The corresponding set of techniques for privacy-preserving record linkage (PPRL) has received widespread attention. One recent method is based on Bloom filters. Due to superior resilience against cryptographic attacks, composite Bloom filters (cryptographic long-term keys, CLKs) are considered best practice for privacy in PPRL. Real-world performance of these techniques using large-scale data is unknown up to now. Using a large subset of Australian hospital admission data, we tested the performance of an innovative PPRL technique (CLKs using multibit trees) against a gold-standard derived from clear-text probabilistic record linkage. Linkage time and linkage quality (recall, precision and F-measure) were evaluated. Clear text probabilistic linkage resulted in marginally higher precision and recall than CLKs. PPRL required more computing time but 5 million records could still be de-duplicated within one day. However, the PPRL approach required fine tuning of parameters. We argue that increased privacy of PPRL comes with the price of small losses in precision and recall and a large increase in computational burden and setup time. These costs seem to be acceptable in most applied settings, but they have to be considered in the decision to apply PPRL. Further research on the optimal automatic choice of parameters is needed.

  5. An Accurate Method for Inferring Relatedness in Large Datasets of Unphased Genotypes via an Embedded Likelihood-Ratio Test

    KAUST Repository

    Rodriguez, Jesse M.

    2013-01-01

    Studies that map disease genes rely on accurate annotations that indicate whether individuals in the studied cohorts are related to each other or not. For example, in genome-wide association studies, the cohort members are assumed to be unrelated to one another. Investigators can correct for individuals in a cohort with previously-unknown shared familial descent by detecting genomic segments that are shared between them, which are considered to be identical by descent (IBD). Alternatively, elevated frequencies of IBD segments near a particular locus among affected individuals can be indicative of a disease-associated gene. As genotyping studies grow to use increasingly large sample sizes and meta-analyses begin to include many data sets, accurate and efficient detection of hidden relatedness becomes a challenge. To enable disease-mapping studies of increasingly large cohorts, a fast and accurate method to detect IBD segments is required. We present PARENTE, a novel method for detecting related pairs of individuals and shared haplotypic segments within these pairs. PARENTE is a computationally-efficient method based on an embedded likelihood ratio test. As demonstrated by the results of our simulations, our method exhibits better accuracy than the current state of the art, and can be used for the analysis of large genotyped cohorts. PARENTE\\'s higher accuracy becomes even more significant in more challenging scenarios, such as detecting shorter IBD segments or when an extremely low false-positive rate is required. PARENTE is publicly and freely available at http://parente.stanford.edu/. © 2013 Springer-Verlag.

  6. Retrospective analysis of cohort database: Phenotypic variability in a large dataset of patients confirmed to have homozygous familial hypercholesterolemia

    Directory of Open Access Journals (Sweden)

    Frederick J. Raal

    2016-06-01

    Full Text Available These data describe the phenotypic variability in a large cohort of patients confirmed to have homozygous familial hypercholesterolemia. Herein, we describe the observed relationship of treated low-density lipoprotein cholesterol with age. We also overlay the low-density lipoprotein receptor gene (LDLR functional status with these phenotypic data. A full description of these data is available in our recent study published in Atherosclerosis, “Phenotype Diversity Among Patients With Homozygous Familial Hypercholesterolemia: A Cohort Study” (Raal et al., 2016 [1].

  7. The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

    Science.gov (United States)

    Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

    2012-03-15

    Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.

  8. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  9. Towards large-scale mapping of urban three-dimensional structure using Landsat imagery and global elevation datasets

    Science.gov (United States)

    Wang, P.; Huang, C.

    2017-12-01

    The three-dimensional (3D) structure of buildings and infrastructures is fundamental to understanding and modelling of the impacts and challenges of urbanization in terms of energy use, carbon emissions, and earthquake vulnerabilities. However, spatially detailed maps of urban 3D structure have been scarce, particularly in fast-changing developing countries. We present here a novel methodology to map the volume of buildings and infrastructures at 30 meter resolution using a synergy of Landsat imagery and openly available global digital surface models (DSMs), including the Shuttle Radar Topography Mission (SRTM), ASTER Global Digital Elevation Map (GDEM), ALOS World 3D - 30m (AW3D30), and the recently released global DSM from the TanDEM-X mission. Our method builds on the concept of object-based height profile to extract height metrics from the DSMs and use a machine learning algorithm to predict height and volume from the height metrics. We have tested this algorithm in the entire England and assessed our result using Lidar measurements in 25 England cities. Our initial assessments achieved a RMSE of 1.4 m (R2 = 0.72) for building height and a RMSE of 1208.7 m3 (R2 = 0.69) for building volume, demonstrating the potential of large-scale applications and fully automated mapping of urban structure.

  10. Lunar Mapping and Modeling On-the-Go: A mobile framework for viewing and interacting with large geospatial datasets

    Science.gov (United States)

    Chang, G.; Kim, R.; Bui, B.; Sadaqathullah, S.; Law, E.; Malhotra, S.

    2012-12-01

    bookmark those layers for quick access in subsequent sessions. A search tool is also provided to allow users to quickly find points of interests on the moon and to view the auxiliary data associated with that feature. More advanced features include the ability to interact with the data. Using the services provided by the portal, users will be able to log in and access the same scientific analysis tools provided on the web site including measuring between two points, generating subsets, and running other analysis tools, all by using a customized touch interface that are immediately familiar to users of these smart mobile devices. Users can also access their own storage on the portal and view or send the data to other users. Finally, there are features that will utilize functionality that can only be enabled by mobile devices. This includes the use of the gyroscopes and motion sensors to provide a haptic interface visualize lunar data in 3D, on the device as well as potentially on a large screen. The mobile framework that we have developed for LMMP provides a glimpse of what is possible in visualizing and manipulating large geospatial data on small portable devices. While the framework is currently tuned to our portal, we hope that we can generalize the tool to use data sources from any type of GIS services.

  11. Automatic reduction of large X-ray fluorescence data-sets applied to XAS and mapping experiments

    International Nuclear Information System (INIS)

    Martin Montoya, Ligia Andrea

    2017-02-01

    In this thesis two automatic methods for the reduction of large fluorescence data sets are presented. The first method is proposed in the framework of BioXAS experiments. The challenge of this experiment is to deal with samples in ultra dilute concentrations where the signal-to-background ratio is low. The experiment is performed in fluorescence mode X-ray absorption spectroscopy with a 100 pixel high-purity Ge detector. The first step consists on reducing 100 fluorescence spectra into one. In this step, outliers are identified by means of the shot noise. Furthermore, a fitting routine which model includes Gaussian functions for the fluorescence lines and exponentially modified Gaussian (EMG) functions for the scattering lines (with long tails at lower energies) is proposed to extract the line of interest from the fluorescence spectrum. Additionally, the fitting model has an EMG function for each scattering line (elastic and inelastic) at incident energies where they start to be discerned. At these energies, the data reduction is done per detector column to include the angular dependence of scattering. In the second part of this thesis, an automatic method for texts separation on palimpsests is presented. Scanning X-ray fluorescence is performed on the parchment, where a spectrum per scanned point is collected. Within this method, each spectrum is treated as a vector forming a basis which is to be transformed so that the basis vectors are the spectra of each ink. Principal Component Analysis is employed as an initial guess of the seek basis. This basis is further transformed by means of an optimization routine that maximizes the contrast and minimizes the non-negative entries in the spectra. The method is tested on original and self made palimpsests.

  12. Extracting Prior Distributions from a Large Dataset of In-Situ Measurements to Support SWOT-based Estimation of River Discharge

    Science.gov (United States)

    Hagemann, M.; Gleason, C. J.

    2017-12-01

    The upcoming (2021) Surface Water and Ocean Topography (SWOT) NASA satellite mission aims, in part, to estimate discharge on major rivers worldwide using reach-scale measurements of stream width, slope, and height. Current formalizations of channel and floodplain hydraulics are insufficient to fully constrain this problem mathematically, resulting in an infinitely large solution set for any set of satellite observations. Recent work has reformulated this problem in a Bayesian statistical setting, in which the likelihood distributions derive directly from hydraulic flow-law equations. When coupled with prior distributions on unknown flow-law parameters, this formulation probabilistically constrains the parameter space, and results in a computationally tractable description of discharge. Using a curated dataset of over 200,000 in-situ acoustic Doppler current profiler (ADCP) discharge measurements from over 10,000 USGS gaging stations throughout the United States, we developed empirical prior distributions for flow-law parameters that are not observable by SWOT, but that are required in order to estimate discharge. This analysis quantified prior uncertainties on quantities including cross-sectional area, at-a-station hydraulic geometry width exponent, and discharge variability, that are dependent on SWOT-observable variables including reach-scale statistics of width and height. When compared against discharge estimation approaches that do not use this prior information, the Bayesian approach using ADCP-derived priors demonstrated consistently improved performance across a range of performance metrics. This Bayesian approach formally transfers information from in-situ gaging stations to remote-sensed estimation of discharge, in which the desired quantities are not directly observable. Further investigation using large in-situ datasets is therefore a promising way forward in improving satellite-based estimates of river discharge.

  13. Contribution of Road Grade to the Energy Use of Modern Automobiles Across Large Datasets of Real-World Drive Cycles: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Wood, E.; Burton, E.; Duran, A.; Gonder, J.

    2014-01-01

    Understanding the real-world power demand of modern automobiles is of critical importance to engineers using modeling and simulation to inform the intelligent design of increasingly efficient powertrains. Increased use of global positioning system (GPS) devices has made large scale data collection of vehicle speed (and associated power demand) a reality. While the availability of real-world GPS data has improved the industry's understanding of in-use vehicle power demand, relatively little attention has been paid to the incremental power requirements imposed by road grade. This analysis quantifies the incremental efficiency impacts of real-world road grade by appending high fidelity elevation profiles to GPS speed traces and performing a large simulation study. Employing a large real-world dataset from the National Renewable Energy Laboratory's Transportation Secure Data Center, vehicle powertrain simulations are performed with and without road grade under five vehicle models. Aggregate results of this study suggest that road grade could be responsible for 1% to 3% of fuel use in light-duty automobiles.

  14. Building and calibrating a large-extent and high resolution coupled groundwater-land surface model using globally available data-sets

    Science.gov (United States)

    Sutanudjaja, E. H.; Van Beek, L. P.; de Jong, S. M.; van Geer, F.; Bierkens, M. F.

    2012-12-01

    The current generation of large-scale hydrological models generally lacks a groundwater model component simulating lateral groundwater flow. Large-scale groundwater models are rare due to a lack of hydro-geological data required for their parameterization and a lack of groundwater head data required for their calibration. In this study, we propose an approach to develop a large-extent fully-coupled land surface-groundwater model by using globally available datasets and calibrate it using a combination of discharge observations and remotely-sensed soil moisture data. The underlying objective is to devise a collection of methods that enables one to build and parameterize large-scale groundwater models in data-poor regions. The model used, PCR-GLOBWB-MOD, has a spatial resolution of 1 km x 1 km and operates on a daily basis. It consists of a single-layer MODFLOW groundwater model that is dynamically coupled to the PCR-GLOBWB land surface model. This fully-coupled model accommodates two-way interactions between surface water levels and groundwater head dynamics, as well as between upper soil moisture states and groundwater levels, including a capillary rise mechanism to sustain upper soil storage and thus to fulfill high evaporation demands (during dry conditions). As a test bed, we used the Rhine-Meuse basin, where more than 4000 groundwater head time series have been collected for validation purposes. The model was parameterized using globally available data-sets on surface elevation, drainage direction, land-cover, soil and lithology. Next, the model was calibrated using a brute force approach and massive parallel computing, i.e. by running the coupled groundwater-land surface model for more than 3000 different parameter sets. Here, we varied minimal soil moisture storage and saturated conductivities of the soil layers as well as aquifer transmissivities. Using different regularization strategies and calibration criteria we compared three calibration scenarios

  15. The development of the Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS): a large-scale data sharing initiative.

    Science.gov (United States)

    Lutomski, Jennifer E; Baars, Maria A E; Schalk, Bianca W M; Boter, Han; Buurman, Bianca M; den Elzen, Wendy P J; Jansen, Aaltje P D; Kempen, Gertrudis I J M; Steunenberg, Bas; Steyerberg, Ewout W; Olde Rikkert, Marcel G M; Melis, René J F

    2013-01-01

    In 2008, the Ministry of Health, Welfare and Sport commissioned the National Care for the Elderly Programme. While numerous research projects in older persons' health care were to be conducted under this national agenda, the Programme further advocated the development of The Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS) which would be integrated into all funded research protocols. In this context, we describe TOPICS data sharing initiative (www.topics-mds.eu). A working group drafted TOPICS-MDS prototype, which was subsequently approved by a multidisciplinary panel. Using instruments validated for older populations, information was collected on demographics, morbidity, quality of life, functional limitations, mental health, social functioning and health service utilisation. For informal caregivers, information was collected on demographics, hours of informal care and quality of life (including subjective care-related burden). Between 2010 and 2013, a total of 41 research projects contributed data to TOPICS-MDS, resulting in preliminary data available for 32,310 older persons and 3,940 informal caregivers. The majority of studies sampled were from primary care settings and inclusion criteria differed across studies. TOPICS-MDS is a public data repository which contains essential data to better understand health challenges experienced by older persons and informal caregivers. Such findings are relevant for countries where increasing health-related expenditure has necessitated the evaluation of contemporary health care delivery. Although open sharing of data can be difficult to achieve in practice, proactively addressing issues of data protection, conflicting data analysis requests and funding limitations during TOPICS-MDS developmental phase has fostered a data sharing culture. To date, TOPICS-MDS has been successfully incorporated into 41 research projects, thus supporting the feasibility of constructing a large (>30,000 observations

  16. Assessment of the effects and limitations of the 1998 to 2008 Abbreviated Injury Scale map using a large population-based dataset

    Directory of Open Access Journals (Sweden)

    Franklyn Melanie

    2011-01-01

    Full Text Available Abstract Background Trauma systems should consistently monitor a given trauma population over a period of time. The Abbreviated Injury Scale (AIS and derived scores such as the Injury Severity Score (ISS are commonly used to quantify injury severities in trauma registries. To reflect contemporary trauma management and treatment, the most recent version of the AIS (AIS08 contains many codes which differ in severity from their equivalents in the earlier 1998 version (AIS98. Consequently, the adoption of AIS08 may impede comparisons between data coded using different AIS versions. It may also affect the number of patients classified as major trauma. Methods The entire AIS98-coded injury dataset of a large population based trauma registry was retrieved and mapped to AIS08 using the currently available AIS98-AIS08 dictionary map. The percentage of codes which had increased or decreased in severity, or could not be mapped, was examined in conjunction with the effect of these changes to the calculated ISS. The potential for free text information accompanying AIS coding to improve the quality of AIS mapping was explored. Results A total of 128280 AIS98-coded injuries were evaluated in 32134 patients, 15471 patients of whom were classified as major trauma. Although only 4.5% of dictionary codes decreased in severity from AIS98 to AIS08, this represented almost 13% of injuries in the registry. In 4.9% of patients, no injuries could be mapped. ISS was potentially unreliable in one-third of patients, as they had at least one AIS98 code which could not be mapped. Using AIS08, the number of patients classified as major trauma decreased by between 17.3% and 30.3%. Evaluation of free text descriptions for some injuries demonstrated the potential to improve mapping between AIS versions. Conclusions Converting AIS98-coded data to AIS08 results in a significant decrease in the number of patients classified as major trauma. Many AIS98 codes are missing from the

  17. 'Dip-sticks' calibration handles self-attenuation and coincidence effects in large-volume gamma-ray spectrometry

    CERN Document Server

    Wolterbeek, H T

    2000-01-01

    Routine gamma-spectrometric analyses of samples with low-level activities (e.g. food, water, environmental and industrial samples) are often performed in large samples, placed close to the detector. In these geometries, detection sensitivity is improved but large errors are introduced due to self-attenuation and coincidence summing. Current approaches to these problems comprise computational methods and spiked standard materials. However, the first are often regarded as too complex for practical routine use, the latter never fully match real samples. In the present study, we introduce a dip-sticks calibration as a fast and easy practical solution to this quantification problem in a routine analytical setting. In the proposed set-up, calibrations are performed within the sample itself, thus making it a broadly accessible matching-reference approach, which is principally usable for all sample matrices.

  18. Insights into social disparities in smoking prevalence using Mosaic, a novel measure of socioeconomic status: an analysis using a large primary care dataset

    Directory of Open Access Journals (Sweden)

    Szatkowski Lisa

    2010-12-01

    Full Text Available Abstract Background There are well-established socio-economic differences in the prevalence of smoking in the UK, but conventional socio-economic measures may not capture the range and degree of these associations. We have used a commercial geodemographic profiling system, Mosaic, to explore associations with smoking prevalence in a large primary care dataset and to establish whether this tool provides new insights into socio-economic determinants of smoking. Methods We analysed anonymised data on over 2 million patients from The Health Improvement Network (THIN database, linked via patients' postcodes to Mosaic classifications (11 groups and 61 types and quintiles of Townsend Index of Multiple Deprivation. Patients' current smoking status was identified using Read Codes, and logistic regression was used to explore the associations between the available measures of socioeconomic status and smoking prevalence. Results As anticipated, smoking prevalence increased with increasing deprivation according to the Townsend Index (age and sex adjusted OR for highest vs lowest quintile 2.96, 95% CI 2.92-2.99. There were more marked differences in prevalence across Mosaic groups (OR for group G vs group A 4.41, 95% CI 4.33-4.49. Across the 61 Mosaic types, smoking prevalence varied from 8.6% to 42.7%. Mosaic types with high smoking prevalence were characterised by relative deprivation, but also more specifically by single-parent households living in public rented accommodation in areas with little community support, having no access to a car, few qualifications and high TV viewing behaviour. Conclusion Conventional socio-economic measures may underplay social disparities in smoking prevalence. Newer classification systems, such as Mosaic, encompass a wider range of demographic, lifestyle and behaviour data, and are valuable in identifying characteristics of groups of heavy smokers which might be used to tailor cessation interventions.

  19. Geostatistical and multivariate modelling for large scale quantitative mapping of seafloor sediments using sparse datasets, a case study from the Cleaverbank area (the Netherlands)

    NARCIS (Netherlands)

    Alevizos, Evangelos; Siemes, K.; Janmaat, J.; Snellen, M.; Simons, D.G.; Greinert, J

    2016-01-01

    Quantitative mapping of seafloor sediment properties (eg. grain size) requires the input of comprehensive Multi-Beam Echo Sounder (MBES) datasets along with adequate ground truth for establishing a functional relation between them. MBES surveys in extensive shallow shelf areas can be a rather

  20. Passive technologies for future large-scale photonic integrated circuits on silicon: polarization handling, light non-reciprocity and loss reduction

    Directory of Open Access Journals (Sweden)

    Daoxin Dai

    2012-03-01

    Full Text Available Silicon-based large-scale photonic integrated circuits are becoming important, due to the need for higher complexity and lower cost for optical transmitters, receivers and optical buffers. In this paper, passive technologies for large-scale photonic integrated circuits are described, including polarization handling, light non-reciprocity and loss reduction. The design rule for polarization beam splitters based on asymmetrical directional couplers is summarized and several novel designs for ultra-short polarization beam splitters are reviewed. A novel concept for realizing a polarization splitter–rotator is presented with a very simple fabrication process. Realization of silicon-based light non-reciprocity devices (e.g., optical isolator, which is very important for transmitters to avoid sensitivity to reflections, is also demonstrated with the help of magneto-optical material by the bonding technology. Low-loss waveguides are another important technology for large-scale photonic integrated circuits. Ultra-low loss optical waveguides are achieved by designing a Si3N4 core with a very high aspect ratio. The loss is reduced further to <0.1 dB m−1 with an improved fabrication process incorporating a high-quality thermal oxide upper cladding by means of wafer bonding. With the developed ultra-low loss Si3N4 optical waveguides, some devices are also demonstrated, including ultra-high-Q ring resonators, low-loss arrayed-waveguide grating (demultiplexers, and high-extinction-ratio polarizers.

  1. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  2. Use of a Recursive-Rule eXtraction algorithm with J48graft to achieve highly accurate and concise rule extraction from a large breast cancer dataset

    Directory of Open Access Journals (Sweden)

    Yoichi Hayashi

    Full Text Available To assist physicians in the diagnosis of breast cancer and thereby improve survival, a highly accurate computer-aided diagnostic system is necessary. Although various machine learning and data mining approaches have been devised to increase diagnostic accuracy, most current methods are inadequate. The recently developed Recursive-Rule eXtraction (Re-RX algorithm provides a hierarchical, recursive consideration of discrete variables prior to analysis of continuous data, and can generate classification rules that have been trained on the basis of both discrete and continuous attributes. The objective of this study was to extract highly accurate, concise, and interpretable classification rules for diagnosis using the Re-RX algorithm with J48graft, a class for generating a grafted C4.5 decision tree. We used the Wisconsin Breast Cancer Dataset (WBCD. Nine research groups provided 10 kinds of highly accurate concrete classification rules for the WBCD. We compared the accuracy and characteristics of the rule set for the WBCD generated using the Re-RX algorithm with J48graft with five rule sets obtained using 10-fold cross validation (CV. We trained the WBCD using the Re-RX algorithm with J48graft and the average classification accuracies of 10 runs of 10-fold CV for the training and test datasets, the number of extracted rules, and the average number of antecedents for the WBCD. Compared with other rule extraction algorithms, the Re-RX algorithm with J48graft resulted in a lower average number of rules for diagnosing breast cancer, which is a substantial advantage. It also provided the lowest average number of antecedents per rule. These features are expected to greatly aid physicians in making accurate and concise diagnoses for patients with breast cancer. Keywords: Breast cancer diagnosis, Rule extraction, Re-RX algorithm, J48graft, C4.5

  3. Remote handling at LAMPF

    International Nuclear Information System (INIS)

    Grisham, D.L.; Lambert, J.E.

    1983-01-01

    Experimental area A at the Clinton P. Anderson Meson Physics Facility (LAMPF) encompasses a large area. Presently there are four experimental target cells along the main proton beam line that have become highly radioactive, thus dictating that all maintenance be performed remotely. The Monitor remote handling system was developed to perform in situ maintenance at any location within area A. Due to the complexity of experimental systems and confined space, conventional remote handling methods based upon hot cell and/or hot bay concepts are not workable. Contrary to conventional remote handling which require special tooling for each specifically planned operation, the Monitor concept is aimed at providing a totally flexible system capable of remotely performing general mechanical and electrical maintenance operations using standard tools. The Monitor system is described

  4. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  5. The New world of ';Big Data' Analytics and High Performance Data: A Paradigm shift in the way we interact with very large Earth Observation datasets (Invited)

    Science.gov (United States)

    Purss, M. B.; Lewis, A.; Ip, A.; Evans, B.

    2013-12-01

    The next decade promises an exponential increase in volumes of open data from Earth observing satellites. The ESA Sentinels, the Japan Meteorological Agency's Himawari 8/9 geostationary satellites, various NASA missions, and of course the many EO satellites planned from China, will produce petabyte scale datasets of national and global significance. It is vital that we develop new ways of managing, accessing and using this ';big-data' from satellites, to produce value added information within realistic timeframes. A paradigm shift is required away from traditional ';scene based' (and labour intensive) approaches with data storage and delivery for processing at local sites, to emerging High Performance Data (HPD) models where the data are organised and co-located with High Performance Computational (HPC) infrastructures in a way that enables users to bring themselves, their algorithms and the HPC processing power to the data. Automated workflows, that allow the entire archive of data to be rapidly reprocessed from raw data to fully calibrated products, are a crucial requirement for the effective stewardship of these datasets. New concepts such as arranging and viewing data as ';data objects' which underpin the delivery of ';information as a service' are also integral to realising the transition into HPD analytics. As Australia's national remote sensing and geoscience agency, Geoscience Australia faces a pressing need to solve the problems of ';big-data', in particular around the 25-year archive of calibrated Landsat data. The challenge is to ensure standardised information can be extracted from the entire archive and applied to nationally significant problems in hazards, water management, land management, resource development and the environment. Ultimately, these uses justify government investment in these unique systems. A key challenge was how best to organise the archive of calibrated Landsat data (estimated to grow to almost 1 PB by the end of 2014) in a way

  6. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  7. SIZE SELECTION IN DIVING TUFTED DUCKS AYTHYA-FULIGULA EXPLAINED BY DIFFERENTIAL HANDLING OF SMALL AND LARGE MUSSELS DREISSENA-POLYMORPHA

    NARCIS (Netherlands)

    DELEEUW, JJ; VANEERDEN, MR

    1992-01-01

    We studied prey size selection of Tufted Ducks feeding on fresh-water mussels under semi-natural conditions. In experiments with non-diving birds, we found that Tufted Ducks use two techniques to handle mussels. Mussels less than 16 mm in length are strained from a waterflow generated in the bill

  8. Test sample handling apparatus

    International Nuclear Information System (INIS)

    1981-01-01

    A test sample handling apparatus using automatic scintillation counting for gamma detection, for use in such fields as radioimmunoassay, is described. The apparatus automatically and continuously counts large numbers of samples rapidly and efficiently by the simultaneous counting of two samples. By means of sequential ordering of non-sequential counting data, it is possible to obtain precisely ordered data while utilizing sample carrier holders having a minimum length. (U.K.)

  9. A coastal seawater temperature dataset for biogeographical studies: large biases between in situ and remotely-sensed data sets around the Coast of South Africa.

    Directory of Open Access Journals (Sweden)

    Albertus J Smit

    Full Text Available Gridded SST products developed particularly for offshore regions are increasingly being applied close to the coast for biogeographical applications. The purpose of this paper is to demonstrate the dangers of doing so through a comparison of reprocessed MODIS Terra and Pathfinder v5.2 SSTs, both at 4 km resolution, with instrumental in situ temperatures taken within 400 m from the coast. We report large biases of up to +6°C in places between satellite-derived and in situ climatological temperatures for 87 sites spanning the entire ca. 2 700 km of the South African coastline. Although biases are predominantly warm (i.e. the satellite SSTs being higher, smaller or even cold biases also appear in places, especially along the southern and western coasts of the country. We also demonstrate the presence of gradients in temperature biases along shore-normal transects - generally SSTs extracted close to the shore demonstrate a smaller bias with respect to the in situ temperatures. Contributing towards the magnitude of the biases are factors such as SST data source, proximity to the shore, the presence/absence of upwelling cells or coastal embayments. Despite the generally large biases, from a biogeographical perspective, species distribution retains a correlative relationship with underlying spatial patterns in SST, but in order to arrive at a causal understanding of the determinants of biogeographical patterns we suggest that in shallow, inshore marine habitats, temperature is best measured directly.

  10. A Coastal Seawater Temperature Dataset for Biogeographical Studies: Large Biases between In Situ and Remotely-Sensed Data Sets around the Coast of South Africa

    Science.gov (United States)

    Smit, Albertus J.; Roberts, Michael; Anderson, Robert J.; Dufois, Francois; Dudley, Sheldon F. J.; Bornman, Thomas G.; Olbers, Jennifer; Bolton, John J.

    2013-01-01

    Gridded SST products developed particularly for offshore regions are increasingly being applied close to the coast for biogeographical applications. The purpose of this paper is to demonstrate the dangers of doing so through a comparison of reprocessed MODIS Terra and Pathfinder v5.2 SSTs, both at 4 km resolution, with instrumental in situ temperatures taken within 400 m from the coast. We report large biases of up to +6°C in places between satellite-derived and in situ climatological temperatures for 87 sites spanning the entire ca. 2 700 km of the South African coastline. Although biases are predominantly warm (i.e. the satellite SSTs being higher), smaller or even cold biases also appear in places, especially along the southern and western coasts of the country. We also demonstrate the presence of gradients in temperature biases along shore-normal transects — generally SSTs extracted close to the shore demonstrate a smaller bias with respect to the in situ temperatures. Contributing towards the magnitude of the biases are factors such as SST data source, proximity to the shore, the presence/absence of upwelling cells or coastal embayments. Despite the generally large biases, from a biogeographical perspective, species distribution retains a correlative relationship with underlying spatial patterns in SST, but in order to arrive at a causal understanding of the determinants of biogeographical patterns we suggest that in shallow, inshore marine habitats, temperature is best measured directly. PMID:24312609

  11. How Retailers Handle Complaint Management

    DEFF Research Database (Denmark)

    Hansen, Torben; Wilke, Ricky; Zaichkowsky, Judy

    2009-01-01

    This article fills a gap in the literature by providing insight about the handling of complaint management (CM) across a large cross section of retailers in the grocery, furniture, electronic and auto sectors. Determinants of retailers’ CM handling are investigated and insight is gained as to the......This article fills a gap in the literature by providing insight about the handling of complaint management (CM) across a large cross section of retailers in the grocery, furniture, electronic and auto sectors. Determinants of retailers’ CM handling are investigated and insight is gained...... as to the links between CM and redress of consumers’ complaints. The results suggest that retailers who attach large negative consequences to consumer dissatisfaction are more likely than other retailers to develop a positive strategic view on customer complaining, but at the same time an increase in perceived...

  12. Comparing vector-based and Bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on Twitter and Stack Overflow.

    Science.gov (United States)

    Stanley, Clayton; Byrne, Michael D

    2016-12-01

    The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  13. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Science.gov (United States)

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  14. Sirenomelia: an epidemiologic study in a large dataset from the International Clearinghouse of Birth Defects Surveillance and Research, and literature review.

    Science.gov (United States)

    Orioli, Iêda M; Amar, Emmanuelle; Arteaga-Vazquez, Jazmin; Bakker, Marian K; Bianca, Sebastiano; Botto, Lorenzo D; Clementi, Maurizio; Correa, Adolfo; Csaky-Szunyogh, Melinda; Leoncini, Emanuele; Li, Zhu; López-Camelo, Jorge S; Lowry, R Brian; Marengo, Lisa; Martínez-Frías, María-Luisa; Mastroiacovo, Pierpaolo; Morgan, Margery; Pierini, Anna; Ritvanen, Annukka; Scarano, Gioacchino; Szabova, Elena; Castilla, Eduardo E

    2011-11-15

    Sirenomelia is a very rare limb anomaly in which the normally paired lower limbs are replaced by a single midline limb. This study describes the prevalence, associated malformations, and maternal characteristics among cases with sirenomelia. Data originated from 19 birth defect surveillance system members of the International Clearinghouse for Birth Defects Surveillance and Research, and were reported according to a single pre-established protocol. Cases were clinically evaluated locally and reviewed centrally. A total of 249 cases with sirenomelia were identified among 25,290,172 births, for a prevalence of 0.98 per 100,000, with higher prevalence in the Mexican registry. An increase of sirenomelia prevalence with maternal age less than 20 years was statistically significant. The proportion of twinning was 9%, higher than the 1% expected. Sex was ambiguous in 47% of cases, and no different from expectation in the rest. The proportion of cases born alive, premature, and weighting less than 2,500 g were 47%, 71.2%, and 88.2%, respectively. Half of the cases with sirenomelia also presented with genital, large bowel, and urinary defects. About 10-15% of the cases had lower spinal column defects, single or anomalous umbilical artery, upper limb, cardiac, and central nervous system defects. There was a greater than expected association of sirenomelia with other very rare defects such as bladder exstrophy, cyclopia/holoprosencephaly, and acardia-acephalus. The application of the new biological network analysis approach, including molecular results, to these associated very rare diseases is suggested for future studies. Copyright © 2011 Wiley Periodicals, Inc.

  15. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Science.gov (United States)

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  16. Estruturação topológica de grandes bases de dados de bacias hidrográficas Topologically-structureo large datasets for watersheds

    Directory of Open Access Journals (Sweden)

    Carlos Antonio Alvares Soares Ribeiro

    2008-08-01

    Full Text Available O presente trabalho teve como objetivo avaliar o método para delimitação da bacia de contribuição à montante de um ponto selecionado sobre o hidrografia e a obtenção das respectivas características morfométricas, a partir de bases de dados estruturadas topologicamente. Para tanto, utilizou-se o aplicativo Hidrodata 2.0, desenvolvido para o ArcINFO workstation, comparando os seus resultados com os do processo convencional. Os resultados comprovaram que o tempo de processamento demandado para delimitação de bacias e extração de suas características morfométricas a partir de uma base de dados estruturada topologicamente se manteve baixo e constante. Concluiu-se que o método apresentado poderá ser aplicado em qualquer bacia hidrográfica, independentemente do seu tamanho, mesmo com o uso de computadores de configuração mais modesta.The present work aims to present and to evaluate a method for topologically structuring large databases, implemented as a set of AML routines for ArcINFO workstation named Hidrodata. The results proved that the processing time for delineating drainage areas and extracting their morphometric characteristics was kept low and constant. The use of a topologically structured database resulted in a lower demand of processing capacity. It was concluded that the presented approach can be applied for any watershed, independently of its size, even with the use of less-sophisticated computers.

  17. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  18. Sophisticated fuel handling system evolved

    International Nuclear Information System (INIS)

    Ross, D.A.

    1988-01-01

    The control systems at Sellafield fuel handling plant are described. The requirements called for built-in diagnostic features as well as the ability to handle a large sequencing application. Speed was also important; responses better than 50ms were required. The control systems are used to automate operations within each of the three main process caves - two Magnox fuel decanners and an advanced gas-cooled reactor fuel dismantler. The fuel route within the fuel handling plant is illustrated and described. ASPIC (Automated Sequence Package for Industrial Control) which was developed as a controller for the plant processes is described. (U.K.)

  19. Ergonomic material-handling device

    Science.gov (United States)

    Barsnick, Lance E.; Zalk, David M.; Perry, Catherine M.; Biggs, Terry; Tageson, Robert E.

    2004-08-24

    A hand-held ergonomic material-handling device capable of moving heavy objects, such as large waste containers and other large objects requiring mechanical assistance. The ergonomic material-handling device can be used with neutral postures of the back, shoulders, wrists and knees, thereby reducing potential injury to the user. The device involves two key features: 1) gives the user the ability to adjust the height of the handles of the device to ergonomically fit the needs of the user's back, wrists and shoulders; and 2) has a rounded handlebar shape, as well as the size and configuration of the handles which keep the user's wrists in a neutral posture during manipulation of the device.

  20. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  1. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  2. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  3. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  4. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  5. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  6. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  7. Preliminary handling studies in large size fast piles; Etudes preliminaires de manutention dans les reacteurs a neutrons rapides de grande taille

    Energy Technology Data Exchange (ETDEWEB)

    Leduc, J; Marmonier, P [Association Euratom-CEA Cadarache (France). Centre d' Etudes Nucleaires

    1964-07-01

    This report examines the various fuel handling systems which presently seem feasible for a fast power reactor. It tries to point out the advantages and / or the the disadvantages and the fabrication problems for each solution involved and makes, a tentative to evaluate the time required for a fuel loading and / or unloading operation. One has investigated the influence of the maximum allowable irradiation, the number of of shut-downs, the power distribution shape within the core on the storage capacity needed, the load factor expected and the average irradiation obtained. (authors) [French] On a examine dans ce rapport les differents systemes de manutention, qui semblent actuellement realisables pour un reacteur a neutrons rapides de puissance, en essayant de faire ressortir les avantages, les inconvenients et les difficultes de realisation de chaque systeme, et de chiffer les temps de manutention auxquels ils conduisent. On a aussi regarde l'influence des variations du taux d'irradiation maximal,de la cadence des arrets ou de la forme du flux dans le coeur du reacteur, sur la capacite du stockage, le taux de disponibilite et le taux d'irradiation moyen. (auteurs)

  8. Preliminary handling studies in large size fast piles; Etudes preliminaires de manutention dans les reacteurs a neutrons rapides de grande taille

    Energy Technology Data Exchange (ETDEWEB)

    Leduc, J.; Marmonier, P. [Association Euratom-CEA Cadarache (France). Centre d' Etudes Nucleaires

    1964-07-01

    This report examines the various fuel handling systems which presently seem feasible for a fast power reactor. It tries to point out the advantages and / or the the disadvantages and the fabrication problems for each solution involved and makes, a tentative to evaluate the time required for a fuel loading and / or unloading operation. One has investigated the influence of the maximum allowable irradiation, the number of of shut-downs, the power distribution shape within the core on the storage capacity needed, the load factor expected and the average irradiation obtained. (authors) [French] On a examine dans ce rapport les differents systemes de manutention, qui semblent actuellement realisables pour un reacteur a neutrons rapides de puissance, en essayant de faire ressortir les avantages, les inconvenients et les difficultes de realisation de chaque systeme, et de chiffer les temps de manutention auxquels ils conduisent. On a aussi regarde l'influence des variations du taux d'irradiation maximal,de la cadence des arrets ou de la forme du flux dans le coeur du reacteur, sur la capacite du stockage, le taux de disponibilite et le taux d'irradiation moyen. (auteurs)

  9. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  10. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  11. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  12. Updates to FuncLab, a Matlab based GUI for handling receiver functions

    Science.gov (United States)

    Porritt, Robert W.; Miller, Meghan S.

    2018-02-01

    Receiver functions are a versatile tool commonly used in seismic imaging. Depending on how they are processed, they can be used to image discontinuity structure within the crust or mantle or they can be inverted for seismic velocity either directly or jointly with complementary datasets. However, modern studies generally require large datasets which can be challenging to handle; therefore, FuncLab was originally written as an interactive Matlab GUI to assist in handling these large datasets. This software uses a project database to allow interactive trace editing, data visualization, H-κ stacking for crustal thickness and Vp/Vs ratio, and common conversion point stacking while minimizing computational costs. Since its initial release, significant advances have been made in the implementation of web services and changes in the underlying Matlab platform have necessitated a significant revision to the software. Here, we present revisions to the software, including new features such as data downloading via irisFetch.m, receiver function calculations via processRFmatlab, on-the-fly cross-section tools, interface picking, and more. In the descriptions of the tools, we present its application to a test dataset in Michigan, Wisconsin, and neighboring areas following the passage of USArray Transportable Array. The software is made available online at https://robporritt.wordpress.com/software.

  13. An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

    Science.gov (United States)

    Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

    2015-09-01

    Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.

  14. MFTF exception handling system

    International Nuclear Information System (INIS)

    Nowell, D.M.; Bridgeman, G.D.

    1979-01-01

    In the design of large experimental control systems, a major concern is ensuring that operators are quickly alerted to emergency or other exceptional conditions and that they are provided with sufficient information to respond adequately. This paper describes how the MFTF exception handling system satisfies these requirements. Conceptually exceptions are divided into one of two classes. Those which affect command status by producing an abort or suspend condition and those which fall into a softer notification category of report only or operator acknowledgement requirement. Additionally, an operator may choose to accept an exception condition as operational, or turn off monitoring for sensors determined to be malfunctioning. Control panels and displays used in operator response to exceptions are described

  15. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    International Nuclear Information System (INIS)

    Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

    2010-01-01

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  16. A Novel Strategy for Very-Large-Scale Cash-Crop Mapping in the Context of Weather-Related Risk Assessment, Combining Global Satellite Multispectral Datasets, Environmental Constraints, and In Situ Acquisition of Geospatial Data.

    Science.gov (United States)

    Dell'Acqua, Fabio; Iannelli, Gianni Cristian; Torres, Marco A; Martina, Mario L V

    2018-02-14

    Cash crops are agricultural crops intended to be sold for profit as opposed to subsistence crops, meant to support the producer, or to support livestock. Since cash crops are intended for future sale, they translate into large financial value when considered on a wide geographical scale, so their production directly involves financial risk. At a national level, extreme weather events including destructive rain or hail, as well as drought, can have a significant impact on the overall economic balance. It is thus important to map such crops in order to set up insurance and mitigation strategies. Using locally generated data-such as municipality-level records of crop seeding-for mapping purposes implies facing a series of issues like data availability, quality, homogeneity, etc. We thus opted for a different approach relying on global datasets. Global datasets ensure homogeneity and availability of data, although sometimes at the expense of precision and accuracy. A typical global approach makes use of spaceborne remote sensing, for which different land cover classification strategies are available in literature at different levels of cost and accuracy. We selected the optimal strategy in the perspective of a global processing chain. Thanks to a specifically developed strategy for fusing unsupervised classification results with environmental constraints and other geospatial inputs including ground-based data, we managed to obtain good classification results despite the constraints placed. The overall production process was composed using "good-enough" algorithms at each step, ensuring that the precision, accuracy, and data-hunger of each algorithm was commensurate to the precision, accuracy, and amount of data available. This paper describes the tailored strategy developed on the occasion as a cooperation among different groups with diverse backgrounds, a strategy which is believed to be profitably reusable in other, similar contexts. The paper presents the problem

  17. A Novel Strategy for Very-Large-Scale Cash-Crop Mapping in the Context of Weather-Related Risk Assessment, Combining Global Satellite Multispectral Datasets, Environmental Constraints, and In Situ Acquisition of Geospatial Data

    Directory of Open Access Journals (Sweden)

    Fabio Dell’Acqua

    2018-02-01

    Full Text Available Cash crops are agricultural crops intended to be sold for profit as opposed to subsistence crops, meant to support the producer, or to support livestock. Since cash crops are intended for future sale, they translate into large financial value when considered on a wide geographical scale, so their production directly involves financial risk. At a national level, extreme weather events including destructive rain or hail, as well as drought, can have a significant impact on the overall economic balance. It is thus important to map such crops in order to set up insurance and mitigation strategies. Using locally generated data—such as municipality-level records of crop seeding—for mapping purposes implies facing a series of issues like data availability, quality, homogeneity, etc. We thus opted for a different approach relying on global datasets. Global datasets ensure homogeneity and availability of data, although sometimes at the expense of precision and accuracy. A typical global approach makes use of spaceborne remote sensing, for which different land cover classification strategies are available in literature at different levels of cost and accuracy. We selected the optimal strategy in the perspective of a global processing chain. Thanks to a specifically developed strategy for fusing unsupervised classification results with environmental constraints and other geospatial inputs including ground-based data, we managed to obtain good classification results despite the constraints placed. The overall production process was composed using “good-enough" algorithms at each step, ensuring that the precision, accuracy, and data-hunger of each algorithm was commensurate to the precision, accuracy, and amount of data available. This paper describes the tailored strategy developed on the occasion as a cooperation among different groups with diverse backgrounds, a strategy which is believed to be profitably reusable in other, similar contexts. The

  18. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  19. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  20. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  1. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  2. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  3. Nuclear fuel handling apparatus

    International Nuclear Information System (INIS)

    Andrea, C.; Dupen, C.F.G.; Noyes, R.C.

    1977-01-01

    A fuel handling machine for a liquid metal cooled nuclear reactor in which a retractable handling tube and gripper are lowered into the reactor to withdraw a spent fuel assembly into the handling tube. The handling tube containing the fuel assembly immersed in liquid sodium is then withdrawn completely from the reactor into the outer barrel of the handling machine. The machine is then used to transport the spent fuel assembly directly to a remotely located decay tank. The fuel handling machine includes a decay heat removal system which continuously removes heat from the interior of the handling tube and which is capable of operating at its full cooling capacity at all times. The handling tube is supported in the machine from an articulated joint which enables it to readily align itself with the correct position in the core. An emergency sodium supply is carried directly by the machine to provide make up in the event of a loss of sodium from the handling tube during transport to the decay tank. 5 claims, 32 drawing figures

  4. On the visualization of water-related big data: extracting insights from drought proxies' datasets

    Science.gov (United States)

    Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

    2017-04-01

    Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was

  5. How to Handle Abuse

    Science.gov (United States)

    ... Handle Abuse KidsHealth / For Kids / How to Handle Abuse What's in this article? Tell Right Away How Do You Know Something Is Abuse? ... babysitter, teacher, coach, or a bigger kid. Child abuse can happen anywhere — at ... building. Tell Right Away A kid who is being seriously hurt ...

  6. Grain Handling and Storage.

    Science.gov (United States)

    Harris, Troy G.; Minor, John

    This text for a secondary- or postecondary-level course in grain handling and storage contains ten chapters. Chapter titles are (1) Introduction to Grain Handling and Storage, (2) Elevator Safety, (3) Grain Grading and Seed Identification, (4) Moisture Control, (5) Insect and Rodent Control, (6) Grain Inventory Control, (7) Elevator Maintenance,…

  7. Handling large variations in mechanics: Some applications

    Indian Academy of Sciences (India)

    become a simple random walk wherein most jumps occur to the left-and right- ..... region 1 spans from the start of the test to the occurrence of the 1st event .... analysis would help in engineering design decision making. 4. ... Colombo I S, Forde M C, Main I G and Halliday J 2003a AE monitoring of concrete bridge beams in.

  8. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    Science.gov (United States)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  9. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  10. Handling Pyrophoric Reagents

    Energy Technology Data Exchange (ETDEWEB)

    Alnajjar, Mikhail S.; Haynie, Todd O.

    2009-08-14

    Pyrophoric reagents are extremely hazardous. Special handling techniques are required to prevent contact with air and the resulting fire. This document provides several methods for working with pyrophoric reagents outside of an inert atmosphere.

  11. Remote handling equipment

    International Nuclear Information System (INIS)

    Clement, G.

    1984-01-01

    After a definition of intervention, problems encountered for working in an adverse environment are briefly analyzed for development of various remote handling equipments. Some examples of existing equipments are given [fr

  12. Ergonomics and patient handling.

    Science.gov (United States)

    McCoskey, Kelsey L

    2007-11-01

    This study aimed to describe patient-handling demands in inpatient units during a 24-hour period at a military health care facility. A 1-day total population survey described the diverse nature and impact of patient-handling tasks relative to a variety of nursing care units, patient characteristics, and transfer equipment. Productivity baselines were established based on patient dependency, physical exertion, type of transfer, and time spent performing the transfer. Descriptions of the physiological effect of transfers on staff based on patient, transfer, and staff characteristics were developed. Nursing staff response to surveys demonstrated how patient-handling demands are impacted by the staff's physical exertion and level of patient dependency. The findings of this study describe the types of transfers occurring in these inpatient units and the physical exertion and time requirements for these transfers. This description may guide selection of the most appropriate and cost-effective patient-handling equipment required for specific units and patients.

  13. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  14. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  15. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  16. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  17. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  18. Remote handling machines

    International Nuclear Information System (INIS)

    Sato, Shinri

    1985-01-01

    In nuclear power facilities, the management of radioactive wastes is made with its technology plus the automatic techniques. Under the radiation field, the maintenance or aid of such systems is important. To cope with this situation, MF-2 system, MF-3 system and a manipulator system as remote handling machines are described. MF-2 system consists of an MF-2 carrier truck, a control unit and a command trailer. It is capable of handling heavy-weight objects. The system is not by hydraulic but by electrical means. MF-3 system consists of a four-crawler truck and a manipulator. The truck is versatile in its posture by means of the four independent crawlers. The manipulator system is bilateral in operation, so that the delicate handling is made possible. (Mori, K.)

  19. Practices of Handling

    DEFF Research Database (Denmark)

    Ræbild, Ulla

    to touch, pick up, carry, or feel with the hands. Figuratively it is to manage, deal with, direct, train, or control. Additionally, as a noun, a handle is something by which we grasp or open up something. Lastly, handle also has a Nordic root, here meaning to trade, bargain or deal. Together all four...... meanings seem to merge in the fashion design process, thus opening up for an embodied engagement with matter that entails direction giving, organizational management and negotiation. By seeing processes of handling as a key fashion methodological practice, it is possible to divert the discourse away from...... introduces four ways whereby fashion designers apply their own bodies as tools for design; a) re-activating past garment-design experiences, b) testing present garment-design experiences c) probing for new garment-design experiences and d) design of future garment experiences by body proxy. The paper...

  20. TRANSPORT/HANDLING REQUESTS

    CERN Multimedia

    Groupe ST/HM

    2002-01-01

    A new EDH document entitled 'Transport/Handling Request' will be in operation as of Monday, 11th February 2002, when the corresponding icon will be accessible from the EDH desktop, together with the application instructions. This EDH form will replace the paper-format transport/handling request form for all activities involving the transport of equipment and materials. However, the paper form will still be used for all vehicle-hire requests. The introduction of the EDH transport/handling request form is accompanied by the establishment of the following time limits for the various services concerned: 24 hours for the removal of office items, 48 hours for the transport of heavy items (of up to 6 metric tons and of standard road width), 5 working days for a crane operation, extra-heavy transport operation or complete removal, 5 working days for all transport operations relating to LHC installation. ST/HM Group, Logistics Section Tel: 72672 - 72202

  1. LACIE data-handling techniques

    Science.gov (United States)

    Waits, G. H. (Principal Investigator)

    1979-01-01

    Techniques implemented to facilitate processing of LANDSAT multispectral data between 1975 and 1978 are described. The data that were handled during the large area crop inventory experiment and the storage mechanisms used for the various types of data are defined. The overall data flow, from the placing of the LANDSAT orders through the actual analysis of the data set, is discussed. An overview is provided of the status and tracking system that was developed and of the data base maintenance and operational task. The archiving of the LACIE data is explained.

  2. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  3. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  4. Safe handling of tritium

    International Nuclear Information System (INIS)

    1991-01-01

    The main objective of this publication is to provide practical guidance and recommendations on operational radiation protection aspects related to the safe handling of tritium in laboratories, industrial-scale nuclear facilities such as heavy-water reactors, tritium removal plants and fission fuel reprocessing plants, and facilities for manufacturing commercial tritium-containing devices and radiochemicals. The requirements of nuclear fusion reactors are not addressed specifically, since there is as yet no tritium handling experience with them. However, much of the material covered is expected to be relevant to them as well. Annex III briefly addresses problems in the comparatively small-scale use of tritium at universities, medical research centres and similar establishments. However, the main subject of this publication is the handling of larger quantities of tritium. Operational aspects include designing for tritium safety, safe handling practice, the selection of tritium-compatible materials and equipment, exposure assessment, monitoring, contamination control and the design and use of personal protective equipment. This publication does not address the technologies involved in tritium control and cleanup of effluents, tritium removal, or immobilization and disposal of tritium wastes, nor does it address the environmental behaviour of tritium. Refs, figs and tabs

  5. Grain Grading and Handling.

    Science.gov (United States)

    Rendleman, Matt; Legacy, James

    This publication provides an introduction to grain grading and handling for adult students in vocational and technical education programs. Organized in five chapters, the booklet provides a brief overview of the jobs performed at a grain elevator and of the techniques used to grade grain. The first chapter introduces the grain industry and…

  6. Mars Sample Handling Functionality

    Science.gov (United States)

    Meyer, M. A.; Mattingly, R. L.

    2018-04-01

    The final leg of a Mars Sample Return campaign would be an entity that we have referred to as Mars Returned Sample Handling (MRSH.) This talk will address our current view of the functional requirements on MRSH, focused on the Sample Receiving Facility (SRF).

  7. Handling wood shavings

    Energy Technology Data Exchange (ETDEWEB)

    1974-09-18

    Details of bulk handling equipment suitable for collection and compressing wood waste from commercial joinery works are discussed. The Redler Bin Discharger ensures free flow of chips from storage silo discharge prior to compression into briquettes for use as fuel or processing into chipboard.

  8. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  9. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  10. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  11. Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

    Directory of Open Access Journals (Sweden)

    Fei Hu

    2018-04-01

    Full Text Available Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a system design and architecture (distributed architecture, logical data model, physical data model, and data operations; and (b practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption. Four major conclusions are offered: (1 no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2 SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3 SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4 SciDB, Spark, and Hive have better support of user defined functions (UDFs to extend the system capability.

  12. Remote handling for an ISIS target change

    International Nuclear Information System (INIS)

    Broome, T.A.; Holding, M.

    1989-01-01

    During 1987 two ISIS targets were changed. This document describes the main features of the remote handling aspects of the work. All the work has to be carried out using remote handling techniques. The radiation level measured on the surface of the reflector when the second target had been removed was about 800 mGy/h demonstrating that hands on operations on any part of the target reflector moderator assembly is not practical. The target changes were the first large scale operations in the Target Station Remote Handling Cell and a great deal was learned about both equipment and working practices. Some general principles emerged which are applicable to other active handling tasks on facilities like ISIS and these are discussed below. 8 figs

  13. Handling and Transport Problems

    Energy Technology Data Exchange (ETDEWEB)

    Pomarola, J. [Head of Technical Section, Atomic Energy Commission, Saclay (France); Savouyaud, J. [Head of Electro-Mechanical Sub-Division, Atomic Energy Commission, Saclay (France)

    1960-07-01

    Arrangements for special or dangerous transport operations by road arising out of the activities of the Atomic Energy Commission are made by the Works and Installations Division which acts in concert with the Monitoring and Protection Division (MPD) whenever radioactive substances or appliances are involved. In view of the risk of irradiation and contamination entailed in handling and transporting radioactive substances, including waste, a specialized transport and storage team has been formed as a complement to the emergency and decontamination teams.

  14. Solid waste handling

    International Nuclear Information System (INIS)

    Parazin, R.J.

    1995-01-01

    This study presents estimates of the solid radioactive waste quantities that will be generated in the Separations, Low-Level Waste Vitrification and High-Level Waste Vitrification facilities, collectively called the Tank Waste Remediation System Treatment Complex, over the life of these facilities. This study then considers previous estimates from other 200 Area generators and compares alternative methods of handling (segregation, packaging, assaying, shipping, etc.)

  15. Handling of radioactive waste

    International Nuclear Information System (INIS)

    Sanhueza Mir, Azucena

    1998-01-01

    Based on characteristics and quantities of different types of radioactive waste produced in the country, achievements in infrastructure and the way to solve problems related with radioactive waste handling and management, are presented in this paper. Objectives of maintaining facilities and capacities for controlling, processing and storing radioactive waste in a conditioned form, are attained, within a great range of legal framework, so defined to contribute with safety to people and environment (au)

  16. Renal phosphate handling: Physiology

    Directory of Open Access Journals (Sweden)

    Narayan Prasad

    2013-01-01

    Full Text Available Phosphorus is a common anion. It plays an important role in energy generation. Renal phosphate handling is regulated by three organs parathyroid, kidney and bone through feedback loops. These counter regulatory loops also regulate intestinal absorption and thus maintain serum phosphorus concentration in physiologic range. The parathyroid hormone, vitamin D, Fibrogenic growth factor 23 (FGF23 and klotho coreceptor are the key regulators of phosphorus balance in body.

  17. Uranium hexafluoride handling

    International Nuclear Information System (INIS)

    1991-01-01

    The United States Department of Energy, Oak Ridge Field Office, and Martin Marietta Energy Systems, Inc., are co-sponsoring this Second International Conference on Uranium Hexafluoride Handling. The conference is offered as a forum for the exchange of information and concepts regarding the technical and regulatory issues and the safety aspects which relate to the handling of uranium hexafluoride. Through the papers presented here, we attempt not only to share technological advances and lessons learned, but also to demonstrate that we are concerned about the health and safety of our workers and the public, and are good stewards of the environment in which we all work and live. These proceedings are a compilation of the work of many experts in that phase of world-wide industry which comprises the nuclear fuel cycle. Their experience spans the entire range over which uranium hexafluoride is involved in the fuel cycle, from the production of UF 6 from the naturally-occurring oxide to its re-conversion to oxide for reactor fuels. The papers furnish insights into the chemical, physical, and nuclear properties of uranium hexafluoride as they influence its transport, storage, and the design and operation of plant-scale facilities for production, processing, and conversion to oxide. The papers demonstrate, in an industry often cited for its excellent safety record, continuing efforts to further improve safety in all areas of handling uranium hexafluoride

  18. Uranium hexafluoride handling. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    1991-12-31

    The United States Department of Energy, Oak Ridge Field Office, and Martin Marietta Energy Systems, Inc., are co-sponsoring this Second International Conference on Uranium Hexafluoride Handling. The conference is offered as a forum for the exchange of information and concepts regarding the technical and regulatory issues and the safety aspects which relate to the handling of uranium hexafluoride. Through the papers presented here, we attempt not only to share technological advances and lessons learned, but also to demonstrate that we are concerned about the health and safety of our workers and the public, and are good stewards of the environment in which we all work and live. These proceedings are a compilation of the work of many experts in that phase of world-wide industry which comprises the nuclear fuel cycle. Their experience spans the entire range over which uranium hexafluoride is involved in the fuel cycle, from the production of UF{sub 6} from the naturally-occurring oxide to its re-conversion to oxide for reactor fuels. The papers furnish insights into the chemical, physical, and nuclear properties of uranium hexafluoride as they influence its transport, storage, and the design and operation of plant-scale facilities for production, processing, and conversion to oxide. The papers demonstrate, in an industry often cited for its excellent safety record, continuing efforts to further improve safety in all areas of handling uranium hexafluoride. Selected papers were processed separately for inclusion in the Energy Science and Technology Database.

  19. Torus sector handling system

    International Nuclear Information System (INIS)

    Grisham, D.L.

    1981-01-01

    A remote handling system is proposed for moving a torus sector of the accelerator from under the cryostat to a point where it can be handled by a crane and for the reverse process for a new sector. Equipment recommendations are presented, as well as possible alignment schemes. Some general comments about future remote-handling methods and the present capabilities of existing systems will also be included. The specific task to be addressed is the removal and replacement of a 425 to 450 ton torus sector. This requires a horizontal movement of approx. 10 m from a normal operating position to a point where its further transport can be accomplished by more conventional means (crane or floor transporter). The same horizontal movement is required for reinstallation, but a positional tolerance of 2 cm is required to allow reasonable fit-up for the vacuum seal from the radial frames to the torus sector. Since the sectors are not only heavy but rather tall and narrow, the transport system must provide a safe, stable, and repeatable method fo sector movement. This limited study indicates that the LAMPF-based method of transporting torus sectors offers a proven method of moving heavy items. In addition, the present state of the art in remote equipment is adequate for FED maintenance

  20. Handling of Solid Residues

    International Nuclear Information System (INIS)

    Medina Bermudez, Clara Ines

    1999-01-01

    The topic of solid residues is specifically of great interest and concern for the authorities, institutions and community that identify in them a true threat against the human health and the atmosphere in the related with the aesthetic deterioration of the urban centers and of the natural landscape; in the proliferation of vectorial transmitters of illnesses and the effect on the biodiversity. Inside the wide spectrum of topics that they keep relationship with the environmental protection, the inadequate handling of solid residues and residues dangerous squatter an important line in the definition of political and practical environmentally sustainable. The industrial development and the population's growth have originated a continuous increase in the production of solid residues; of equal it forms, their composition day after day is more heterogeneous. The base for the good handling includes the appropriate intervention of the different stages of an integral administration of residues, which include the separation in the source, the gathering, the handling, the use, treatment, final disposition and the institutional organization of the administration. The topic of the dangerous residues generates more expectation. These residues understand from those of pathogen type that are generated in the establishments of health that of hospital attention, until those of combustible, inflammable type, explosive, radio-active, volatile, corrosive, reagent or toxic, associated to numerous industrial processes, common in our countries in development

  1. Image-based Exploration of Large-Scale Pathline Fields

    KAUST Repository

    Nagoor, Omniah H.

    2014-05-27

    While real-time applications are nowadays routinely used in visualizing large nu- merical simulations and volumes, handling these large-scale datasets requires high-end graphics clusters or supercomputers to process and visualize them. However, not all users have access to powerful clusters. Therefore, it is challenging to come up with a visualization approach that provides insight to large-scale datasets on a single com- puter. Explorable images (EI) is one of the methods that allows users to handle large data on a single workstation. Although it is a view-dependent method, it combines both exploration and modification of visual aspects without re-accessing the original huge data. In this thesis, we propose a novel image-based method that applies the concept of EI in visualizing large flow-field pathlines data. The goal of our work is to provide an optimized image-based method, which scales well with the dataset size. Our approach is based on constructing a per-pixel linked list data structure in which each pixel contains a list of pathlines segments. With this view-dependent method it is possible to filter, color-code and explore large-scale flow data in real-time. In addition, optimization techniques such as early-ray termination and deferred shading are applied, which further improves the performance and scalability of our approach.

  2. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  3. CUDA based Level Set Method for 3D Reconstruction of Fishes from Large Acoustic Data

    DEFF Research Database (Denmark)

    Sharma, Ojaswa; Anton, François

    2009-01-01

    Acoustic images present views of underwater dynamics, even in high depths. With multi-beam echo sounders (SONARs), it is possible to capture series of 2D high resolution acoustic images. 3D reconstruction of the water column and subsequent estimation of fish abundance and fish species identificat...... of suppressing threshold and show its convergence as the evolution proceeds. We also present a GPU based streaming computation of the method using NVIDIA's CUDA framework to handle large volume data-sets. Our implementation is optimised for memory usage to handle large volumes....

  4. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  5. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  6. Preference Handling for Artificial Intelligence

    OpenAIRE

    Goldsmith, Judy; University of Kentucky; Junker, Ulrich; ILOG

    2009-01-01

    This article explains the benefits of preferences for AI systems and draws a picture of current AI research on preference handling. It thus provides an introduction to the topics covered by this special issue on preference handling.

  7. Crud handling circuit

    International Nuclear Information System (INIS)

    Smith, J.C.; Manuel, R.J.; McAllister, J.E.

    1981-01-01

    A process for handling the problems of crud formation during the solvent extraction of wet-process phosphoric acid, e.g. for uranium and rare earth removal, is described. It involves clarification of the crud-solvent mixture, settling, water washing the residue and treatment of the crud with a caustic wash to remove and regenerate the solvent. Applicable to synergistic mixtures of dialkylphosphoric acids and trialkylphosphine oxides dissolved in inert diluents and more preferably to the reductive stripping technique. (U.K.)

  8. Handling of potassium

    International Nuclear Information System (INIS)

    Schwarz, N.; Komurka, M.

    1983-03-01

    As a result for the Fast Breeder Development extensive experience is available worldwide with respect to Sodium technology. Due to the extension of the research program to topping cycles with Potassium as the working medium, test facilities with Potassium have been designed and operated in the Institute of Reactor Safety. The different chemical properties of Sodium and Potassium give rise in new safety concepts and operating procedures. The handling problems of Potassium are described in the light of theoretical properties and own experiences. Selected literature on main safety and operating problems complete this report. (Author) [de

  9. Extreme coal handling

    Energy Technology Data Exchange (ETDEWEB)

    Bradbury, S; Homleid, D. [Air Control Science Inc. (United States)

    2004-04-01

    Within the journals 'Focus on O & M' is a short article describing modifications to coal handling systems at Eielson Air Force Base near Fairbanks, Alaska, which is supplied with power and heat from a subbituminous coal-fired central plant. Measures to reduce dust include addition of an enclosed recirculation chamber at each transfer point and new chute designs to reduce coal velocity, turbulence, and induced air. The modifications were developed by Air Control Science (ACS). 7 figs., 1 tab.

  10. Handling Data Skew in MapReduce Cluster by Using Partition Tuning

    Directory of Open Access Journals (Sweden)

    Yufei Gao

    2017-01-01

    Full Text Available The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH. In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN. We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM on healthcare data.

  11. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  12. A Perspective on Remote Handling Operations and Human Machine Interface for Remote Handling in Fusion

    International Nuclear Information System (INIS)

    Haist, B.; Hamilton, D.; Sanders, St.

    2006-01-01

    A large-scale fusion device presents many challenges to the remote handling operations team. This paper is based on unique operational experience at JET and gives a perspective on remote handling task development, logistics and resource management, as well as command, control and human-machine interface systems. Remote operations require an accurate perception of a dynamic environment, ideally providing the operators with the same unrestricted knowledge of the task scene as would be available if they were actually at the remote work location. Traditional camera based systems suffer from a limited number of viewpoints and also degrade quickly when exposed to high radiation. Virtual Reality and Augmented Reality software offer great assistance. The remote handling system required to maintain a tokamak requires a large number of different and complex pieces of equipment coordinating to perform a large array of tasks. The demands on the operator's skill in performing the tasks can escalate to a point where the efficiency and safety of operations are compromised. An operations guidance system designed to facilitate the planning, development, validation and execution of remote handling procedures is essential. Automatic planning of motion trajectories of remote handling equipment and the remote transfer of heavy loads will be routine and need to be reliable. This paper discusses the solutions developed at JET in these areas and also the trends in management and presentation of operational data as well as command, control and HMI technology development offering the potential to greatly assist remote handling in future fusion machines. (author)

  13. Detecting Significant Stress Drop Variations in Large Micro-Earthquake Datasets: A Comparison Between a Convergent Step-Over in the San Andreas Fault and the Ventura Thrust Fault System, Southern California

    Science.gov (United States)

    Goebel, T. H. W.; Hauksson, E.; Plesch, A.; Shaw, J. H.

    2017-06-01

    A key parameter in engineering seismology and earthquake physics is seismic stress drop, which describes the relative amount of high-frequency energy radiation at the source. To identify regions with potentially significant stress drop variations, we perform a comparative analysis of source parameters in the greater San Gorgonio Pass (SGP) and Ventura basin (VB) in southern California. The identification of physical stress drop variations is complicated by large data scatter as a result of attenuation, limited recording bandwidth and imprecise modeling assumptions. In light of the inherently high uncertainties in single stress drop measurements, we follow the strategy of stacking large numbers of source spectra thereby enhancing the resolution of our method. We analyze more than 6000 high-quality waveforms between 2000 and 2014, and compute seismic moments, corner frequencies and stress drops. Significant variations in stress drop estimates exist within the SGP area. Moreover, the SGP also exhibits systematically higher stress drops than VB and shows more scatter. We demonstrate that the higher scatter in SGP is not a generic artifact of our method but an expression of differences in underlying source processes. Our results suggest that higher differential stresses, which can be deduced from larger focal depth and more thrust faulting, may only be of secondary importance for stress drop variations. Instead, the general degree of stress field heterogeneity and strain localization may influence stress drops more strongly, so that more localized faulting and homogeneous stress fields favor lower stress drops. In addition, higher loading rates, for example, across the VB potentially result in stress drop reduction whereas slow loading rates on local fault segments within the SGP region result in anomalously high stress drop estimates. Our results show that crustal and fault properties systematically influence earthquake stress drops of small and large events and should

  14. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  15. Remote handling in ZEPHYR

    International Nuclear Information System (INIS)

    Andelfinger, C.; Lackner, E.; Ulrich, M.; Weber, G.; Schilling, H.B.

    1982-04-01

    A conceptual design of the ZEPHYR building is described. The listed radiation data show that remote handling devices will be necessary in most areas of the building. For difficult repair and maintenance works it is intended to transfer complete units from the experimental hall to a hot cell which provides better working conditions. The necessary crane systems and other transport means are summarized as well as suitable commercially available manipulators and observation devices. The conept of automatic devices for cutting and welding and other operations inside the vacuum vessel and the belonging position control system is sketched. Guidelines for the design of passive components are set up in order to facilitate remote operation. (orig.)

  16. Handling hunger strikers.

    Science.gov (United States)

    1992-04-01

    Hunger strikes are being used increasingly and not only by those with a political point to make. Whereas in the past, hunger strikes in the United Kingdom seemed mainly to be started by terrorist prisoners for political purposes, the most recent was begun by a Tamil convicted of murder, to protest his innocence. In the later stages of his strike, before calling it off, he was looked after at the Hammersmith Hospital. So it is not only prison doctors who need to know how to handle a hunger strike. The following guidelines, adopted by the 43rd World Medical Assembly in Malta in November 1991, are therefore a timely reminder of the doctor's duties during a hunger strike.

  17. Plutonium safe handling

    International Nuclear Information System (INIS)

    Tvehlov, Yu.

    2000-01-01

    The abstract, prepared on the basis of materials of the IAEA new leadership on the plutonium safe handling and its storage (the publication no. 9 in the Safety Reports Series), aimed at presenting internationally acknowledged criteria on the radiation danger evaluation and summarizing the experience in the safe management of great quantities of plutonium, accumulated in the nuclear states, is presented. The data on the weapon-class and civil plutonium, the degree of its danger, the measures for provision of its safety, including the data on accident radiation consequences with the fission number 10 18 , are presented. The recommendations, making it possible to eliminate the super- criticality danger, as well as ignition and explosion, to maintain the tightness of the facility, aimed at excluding the radioactive contamination and the possibility of internal irradiation, to provide for the plutonium security, physical protection and to reduce irradiation are given [ru

  18. Handling missing values in the MDS-UPDRS.

    Science.gov (United States)

    Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T

    2015-10-01

    This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.

  19. Handle with care

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1965-03-15

    Full text: A film dealing with transport of radioactive materials by everyday means - rail, road, sea and air transport - has been made for IAEA. It illustrates in broad terms some of the simple precautions which should be followed by persons dealing with such materials during shipment. Throughout, the picture stresses the transport regulations drawn up and recommended by the Agency, and in particular the need to carry out carefully the instructions based on these regulations in order to ensure that there is no hazard to the public nor to those who handle radioactive materials in transit and storage. In straightforward language, the film addresses the porter of a goods wagon, an airline cargo clerk, a dockside crane operator, a truck driver and others who load and ship freight. It shows the various types of package used to contain different categories of radioactive substances according to the intensity of the radiation emitted. It also illustrates their robustness by a series of tests involving drops, fires, impact, crushing, etc. Clear instructions are conveyed on what to do in the event of an unlikely accident with any type of package. The film is entitled, 'The Safe Transport of Radioactive Materials', and is No. 3 in the series entitled, 'Handle with Care'. It was made for IAEA through the United Kingdom Atomic Energy Authority by the Film Producers' Guild in the United Kingdom. It is in 16 mm colour, optical sound, with a running time of 20 minutes. It is available for order at $50 either direct from IAEA or through any of its Member Governments. Prints can be supplied in English, French, Russian or Spanish. Copies are also available for adaptation for commentaries in other languages. (author)

  20. Development of tritium-handling technique

    International Nuclear Information System (INIS)

    Ohmura, Hiroshi; Hosaka, Akio; Okamoto, Takahumi

    1988-01-01

    The overview of developing activities for tritium-handling techniques in IHI are presented. To establish a fusion power plant, tritium handling is one of the key technologies. Recently in JAERI, conceptual design of FER (Fusion Experimental Reactor) has been carried out, and the FER system requires a processing system for a large amount of tritium. IHI concentrate on investigation of fuel gas purification, isotope separation and storage systems under contract with Toshiba Corporation. Design results of the systems and each components are reviewed. IHI has been developing fundamental handling techniques which are the ZrNi bed for hydrogen isotope storage and isotope separation by laser. The ZrNi bed with a tritium storage capacity of 1000 Ci has been constructed and recovery capability of the hydrogen isotope until 10 -4 Torr {0.013 Pa} was confirmed. In laser isotope separation, the optimum laser wave length has been determined. (author)

  1. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  2. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  3. StarDB: a large-scale DBMS for strings

    KAUST Repository

    Sahli, Majed

    2015-08-01

    Strings and applications using them are proliferating in science and business. Currently, strings are stored in file systems and processed using ad-hoc procedural code. Existing techniques are not flexible and cannot efficiently handle complex queries or large datasets. In this paper, we demonstrate StarDB, a distributed database system for analytics on strings. StarDB hides data and system complexities and allows users to focus on analytics. It uses a comprehensive set of parallel string operations and provides a declarative query language to solve complex queries. StarDB automatically tunes itself and runs with over 90% efficiency on supercomputers, public clouds, clusters, and workstations. We test StarDB using real datasets that are 2 orders of magnitude larger than the datasets reported by previous works.

  4. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  5. Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

    NARCIS (Netherlands)

    Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

    2017-01-01

    Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

  6. Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

    Directory of Open Access Journals (Sweden)

    C. V. Subbulakshmi

    2015-01-01

    Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.

  7. Unvented Drum Handling Plan

    International Nuclear Information System (INIS)

    MCDONALD, K.M.

    2000-01-01

    This drum-handling plan proposes a method to deal with unvented transuranic drums encountered during retrieval of drums. Finding unvented drums during retrieval activities was expected, as identified in the Transuranic (TRU) Phase I Retrieval Plan (HNF-4781). However, significant numbers of unvented drums were not expected until excavation of buried drums began. This plan represents accelerated planning for management of unvented drums. A plan is proposed that manages unvented drums differently based on three categories. The first category of drums is any that visually appear to be pressurized. These will be vented immediately, using either the Hanford Fire Department Hazardous Materials (Haz. Mat.) team, if such are encountered before the facilities' capabilities are established, or using internal capabilities, once established. To date, no drums have been retrieved that showed signs of pressurization. The second category consists of drums that contain a minimal amount of Pu isotopes. This minimal amount is typically less than 1 gram of Pu, but may be waste-stream dependent. Drums in this category are assayed to determine if they are low-level waste (LLW). LLW drums are typically disposed of without venting. Any unvented drums that assay as TRU will be staged for a future venting campaign, using appropriate safety precautions in their handling. The third category of drums is those for which records show larger amounts of Pu isotopes (typically greater than or equal to 1 gram of Pu). These are assumed to be TRU and are not assayed at this point, but are staged for a future venting campaign. Any of these drums that do not have a visible venting device will be staged awaiting venting, and will be managed under appropriate controls, including covering the drums to protect from direct solar exposure, minimizing of container movement, and placement of a barrier to restrict vehicle access. There are a number of equipment options available to perform the venting. The

  8. New transport and handling contract

    CERN Multimedia

    SC Department

    2008-01-01

    A new transport and handling contract entered into force on 1.10.2008. As with the previous contract, the user interface is the internal transport/handling request form on EDH: https://edh.cern.ch/Document/TransportRequest/ To ensure that you receive the best possible service, we invite you to complete the various fields as accurately as possible and to include a mobile telephone number on which we can reach you. You can follow the progress of your request (schedule, completion) in the EDH request routing information. We remind you that the following deadlines apply: 48 hours for the transport of heavy goods (up to 8 tonnes) or simple handling operations 5 working days for crane operations, transport of extra-heavy goods, complex handling operations and combined transport and handling operations in the tunnel. For all enquiries, the number to contact remains unchanged: 72202. Heavy Handling Section TS-HE-HH 72672 - 160319

  9. How to handle spatial heterogeneity in hydrological models.

    Science.gov (United States)

    Loritz, Ralf; Neuper, Malte; Gupta, Hoshin; Zehe, Erwin

    2017-04-01

    The amount of data we observe in our environmental systems is larger than ever. This leads to a new kind of problem where hydrological modelers can have access to large datasets with various quantitative and qualitative observations but are uncertain about the information content with respect to the hydrological functioning of a landscape. For example digital elevation models obviously contain plenty of information about the topography of a landscape; however the question of relevance for Hydrology is how much of this information is important for the hydrological functioning of a landscape. This kind of question is not limited to topography and we can ask similar questions when handling distributed rainfall data or geophysical images. In this study we would like to show how one can separate dominant patterns in the landscape from idiosyncratic system details. We use a 2D numerical hillslope model in combination with an extensive research data set to test a variety of different model setups that are built upon different landscape characteristics and run by different rainfalls measurements. With the help of information theory based measures we can identify and learn how much heterogeneity is really necessary for successful hydrological simulations and how much of it we can neglect.

  10. Remote handling and accelerators

    International Nuclear Information System (INIS)

    Wilson, M.T.

    1983-01-01

    The high-current levels of contemporary and proposed accelerator facilities induce radiation levels into components, requiring consideration be given to maintenance techniques that reduce personnel exposure. Typical components involved include beamstops, targets, collimators, windows, and instrumentation that intercepts the direct beam. Also included are beam extraction, injection, splitting, and kicking regions, as well as purposeful spill areas where beam tails are trimmed and neutral particles are deposited. Scattered beam and secondary particles activate components all along a beamline such as vacuum pipes, magnets, and shielding. Maintenance techniques vary from hands-on to TV-viewed operation using state-of-the-art servomanipulators. Bottom- or side-entry casks are used with thimble-type target and diagnostic assemblies. Long-handled tools are operated from behind shadow shields. Swinging shield doors, unstacking block, and horizontally rolling shield roofs are all used to provide access. Common to all techniques is the need to make operations simple and to provide a means of seeing and reaching the area

  11. TFTR tritium handling concepts

    International Nuclear Information System (INIS)

    Garber, H.J.

    1976-01-01

    The Tokamak Fusion Test Reactor, to be located on the Princeton Forrestal Campus, is expected to operate with 1 to 2.5 MA tritium--deuterium plasmas, with the pulses involving injection of 50 to 150 Ci (5 to 16 mg) of tritium. Attainment of fusion conditions is based on generation of an approximately 1 keV tritium plasma by ohmic heating and conversion to a moderately hot tritium--deuterium ion plasma by injection of a ''preheating'' deuterium neutral beam (40 to 80 keV), followed by injection of a ''reacting'' beam of high energy neutral deuterium (120 to 150 keV). Additionally, compressions accompany the beam injections. Environmental, safety and cost considerations led to the decision to limit the amount of tritium gas on-site to that required for an experiment, maintaining all other tritium in ''solidified'' form. The form of the tritium supply is as uranium tritide, while the spent tritium and other hydrogen isotopes are getter-trapped by zirconium--aluminum alloy. The issues treated include: (1) design concepts for the tritium generator and its purification, dispensing, replenishment, containment, and containment--cleanup systems; (2) features of the spent plasma trapping system, particularly the regenerable absorption cartridges, their integration into the vacuum system, and the handling of non-getterables; (3) tritium permeation through the equipment and the anticipated releases to the environment; (4) overview of the tritium related ventilation systems; and (5) design bases for the facility's tritium clean-up systems

  12. Safe Handling of Radioisotopes

    International Nuclear Information System (INIS)

    1958-01-01

    Under its Statute the International Atomic Energy Agency is empowered to provide for the application of standards of safety for protection against radiation to its own operations and to operations making use of assistance provided by it or with which it is otherwise directly associated. To this end authorities receiving such assistance are required to observe relevant health and safety measures prescribed by the Agency. As a first step, it has been considered an urgent task to provide users of radioisotopes with a manual of practice for the safe handling of these substances. Such a manual is presented here and represents the first of a series of manuals and codes to be issued by the Agency. It has been prepared after careful consideration of existing national and international codes of radiation safety, by a group of international experts and in consultation with other international bodies. At the same time it is recommended that the manual be taken into account as a basic reference document by Member States of the Agency in the preparation of national health and safety documents covering the use of radioisotopes.

  13. Radioactive wastes handling facility

    International Nuclear Information System (INIS)

    Hirose, Emiko; Inaguma, Masahiko; Ozaki, Shigeru; Matsumoto, Kaname.

    1997-01-01

    There are disposed an area where a conveyor is disposed for separating miscellaneous radioactive solid wastes such as metals, on area for operators which is disposed in the direction vertical to the transferring direction of the conveyor, an area for receiving the radioactive wastes and placing them on the conveyor and an area for collecting the radioactive wastes transferred by the conveyor. Since an operator can conduct handling while wearing a working cloth attached to a partition wall as he wears his ordinary cloth, the operation condition can be improved and the efficiency for the separating work can be improved. When the area for settling conveyors and the area for the operators is depressurized, cruds on the surface of the wastes are not released to the outside and the working clothes can be prevented from being involved. Since the wastes are transferred by the conveyor, the operator's moving range is reduced, poisonous materials are fallen and moved through a sliding way to an area for collecting materials to be separated. Accordingly, the materials to be removed can be accumulated easily. (N.H.)

  14. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  15. Trends in Modern Exception Handling

    Directory of Open Access Journals (Sweden)

    Marcin Kuta

    2003-01-01

    Full Text Available Exception handling is nowadays a necessary component of error proof information systems. The paper presents overview of techniques and models of exception handling, problems connected with them and potential solutions. The aspects of implementation of propagation mechanisms and exception handling, their effect on semantics and general program efficiency are also taken into account. Presented mechanisms were adopted to modern programming languages. Considering design area, formal methods and formal verification of program properties we can notice exception handling mechanisms are weakly present what makes a field for future research.

  16. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  17. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  18. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  19. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  20. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  1. Safety measuring for sodium handling

    Energy Technology Data Exchange (ETDEWEB)

    Jeong, Ji Young; Jeong, K C; Kim, T J; Kim, B H; Choi, J H

    2001-09-01

    This is the report for the safety measures of sodium handling. These contents are prerequisites for the development of sodium technology and thus the workers participate in sodium handling and experiments have to know them perfectly. As an appendix, the relating parts of the laws are presented.

  2. Waste Handling Building Conceptual Study

    International Nuclear Information System (INIS)

    G.W. Rowe

    2000-01-01

    The objective of the ''Waste Handling Building Conceptual Study'' is to develop proposed design requirements for the repository Waste Handling System in sufficient detail to allow the surface facility design to proceed to the License Application effort if the proposed requirements are approved by DOE. Proposed requirements were developed to further refine waste handling facility performance characteristics and design constraints with an emphasis on supporting modular construction, minimizing fuel inventory, and optimizing facility maintainability and dry handling operations. To meet this objective, this study attempts to provide an alternative design to the Site Recommendation design that is flexible, simple, reliable, and can be constructed in phases. The design concept will be input to the ''Modular Design/Construction and Operation Options Report'', which will address the overall program objectives and direction, including options and issues associated with transportation, the subsurface facility, and Total System Life Cycle Cost. This study (herein) is limited to the Waste Handling System and associated fuel staging system

  3. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  4. Large-scale Machine Learning in High-dimensional Datasets

    DEFF Research Database (Denmark)

    Hansen, Toke Jansen

    Over the last few decades computers have gotten to play an essential role in our daily life, and data is now being collected in various domains at a faster pace than ever before. This dissertation presents research advances in four machine learning fields that all relate to the challenges imposed...... are better at modeling local heterogeneities. In the field of machine learning for neuroimaging, we introduce learning protocols for real-time functional Magnetic Resonance Imaging (fMRI) that allow for dynamic intervention in the human decision process. Specifically, the model exploits the structure of f...

  5. NCBI Mass Sequence Downloader–Large dataset downloading made easy

    Directory of Open Access Journals (Sweden)

    F. Pina-Martins

    2016-01-01

    Source code is licensed under the GPLv3, and is supported on Linux, Windows and Mac OSX. Available at https://github.com/ElsevierSoftwareX/SOFTX-D-15-00072.git, https://github.com/StuntsPT/NCBI_Mass_Downloader

  6. Interactive Visualization of Large High-Dimensional Datasets

    Science.gov (United States)

    Ding, Wei; Chen, Ping

    Nowadays many companies and public organizations use powerful database systems for collecting and managing information. Huge amount of data records are often accumulated within a short period of time. Valuable information is embedded in these data, which could help discover interesting knowledge and significantly assist in decision-making process. However, human beings are not capable of understanding so many data records which often have lots of attributes. The need for automated knowledge extraction is widely recognized, and leads to a rapidly developing market of data analysis and knowledge discovery tools.

  7. Likelihood Approximation With Hierarchical Matrices For Large Spatial Datasets

    KAUST Repository

    Litvinenko, Alexander

    2017-09-03

    We use available measurements to estimate the unknown parameters (variance, smoothness parameter, and covariance length) of a covariance function by maximizing the joint Gaussian log-likelihood function. To overcome cubic complexity in the linear algebra, we approximate the discretized covariance function in the hierarchical (H-) matrix format. The H-matrix format has a log-linear computational cost and storage O(kn log n), where the rank k is a small integer and n is the number of locations. The H-matrix technique allows us to work with general covariance matrices in an efficient way, since H-matrices can approximate inhomogeneous covariance functions, with a fairly general mesh that is not necessarily axes-parallel, and neither the covariance matrix itself nor its inverse have to be sparse. We demonstrate our method with Monte Carlo simulations and an application to soil moisture data. The C, C++ codes and data are freely available.

  8. Distributed Large Dataset Deployment with Improved Load Balancing and Performance

    OpenAIRE

    Siddharth Bhandari

    2016-01-01

    Cloud computing is a prototype for permitting universal, appropriate, on-demand network access. Cloud is a method of computing where enormously scalable IT-enabled proficiencies are delivered „as a service‟ using Internet tools to multiple outdoor clients. Virtualization is the establishment of a virtual form of something such as computing device or server, an operating system, or network devices and storage device. The different names for cloud data management are DaaS Data as a ...

  9. Likelihood Approximation With Hierarchical Matrices For Large Spatial Datasets

    KAUST Repository

    Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E.

    2017-01-01

    algebra, we approximate the discretized covariance function in the hierarchical (H-) matrix format. The H-matrix format has a log-linear computational cost and storage O(kn log n), where the rank k is a small integer and n is the number of locations. The H

  10. The Importance of Normalization on Large and Heterogeneous Microarray Datasets

    Science.gov (United States)

    DNA microarray technology is a powerful functional genomics tool increasingly used for investigating global gene expression in environmental studies. Microarrays can also be used in identifying biological networks, as they give insight on the complex gene-to-gene interactions, ne...

  11. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  12. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  13. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  14. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  15. Production management of window handles

    Directory of Open Access Journals (Sweden)

    Manuela Ingaldi

    2014-12-01

    Full Text Available In the chapter a company involved in the production of aluminum window and door handles was presented. The main customers of the company are primarily companies which produce PCV joinery and wholesalers supplying these companies. One chosen product from the research company - a single-arm pin-lift window handle - was described and its production process depicted technologically. The chapter also includes SWOT analysis conducted in the research company and the value stream of the single-arm pin-lift window handle.

  16. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  17. Safe handling of radiation sources

    International Nuclear Information System (INIS)

    Abd Nasir Ibrahim; Azali Muhammad; Ab Razak Hamzah; Abd Aziz Mohamed; Mohammad Pauzi Ismail

    2004-01-01

    This chapter discussed the subjects related to the safe handling of radiation sources: type of radiation sources, method of use: transport within premises, transport outside premises; Disposal of Gamma Sources

  18. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  19. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  20. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  1. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  2. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  3. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  4. The technique on handling radiation

    International Nuclear Information System (INIS)

    1997-11-01

    This book describes measurement of radiation and handling radiation. The first part deals with measurement of radiation. The contents of this part are characteristic on measurement technique of radiation, radiation detector, measurement of energy spectrum, measurement of radioactivity, measurement for a level of radiation and county's statistics on radiation. The second parts explains handling radiation with treating of sealed radioisotope, treating unsealed source and radiation shield.

  5. Civilsamfundets ABC: H for Handling

    DEFF Research Database (Denmark)

    Lund, Anker Brink; Meyer, Gitte

    2015-01-01

    Hvad er civilsamfundet? Anker Brink Lund og Gitte Meyer fra CBS Center for Civil Society Studies gennemgår civilsamfundet bogstav for bogstav. Vi er nået til H for Handling.......Hvad er civilsamfundet? Anker Brink Lund og Gitte Meyer fra CBS Center for Civil Society Studies gennemgår civilsamfundet bogstav for bogstav. Vi er nået til H for Handling....

  6. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  7. Development of liquid handling techniques in microgravity

    Science.gov (United States)

    Antar, Basil N.

    1995-01-01

    A large number of experiments dealing with protein crystal growth and also with growth of crystals from solution require complicated fluid handling procedures including filling of empty containers with liquids, mixing of solutions, and stirring of liquids. Such procedures are accomplished in a straight forward manner when performed under terrestrial conditions in the laboratory. However, in the low gravity environment of space, such as on board the Space Shuttle or an Earth-orbiting space station, these procedures sometimes produced entirely undesirable results. Under terrestrial conditions, liquids usually completely separate from the gas due to the buoyancy effects of Earth's gravity. Consequently, any gas pockets that are entrained into the liquid during a fluid handling procedure will eventually migrate towards the top of the vessel where they can be removed. In a low gravity environment any folded gas bubble will remain within the liquid bulk indefinitely at a location that is not known a priori resulting in a mixture of liquid and vapor.

  8. Bionic design methodology for wear reduction of bulk solids handling equipment

    NARCIS (Netherlands)

    Chen, G.; Schott, D.L.; Lodewijks, G.

    2016-01-01

    Large-scale handling of particulate solids can cause severe wear on bulk solids handling equipment surfaces. Wear reduces equipment life span and increases maintenance cost. Examples of traditional methods to reduce wear of bulk solids handling equipment include optimizing transport operations

  9. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  10. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  11. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    Science.gov (United States)

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Blanket handling concepts for future fusion power plants

    International Nuclear Information System (INIS)

    Bogusch, E.; Gottfried, R.; Maisonnier, D.

    2003-01-01

    In the frame of the power plant conceptual studies (PPCS) launched by the European Commission, two main blanket handling concepts have been investigated with respect to engineering feasibility and the impact on the plant availability and on cost: the large module handling concept (LMHC) and the large sector handling concept (LSHC). The LMHC has been considered as the reference handling concept while the LSHC has been considered as an attractive alternative to the LMHC due to its potential of smaller replacement times and hence increasing the plant availability. Although no principle feasibility issue has been identified, a number of engineering issues have been highlighted for the LSHC that would require considerable efforts for their resolution. Since its availability of about 77% based on a replacement time for all the internals of about 4.2 months is slightly lower than for the LMHC, the LMHC remains the reference blanket replacement concept for a conceptual reactor

  13. New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

    Science.gov (United States)

    Gu, Xiaoqing; Ni, Tongguang; Wang, Hongyuan

    2014-01-01

    In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.

  14. New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification

    Directory of Open Access Journals (Sweden)

    Xiaoqing Gu

    2014-01-01

    Full Text Available In medical datasets classification, support vector machine (SVM is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM for the class imbalance problem (called FSVM-CIP is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.

  15. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  16. Asthma, guides for diagnostic and handling

    International Nuclear Information System (INIS)

    Salgado, Carlos E; Caballero A, Andres S; Garcia G, Elizabeth

    1999-01-01

    The paper defines the asthma, includes topics as diagnostic, handling of the asthma, special situations as asthma and pregnancy, handling of the asthmatic patient's perioperatory and occupational asthma

  17. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  18. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  19. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  20. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  1. SRV-automatic handling device

    International Nuclear Information System (INIS)

    Yamada, Koji

    1987-01-01

    Automatic handling device for the steam relief valves (SRV's) is developed in order to achieve a decrease in exposure of workers, increase in availability factor, improvement in reliability, improvement in safety of operation, and labor saving. A survey is made during a periodical inspection to examine the actual SVR handling operation. An SRV automatic handling device consists of four components: conveyor, armed conveyor, lifting machine, and control/monitoring system. The conveyor is so designed that the existing I-rail installed in the containment vessel can be used without any modification. This is employed for conveying an SRV along the rail. The armed conveyor, designed for a box rail, is used for an SRV installed away from the rail. By using the lifting machine, an SRV installed away from the I-rail is brought to a spot just below the rail so that the SRV can be transferred by the conveyor. The control/monitoring system consists of a control computer, operation panel, TV monitor and annunciator. The SRV handling device is operated by remote control from a control room. A trial equipment is constructed and performance/function testing is carried out using actual SRV's. As a result, is it shown that the SRV handling device requires only two operators to serve satisfactorily. The required time for removal and replacement of one SRV is about 10 minutes. (Nogami, K.)

  2. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  3. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  4. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  5. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  6. Handling of waste in ports

    International Nuclear Information System (INIS)

    Olson, P.H.

    1994-01-01

    The regulations governing the handling of port-generated waste are often national and/or local legislation, whereas the handling of ship-generated waste is governed by the MARPOL Convention in most parts of the world. The handling of waste consists of two main phases -collection and treatment. Waste has to be collected in every port and on board every ship, whereas generally only some wastes are treated and to a certain degree in ports and on board ships. This paper considers the different kinds of waste generated in both ports and on board ships, where and how it is generated, how it could be collected and treated. The two sources are treated together to show how some ship-generated waste may be treated in port installations primarily constructed for the treatment of the port-generated waste, making integrated use of the available treatment facilities. (author)

  7. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  8. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  9. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  10. Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

    Directory of Open Access Journals (Sweden)

    Kindie Biredagn Nahato

    2015-01-01

    Full Text Available The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.

  11. An assessment of differences in gridded precipitation datasets in complex terrain

    Science.gov (United States)

    Henn, Brian; Newman, Andrew J.; Livneh, Ben; Daly, Christopher; Lundquist, Jessica D.

    2018-01-01

    Hydrologic modeling and other geophysical applications are sensitive to precipitation forcing data quality, and there are known challenges in spatially distributing gauge-based precipitation over complex terrain. We conduct a comparison of six high-resolution, daily and monthly gridded precipitation datasets over the Western United States. We compare the long-term average spatial patterns, and interannual variability of water-year total precipitation, as well as multi-year trends in precipitation across the datasets. We find that the greatest absolute differences among datasets occur in high-elevation areas and in the maritime mountain ranges of the Western United States, while the greatest percent differences among datasets relative to annual total precipitation occur in arid and rain-shadowed areas. Differences between datasets in some high-elevation areas exceed 200 mm yr-1 on average, and relative differences range from 5 to 60% across the Western United States. In areas of high topographic relief, true uncertainties and biases are likely higher than the differences among the datasets; we present evidence of this based on streamflow observations. Precipitation trends in the datasets differ in magnitude and sign at smaller scales, and are sensitive to how temporal inhomogeneities in the underlying precipitation gauge data are handled.

  12. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  13. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  14. Processing large remote sensing image data sets on Beowulf clusters

    Science.gov (United States)

    Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

    2003-01-01

    High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.

  15. Software for handling MFME1

    International Nuclear Information System (INIS)

    Van der Merwe, W.G.

    1984-01-01

    The report deals with SEMFIP, a computer code for determining magnetic field measurements. The program is written in FORTRAN and ASSEMBLER. The preparations for establishing SEMFIP, the actual measurements, data handling and the problems that were experienced are discussed. Details on the computer code are supplied in an appendix

  16. Welding method by remote handling

    International Nuclear Information System (INIS)

    Hashinokuchi, Minoru.

    1994-01-01

    Water is charged into a pit (or a water reservoir) and an article to be welded is placed on a support in the pit by remote handling. A steel plate is disposed so as to cover the article to be welded by remote handling. The welding device is positioned to the portion to be welded and fixed in a state where the article to be welded is shielded from radiation by water and the steel plate. Water in the pit is drained till the portion to be welded is exposed to the atmosphere. Then, welding is conducted. After completion of the welding, water is charged again to the pit and the welding device and fixing jigs are decomposed in a state where the article to be welded is shielded again from radiation by water and the steel plate. Subsequently, the steel plate is removed by remote handling. Then, the article to be welded is returned from the pit to a temporary placing pool by remote handling. This can reduce operator's exposure. Further, since the amount of the shielding materials can be minimized, the amount of radioactive wastes can be decreased. (I.N.)

  17. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  18. Data-Driven Decision Support for Radiologists: Re-using the National Lung Screening Trial Dataset for Pulmonary Nodule Management

    OpenAIRE

    Morrison, James J.; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L.

    2014-01-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was conve...

  19. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction Since July, the activities have been focused on very diverse subjects: operations activities for the 2012 data-taking, Monte Carlo production and data re-processing plans for 2013 conferences (winter and summer), preparation for the Upgrades TDRs and readiness after LS1. The regular operations activities have included: changing to the 53X release at the Tier-0, regular calibrations updates, and data certification to guarantee certified data for analysis with the shortest delay from data taking. The samples, simulated at 8 TeV, have been re-reconstructed using 53X. A lot of effort has been put in their prioritisation to ensure that the samples needed for HCP and future conferences are produced on time. Given the large amount of data that have been collected in 2012 and the available computing resources, a careful planning is needed. The PPD and Physics groups worked on a master schedule for the Monte Carlo production, new conditions validation and data reprocessing. The ...

  20. Radioactivity, shielding, radiation damage, and remote handling

    International Nuclear Information System (INIS)

    Wilson, M.T.

    1975-01-01

    Proton beams of a few hundred million electron volts of energy are capable of inducing hundreds of curies of activity per microampere of beam intensity into the materials they intercept. This adds a new dimension to the parameters that must be considered when designing and operating a high-intensity accelerator facility. Large investments must be made in shielding. The shielding itself may become activated and require special considerations as to its composition, location, and method of handling. Equipment must be designed to withstand large radiation dosages. Items such as vacuum seals, water tubing, and electrical insulation must be fabricated from radiation-resistant materials. Methods of maintaining and replacing equipment are required that limit the radiation dosages to workers.The high-intensity facilities of LAMPF, SIN, and TRIUMF and the high-energy facility of FERMILAB have each evolved a philosophy of radiation handling that matches their particular machine and physical plant layouts. Special tooling, commercial manipulator systems, remote viewing, and other techniques of the hot cell and fission reactor realms are finding application within accelerator facilities. (U.S.)

  1. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  2. Experience in handling concentrated tritium

    International Nuclear Information System (INIS)

    Holtslander, W.J.

    1985-12-01

    The notes describe the experience in handling concentrated tritium in the hydrogen form accumulated in the Chalk River Nuclear Laboratories Tritium Laboratory. The techniques of box operation, pumping systems, hydriding and dehydriding operations, and analysis of tritium are discussed. Information on the Chalk River Tritium Extraction Plant is included as a collection of reprints of papers presented at the Dayton Meeting on Tritium Technology, 1985 April 30 - May 2

  3. International handling of fissionable material

    International Nuclear Information System (INIS)

    1975-01-01

    The opinion of the ministry for foreign affairs on international handling of fissionable materials is given. As an introduction a survey is given of the possibilities to produce nuclear weapons from materials used in or produced by power reactors. Principles for international control of fissionable materials are given. International agreements against proliferation of nuclear weapons are surveyed and methods to improve them are proposed. (K.K.)

  4. Confinement facilities for handling plutonium

    International Nuclear Information System (INIS)

    Maraman, W.J.; McNeese, W.D.; Stafford, R.G.

    1975-01-01

    Plutonium handling on a multigram scale began in 1944. Early criteria, equipment, and techniques for confining contamination have been superseded by more stringent criteria and vastly improved equipment and techniques for in-process contamination control, effluent air cleaning and treatment of liquid wastes. This paper describes the evolution of equipment and practices to minimize exposure of workers and escape of contamination into work areas and into the environment. Early and current contamination controls are compared. (author)

  5. Remote handling equipment for SNS

    International Nuclear Information System (INIS)

    Poulten, B.H.

    1983-01-01

    This report gives information on the areas of the SNS, facility which become highly radioactive preventing hands-on maintenance. Levels of activity are sufficiently high in the Target Station Area of the SNS, especially under fault conditions, to warrant reactor technology to be used in the design of the water, drainage and ventilation systems. These problems, together with the type of remote handling equipment required in the SNS, are discussed

  6. Remote handling in reprocessing plants

    International Nuclear Information System (INIS)

    Streiff, G.

    1984-01-01

    Remote control will be the rule for maintenance in hot cells of future spent fuel reprocessing plants because of the radioactivity level. New handling equipments will be developed and intervention principles defined. Existing materials, recommendations for use and new manipulators are found in the PMDS' documentation. It is also a help in the choice and use of intervention means and a guide for the user [fr

  7. Equipment for the handling of thorium materials

    International Nuclear Information System (INIS)

    Heisler, S.W. Jr.; Mihalovich, G.S.

    1988-01-01

    The Feed Materials Production Center (FMPC) is the United States Department of Energy's storage facility for thorium. FMPC thorium handling and overpacking projects ensure the continued safe handling and storage of the thorium inventory until final disposition of the materials is determined and implemented. The handling and overpacking of the thorium materials requires the design of a system that utilizes remote handling and overpacking equipment not currently utilized at the FMPC in the handling of uranium materials. The use of remote equipment significantly reduces radiation exposure to personnel during the handling and overpacking efforts. The design system combines existing technologies from the nuclear industry, the materials processing and handling industry and the mining industry. The designed system consists of a modified fork lift truck for the transport of thorium containers, automated equipment for material identification and inventory control, and remote handling and overpacking equipment for material identification and inventory control, and remote handling and overpacking equipment for repackaging of the thorium materials

  8. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  9. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  11. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  12. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  13. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  15. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  16. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  17. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  1. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  2. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  3. Enteral Feeding Set Handling Techniques.

    Science.gov (United States)

    Lyman, Beth; Williams, Maria; Sollazzo, Janet; Hayden, Ashley; Hensley, Pam; Dai, Hongying; Roberts, Cristine

    2017-04-01

    Enteral nutrition therapy is common practice in pediatric clinical settings. Often patients will receive a pump-assisted bolus feeding over 30 minutes several times per day using the same enteral feeding set (EFS). This study aims to determine the safest and most efficacious way to handle the EFS between feedings. Three EFS handling techniques were compared through simulation for bacterial growth, nursing time, and supply costs: (1) rinsing the EFS with sterile water after each feeding, (2) refrigerating the EFS between feedings, and (3) using a ready-to-hang (RTH) product maintained at room temperature. Cultures were obtained at baseline, hour 12, and hour 21 of the 24-hour cycle. A time-in-motion analysis was conducted and reported in average number of seconds to complete each procedure. Supply costs were inventoried for 1 month comparing the actual usage to our estimated usage. Of 1080 cultures obtained, the overall bacterial growth rate was 8.7%. The rinse and refrigeration techniques displayed similar bacterial growth (11.4% vs 10.3%, P = .63). The RTH technique displayed the least bacterial growth of any method (4.4%, P = .002). The time analysis in minutes showed the rinse method was the most time-consuming (44.8 ± 2.7) vs refrigeration (35.8 ± 2.6) and RTH (31.08 ± 0.6) ( P refrigerating the EFS between uses is the next most efficacious method for handling the EFS between bolus feeds.

  4. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  5. Handling Qualities of Large Rotorcraft in Hover and Low Speed

    Science.gov (United States)

    2015-03-01

    shaft axes can themselves be a function of the nacelle angle. Following the rules of calculus of variations, the variation for the -force from Eq. (7...their preference. Main inceptor forces, in terms of gradients , breakouts, damping, and friction are provided by a hydraulic McFadden variable force...during training and formal evaluation. 29 Table 8. Inceptor force-displacement characteristics. Cockpit Control Rotorcraft Configuration Gradient

  6. Critical experiments for large scale enriched uranium solution handling

    International Nuclear Information System (INIS)

    Tanner, J.E.; Forehand, H.M.

    1985-01-01

    The authors have performed 17 critical experiments with a concentrated aqueous uranyl nitrate solution contained in an annular cylindrical tank, with annular cylindrical absorbers of stainless steel and/or polyethylene inside. k/sub eff/ calculated by KENO IV, employing 16-group Hansen-Roach cross sections, average 0.977. There is a variation of the calculational bias among the separate experiments, but it is too small to allow assigning it to specific components of the equipment. They are now performing critical experiments with a more concentrated uranyl nitrate solution in pairs of very squat cylindrical tanks with disc shaped absorbers and reflectors of carbon steel, stainless steel, nitronic-50, plain and borated polyethylene. These experiments are in support of upgrading fuel reprocessing at the Idaho Chemical Processing Plant

  7. Safeguarding future large-scale plutonium bulk handling facilities

    International Nuclear Information System (INIS)

    1979-01-01

    The paper reviews the current status, advantages, limitations and probable future developments of material accountancy and of containment and surveillance. The major limitations on the use of material accountancy in applying safeguards to future plants arise from the uncertainty with which flows and inventories can be measured (0.5 to 1.0%), and the necessity to carry out periodical physical inventories to determine whether material has been diverted. The use of plant instrumentation to determine in-process inventories has commenced and so has the development of statistical methods for the evaluations of the data derived from a series of consecutive material balance periods. The limitations of accountancy can be overcome by increased use of containment and surveillance measures which have the advantage that they are independent of the operator's actions. In using these measures it will be necessary to identify the credible diversion paths, build in sufficient redundancy to reduce false alarm rates, develop automatic data recording and alarming

  8. Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

    Science.gov (United States)

    Williamson, A.; Newman, A. V.

    2017-12-01

    Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.

  9. Recent fuel handling experience in Canada

    International Nuclear Information System (INIS)

    Welch, A.C.

    1991-01-01

    For many years, good operation of the fuel handling system at Ontario Hydro's nuclear stations has been taken for granted with the unavailability of the station arising from fuel handling system-related problems usually contributing less than one percent of the total unavailability of the stations. While the situation at the newer Hydro stations continues generally to be good (with the specific exception of some units at Pickering B) some specific and some general problems have caused significant loss of availability at the older plants (Pickering A and Bruce A). Generally the experience at the 600 MWe units in Canada has also continued to be good with Point Lepreau leading the world in availability. As a result of working to correct identified deficiencies, there were some changes for the better as some items of equipment that were a chronic source of trouble were replaced with improved components. In addition, the fuel handling system has been used three times as a delivery system for large-scale non destructive examination of the pressure tubes, twice at Bruce and once at Pickering and performing these inspections this way has saved many days of reactor downtime. Under COG there are several programs to develop improved versions of some of the main assemblies of the fuelling machine head. This paper will generally cover the events relating to Pickering in more detail but will describe the problems with the Bruce Fuelling Machine Bridges since the 600 MW 1P stations have a bridge drive arrangement that is somewhat similar to Bruce

  10. CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

    Directory of Open Access Journals (Sweden)

    K.-G. Karlsson

    2013-05-01

    Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.

  11. Large-scale machine learning and evaluation platform for real-time traffic surveillance

    Science.gov (United States)

    Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel

    2016-09-01

    In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.

  12. Robotic liquid handling and automation in epigenetics.

    Science.gov (United States)

    Gaisford, Wendy

    2012-10-01

    Automated liquid-handling robots and high-throughput screening (HTS) are widely used in the pharmaceutical industry for the screening of large compound libraries, small molecules for activity against disease-relevant target pathways, or proteins. HTS robots capable of low-volume dispensing reduce assay setup times and provide highly accurate and reproducible dispensing, minimizing variation between sample replicates and eliminating the potential for manual error. Low-volume automated nanoliter dispensers ensure accuracy of pipetting within volume ranges that are difficult to achieve manually. In addition, they have the ability to potentially expand the range of screening conditions from often limited amounts of valuable sample, as well as reduce the usage of expensive reagents. The ability to accurately dispense lower volumes provides the potential to achieve a greater amount of information than could be otherwise achieved using manual dispensing technology. With the emergence of the field of epigenetics, an increasing number of drug discovery companies are beginning to screen compound libraries against a range of epigenetic targets. This review discusses the potential for the use of low-volume liquid handling robots, for molecular biological applications such as quantitative PCR and epigenetics.

  13. 7 CFR 926.9 - Handle.

    Science.gov (United States)

    2010-01-01

    ... the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Marketing Agreements and Orders; Fruits, Vegetables, Nuts), DEPARTMENT OF AGRICULTURE DATA COLLECTION, REPORTING AND RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.9 Handle. Handle...

  14. HMSRP Hawaiian Monk Seal Handling Data

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data set contains records for all handling and measurement of Hawaiian monk seals since 1981. Live seals are handled and measured during a variety of events...

  15. Regulations on handling dangerous objects in Japan (with particular reference to sodium)

    International Nuclear Information System (INIS)

    Nagai, M.

    1971-01-01

    Sodium is designated as a kind of dangerous object, so that special care has to be taken in handling or storing large amounts of sodium. Formal regulations on sodium handling in Japan are prescribed in Fire Service Law, which is supplemented by Rules on Handling Dangerous Objects. Since these regulations are not intended to be applied to large sodium circuits, some defects and inappropriate expressions might be found in them. An attempt is made here to pick up these problems and important points from Japanese regulations on handling dangerous objects with particular reference to sodium

  16. ERROR HANDLING IN INTEGRATION WORKFLOWS

    Directory of Open Access Journals (Sweden)

    Alexey M. Nazarenko

    2017-01-01

    Full Text Available Simulation experiments performed while solving multidisciplinary engineering and scientific problems require joint usage of multiple software tools. Further, when following a preset plan of experiment or searching for optimum solu- tions, the same sequence of calculations is run multiple times with various simulation parameters, input data, or conditions while overall workflow does not change. Automation of simulations like these requires implementing of a workflow where tool execution and data exchange is usually controlled by a special type of software, an integration environment or plat- form. The result is an integration workflow (a platform-dependent implementation of some computing workflow which, in the context of automation, is a composition of weakly coupled (in terms of communication intensity typical subtasks. These compositions can then be decomposed back into a few workflow patterns (types of subtasks interaction. The pat- terns, in their turn, can be interpreted as higher level subtasks.This paper considers execution control and data exchange rules that should be imposed by the integration envi- ronment in the case of an error encountered by some integrated software tool. An error is defined as any abnormal behavior of a tool that invalidates its result data thus disrupting the data flow within the integration workflow. The main requirementto the error handling mechanism implemented by the integration environment is to prevent abnormal termination of theentire workflow in case of missing intermediate results data. Error handling rules are formulated on the basic pattern level and on the level of a composite task that can combine several basic patterns as next level subtasks. The cases where workflow behavior may be different, depending on user's purposes, when an error takes place, and possible error handling op- tions that can be specified by the user are also noted in the work.

  17. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  18. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  19. The handling of radiation accidents

    International Nuclear Information System (INIS)

    Macdonald, H.F.; Orchard, H.C.; Walker, C.W.

    1977-04-01

    Some of the more interesting and important contributions to a recent International Symposium on the Handling of Radiation Accidents are discussed and personal comments on many of the papers presented are included. The principal conclusion of the Symposium was that although the nuclear industry has an excellent safety record, there is no room for complacency. Continuing attention to emergency planning and exercising are essential in order to maintain this position. A full list of the papers presented at the Symposium is included as an Appendix. (author)

  20. A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

    Directory of Open Access Journals (Sweden)

    Mohammad Amin Shayegan

    2014-01-01

    Full Text Available A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.

  1. 7 CFR 58.443 - Whey handling.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Whey handling. 58.443 Section 58.443 Agriculture... Procedures § 58.443 Whey handling. (a) Adequate sanitary facilities shall be provided for the handling of whey. If outside, necessary precautions shall be taken to minimize flies, insects and development of...

  2. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  3. Supporting flexible processes with adaptive workflow and case handling

    NARCIS (Netherlands)

    Günther, C.W.; Reichert, M.; Aalst, van der W.M.P.

    2008-01-01

    Workflow management technology has profoundly transformed the way complex tasks are being handled in modern, large-scale organizations. However, it is mostly those systems' inherent lack of flexibility that hinders their broad acceptance, and that is perceived as annoyance by users. In this context,

  4. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  5. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  6. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  7. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  8. A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.

    Science.gov (United States)

    Kamal, Sarwar; Ripon, Shamim Hasnat; Dey, Nilanjan; Ashour, Amira S; Santhi, V

    2016-07-01

    In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  9. Inverse modelling estimates of N2O surface emissions and stratospheric losses using a global dataset

    Science.gov (United States)

    Thompson, R. L.; Bousquet, P.; Chevallier, F.; Dlugokencky, E. J.; Vermeulen, A. T.; Aalto, T.; Haszpra, L.; Meinhardt, F.; O'Doherty, S.; Moncrieff, J. B.; Popa, M.; Steinbacher, M.; Jordan, A.; Schuck, T. J.; Brenninkmeijer, C. A.; Wofsy, S. C.; Kort, E. A.

    2010-12-01

    Nitrous oxide (N2O) levels have been steadily increasing in the atmosphere over the past few decades at a rate of approximately 0.3% per year. This trend is of major concern as N2O is both a long-lived Greenhouse Gas (GHG) and an Ozone Depleting Substance (ODS), as it is a precursor of NO and NO2, which catalytically destroy ozone in the stratosphere. Recently, N2O emissions have been recognised as the most important ODS emissions and are now of greater importance than emissions of CFC's. The growth in atmospheric N2O is predominantly due to the enhancement of surface emissions by human activities. Most notably, the intensification and proliferation of agriculture since the mid-19th century, which has been accompanied by the increased input of reactive nitrogen to soils and has resulted in significant perturbations to the natural N-cycle and emissions of N2O. There exist two approaches for estimating N2O emissions, the so-called 'bottom-up' and 'top-down' approaches. Top-down approaches, based on the inversion of atmospheric measurements, require an estimate of the loss of N2O via photolysis and oxidation in the stratosphere. Uncertainties in the loss magnitude contribute uncertainties of 15 to 20% to the global annual surface emissions, complicating direct comparisons between bottom-up and top-down estimates. In this study, we present a novel inversion framework for the simultaneous optimization of N2O surface emissions and the magnitude of the loss, which avoids errors in the emissions due to incorrect assumptions about the lifetime of N2O. We use a Bayesian inversion with a variational formulation (based on 4D-Var) in order to handle very large datasets. N2O fluxes are retrieved at 4-weekly resolution over a global domain with a spatial resolution of 3.75° x 2.5° longitude by latitude. The efficacy of the simultaneous optimization of emissions and losses is tested using a global synthetic dataset, which mimics the available atmospheric data. Lastly, using real

  10. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. EasyPCC: Benchmark Datasets and Tools for High-Throughput Measurement of the Plant Canopy Coverage Ratio under Field Conditions

    Directory of Open Access Journals (Sweden)

    Wei Guo

    2017-04-01

    Full Text Available Understanding interactions of genotype, environment, and management under field conditions is vital for selecting new cultivars and farming systems. Image analysis is considered a robust technique in high-throughput phenotyping with non-destructive sampling. However, analysis of digital field-derived images remains challenging because of the variety of light intensities, growth environments, and developmental stages. The plant canopy coverage (PCC ratio is an important index of crop growth and development. Here, we present a tool, EasyPCC, for effective and accurate evaluation of the ground coverage ratio from a large number of images under variable field conditions. The core algorithm of EasyPCC is based on a pixel-based segmentation method using a decision-tree-based segmentation model (DTSM. EasyPCC was developed under the MATLAB® and R languages; thus, it could be implemented in high-performance computing to handle large numbers of images following just a single model training process. This study used an experimental set of images from a paddy field to demonstrate EasyPCC, and to show the accuracy improvement possible by adjusting key points (e.g., outlier deletion and model retraining. The accuracy (R2 = 0.99 of the calculated coverage ratio was validated against a corresponding benchmark dataset. The EasyPCC source code is released under GPL license with benchmark datasets of several different crop types for algorithm development and for evaluating ground coverage ratios.

  12. Simulation of the MRS receiving and handling facility

    International Nuclear Information System (INIS)

    Triplett, M.B.; Imhoff, C.H.; Hostick, C.J.

    1984-02-01

    Monitored retrievable storage (MRS) will be required to handle a large volume of spent fuel or high-level waste (HLW) in case of delays in repository deployment. The quantities of materials to be received and repackaged for storage far exceed the requirements of existing waste mangement facilities. A computer simulation model of the MRS receiving and handling (R and H) fcility has been constructed and used to evaluate design alternatives. Studies have identified processes or activities which may constrain throughput performance. In addition, the model has helped to assess design tradeoffs such as those to be made among improved process times, redundant service lines, and improved component availability. 1 reference, 5 figures

  13. Advanced remote handling developments for high radiation applications

    International Nuclear Information System (INIS)

    Herndon, J.N.; Kring, C.T.; Feldman, M.J.; Kuban, D.P.; Martin, H.L.; Rowe, J.C.; Hamel, W.R.

    1985-01-01

    The Remote Control Engineering Task of the Consolidated Fuel Reprocessing Program at Oak Ridge National Laboratory has been developing advanced techniques for remote maintenance of future US fuel reprocessing plants. These efforts are based on the application of teleoperated, force-reflecting servomanipulators for dexterous remote handling with television viewing for large-volume hazardous applications. These developments fully address the nonrepetitive nature of remote maintenance in the unstructured environments encountered in fuel reprocessing. This paper covers the primary emphasis in the present program; the design, fabrication, and installation of a prototype remote handling system for reprocessing applications, the Advanced Integrated Maintenance System

  14. Safety of Cargo Aircraft Handling Procedure

    Directory of Open Access Journals (Sweden)

    Daniel Hlavatý

    2017-07-01

    Full Text Available The aim of this paper is to get acquainted with the ways how to improve the safety management system during cargo aircraft handling. The first chapter is dedicated to general information about air cargo transportation. This includes the history or types of cargo aircraft handling, but also the means of handling. The second part is focused on detailed description of cargo aircraft handling, including a description of activities that are performed before and after handling. The following part of this paper covers a theoretical interpretation of safety, safety indicators and legislative provisions related to the safety of cargo aircraft handling. The fourth part of this paper analyzes the fault trees of events which might occur during handling. The factors found by this analysis are compared with safety reports of FedEx. Based on the comparison, there is a proposal on how to improve the safety management in this transportation company.

  15. Transfer Area Mechanical Handling Calculation

    International Nuclear Information System (INIS)

    Dianda, B.

    2004-01-01

    This calculation is intended to support the License Application (LA) submittal of December 2004, in accordance with the directive given by DOE correspondence received on the 27th of January 2004 entitled: ''Authorization for Bechtel SAX Company L.L. C. to Include a Bare Fuel Handling Facility and Increased Aging Capacity in the License Application, Contract Number DE-AC--28-01R W12101'' (Arthur, W.J., I11 2004). This correspondence was appended by further Correspondence received on the 19th of February 2004 entitled: ''Technical Direction to Bechtel SAIC Company L.L. C. for Surface Facility Improvements, Contract Number DE-AC--28-OIRW12101; TDL No. 04-024'' (BSC 2004a). These documents give the authorization for a Fuel Handling Facility to be included in the baseline. The purpose of this calculation is to establish preliminary bounding equipment envelopes and weights for the Fuel Handling Facility (FHF) transfer areas equipment. This calculation provides preliminary information only to support development of facility layouts and preliminary load calculations. The limitations of this preliminary calculation lie within the assumptions of section 5 , as this calculation is part of an evolutionary design process. It is intended that this calculation is superseded as the design advances to reflect information necessary to support License Application. The design choices outlined within this calculation represent a demonstration of feasibility and may or may not be included in the completed design. This calculation provides preliminary weight, dimensional envelope, and equipment position in building for the purposes of defining interface variables. This calculation identifies and sizes major equipment and assemblies that dictate overall equipment dimensions and facility interfaces. Sizing of components is based on the selection of commercially available products, where applicable. This is not a specific recommendation for the future use of these components or their

  16. Single-molecule mechanics of protein-labelled DNA handles

    Directory of Open Access Journals (Sweden)

    Vivek S. Jadhav

    2016-01-01

    Full Text Available DNA handles are often used as spacers and linkers in single-molecule experiments to isolate and tether RNAs, proteins, enzymes and ribozymes, amongst other biomolecules, between surface-modified beads for nanomechanical investigations. Custom DNA handles with varying lengths and chemical end-modifications are readily and reliably synthesized en masse, enabling force spectroscopic measurements with well-defined and long-lasting mechanical characteristics under physiological conditions over a large range of applied forces. Although these chemically tagged DNA handles are widely used, their further individual modification with protein receptors is less common and would allow for additional flexibility in grabbing biomolecules for mechanical measurements. In-depth information on reliable protocols for the synthesis of these DNA–protein hybrids and on their mechanical characteristics under varying physiological conditions are lacking in literature. Here, optical tweezers are used to investigate different protein-labelled DNA handles in a microfluidic environment under different physiological conditions. Digoxigenin (DIG-dsDNA-biotin handles of varying sizes (1000, 3034 and 4056 bp were conjugated with streptavidin or neutravidin proteins. The DIG-modified ends of these hybrids were bound to surface-modified polystyrene (anti-DIG beads. Using different physiological buffers, optical force measurements showed consistent mechanical characteristics with long dissociation times. These protein-modified DNA hybrids were also interconnected in situ with other tethered biotinylated DNA molecules. Electron-multiplying CCD (EMCCD imaging control experiments revealed that quantum dot–streptavidin conjugates at the end of DNA handles remain freely accessible. The experiments presented here demonstrate that handles produced with our protein–DNA labelling procedure are excellent candidates for grasping single molecules exposing tags suitable for molecular

  17. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  18. CANISTER HANDLING FACILITY DESCRIPTION DOCUMENT

    Energy Technology Data Exchange (ETDEWEB)

    J.F. Beesley

    2005-04-21

    The purpose of this facility description document (FDD) is to establish requirements and associated bases that drive the design of the Canister Handling Facility (CHF), which will allow the design effort to proceed to license application. This FDD will be revised at strategic points as the design matures. This FDD identifies the requirements and describes the facility design, as it currently exists, with emphasis on attributes of the design provided to meet the requirements. This FDD is an engineering tool for design control; accordingly, the primary audience and users are design engineers. This FDD is part of an iterative design process. It leads the design process with regard to the flowdown of upper tier requirements onto the facility. Knowledge of these requirements is essential in performing the design process. The FDD follows the design with regard to the description of the facility. The description provided in this FDD reflects the current results of the design process.

  19. Bulk handling benefits from ICT

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2007-11-15

    The efficiency and accuracy of bulk handling is being improved by the range of management information systems and services available today. As part of the program to extend Richards Bay Coal Terminal, Siemens is installing a manufacturing execution system which coordinates and monitors all movements of raw materials. The article also reports recent developments by AXSMarine, SunGuard Energy, Fuelworx and Railworx in providing integrated tools for tracking, managing and optimising solid/liquid fuels and rail car maintenance activities. QMASTOR Ltd. has secured a contract with Anglo Coal Australia to provide its Pit to Port.net{reg_sign} and iFuse{reg_sign} software systems across all their Australians sites, to include pit-to-product stockpile management. 2 figs.

  20. Handling and transport problems (1960)

    International Nuclear Information System (INIS)

    Pomarola, J.; Savouyaud, J.

    1960-01-01

    I. The handling and transport of radioactive wastes involves the danger of irradiation and contamination. It is indispensable: - to lay down a special set of rules governing the removal and transport of wastes within centres or from one centre to another; - to give charge of this transportation to a group containing teams of specialists. The organisation, equipment and output of these teams is being examined. II. Certain materials are particularly dangerous to transport, and for these special vehicles and fixed installations are necessary. This is the case especially for the evacuation of very active liquids. A transport vehicle is described, consisting of a trailer tractor and a recipient holding 500 litres of liquid of which the activity can reach 1000 C/l; the decanting operation, the route to be followed by the vehicle, and the precautions taken are also described. (author) [fr

  1. CANISTER HANDLING FACILITY DESCRIPTION DOCUMENT

    International Nuclear Information System (INIS)

    Beesley. J.F.

    2005-01-01

    The purpose of this facility description document (FDD) is to establish requirements and associated bases that drive the design of the Canister Handling Facility (CHF), which will allow the design effort to proceed to license application. This FDD will be revised at strategic points as the design matures. This FDD identifies the requirements and describes the facility design, as it currently exists, with emphasis on attributes of the design provided to meet the requirements. This FDD is an engineering tool for design control; accordingly, the primary audience and users are design engineers. This FDD is part of an iterative design process. It leads the design process with regard to the flowdown of upper tier requirements onto the facility. Knowledge of these requirements is essential in performing the design process. The FDD follows the design with regard to the description of the facility. The description provided in this FDD reflects the current results of the design process

  2. Fuel Handling Facility Description Document

    International Nuclear Information System (INIS)

    M.A. LaFountain

    2005-01-01

    The purpose of the facility description document (FDD) is to establish the requirements and their bases that drive the design of the Fuel Handling Facility (FHF) to allow the design effort to proceed to license application. This FDD is a living document that will be revised at strategic points as the design matures. It identifies the requirements and describes the facility design as it currently exists, with emphasis on design attributes provided to meet the requirements. This FDD was developed as an engineering tool for design control. Accordingly, the primary audience and users are design engineers. It leads the design process with regard to the flow down of upper tier requirements onto the facility. Knowledge of these requirements is essential to performing the design process. It trails the design with regard to the description of the facility. This description is a reflection of the results of the design process to date

  3. Data Handling and Parameter Estimation

    DEFF Research Database (Denmark)

    Sin, Gürkan; Gernaey, Krist

    2016-01-01

    ,engineers, and professionals. However, it is also expected that they will be useful both for graduate teaching as well as a stepping stone for academic researchers who wish to expand their theoretical interest in the subject. For the models selected to interpret the experimental data, this chapter uses available models from...... literature that are mostly based on the ActivatedSludge Model (ASM) framework and their appropriate extensions (Henze et al., 2000).The chapter presents an overview of the most commonly used methods in the estimation of parameters from experimental batch data, namely: (i) data handling and validation, (ii......Modelling is one of the key tools at the disposal of modern wastewater treatment professionals, researchers and engineers. It enables them to study and understand complex phenomena underlying the physical, chemical and biological performance of wastewater treatment plants at different temporal...

  4. Superphenix 1 primary handling system fabrication and testing

    International Nuclear Information System (INIS)

    Branchu, J.; Ebbinghaus, K.; Gigarel, C.

    1985-01-01

    Primary handling covers the operations performed for spent fuel removal, new fuel insertion, and the insodium storage outside the new or spent fuel vessel. This equipment typifies many of the difficulties encountered with the project as a whole: fabrication coordination when several countries are involved and design and construction of very large, relatively complex components. Detailed design studies were mainly influenced by thermal and seismic requirements, as applicable to sodium-immersed structures. Where possible, well-tried mechanical solutions were used, but widely differing techniques were involved, ranging from the high precision fabrication of structures and mechanisms comprising numerous component parts, implying complex machining operations. No particular problems were encountered during the sodium testing of the primary handling equipment. Trends for the 1500-MW (electric) breeder include investigation of the advisability of fuel storage in the core lattice and the possibility of handling system simplification

  5. Data handling with SAM and art at the NOνA experiment

    International Nuclear Information System (INIS)

    Aurisano, A; Backhouse, C; Davies, G S; Illingworth, R; Mengel, M; Norman, A; Mayer, N; Rocco, D; Zirnstein, J

    2015-01-01

    During operations, NOvA produces between 5,000 and 7,000 raw files per day with peaks in excess of 12,000. These files must be processed in several stages to produce fully calibrated and reconstructed analysis files. In addition, many simulated neutrino interactions must be produced and processed through the same stages as data. To accommodate the large volume of data and Monte Carlo, production must be possible both on the Fermilab grid and on off-site farms, such as the ones accessible through the Open Science Grid. To handle the challenge of cataloging these files and to facilitate their off-line processing, we have adopted the SAM system developed at Fermilab. SAM indexes files according to metadata, keeps track of each file's physical locations, provides dataset management facilities, and facilitates data transfer to off-site grids. To integrate SAM with Fermilab's art software framework and the NOvA production workflow, we have developed methods to embed metadata into our configuration files, art files, and standalone ROOT files. A module in the art framework propagates the embedded information from configuration files into art files, and from input art files to output art files, allowing us to maintain a complete processing history within our files. Embedding metadata in configuration files also allows configuration files indexed in SAM to be used as inputs to Monte Carlo production jobs. Further, SAM keeps track of the input files used to create each output file. Parentage information enables the construction of self-draining datasets which have become the primary production paradigm used at NOvA. In this paper we will present an overview of SAM at NOvA and how it has transformed the file production framework used by the experiment. (paper)

  6. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  7. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  8. Cask system design guidance for robotic handling

    International Nuclear Information System (INIS)

    Griesmeyer, J.M.; Drotning, W.D.; Morimoto, A.K.; Bennett, P.C.

    1990-10-01

    Remote automated cask handling has the potential to reduce both the occupational exposure and the time required to process a nuclear waste transport cask at a handling facility. The ongoing Advanced Handling Technologies Project (AHTP) at Sandia National Laboratories is described. AHTP was initiated to explore the use of advanced robotic systems to perform cask handling operations at handling facilities for radioactive waste, and to provide guidance to cask designers regarding the impact of robotic handling on cask design. The proof-of-concept robotic systems developed in AHTP are intended to extrapolate from currently available commercial systems to the systems that will be available by the time that a repository would be open for operation. The project investigates those cask handling operations that would be performed at a nuclear waste repository facility during cask receiving and handling. The ongoing AHTP indicates that design guidance, rather than design specification, is appropriate, since the requirements for robotic handling do not place severe restrictions on cask design but rather focus on attention to detail and design for limited dexterity. The cask system design features that facilitate robotic handling operations are discussed, and results obtained from AHTP design and operation experience are summarized. The application of these design considerations is illustrated by discussion of the robot systems and their operation on cask feature mock-ups used in the AHTP project. 11 refs., 11 figs

  9. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  10. Omicseq: a web-based search engine for exploring omics datasets

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  11. Omicseq: a web-based search engine for exploring omics datasets.

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Recent progress on developments of tritium safe handling techniques in Japan

    International Nuclear Information System (INIS)

    Watanabe, Kuniaki; Matsuyama, Masao

    1993-01-01

    Vast amounts of tritium will be used for thermonuclear fusion reactors. Without establishing safe handling techniques for large amounts of tritium, undoubtedly the fusion reactors will not be accepted. Japanese activity on tritium related research has considerably developed in the last 10 years. This review paper gives a brief summary of safe handling techniques developed by Japanese research groups. (author)

  13. A Look at Technologies Vis-a-vis Information Handling Techniques.

    Science.gov (United States)

    Swanson, Rowena W.

    The paper examines several ideas for information handling implemented with new technologies that suggest directions for future development. These are grouped under the topic headings: Handling Large Data Banks, Providing Personalized Information Packages, Providing Information Specialist Services, and Expanding Man-Machine Interaction. Guides in…

  14. Hot Laboratories and Remote Handling

    International Nuclear Information System (INIS)

    Bart, G.; Blanc, J.Y.; Duwe, R.

    2003-01-01

    The European Working Group on ' Hot Laboratories and Remote Handling' is firmly established as the major contact forum for the nuclear R and D facilities at the European scale. The yearly plenary meetings intend to: - Exchange experience on analytical methods, their implementation in hot cells, the methodologies used and their application in nuclear research; - Share experience on common infrastructure exploitation matters such as remote handling techniques, safety features, QA-certification, waste handling; - Promote normalization and co-operation, e.g., by looking at mutual complementarities; - Prospect present and future demands from the nuclear industry and to draw strategic conclusions regarding further needs. The 41. plenary meeting was held in CEA Saclay from September 22 to 24, 2003 in the premises and with the technical support of the INSTN (National Institute for Nuclear Science and Technology). The Nuclear Energy Division of CEA sponsored it. The Saclay meeting was divided in three topical oral sessions covering: - Post irradiation examination: new analysis methods and methodologies, small specimen technology, programmes and results; - Hot laboratory infrastructure: decommissioning, refurbishment, waste, safety, nuclear transports; - Prospective research on materials for future applications: innovative fuels (Generation IV, HTR, transmutation, ADS), spallation source materials, and candidate materials for fusion reactor. A poster session was opened to transport companies and laboratory suppliers. The meeting addressed in three sessions the following items: Session 1 - Post Irradiation Examinations. Out of 12 papers (including 1 poster) 7 dealt with surface and solid state micro analysis, another one with an equally complex wet chemical instrumental analytical technique, while the other four papers (including the poster) presented new concepts for digital x-ray image analysis; Session 2 - Hot laboratory infrastructure (including waste theme) which was

  15. Installation and method for handling fuel assemblies of fast nuclear reactors

    International Nuclear Information System (INIS)

    Aubert, Michel; Renaux, Charley.

    1982-01-01

    This invention concerns an installation and a method for handling the assemblies which makes it possible to have a large revolving plug smaller in diameter than that of the presently known solutions. This large, coaxial to the core, revolving plug has a handling arm enabling a fraction of the assemblies to be reached and deposited in a handling well. Through a small offset revolving plug the remainder of the assemblies can be reached and deposited in a pick-up well accessible to the arm of the large revolving plug [fr

  16. Development of commercial robots for radwaste handling

    International Nuclear Information System (INIS)

    Colborn, K.A.

    1988-01-01

    The cost and dose burden associated with low level radwaste handling activities is a matter of increasing concern to the commercial nuclear power industry. This concern is evidenced by the fact that many utilities have begun to revaluate waste generation, handling, and disposal activities at their plants in an effort to improve their overall radwaste handling operations. This paper reports on the project Robots for Radwaste Handling, to identify the potential of robots to improve radwaste handling operations. The project has focussed on the potential of remote or automated technology to improve well defined, recognizable radwaste operations. The project focussed on repetitive, low skill level radwaste handling and decontamination tasks which involve significant radiation exposure

  17. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Science.gov (United States)

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  18. Programs for visualization, handling and quantification of PIXE maps at the AGLAE facility

    International Nuclear Information System (INIS)

    Pichon, L.; Calligaro, T.; Lemasson, Q.; Moignard, B.; Pacheco, C.

    2015-01-01

    The external beam setup of the AGLAE facility has been developed in order to combine PIXE with PIGE, EBS and recently IBIL for the analysis of cultural heritage artefacts. The upgraded external beam end-station integrates five large solid angle X-ray detectors either to reduce the risk of damage on sensitive artworks by decreasing the beam intensity or to routinely acquire elemental maps at various scales. While many programs are available to process PIXE maps acquired with nuclear microprobes, a software to process the major and trace elements PIXE maps point by point using GUPIX is not available. The present paper describes three programs developed for the AGLAE facility to process numerous maps obtained with multiple detectors. AGLAEMAP allows to handle maps and pixel groups within maps, TRAUPIXE to process quantitatively PIXE spectra of all pixels and DATAIMAGING to display the resulting quantitative elemental maps. The benefits of this software suite are demonstrated by processing a dataset acquired on a pellet of geostandard reference material and on a terre mêlée pottery shard sample created by the famous ceramist Bernard Palissy (1510–1589), highlighting chemical elements present in this polychrome ceramic.

  19. Programs for visualization, handling and quantification of PIXE maps at the AGLAE facility

    Energy Technology Data Exchange (ETDEWEB)

    Pichon, L., E-mail: laurent.pichon@culture.fr [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France); Calligaro, T. [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France); PSL Research University, Chimie ParisTech-CNRS, Institut de Recherche Chimie Paris, UMR8247, 75005 Paris (France); Lemasson, Q.; Moignard, B.; Pacheco, C. [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France)

    2015-11-15

    The external beam setup of the AGLAE facility has been developed in order to combine PIXE with PIGE, EBS and recently IBIL for the analysis of cultural heritage artefacts. The upgraded external beam end-station integrates five large solid angle X-ray detectors either to reduce the risk of damage on sensitive artworks by decreasing the beam intensity or to routinely acquire elemental maps at various scales. While many programs are available to process PIXE maps acquired with nuclear microprobes, a software to process the major and trace elements PIXE maps point by point using GUPIX is not available. The present paper describes three programs developed for the AGLAE facility to process numerous maps obtained with multiple detectors. AGLAEMAP allows to handle maps and pixel groups within maps, TRAUPIXE to process quantitatively PIXE spectra of all pixels and DATAIMAGING to display the resulting quantitative elemental maps. The benefits of this software suite are demonstrated by processing a dataset acquired on a pellet of geostandard reference material and on a terre mêlée pottery shard sample created by the famous ceramist Bernard Palissy (1510–1589), highlighting chemical elements present in this polychrome ceramic.

  20. Sequence trajectory generation for garment handling systems

    OpenAIRE

    Liu, Honghai; Lin, Hua

    2008-01-01

    This paper presents a novel generic approach to the planning strategy of garment handling systems. An assumption is proposed to separate the components of such systems into a component for intelligent gripper techniques and a component for handling planning strategies. Researchers can concentrate on one of the two components first, then merge the two problems together. An algorithm is addressed to generate the trajectory position and a clothes handling sequence of clothes partitions, which ar...

  1. Enclosure for handling high activity materials

    International Nuclear Information System (INIS)

    Jimeno de Osso, F.

    1977-01-01

    One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With this purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. In this report a description is given of those aspects that may be of interest to people working in this field. (Author)

  2. Enclosure for handling high activity materials abstract

    International Nuclear Information System (INIS)

    Jimeno de Osso, F.; Dominguez Rodriguez, G.; Cruz Castillo, F. de la; Rodriguez Esteban, A.

    1977-01-01

    One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With that purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. A description is given of those aspects that may be of interest to people working in this field. (author) [es

  3. Scheduling of outbound luggage handling at airports

    DEFF Research Database (Denmark)

    Barth, Torben C.; Pisinger, David

    2012-01-01

    This article considers the outbound luggage handling problem at airports. The problem is to assign handling facilities to outbound flights and decide about the handling start time. This dynamic, near real-time assignment problem is part of the daily airport operations. Quality, efficiency......). Another solution method is a decomposition approach. The problem is divided into different subproblems and solved in iterative steps. The different solution approaches are tested on real world data from Frankfurt Airport....

  4. ATA diagnostic data handling system: an overview

    International Nuclear Information System (INIS)

    Chambers, F.W.; Kallman, J.; McDonald, J.; Slominski, M.

    1984-01-01

    The functions to be performed by the ATA diagnostic data handling system are discussed. The capabilities of the present data acquisition system (System 0) are presented. The goals for the next generation acquisition system (System 1), currently under design, are discussed. Facilities on the Octopus system for data handling are reviewed. Finally, we discuss what has been learned about diagnostics and computer based data handling during the past year

  5. Enclosure for handling high activity materials

    Energy Technology Data Exchange (ETDEWEB)

    Jimeno de Osso, F

    1977-07-01

    One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With this purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. In this report a description is given of those aspects that may be of interest to people working in this field. (Author)

  6. Testing AGN unification via inference from large catalogs

    Science.gov (United States)

    Nikutta, Robert; Ivezic, Zeljko; Elitzur, Moshe; Nenkova, Maia

    2018-01-01

    Source orientation and clumpiness of the central dust are the main factors in AGN classification. Type-1 QSOs are easy to observe and large samples are available (e.g. in SDSS), but obscured type-2 AGN are dimmer and redder as our line of sight is more obscured, making it difficult to obtain a complete sample. WISE has found up to a million QSOs. With only 4 bands and a relatively small aperture the analysis of individual sources is challenging, but the large sample allows inference of bulk properties at a very significant level.CLUMPY (www.clumpy.org) is arguably the most popular database of AGN torus SEDs. We model the ensemble properties of the entire WISE AGN content using regularized linear regression, with orientation-dependent CLUMPY color-color-magnitude (CCM) tracks as basis functions. We can reproduce the observed number counts per CCM bin with percent-level accuracy, and simultaneously infer the probability distributions of all torus parameters, redshifts, additional SED components, and identify type-1/2 AGN populations through their IR properties alone. We increase the statistical power of our AGN unification tests even further, by adding other datasets as axes in the regression problem. To this end, we make use of the NOAO Data Lab (datalab.noao.edu), which hosts several high-level large datasets and provides very powerful tools for handling large data, e.g. cross-matched catalogs, fast remote queries, etc.

  7. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  8. Application Examples for Handle System Usage

    Science.gov (United States)

    Toussaint, F.; Weigel, T.; Thiemann, H.; Höck, H.; Stockhause, M.; Lautenschlager, M.

    2012-12-01

    Besides the well-known DOI (Digital Object Identifiers) as a special form of Handles that resolve to scientific publications there are various other applications in use. Others perhaps are just not yet. We present some examples for the existing ones and some ideas for the future. The national German project C3-Grid provides a framework to implement a first solution for provenance tracing and explore unforeseen implications. Though project-specific, the high-level architecture is generic and represents well a common notion of data derivation. Users select one or many input datasets and a workflow software module (an agent in this context) to execute on the data. The output data is deposited in a repository to be delivered to the user. All data is accompanied by an XML metadata document. All input and output data, metadata and the workflow module receive Handles and are linked together to establish a directed acyclic graph of derived data objects and involved agents. Data that has been modified by a workflow module is linked to its predecessor data and the workflow module involved. Version control systems such as svn or git provide Internet access to software repositories using URLs. To refer to a specific state of the source code of for instance a C3 workflow module, it is sufficient to reference the URL to the svn revision or git hash. In consequence, individual revisions and the repository as a whole receive PIDs. Moreover, the revision specific PIDs are linked to their respective predecessors and become part of the provenance graph. Another example for usage of PIDs in a current major project is given in EUDAT (European Data Infrastructure) which will link scientific data of several research communities together. In many fields it is necessary to provide data objects at multiple locations for a variety of applications. To ensure consistency, not only the master of a data object but also its copies shall be provided with a PID. To verify transaction safety and to

  9. Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

    OpenAIRE

    Antonio Jesús Pérez-Luque; Cristina Patricia Sánchez-Rojas; Regino Zamora; Ramón Pérez-Pérez; Francisco Javier Bonet

    2015-01-01

    Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems...

  10. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  11. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  12. Musculoskeletal injuries resulting from patient handling tasks among hospital workers.

    Science.gov (United States)

    Pompeii, Lisa A; Lipscomb, Hester J; Schoenfisch, Ashley L; Dement, John M

    2009-07-01

    The purpose of this study was to evaluate musculoskeletal injuries and disorders resulting from patient handling prior to the implementation of a "minimal manual lift" policy at a large tertiary care medical center. We sought to define the circumstances surrounding patient handling injuries and to identify potential preventive measures. Human resources data were used to define the cohort and their time at work. Workers' compensation records (1997-2003) were utilized to identify work-related musculoskeletal claims, while the workers' description of injury was used to identify those that resulted from patient handling. Adjusted rate ratios were generated using Poisson regression. One-third (n = 876) of all musculoskeletal injuries resulted from patient handling activities. Most (83%) of the injury burden was incurred by inpatient nurses, nurses' aides and radiology technicians, while injury rates were highest for nurses' aides (8.8/100 full-time equivalent, FTEs) and smaller workgroups including emergency medical technicians (10.3/100 FTEs), patient transporters (4.3/100 FTEs), operating room technicians (3.1/100 FTEs), and morgue technicians (2.2/100 FTEs). Forty percent of injuries due to lifting/transferring patients may have been prevented through the use of mechanical lift equipment, while 32% of injuries resulting from repositioning/turning patients, pulling patients up in bed, or catching falling patients may not have been prevented by the use of lift equipment. The use of mechanical lift equipment could significantly reduce the risk of some patient handling injuries but additional interventions need to be considered that address other patient handling tasks. Smaller high-risk workgroups should not be neglected in prevention efforts.

  13. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  14. Hot Laboratories and Remote Handling

    International Nuclear Information System (INIS)

    2007-01-01

    The Opening talk of the workshop 'Hot Laboratories and Remote Handling' was given by Marin Ciocanescu with the communication 'Overview of R and D Program in Romanian Institute for Nuclear Research'. The works of the meeting were structured into three sections addressing the following items: Session 1. Hot cell facilities: Infrastructure, Refurbishment, Decommissioning; Session 2. Waste, transport, safety and remote handling issues; Session 3. Post-Irradiation examination techniques. In the frame of Section 1 the communication 'Overview of hot cell facilities in South Africa' by Wouter Klopper, Willie van Greunen et al, was presented. In the framework of the second session there were given the following four communications: 'The irradiated elements cell at PHENIX' by Laurent Breton et al., 'Development of remote equipment for DUPIC fuel fabrication at KAERI', by Jung Won Lee et al., 'Aspects of working with manipulators and small samples in an αβγ-box, by Robert Zubler et al., and 'The GIOCONDA experience of the Joint Research Centre Ispra: analysis of the experimental assemblies finalized to their safe recovery and dismantling', by Roberto Covini. Finally, in the framework of the third section the following five communications were presented: 'PIE of a CANDU fuel element irradiated for a load following test in the INR TRIGA reactor' by Marcel Parvan et al., 'Adaptation of the pole figure measurement to the irradiated items from zirconium alloys' by Yury Goncharenko et al., 'Fuel rod profilometry with a laser scan micrometer' by Daniel Kuster et al., 'Raman spectroscopy, a new facility at LECI laboratory to investigate neutron damage in irradiated materials' by Lionel Gosmain et al., and 'Analysis of complex nuclear materials with the PSI shielded analytical instruments' by Didier Gavillet. In addition, eleven more presentations were given as posters. Their titles were: 'Presentation of CETAMA activities (CEA analytic group)' by Alain Hanssens et al. 'Analysis of

  15. Fuel handling problems at KANUPP

    Energy Technology Data Exchange (ETDEWEB)

    Ahmed, I; Mazhar Hasan, S; Mugtadir, A [Karachi Nuclear Power Plant (KANUPP), Karachi (Pakistan)

    1991-04-01

    KANUPP experienced two abnormal fuel and fuel handling related problems during the year 1990. One of these had arisen due to development of end plate to end plate coupling between the two bundles at the leading end of the fuel string in channel HO2-S. The incident occurred when attempts were being made to fuel this channel. Due to pulling of sticking bundles into the acceptor fuelling machine (north) magazine, which was not designed to accommodate two bundles, a magazine rotary stop occurred. The forward motion of the charge tube was simultaneously discovered to be restricted. The incident led to stalling of fuelling machine locked on to the channel HO2, necessitating a reactor shut down. Removal of the fuelling machine was accomplished sometime later after draining of the channel. The second incident which made the fuelling of channel KO5-N temporarily inexecutable, occurred during attempts to remove its north end shield plug when this channel came up for fuelling. The incident resulted due to breaking of the lugs of the shield plug, making its withdrawal impossible. The Plant however kept operating with suspended fuelling of channel KO5, until it could no longer sustain a further increase in fuel burnup at the maximum rating position. Resolving both these problems necessitated draining of the respective channels, leaving the resident fuel uncovered for the duration of the associated operation. Due to substantial difference in the oxidation temperatures Of UO{sub 2} and Zircaloy and its influence as such on the cooling requirement, it was necessary either to determine explicitly that the respective channels did not contain defective fuel bundles or wait for time long enough to allow the decay heat to reduce to manageable proportions. This had a significant bearing on the Plant down time necessary for the rectification of the problems. This paper describes the two incidents in detail and dwells upon the measures adopted to resolve the related problems. (author)

  16. Fuel handling problems at KANUPP

    International Nuclear Information System (INIS)

    Ahmed, I.; Mazhar Hasan, S.; Mugtadir, A.

    1991-01-01

    KANUPP experienced two abnormal fuel and fuel handling related problems during the year 1990. One of these had arisen due to development of end plate to end plate coupling between the two bundles at the leading end of the fuel string in channel HO2-S. The incident occurred when attempts were being made to fuel this channel. Due to pulling of sticking bundles into the acceptor fuelling machine (north) magazine, which was not designed to accommodate two bundles, a magazine rotary stop occurred. The forward motion of the charge tube was simultaneously discovered to be restricted. The incident led to stalling of fuelling machine locked on to the channel HO2, necessitating a reactor shut down. Removal of the fuelling machine was accomplished sometime later after draining of the channel. The second incident which made the fuelling of channel KO5-N temporarily inexecutable, occurred during attempts to remove its north end shield plug when this channel came up for fuelling. The incident resulted due to breaking of the lugs of the shield plug, making its withdrawal impossible. The Plant however kept operating with suspended fuelling of channel KO5, until it could no longer sustain a further increase in fuel burnup at the maximum rating position. Resolving both these problems necessitated draining of the respective channels, leaving the resident fuel uncovered for the duration of the associated operation. Due to substantial difference in the oxidation temperatures Of UO 2 and Zircaloy and its influence as such on the cooling requirement, it was necessary either to determine explicitly that the respective channels did not contain defective fuel bundles or wait for time long enough to allow the decay heat to reduce to manageable proportions. This had a significant bearing on the Plant down time necessary for the rectification of the problems. This paper describes the two incidents in detail and dwells upon the measures adopted to resolve the related problems. (author)

  17. Ergonomics and comfort in lawn mower handle positioning: An evaluation of handle geometry.

    Science.gov (United States)

    Lowndes, Bethany R; Heald, Elizabeth A; Hallbeck, M Susan

    2015-11-01

    Hand operation accompanied with any combination of large forces, awkward positions and repetition may lead to upper limb injury or illness and may be exacerbated by vibration. Commercial lawn mowers expose operators to these factors during actuation of hand controls and therefore may be a health concern. A nontraditional lawn mower control system may decrease upper limb illnesses and injuries through more neutral hand and body positioning. This study compared maximum grip strength in twelve different orientations (3 grip spans and 4 positions) and evaluated self-described comfortable handle positions. The results displayed force differences between nontraditional (X) and both vertical (V) and pistol (P) positions (p < 0.0001) and among the different grip spans (p < 0.0001). Based on these results, recommended designs should incorporate a tilt between 45 and 70°, handle rotations between 48 and 78°, and reduced force requirements or decreased grip spans to improve user health and comfort. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.

  18. Conceptual design of divertor cassette handling by remote handling system for JT-60SA

    International Nuclear Information System (INIS)

    Hayashi, Takao; Sakurai, Shinji; Masaki, Kei; Tamai, Hiroshi; Yoshida, Kiyoshi; Matsukawa, Makoto

    2007-01-01

    The JT-60SA aims to contribute and supplement ITER toward DEMO reactor based on tokamak concept. One of the features of JT-60SA is its high power long pulse heating, causing the large annual neutron fluence. Because the expected dose rate at the vacuum vessel (VV) may exceed 1 mSv/hr after 10 years operation and three month cooling, the human access inside the VV is prohibited. Therefore a remote handling (RH) system is necessary for the maintenance and repair of in-vessel components. This paper described the RH system of JT-60SA, especially the expansion of the RH rail and exchange of the divertor modules. The RH rail is divided into nine and three-point mounting. The nine sections can cover 225 degrees in toroidal direction. A divertor module, which is 10 degrees wide in toroidal direction and weighs 500kg itself due to the limitations of port width and handling weight, can be exchanged by heavy weight manipulator (HWM). The HWM brings the divertor module to the front of the other RH port, which is used for supporting the rail and/or carrying in and out equipments. Then another RH device receives and brings out the module by a pallet installed from outside the VV. (author)

  19. Conceptual design of divertor cassette handling by remote handling system of JT-60SA

    International Nuclear Information System (INIS)

    Hayashi, Takao; Sakurai, Shinji; Masaki, Kei; Tamai, Hiroshi; Yoshida, Kiyoshi; Matsukawa, Makoto

    2008-01-01

    The JT-60SA aims to contribute and supplement ITER toward demonstration fusion reactor based on tokamak concept. One of the features of JT-60SA is its high power long pulse heating, causing the large annual neutron fluence. Because the expected dose rate at the vacuum vessel (VV) may exceed 1 mSv/hr after 10 years operation and three month cooling, the human access inside the VV is restricted. Therefore a remote handling (RH) system is necessary for the maintenance and repair of in-vessel components. This paper described the RH system of JT-60SA, especially the expansion of the RH rail and exchange of the divertor cassettes. The RH rail is divided into nine and three-point mounting. The nine sections can cover 225 degrees in toroidal direction. A divertor cassette, which is 10 degrees wide in toroidal direction and weighs 500 kg itself due to the limitations of port width and handling weight, can be exchanged by heavy weight manipulator (HWM). The HWM brings the divertor cassette to the front of the other RH port, which is used for supporting the rail and/or carrying in and out equipments. Then another RH device receives and brings out the cassette by a pallet installed from outside the VV. (author)

  20. X-ray computed tomography datasets for forensic analysis of vertebrate fossils

    Science.gov (United States)

    Rowe, Timothy B.; Luo, Zhe-Xi; Ketcham, Richard A.; Maisano, Jessica A.; Colbert, Matthew W.

    2016-01-01

    We describe X-ray computed tomography (CT) datasets from three specimens recovered from Early Cretaceous lakebeds of China that illustrate the forensic interpretation of CT imagery for paleontology. Fossil vertebrates from thinly bedded sediments often shatter upon discovery and are commonly repaired as amalgamated mosaics grouted to a solid backing slab of rock or plaster. Such methods are prone to inadvertent error and willful forgery, and once required potentially destructive methods to identify mistakes in reconstruction. CT is an efficient, nondestructive alternative that can disclose many clues about how a specimen was handled and repaired. These annotated datasets illustrate the power of CT in documenting specimen integrity and are intended as a reference in applying CT more broadly to evaluating the authenticity of comparable fossils. PMID:27272251

  1. 9 CFR 3.118 - Handling.

    Science.gov (United States)

    2010-01-01

    ... 9 Animals and Animal Products 1 2010-01-01 2010-01-01 false Handling. 3.118 Section 3.118 Animals and Animal Products ANIMAL AND PLANT HEALTH INSPECTION SERVICE, DEPARTMENT OF AGRICULTURE ANIMAL WELFARE STANDARDS Specifications for the Humane Handling, Care, Treatment, and Transportation of Marine...

  2. How to Handle Impasses in Bargaining.

    Science.gov (United States)

    Durrant, Robert E.

    Guidelines in an outline format are presented to school board members and administrators on how to handle impasses in bargaining. The following two rules are given: there sometimes may be strikes, but there always will be settlements; and on the way to settlements, there always will be impasses. Suggestions for handling impasses are listed under…

  3. Handling uncertainty through adaptiveness in planning approaches

    NARCIS (Netherlands)

    Zandvoort, M.; Vlist, van der M.J.; Brink, van den A.

    2018-01-01

    Planners and water managers seek to be adaptive to handle uncertainty through the use of planning approaches. In this paper, we study what type of adaptiveness is proposed and how this may be operationalized in planning approaches to adequately handle different uncertainties. We took a

  4. Survey of postharvest handling, preservation and processing ...

    African Journals Online (AJOL)

    Survey of postharvest handling, preservation and processing practices along the camel milk chain in Isiolo district, Kenya. ... Despite the important contribution of camel milk to food security for pastoralists in Kenya, little is known about the postharvest handling, preservation and processing practices. In this study, existing ...

  5. PND fuel handling decontamination: facilities and techniques

    International Nuclear Information System (INIS)

    Pan, R.Y.

    1996-01-01

    The use of various decontamination techniques and equipment has become a critical part of Fuel Handling maintenance work at Ontario Hydro's Pickering Nuclear Division. This paper presents an overview of the set up and techniques used for decontamination in the PND Fuel Handling Maintenance Facility and the effectiveness of each. (author). 1 tab., 9 figs

  6. Handling Kids in Crisis with Care

    Science.gov (United States)

    Bushinski, Cari

    2018-01-01

    The Handle with Care program helps schools help students who experience trauma. While at the scene of an event like a domestic violence call, drug raid, or car accident, law enforcement personnel determine the names and school of any children present. They notify that child's school to "handle ___ with care" the next day, and the school…

  7. PND fuel handling decontamination: facilities and techniques

    Energy Technology Data Exchange (ETDEWEB)

    Pan, R Y [Ontario Hydro, Toronto, ON (Canada)

    1997-12-31

    The use of various decontamination techniques and equipment has become a critical part of Fuel Handling maintenance work at Ontario Hydro`s Pickering Nuclear Division. This paper presents an overview of the set up and techniques used for decontamination in the PND Fuel Handling Maintenance Facility and the effectiveness of each. (author). 1 tab., 9 figs.

  8. Handling knowledge on osteoporosis - a qualitative study

    DEFF Research Database (Denmark)

    Nielsen, Dorthe; Huniche, Lotte; Brixen, Kim

    2013-01-01

    Scand J Caring Sci; 2012 Handling knowledge on osteoporosis - a qualitative study The aim of this qualitative study was to increase understanding of the importance of osteoporosis information and knowledge for patients' ways of handling osteoporosis in their everyday lives. Interviews were...

  9. DDOS ATTACK DETECTION SIMULATION AND HANDLING MECHANISM

    Directory of Open Access Journals (Sweden)

    Ahmad Sanmorino

    2013-11-01

    Full Text Available In this study we discuss how to handle DDoS attack that coming from the attacker by using detection method and handling mechanism. Detection perform by comparing number of packets and number of flow. Whereas handling mechanism perform by limiting or drop the packets that detected as a DDoS attack. The study begins with simulation on real network, which aims to get the real traffic data. Then, dump traffic data obtained from the simulation used for detection method on our prototype system called DASHM (DDoS Attack Simulation and Handling Mechanism. From the result of experiment that has been conducted, the proposed method successfully detect DDoS attack and handle the incoming packet sent by attacker.

  10. MRI of meniscal bucket-handle tears

    Energy Technology Data Exchange (ETDEWEB)

    Magee, T.H.; Hinson, G.W. [Menorah Medical Center, Overland Park, KS (United States). Dept. of Radiology

    1998-09-01

    A meniscal bucket-handle tear is a tear with an attached fragment displaced from the meniscus of the knee joint. Low sensitivity of MRI for detection of bucket-handle tears (64% as compared with arthroscopy) has been reported previously. We report increased sensitivity for detecting bucket-handle tears with the use of coronal short tau inversion recovery (STIR) images. Results. By using four criteria for diagnosis of meniscal bucket-handle tears, our overall sensitivity compared with arthroscopy was 93% (28 of 30 meniscal bucket-handle tears seen at arthroscopy were detected by MRI). The meniscal fragment was well visualized in all 28 cases on coronal STIR images. The double posterior cruciate ligament sign was seen in 8 of 30 cases, the flipped meniscus was seen in 10 of 30 cases and a fragment in the intercondylar notch was seen in 18 of 30 cases. (orig.)

  11. Reservoir water level forecasting using group method of data handling

    Science.gov (United States)

    Zaji, Amir Hossein; Bonakdari, Hossein; Gharabaghi, Bahram

    2018-06-01

    Accurately forecasted reservoir water level is among the most vital data for efficient reservoir structure design and management. In this study, the group method of data handling is combined with the minimum description length method to develop a very practical and functional model for predicting reservoir water levels. The models' performance is evaluated using two groups of input combinations based on recent days and recent weeks. Four different input combinations are considered in total. The data collected from Chahnimeh#1 Reservoir in eastern Iran are used for model training and validation. To assess the models' applicability in practical situations, the models are made to predict a non-observed dataset for the nearby Chahnimeh#4 Reservoir. According to the results, input combinations (L, L -1) and (L, L -1, L -12) for recent days with root-mean-squared error (RMSE) of 0.3478 and 0.3767, respectively, outperform input combinations (L, L -7) and (L, L -7, L -14) for recent weeks with RMSE of 0.3866 and 0.4378, respectively, with the dataset from https://www.typingclub.com/st. Accordingly, (L, L -1) is selected as the best input combination for making 7-day ahead predictions of reservoir water levels.

  12. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  13. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  14. Tritium handling facilities at the Los Alamos Scientific Laboratory

    International Nuclear Information System (INIS)

    Anderson, J.L.; Damiano, F.A.; Nasise, J.E.

    1975-01-01

    A new tritium facility, recently activated at the Los Alamos Scientific Laboratory, is described. The facility contains a large drybox, associated gas processing system, a facility for handling tritium gas at pressures to approximately 100 MPa, and an effluent treatment system which removes tritium from all effluents prior to their release to the atmosphere. The system and its various components are discussed in detail with special emphasis given to those aspects which significantly reduce personnel exposures and atmospheric releases. (auth)

  15. Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

    Directory of Open Access Journals (Sweden)

    Hoefsloot Huub CJ

    2009-05-01

    Full Text Available Abstract Background Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Results Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. Conclusion We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre

  16. Handling Procedures of Vegetable Crops

    Science.gov (United States)

    Perchonok, Michele; French, Stephen J.

    2004-01-01

    The National Aeronautics and Space Administration (NASA) is working towards future long duration manned space flights beyond low earth orbit. The duration of these missions may be as long as 2.5 years and will likely include a stay on a lunar or planetary surface. The primary goal of the Advanced Food System in these long duration exploratory missions is to provide the crew with a palatable, nutritious, and safe food system while minimizing volume, mass, and waste. Vegetable crops can provide the crew with added nutrition and variety. These crops do not require any cooking or food processing prior to consumption. The vegetable crops, unlike prepackaged foods, will provide bright colors, textures (crispy), and fresh aromas. Ten vegetable crops have been identified for possible use in long duration missions. They are lettuce, spinach, carrot, tomato, green onion, radish, bell pepper, strawberries, fresh herbs, and cabbage. Whether these crops are grown on a transit vehicle (e.g., International Space Station) or on the lunar or planetary surface, it will be necessary to determine how to safely handle the vegetables while maintaining acceptability. Since hydrogen peroxide degrades into water and oxygen and is generally recognized as safe (GRAS), hydrogen peroxide has been recommended as the sanitizer. The objective of th is research is to determine the required effective concentration of hydrogen peroxide. In addition, it will be determined whether the use of hydrogen peroxide, although a viable sanitizer, adversely affects the quality of the vegetables. Vegetables will be dipped in 1 % hydrogen peroxide, 3% hydrogen peroxide, or 5% hydrogen peroxide. Treated produce and controls will be stored in plastic bags at 5 C for up to 14 days. Sensory, color, texture, and total plate count will be measured. The effect on several vegetables including lettuce, radish, tomato and strawberries has been completed. Although each vegetable reacts to hydrogen peroxide differently, the

  17. The handling of radiation accidents

    International Nuclear Information System (INIS)

    1977-01-01

    The symposium was attended by 204 participants from 39 countries and 5 international organizations. Forty-two papers were presented in 8 sessions. The purpose of the meeting was to foster an exchange of experiences gained in establishing and exercising plans for mitigating the effects of radiation accidents and in the handling of actual accident situations. Only a small number of accidents were reported at the symposium, and this reflects the very high standards of safety that has been achieved by the nuclear industry. No accidents of radiological significance were reported to have occurred at commercial nuclear power plants. Of the accidents reported, industrial radiography continues to be the area in which most of the radiation accidents occur. The experience gained in the reported accident situations served to confirm the crucial importance of the prompt availability of medical and radiological services, particularly in the case of uptake of radioactive material, and emphasized the importance of detailed investigation into the causes of the accident in order to improve preventative measures. One of the principal themes of the symposium involved emergency procedures related to nuclear power plant accidents, and several papers defining the scope, progression and consequences of design base accidents for both thermal and fast reactor systems were presented. These were complemented by papers defining the resultant protection requirements that should be satisfied in the establishment of plans designed to mitigate the effects of the postulated accident situations. Several papers were presented describing existing emergency organizational arrangements relating both to specific nuclear power plants and to comprehensive national schemes, and a particularly informative session was devoted to the topic of training of personnel in the practical conduct of emergency arrangements. The general feeling of the participants was one of studied confidence in the competence and

  18. Handling load with less stress

    NARCIS (Netherlands)

    Bansal, N.; Gamarnik, D.

    2006-01-01

    We study how the average performance of a system degrades as the load nears its peak capacity. We restrict our attention to the performance measures of average sojourn time and the large deviation rates of buffer overflow probabilities. We first show that for certain queueing systems, the average

  19. MPI Debugging with Handle Introspection

    DEFF Research Database (Denmark)

    Brock-Nannestad, Laust; DelSignore, John; Squyres, Jeffrey M.

    The Message Passing Interface, MPI, is the standard programming model for high performance computing clusters. However, debugging applications on large scale clusters is difficult. The widely used Message Queue Dumping interface enables inspection of message queue state but there is no general in...

  20. Safe Handling of Radioisotopes. Health Physics Addendum

    International Nuclear Information System (INIS)

    Appleton, G.J.; Krishnamoorthy, P.N.

    1960-01-01

    The International Atomic Energy Agency published in 1958 a Manual entitled ''Safe Handling of Radioisotopes'' (Safety Series No. 1 - STI/PUB/1), based on the work of an international panel convened by the Agency. As recommended by that panel and approved by the Agency's Board of Governors, this Addendum has now been prepared, primarily as a supplement to the Manual. It contains technical information necessary for the implementation of the controls given in the Manual. In addition, it is intended to serve as a brief introduction to the technical problems encountered in radiological protection work and to the methods of resolving them. As in the case of the Manual itself, the information given in this Addendum is particularly relevant to the problems encountered by the small user of radioisotopes. Although the basic principles set forth in it apply to all work with radiation sources, the Addendum is not intended to serve as a radiological protection manual for use in reactor installations or large-scale nuclear industry, where more specialized techniques and information are required.

  1. Safe Handling of Radioisotopes. Health Physics Addendum

    Energy Technology Data Exchange (ETDEWEB)

    Appleton, G J; Krishnamoorthy, P N

    1960-07-15

    The International Atomic Energy Agency published in 1958 a Manual entitled ''Safe Handling of Radioisotopes'' (Safety Series No. 1 - STI/PUB/1), based on the work of an international panel convened by the Agency. As recommended by that panel and approved by the Agency's Board of Governors, this Addendum has now been prepared, primarily as a supplement to the Manual. It contains technical information necessary for the implementation of the controls given in the Manual. In addition, it is intended to serve as a brief introduction to the technical problems encountered in radiological protection work and to the methods of resolving them. As in the case of the Manual itself, the information given in this Addendum is particularly relevant to the problems encountered by the small user of radioisotopes. Although the basic principles set forth in it apply to all work with radiation sources, the Addendum is not intended to serve as a radiological protection manual for use in reactor installations or large-scale nuclear industry, where more specialized techniques and information are required.

  2. Distributed computing for FTU data handling

    Energy Technology Data Exchange (ETDEWEB)

    Bertocchi, A. E-mail: bertocchi@frascati.enea.it; Bracco, G.; Buceti, G.; Centioli, C.; Giovannozzi, E.; Iannone, F.; Panella, M.; Vitale, V

    2002-06-01

    The growth of data warehouse in tokamak experiment is leading fusion laboratories to provide new IT solutions in data handling. In the last three years, the Frascati Tokamak Upgrade (FTU) experimental database was migrated from IBM-mainframe to Unix distributed computing environment. The migration efforts have taken into account the following items: (1) a new data storage solution based on storage area network over fibre channel; (2) andrew file system (AFS) for wide area network file sharing; (3) 'one measure/one file' philosophy replacing 'one shot/one file' to provide a faster read/write data access; (4) more powerful services, such as AFS, CORBA and MDSplus to allow users to access FTU database from different clients, regardless their O.S.; (5) large availability of data analysis tools, from the locally developed utility SHOW to the multi-platform Matlab, interactive data language and jScope (all these tools are now able to access also the Joint European Torus data, in the framework of the remote data access activity); (6) a batch-computing cluster of Alpha/CompaqTru64 CPU based on CODINE/GRD to optimize utilization of software and hardware resources.

  3. Safe Handling of Radioisotopes. Medical Addendum

    International Nuclear Information System (INIS)

    Hercik, F.; Jammet, H.

    1960-01-01

    The International Atomic Energy Agency published in 1958 a Manual entitled ''Safe Handling of Radioisotopes'' (Safety Series No. 1 - STI/PUB/1), based on the work of an international panel convened by the Agency. As recommended by that panel and approved by the Agency's Board of Governors, this Addendum has now been prepared, primarily as a supplement to the Manual. It contains information necessary to medical officers concerned with the implementation of the controls given in the Manual. In addition, it is intended to serve as a brief introduction to the medical problems encountered in radiological protection work and to the methods of resolving them. As in the case of the Manual itself, the information given in this Addendum is particularly relevant to the problems encountered by the small user of radioisotopes. Although the basic principles set forth in it apply to all work with radiation sources, the Addendum is not intended to serve as a radiological protection manual for use in reactor installations or large-scale nuclear industry, where more specialized techniques and information are required.

  4. Nuclear hydrogen production and its safe handling

    International Nuclear Information System (INIS)

    Chung, Hongsuk; Paek, Seungwoo; Kim, Kwang-Rag; Ahn, Do-Hee; Lee, Minsoo; Chang, Jong Hwa

    2003-01-01

    An overview of the hydrogen related research presently undertaken at the Korea Atomic Energy Research Institute are presented. These encompass nuclear hydrogen production, hydrogen storage, and the safe handling of hydrogen, High temperature gas-cooled reactors can play a significant role, with respect to large-scale hydrogen production, if used as the provider of high temperature heat in fossil fuel conversion or thermochemical cycles. A variety of potential hydrogen production methods for high temperature gas-cooled reactors were analyzed. They are steam reforming of natural gas, thermochemical cycles, etc. The produced hydrogen should be stored safely. Titanium metal was tested primarily because its hydride has very low dissociation pressures at normal storage temperatures and a high capacity for hydrogen, it is easy to prepare and is non-reactive with air in the expected storage conditions. There could be a number of potential sources of hydrogen evolution risk in a nuclear hydrogen production facility. In order to reduce the deflagration detonation it is necessary to develop hydrogen control methods that are capable of dealing with the hydrogen release rate. A series of experiments were conducted to assess the catalytic recombination characteristics of hydrogen in an air stream using palladium catalysts. (author)

  5. Controlling fugitive dust emissions in material handling operations

    Energy Technology Data Exchange (ETDEWEB)

    Tooker, G E

    1992-05-01

    The primary mechanism of fugitive dust generation in bulk material handling transfer operations is by dispersion of dust in turbulent air induced to flow with falling or projected material streams. This paper returns to basic theories of particle dynamics and fluid mechanics to quantify the dust generating mechanism by rational analysis. Calculations involving fluid mechanisms are made easier by the availability of the personal computer and the many math manipulating programs. Rational analysis is much more cost effective when estimating collection air volumes to control fugitive emissions; especially in enclosed material handling transfers transporting large volumes of dusty material. Example calculations, using a typical enclosed conveyor-to-conveyor transfer operation are presented to illustrate and highlight the key parameters that determine the magnitude of induced air flow that must be controlled. The methods presented in this paper for estimating collection air volumes apply only enclosed material handling transfers, exhausted to a dust collector. Since some assistance to the control of dust emissions must be given by the material handling transfer chute design, a discussion of good transfer chute design practice is presented. 4 refs., 2 figs., 2 tabs.

  6. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  7. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  8. Fuel handling machine and auxiliary systems for a fuel handling cell

    International Nuclear Information System (INIS)

    Suikki, M.

    2013-10-01

    This working report is an update for as well as a supplement to an earlier fuel handling machine design (Kukkola and Roennqvist 2006). A focus in the earlier design proposal was primarily on the selection of a mechanical structure and operating principle for the fuel handling machine. This report introduces not only a fuel handling machine design but also auxiliary fuel handling cell equipment and its operation. An objective of the design work was to verify the operating principles of and space allocations for fuel handling cell equipment. The fuel handling machine is a remote controlled apparatus capable of handling intensely radiating fuel assemblies in the fuel handling cell of an encapsulation plant. The fuel handling cell is air tight space radiation-shielded with massive concrete walls. The fuel handling machine is based on a bridge crane capable of traveling in the handling cell along wall tracks. The bridge crane has its carriage provided with a carousel type turntable having mounted thereon both fixed and telescopic masts. The fixed mast has a gripper movable on linear guides for the transfer of fuel assemblies. The telescopic mast has a manipulator arm capable of maneuvering equipment present in the fuel handling cell, as well as conducting necessary maintenance and cleaning operations or rectifying possible fault conditions. The auxiliary fuel handling cell systems consist of several subsystems. The subsystems include a service manipulator, a tool carrier for manipulators, a material hatch, assisting winches, a vacuum cleaner, as well as a hose reel. With the exception of the vacuum cleaner, the devices included in the fuel handling cell's auxiliary system are only used when the actual encapsulation process is not ongoing. The malfunctions of mechanisms or actuators responsible for the motion actions of a fuel handling machine preclude in a worst case scenario the bringing of the fuel handling cell and related systems to a condition appropriate for

  9. Fuel handling machine and auxiliary systems for a fuel handling cell

    Energy Technology Data Exchange (ETDEWEB)

    Suikki, M. [Optimik Oy, Turku (Finland)

    2013-10-15

    This working report is an update for as well as a supplement to an earlier fuel handling machine design (Kukkola and Roennqvist 2006). A focus in the earlier design proposal was primarily on the selection of a mechanical structure and operating principle for the fuel handling machine. This report introduces not only a fuel handling machine design but also auxiliary fuel handling cell equipment and its operation. An objective of the design work was to verify the operating principles of and space allocations for fuel handling cell equipment. The fuel handling machine is a remote controlled apparatus capable of handling intensely radiating fuel assemblies in the fuel handling cell of an encapsulation plant. The fuel handling cell is air tight space radiation-shielded with massive concrete walls. The fuel handling machine is based on a bridge crane capable of traveling in the handling cell along wall tracks. The bridge crane has its carriage provided with a carousel type turntable having mounted thereon both fixed and telescopic masts. The fixed mast has a gripper movable on linear guides for the transfer of fuel assemblies. The telescopic mast has a manipulator arm capable of maneuvering equipment present in the fuel handling cell, as well as conducting necessary maintenance and cleaning operations or rectifying possible fault conditions. The auxiliary fuel handling cell systems consist of several subsystems. The subsystems include a service manipulator, a tool carrier for manipulators, a material hatch, assisting winches, a vacuum cleaner, as well as a hose reel. With the exception of the vacuum cleaner, the devices included in the fuel handling cell's auxiliary system are only used when the actual encapsulation process is not ongoing. The malfunctions of mechanisms or actuators responsible for the motion actions of a fuel handling machine preclude in a worst case scenario the bringing of the fuel handling cell and related systems to a condition appropriate for

  10. Development of software for handling ship's pharmacy.

    Science.gov (United States)

    Nittari, Giulio; Peretti, Alessandro; Sibilio, Fabio; Ioannidis, Nicholas; Amenta, Francesco

    2016-01-01

    Ships are required to carry a given amount of medicinal products and medications depending on the flag and the type of vessel. These medicines are stored in the so called ship's "medicine chest" or more properly - a ship pharmacy. Owing to the progress of medical sciences and to the increase in the mean age of seafarers employed on board ships, the number of pharmaceutical products and medical devices required by regulations to be carried on board ships is increasing. This may make handling of the ship's medicine chest a problem primarily on large ships sailing on intercontinental routes due to the difficulty in identifying the correspondence between medicines obtained abroad with those available at the national market. To minimise these problems a tool named Pharmacy Ship (acronym: PARSI) has been developed. The application PARSI is based on a database containing the information about medicines and medical devices required by different countries regulations. In the first application the system was standardised to comply with the Italian regulations issued on the 1st October, 2015 which entered into force on the 18 January 2016. Thanks to PARSI it was possible to standardize the inventory procedures, facilitate the work of maritime health authorities and make it easier for the crew, not professional in the field, to handle the 'medicine chest' correctly by automating the procedures for medicines management. As far as we know there are no other similar tools available at the moment. The application of the software, as well as the automation of different activities, currently carried out manually, will help manage (qualitatively and quantitatively) the ship's pharmacy. The system developed in this study has proved to be an effective tool which serves to guarantee the compliance of the ship pharmacy with regulations of the flag state in terms of medicinal products and medications. Sharing the system with the Telemedical Maritime Assistance Service may result in

  11. Classification of Large-Scale Remote Sensing Images for Automatic Identification of Health Hazards: Smoke Detection Using an Autologistic Regression Classifier.

    Science.gov (United States)

    Wolters, Mark A; Dean, C B

    2017-01-01

    Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.

  12. Handling of multiassembly sealed baskets between reactor storage and a remote handling facility

    International Nuclear Information System (INIS)

    Massey, J.V.; Kessler, J.H.; McSherry, A.J.

    1989-06-01

    The storage of multiple fuel assemblies in sealed (welded) dry storage baskets is gaining increasing use to augment at-reactor fuel storage capacity. Since this increasing use will place a significant number of such baskets on reactor sites, some initial downstream planning for their future handling scenarios for retrieving multi-assembly sealed baskets (MSBs) from onsite storage and transferring and shipping the fuel (and/or the baskets) to a federally operated remote handling facility (RHF). Numerous options or at-reactor and away-from-reactor handling were investigated. Materials handling flowsheets were developed along with conceptual designs for the equipment and tools required to handle and open the MSBs. The handling options were evaluated and compared to a reference case, fuel handling sequence (i.e., fuel assemblies are taken from the fuel pool, shipped to a receiving and handling facility and placed into interim storage). The main parameters analyzed are throughout, radiation dose burden and cost. In addition to evaluating the handling of MSBs, this work also evaluated handling consolidated fuel canisters (CFCs). In summary, the handling of MSBs and CFCs in the store, ship and bury fuel cycle was found to be feasible and, under some conditions, to offer significant benefits in terms of throughput, cost and safety. 14 refs., 20 figs., 24 tabs

  13. Safeguards information handling and treatment

    International Nuclear Information System (INIS)

    Carchon, R.; Liu, J.; Ruan, D.

    2001-01-01

    Many states are currently discussing the new additional protocol (INFCIRC/540). This expanded framework is expected to establish the additional confirmation that there are no undeclared activities and facilities in that state. The information collected by the IAEA mainly comes from three different sources: information either provided by the state, collected by the IAEA, and from open sources. This information can be uncertain, incomplete, imprecise, not fully reliable, contradictory, etc. Hence, there is a need for a mathematical framework that provides a basis for handling and treatment of multidimensional information of varying quality. We use a linguistic assessment based on fuzzy set theory, as a flexible and realistic approach. The concept of a linguistic variable serves the purpose of providing a means of approximated characterization of information that may be imprecise, too complex or ill-defined, for which the traditional quantitative approach does not give an adequate answer. In the application of this linguistic assessment approach, a problem arises on how to aggregate linguistic information. Two different approaches can be followed: (1) approximation approach using the associated membership function; (2) symbolic approach acting by the direct computation on labels, where the use of membership function and the linguistic approximation is unnecessary, which makes computation simple and quick. To manipulate the linguistic information in this context, we work with aggregation operators for combining the linguistic non-weighted and weighted values by direct computation on labels, like the Min-type and Max-type weighted aggregation operators as well as the median aggregation operator. A case study on the application of these aggregation operators to the fusion of safeguards relevant information is given. The IAEA Physical Model of the nuclear fuel cycle can be taken as a systematic and comprehensive indicator system. It identifies and describes indicators of

  14. Ergonomics: safe patient handling and mobility.

    Science.gov (United States)

    Hallmark, Beth; Mechan, Patricia; Shores, Lynne

    2015-03-01

    This article reviews and investigates the issues surrounding ergonomics, with a specific focus on safe patient handling and mobility. The health care worker of today faces many challenges, one of which is related to the safety of patients. Safe patient handling and mobility is on the forefront of the movement to improve patient safety. This article reviews the risks associated with patient handling and mobility, and informs the reader of current evidence-based practice relevant to this area of care. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. How the NWC handles software as product

    Energy Technology Data Exchange (ETDEWEB)

    Vinson, D.

    1997-11-01

    This tutorial provides a hands-on view of how the Nuclear Weapons Complex project should be handling (or planning to handle) software as a product in response to Engineering Procedure 401099. The SQAS has published the document SQAS96-002, Guidelines for NWC Processes for Handling Software Product, that will be the basis for the tutorial. The primary scope of the tutorial is on software products that result from weapons and weapons-related projects, although the information presented is applicable to many software projects. Processes that involve the exchange, review, or evaluation of software product between or among NWC sites, DOE, and external customers will be described.

  16. Handling of bulk solids theory and practice

    CERN Document Server

    Shamlou, P A

    1990-01-01

    Handling of Bulk Solids provides a comprehensive discussion of the field of solids flow and handling in the process industries. Presentation of the subject follows classical lines of separate discussions for each topic, so each chapter is self-contained and can be read on its own. Topics discussed include bulk solids flow and handling properties; pressure profiles in bulk solids storage vessels; the design of storage silos for reliable discharge of bulk materials; gravity flow of particulate materials from storage vessels; pneumatic transportation of bulk solids; and the hazards of solid-mater

  17. Remote handling in the Plutonium Immobilization Project: Puck handling

    International Nuclear Information System (INIS)

    Brault, J.R.

    2000-01-01

    Since the break up of the Soviet Union at the end of the Cold War, the US and Russia have been negotiating ways to reduce their nuclear stockpiles. Economics is one of the reasons behind this, but another important reason is safeguarding these materials from unstable organizations and countries. With the downsizing of the nuclear stockpiles, large quantities of plutonium are being declared excess and must be safely disposed of. The Savannah River Site (SRS) has been selected as the site where the immobilization facility will be located. Conceptual design and process development commenced in 1998. SRS will immobilize excess plutonium in a ceramic waste form and encapsulate it in vitrified high level waste in the Defense Waste Processing Facility (DWPF) canister. These canisters will then be interred in the national repository at Yucca Mountain, New Mexico. The facility is divided into three distinct operating areas: Plutonium Conversion, First Stage Immobilization, and Second Stage Immobilization. This paper will discuss the first two operations

  18. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  19. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    Science.gov (United States)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  20. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  1. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  2. Remote handling maintenance of ITER

    International Nuclear Information System (INIS)

    Haange, R.

    1999-01-01

    The remote maintenance strategy and the associated component design of the International Thermonuclear Experimental Reactor (ITER) have reached a high degree of completeness, especially with respect to those components that are expected to require frequent or occasional remote maintenance. Large-scale test stands, to demonstrate the principle feasibility of the remote maintenance procedures and to develop the required equipment and tools, were operational at the end of the Engineering Design Activities (EDA) phase. The initial results are highly encouraging: major remote equipment deployment and component replacement operations have been successfully demonstrated. (author)

  3. Genetics and Forest Seed Handling

    DEFF Research Database (Denmark)

    Schmidt, Lars Holger

    2016-01-01

    High genetic quality seed is obtained from seed sources that match the planting site, have a good outcrossing rate, and are superior in some desirable characters. Non-degraded natural forests and plantations may be used as untested seed sources, which can sometimes be managed to promote outbreeding...... and increase seed production. Planted seed orchards aim at capturing large genetic variation and are planted in a design that facilitates genetic evaluation and promotes outbred seed production. Good seed production relies upon success of the whole range of reproductive events from flower differentiation...

  4. A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.

    Science.gov (United States)

    Ramu, Avinash; Kahveci, Tamer; Burleigh, J Gordon

    2012-10-03

    We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.

  5. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  6. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  7. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  8. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  9. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  10. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  11. Management of transport and handling contracts

    CERN Document Server

    Rühl, I

    2004-01-01

    This paper shall outline the content, application and management strategies for the various contracts related to transport and handling activities. In total, the two sections Logistics and Handling Maintenance are in charge of 27 (!) contracts ranging from small supply contracts to big industrial support contracts. The activities as well as the contracts can generally be divided into four main topics "Vehicle Fleet Management"; "Supply, Installation and Commissioning of Lifting and Hoisting Equipment"; "Equipment Maintenance" and "Industrial Support for Transport and Handling". Each activity and contract requires different approaches and permanent adaptation to the often changing CERN's requirements. In particular, the management and the difficulties experienced with the contracts E072 "Maintenance of lifting and hoisting equipment", F420 "Supply of seven overhead traveling cranes for LHC" and S090/S103 "Industrial support for transport and handling" will be explained in detail.

  12. Travelling cranes for heavy reactor component handling

    International Nuclear Information System (INIS)

    Champeil, M.

    1977-01-01

    Structure and operating machinery of two travelling cranes (600 t and 450 t) used in the Framatome factory for handling heavy reactor components are described. When coupled, these cranes can lift loads up to 1000 t [fr

  13. Aerobot Sampling and Handling System, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — Honeybee Robotics proposes to: ?Derive and document the functional and technical requirements for Aerobot surface sampling and sample handling across a range of...

  14. Data handling systems and methods of wiring

    International Nuclear Information System (INIS)

    Grant, J.

    1981-01-01

    An improved data handling system, for monitoring and control of nuclear reactor operations, is described in which time delays associated with scanning are reduced and noise and fault signals in the system are resolved. (U.K.)

  15. Harvesting and handling agricultural residues for energy

    Energy Technology Data Exchange (ETDEWEB)

    Jenkins, B.M.; Summer, H.R.

    1986-05-01

    Significant progress in understanding the needs for design of agricultural residue collection and handling systems has been made but additional research is required. Recommendations are made for research to (a) integrate residue collection and handling systems into general agricultural practices through the development of multi-use equipment and total harvest systems; (b) improve methods for routine evaluation of agricultural residue resources, possibly through remote sensing and image processing; (c) analyze biomass properties to obtain detailed data relevant to engineering design and analysis; (d) evaluate long-term environmental, social, and agronomic impacts of residue collection; (e) develop improved equipment with higher capacities to reduce residue collection and handling costs, with emphasis on optimal design of complete systems including collection, transportation, processing, storage, and utilization; and (f) produce standard forms of biomass fuels or products to enhance material handling and expand biomass markets through improved reliability and automatic control of biomass conversion and other utilization systems. 118 references.

  16. Handling of disused radioactive materials in Ecuador

    International Nuclear Information System (INIS)

    Benitez, Manuel

    1999-10-01

    This paper describes the handling of disused radioactive sources. It also shows graphic information of medical and industrial equipment containing radioactive sources. This information was prepared as part of a training course on radioactive wastes. (The author)

  17. Foster parenting, human imprinting and conventional handling ...

    African Journals Online (AJOL)

    p2492989

    Foster parenting, human imprinting and conventional handling affects survival and early .... bird may subsequently direct its sexual attention to those humans on whom it was imprinted (Bubier et al., ..... The mind through chicks' eyes: memory,.

  18. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  19. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  20. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...