handle large datasets: Topics by WorldWideScience.org

Sample records for handle large datasets

MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

Science.gov (United States)

Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

2013-01-01

Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED
Multiresolution persistent homology for excessively large biomolecular datasets

Energy Technology Data Exchange (ETDEWEB)

Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

2015-10-07

Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.
A Bayesian spatio-temporal geostatistical model with an auxiliary lattice for large datasets

KAUST Repository

Xu, Ganggang

2015-01-01

When spatio-temporal datasets are large, the computational burden can lead to failures in the implementation of traditional geostatistical tools. In this paper, we propose a computationally efficient Bayesian hierarchical spatio-temporal model in which the spatial dependence is approximated by a Gaussian Markov random field (GMRF) while the temporal correlation is described using a vector autoregressive model. By introducing an auxiliary lattice on the spatial region of interest, the proposed method is not only able to handle irregularly spaced observations in the spatial domain, but it is also able to bypass the missing data problem in a spatio-temporal process. Because the computational complexity of the proposed Markov chain Monte Carlo algorithm is of the order O(n) with n the total number of observations in space and time, our method can be used to handle very large spatio-temporal datasets with reasonable CPU times. The performance of the proposed model is illustrated using simulation studies and a dataset of precipitation data from the coterminous United States.
FTSPlot: fast time series visualization for large datasets.

Directory of Open Access Journals (Sweden)

Michael Riss

Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.
Querying Large Biological Network Datasets

Science.gov (United States)

Gulsoy, Gunhan

2013-01-01

New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…
Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

Science.gov (United States)

Maskey, M.; Ramachandran, R.; Miller, J.

2017-12-01

Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.
Topic modeling for cluster analysis of large biological and medical datasets.

Science.gov (United States)

Zhao, Weizhong; Zou, Wen; Chen, James J

2014-01-01

The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting
Image segmentation evaluation for very-large datasets

Science.gov (United States)

Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

2016-03-01

With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.
Large-scale matrix-handling subroutines 'ATLAS'

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

1978-03-01

Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)
Really big data: Processing and analysis of large datasets

Science.gov (United States)

Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...
Random Coefficient Logit Model for Large Datasets

NARCIS (Netherlands)

C. Hernández-Mireles (Carlos); D. Fok (Dennis)

2010-01-01

textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,
Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

KAUST Repository

Zhang, Bohai; Sang, Huiyan; Huang, Jianhua Z.

2014-01-01

of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

KAUST Repository

Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

2018-01-01

Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

KAUST Repository

Müller, Matthias

2018-03-28

Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.
Extraction of drainage networks from large terrain datasets using high throughput computing

Science.gov (United States)

Gong, Jianya; Xie, Jibo

2009-02-01

Advanced digital photogrammetry and remote sensing technology produces large terrain datasets (LTD). How to process and use these LTD has become a big challenge for GIS users. Extracting drainage networks, which are basic for hydrological applications, from LTD is one of the typical applications of digital terrain analysis (DTA) in geographical information applications. Existing serial drainage algorithms cannot deal with large data volumes in a timely fashion, and few GIS platforms can process LTD beyond the GB size. High throughput computing (HTC), a distributed parallel computing mode, is proposed to improve the efficiency of drainage networks extraction from LTD. Drainage network extraction using HTC involves two key issues: (1) how to decompose the large DEM datasets into independent computing units and (2) how to merge the separate outputs into a final result. A new decomposition method is presented in which the large datasets are partitioned into independent computing units using natural watershed boundaries instead of using regular 1-dimensional (strip-wise) and 2-dimensional (block-wise) decomposition. Because the distribution of drainage networks is strongly related to watershed boundaries, the new decomposition method is more effective and natural. The method to extract natural watershed boundaries was improved by using multi-scale DEMs instead of single-scale DEMs. A HTC environment is employed to test the proposed methods with real datasets.
Valuation of large variable annuity portfolios: Monte Carlo simulation and synthetic datasets

Directory of Open Access Journals (Sweden)

Gan Guojun

2017-12-01

Full Text Available Metamodeling techniques have recently been proposed to address the computational issues related to the valuation of large portfolios of variable annuity contracts. However, it is extremely diffcult, if not impossible, for researchers to obtain real datasets frominsurance companies in order to test their metamodeling techniques on such real datasets and publish the results in academic journals. To facilitate the development and dissemination of research related to the effcient valuation of large variable annuity portfolios, this paper creates a large synthetic portfolio of variable annuity contracts based on the properties of real portfolios of variable annuities and implements a simple Monte Carlo simulation engine for valuing the synthetic portfolio. In addition, this paper presents fair market values and Greeks for the synthetic portfolio of variable annuity contracts that are important quantities for managing the financial risks associated with variable annuities. The resulting datasets can be used by researchers to test and compare the performance of various metamodeling techniques.
Updates to FuncLab, a Matlab based GUI for handling receiver functions

Science.gov (United States)

Porritt, Robert W.; Miller, Meghan S.

2018-02-01

Receiver functions are a versatile tool commonly used in seismic imaging. Depending on how they are processed, they can be used to image discontinuity structure within the crust or mantle or they can be inverted for seismic velocity either directly or jointly with complementary datasets. However, modern studies generally require large datasets which can be challenging to handle; therefore, FuncLab was originally written as an interactive Matlab GUI to assist in handling these large datasets. This software uses a project database to allow interactive trace editing, data visualization, H-κ stacking for crustal thickness and Vp/Vs ratio, and common conversion point stacking while minimizing computational costs. Since its initial release, significant advances have been made in the implementation of web services and changes in the underlying Matlab platform have necessitated a significant revision to the software. Here, we present revisions to the software, including new features such as data downloading via irisFetch.m, receiver function calculations via processRFmatlab, on-the-fly cross-section tools, interface picking, and more. In the descriptions of the tools, we present its application to a test dataset in Michigan, Wisconsin, and neighboring areas following the passage of USArray Transportable Array. The software is made available online at https://robporritt.wordpress.com/software.
Multiresolution comparison of precipitation datasets for large-scale models

Science.gov (United States)

Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

2014-12-01

Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.
Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

Science.gov (United States)

Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

2017-12-01

Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.
Distributed and parallel approach for handle and perform huge datasets

Science.gov (United States)

Konopko, Joanna

2015-12-01

Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.

Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

KAUST Repository

Zhang, Bohai

2014-01-01

Various continuously-indexed spatio-temporal process models have been constructed to characterize spatio-temporal dependence structures, but the computational complexity for model fitting and predictions grows in a cubic order with the size of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points. Our approach is applicable to nonseparable and nonstationary spatio-temporal covariance models. We illustrate the effectiveness of our method through simulation experiments and application to an ozone measurement dataset.
Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

OpenAIRE

Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

2014-01-01

International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...
Large-component handling equipment and its use

International Nuclear Information System (INIS)

Krieg, S.A.; Swannack, D.L.

1983-01-01

The Fast Flux Test Facility (FFTF) reactor systems have special requirements for component replacements during maintenance servicing. Replacement operations must address handling of equipment within shielded metal containers while maintaining an inert atmosphere to prevent reaction of sodium with air. Plant identification of a failed component results in selecting and assembling the maintenance cask and equipment transport system for transfer from the storage facility to the Reactor Containment Building (RCB). This includes a proper diameter and length cask, inert atmosphere control consoles, component lift fixture and support structure for interface with the facility area surrounding the component. This equipment is staged in modular groups in the Reactor Service Building for transfer through the equipment airlock to the containment interior. The failed component is generally prepared for replacement by installation of the special lifting fixture attachment. Assembly of the cask support structure is performed over the component position on the containment building operating floor. The cask and shroud from the reactor interface are inerted after all manual service connections and handling attachments are completed. The component is lifted from the reactor and into the cask interior through a floor valve which is then closed to isolate the component reactor port. The cask with sodium wetted component is transferred to a service/repair location, either within containment or outside, to the Maintenance Facility cleaning and repair area. The complete equipment and handling operations for replacement of a large reactor component are described
A Hybrid Neuro-Fuzzy Model For Integrating Large Earth-Science Datasets

Science.gov (United States)

Porwal, A.; Carranza, J.; Hale, M.

2004-12-01

A GIS-based hybrid neuro-fuzzy approach to integration of large earth-science datasets for mineral prospectivity mapping is described. It implements a Takagi-Sugeno type fuzzy inference system in the framework of a four-layered feed-forward adaptive neural network. Each unique combination of the datasets is considered a feature vector whose components are derived by knowledge-based ordinal encoding of the constituent datasets. A subset of feature vectors with a known output target vector (i.e., unique conditions known to be associated with either a mineralized or a barren location) is used for the training of an adaptive neuro-fuzzy inference system. Training involves iterative adjustment of parameters of the adaptive neuro-fuzzy inference system using a hybrid learning procedure for mapping each training vector to its output target vector with minimum sum of squared error. The trained adaptive neuro-fuzzy inference system is used to process all feature vectors. The output for each feature vector is a value that indicates the extent to which a feature vector belongs to the mineralized class or the barren class. These values are used to generate a prospectivity map. The procedure is demonstrated by an application to regional-scale base metal prospectivity mapping in a study area located in the Aravalli metallogenic province (western India). A comparison of the hybrid neuro-fuzzy approach with pure knowledge-driven fuzzy and pure data-driven neural network approaches indicates that the former offers a superior method for integrating large earth-science datasets for predictive spatial mathematical modelling.
An Investigation of Large Tilt-Rotor Hover and Low Speed Handling Qualities

Science.gov (United States)

Malpica, Carlos A.; Decker, William A.; Theodore, Colin R.; Lindsey, James E.; Lawrence, Ben; Blanken, Chris L.

2011-01-01

A piloted simulation experiment conducted on the NASA-Ames Vertical Motion Simulator evaluated the hover and low speed handling qualities of a large tilt-rotor concept, with particular emphasis on longitudinal and lateral position control. Ten experimental test pilots evaluated different combinations of Attitude Command-Attitude Hold (ACAH) and Translational Rate Command (TRC) response types, nacelle conversion actuator authority limits and inceptor choices. Pilots performed evaluations in revised versions of the ADS-33 Hover, Lateral Reposition and Depart/Abort MTEs and moderate turbulence conditions. Level 2 handling qualities ratings were primarily recorded using ACAH response type in all three of the evaluation maneuvers. The baseline TRC conferred Level 1 handling qualities in the Hover MTE, but there was a tendency to enter into a PIO associated with nacelle actuator rate limiting when employing large, aggressive control inputs. Interestingly, increasing rate limits also led to a reduction in the handling qualities ratings. This led to the identification of a nacelle rate to rotor longitudinal flapping coupling effect that induced undesired, pitching motions proportional to the allowable amount of nacelle rate. A modification that counteracted this effect significantly improved the handling qualities. Evaluation of the different response type variants showed that inclusion of TRC response could provide Level 1 handling qualities in the Lateral Reposition maneuver by reducing coupled pitch and heave off axis responses that otherwise manifest with ACAH. Finally, evaluations in the Depart/Abort maneuver showed that uncertainty about commanded nacelle position and ensuing aircraft response, when manually controlling the nacelle, demanded high levels of attention from the pilot. Additional requirements to maintain pitch attitude within 5 deg compounded the necessary workload.
A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

International Nuclear Information System (INIS)

Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

2014-01-01

With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative
Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

Science.gov (United States)

Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

2014-01-01

Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution
Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

Science.gov (United States)

Drouart, G.; Falkendal, T.

2018-04-01

We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.
An Improved TA-SVM Method Without Matrix Inversion and Its Fast Implementation for Nonstationary Datasets.

Science.gov (United States)

Shi, Yingzhong; Chung, Fu-Lai; Wang, Shitong

2015-09-01

Recently, a time-adaptive support vector machine (TA-SVM) is proposed for handling nonstationary datasets. While attractive performance has been reported and the new classifier is distinctive in simultaneously solving several SVM subclassifiers locally and globally by using an elegant SVM formulation in an alternative kernel space, the coupling of subclassifiers brings in the computation of matrix inversion, thus resulting to suffer from high computational burden in large nonstationary dataset applications. To overcome this shortcoming, an improved TA-SVM (ITA-SVM) is proposed using a common vector shared by all the SVM subclassifiers involved. ITA-SVM not only keeps an SVM formulation, but also avoids the computation of matrix inversion. Thus, we can realize its fast version, that is, improved time-adaptive core vector machine (ITA-CVM) for large nonstationary datasets by using the CVM technique. ITA-CVM has the merit of asymptotic linear time complexity for large nonstationary datasets as well as inherits the advantage of TA-SVM. The effectiveness of the proposed classifiers ITA-SVM and ITA-CVM is also experimentally confirmed.
Megastudies, crowdsourcing, and large datasets in psycholinguistics: An overview of recent developments.

Science.gov (United States)

Keuleers, Emmanuel; Balota, David A

2015-01-01

This paper introduces and summarizes the special issue on megastudies, crowdsourcing, and large datasets in psycholinguistics. We provide a brief historical overview and show how the papers in this issue have extended the field by compiling new databases and making important theoretical contributions. In addition, we discuss several studies that use text corpora to build distributional semantic models to tackle various interesting problems in psycholinguistics. Finally, as is the case across the papers, we highlight some methodological issues that are brought forth via the analyses of such datasets.
Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

KAUST Repository

Huang, Huang

2017-07-16

This thesis focuses on two topics, computational methods for large spatial datasets and functional data ranking. Both are tackling the challenges of big and high-dimensional data. The first topic is motivated by the prohibitive computational burden in fitting Gaussian process models to large and irregularly spaced spatial datasets. Various approximation methods have been introduced to reduce the computational cost, but many rely on unrealistic assumptions about the process and retaining statistical efficiency remains an issue. We propose a new scheme to approximate the maximum likelihood estimator and the kriging predictor when the exact computation is infeasible. The proposed method provides different types of hierarchical low-rank approximations that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements with the hierarchical low-rank approximation and apply the proposed fast kriging to fill gaps for satellite images. The second topic is motivated by rank-based outlier detection methods for functional data. Compared to magnitude outliers, it is more challenging to detect shape outliers as they are often masked among samples. We develop a new notion of functional data depth by taking the integration of a univariate depth function. Having a form of the integrated depth, it shares many desirable features. Furthermore, the novel formation leads to a useful decomposition for detecting both shape and magnitude outliers. Our simulation studies show the proposed outlier detection procedure outperforms competitors in various outlier models. We also illustrate our methodology using real datasets of curves, images, and video frames. Finally, we introduce the functional data ranking technique to spatio-temporal statistics for visualizing and assessing covariance properties, such as
Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

Science.gov (United States)

Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

2014-01-01

As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The role of metadata in managing large environmental science datasets. Proceedings

Energy Technology Data Exchange (ETDEWEB)

Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

1995-06-01

The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.
RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

DEFF Research Database (Denmark)

Jensen, Tue Vissing; Pinson, Pierre

2017-01-01

, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven...... to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecastingof renewable power generation....
Image-based Exploration of Large-Scale Pathline Fields

KAUST Repository

Nagoor, Omniah H.

2014-05-27

While real-time applications are nowadays routinely used in visualizing large nu- merical simulations and volumes, handling these large-scale datasets requires high-end graphics clusters or supercomputers to process and visualize them. However, not all users have access to powerful clusters. Therefore, it is challenging to come up with a visualization approach that provides insight to large-scale datasets on a single com- puter. Explorable images (EI) is one of the methods that allows users to handle large data on a single workstation. Although it is a view-dependent method, it combines both exploration and modification of visual aspects without re-accessing the original huge data. In this thesis, we propose a novel image-based method that applies the concept of EI in visualizing large flow-field pathlines data. The goal of our work is to provide an optimized image-based method, which scales well with the dataset size. Our approach is based on constructing a per-pixel linked list data structure in which each pixel contains a list of pathlines segments. With this view-dependent method it is possible to filter, color-code and explore large-scale flow data in real-time. In addition, optimization techniques such as early-ray termination and deferred shading are applied, which further improves the performance and scalability of our approach.
Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Directory of Open Access Journals (Sweden)

Sai Kiranmayee Samudrala

2015-01-01

Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
Palmprint and Palmvein Recognition Based on DCNN and A New Large-Scale Contactless Palmvein Dataset

Directory of Open Access Journals (Sweden)

Lin Zhang

2018-03-01

Full Text Available Among the members of biometric identifiers, the palmprint and the palmvein have received significant attention due to their stability, uniqueness, and non-intrusiveness. In this paper, we investigate the problem of palmprint/palmvein recognition and propose a Deep Convolutional Neural Network (DCNN based scheme, namely P a l m R CNN (short for palmprint/palmvein recognition using CNNs. The effectiveness and efficiency of P a l m R CNN have been verified through extensive experiments conducted on benchmark datasets. In addition, though substantial effort has been devoted to palmvein recognition, it is still quite difficult for the researchers to know the potential discriminating capability of the contactless palmvein. One of the root reasons is that a large-scale and publicly available dataset comprising high-quality, contactless palmvein images is still lacking. To this end, a user-friendly acquisition device for collecting high quality contactless palmvein images is at first designed and developed in this work. Then, a large-scale palmvein image dataset is established, comprising 12,000 images acquired from 600 different palms in two separate collection sessions. The collected dataset now is publicly available.
Large scale validation of the M5L lung CAD on heterogeneous CT datasets

Energy Technology Data Exchange (ETDEWEB)

Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [CEADEN, Havana 11300, Cuba and INFN, Sezione di Torino, Torino 10125 (Italy); Fiorina, E.; Pennazio, F.; Peroni, C. [Department of Physics, University of Torino, Torino 10125, Italy and INFN, Sezione di Torino, Torino 10125 (Italy); Saletta, M.; Cerello, P., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [INFN, Sezione di Torino, Torino 10125 (Italy); Camarlinghi, N.; Fantacci, M. E. [Department of Physics, University of Pisa, Pisa 56127, Italy and INFN, Sezione di Pisa, Pisa 56127 (Italy)

2015-04-15

Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large
Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

International Nuclear Information System (INIS)

Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

2010-01-01

Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.
Orthology detection combining clustering and synteny for very large datasets.

Science.gov (United States)

Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

2014-01-01

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

A Multi-Resolution Spatial Model for Large Datasets Based on the Skew-t Distribution

KAUST Repository

Tagle, Felipe

2017-12-06

Large, non-Gaussian spatial datasets pose a considerable modeling challenge as the dependence structure implied by the model needs to be captured at different scales, while retaining feasible inference. Skew-normal and skew-t distributions have only recently begun to appear in the spatial statistics literature, without much consideration, however, for the ability to capture dependence at multiple resolutions, and simultaneously achieve feasible inference for increasingly large data sets. This article presents the first multi-resolution spatial model inspired by the skew-t distribution, where a large-scale effect follows a multivariate normal distribution and the fine-scale effects follow a multivariate skew-normal distributions. The resulting marginal distribution for each region is skew-t, thereby allowing for greater flexibility in capturing skewness and heavy tails characterizing many environmental datasets. Likelihood-based inference is performed using a Monte Carlo EM algorithm. The model is applied as a stochastic generator of daily wind speeds over Saudi Arabia.
Managing Large Multidimensional Array Hydrologic Datasets : A Case Study Comparing NetCDF and SciDB

NARCIS (Netherlands)

Liu, H.; van Oosterom, P.J.M.; Hu, C.; Wang, Wen

2016-01-01

Management of large hydrologic datasets including storage, structuring, indexing and query is one of the crucial challenges in the era of big data. This research originates from a specific data query problem: time series extraction at specific locations takes a long time when a large
Orthology detection combining clustering and synteny for very large datasets.

Directory of Open Access Journals (Sweden)

Marcus Lechner

Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Preconditioned dynamic mode decomposition and mode selection algorithms for large datasets using incremental proper orthogonal decomposition

Science.gov (United States)

Ohmichi, Yuya

2017-07-01

In this letter, we propose a simple and efficient framework of dynamic mode decomposition (DMD) and mode selection for large datasets. The proposed framework explicitly introduces a preconditioning step using an incremental proper orthogonal decomposition (POD) to DMD and mode selection algorithms. By performing the preconditioning step, the DMD and mode selection can be performed with low memory consumption and therefore can be applied to large datasets. Additionally, we propose a simple mode selection algorithm based on a greedy method. The proposed framework is applied to the analysis of three-dimensional flow around a circular cylinder.
REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

Science.gov (United States)

Moulik, P.; Lekic, V.; Romanowicz, B. A.

2017-12-01

A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history
Animated analysis of geoscientific datasets: An interactive graphical application

Science.gov (United States)

Morse, Peter; Reading, Anya; Lueg, Christopher

2017-12-01

Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.
A large-scale dataset of solar event reports from automated feature recognition modules

Science.gov (United States)

Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

2016-05-01

The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.
Handling limited datasets with neural networks in medical applications: A small-data approach.

Science.gov (United States)

Shaikhina, Torgyn; Khovanova, Natalia A

2017-01-01

Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
[Parallel virtual reality visualization of extreme large medical datasets].

Science.gov (United States)

Tang, Min

2010-04-01

On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.
Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

Science.gov (United States)

Juric, Mario

2011-01-01

The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.
RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

Science.gov (United States)

Jensen, Tue V.; Pinson, Pierre

2017-11-01

Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.
RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

Science.gov (United States)

Jensen, Tue V; Pinson, Pierre

2017-11-28

Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.
Principal Component Analysis of Process Datasets with Missing Values

Directory of Open Access Journals (Sweden)

Kristen A. Severson

2017-07-01

Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.
A Large-Scale 3D Object Recognition dataset

DEFF Research Database (Denmark)

Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

2016-01-01

geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...
Extended data analysis strategies for high resolution imaging MS : new methods to deal with extremely large image hyperspectral datasets

NARCIS (Netherlands)

Klerk, L.A.; Broersen, A.; Fletcher, I.W.; Liere, van R.; Heeren, R.M.A.

2007-01-01

The large size of the hyperspectral datasets that are produced with modern mass spectrometric imaging techniques makes it difficult to analyze the results. Unsupervised statistical techniques are needed to extract relevant information from these datasets and reduce the data into a surveyable
Immersive Interaction, Manipulation and Analysis of Large 3D Datasets for Planetary and Earth Sciences

Science.gov (United States)

Pariser, O.; Calef, F.; Manning, E. M.; Ardulov, V.

2017-12-01

We will present implementation and study of several use-cases of utilizing Virtual Reality (VR) for immersive display, interaction and analysis of large and complex 3D datasets. These datasets have been acquired by the instruments across several Earth, Planetary and Solar Space Robotics Missions. First, we will describe the architecture of the common application framework that was developed to input data, interface with VR display devices and program input controllers in various computing environments. Tethered and portable VR technologies will be contrasted and advantages of each highlighted. We'll proceed to presenting experimental immersive analytics visual constructs that enable augmentation of 3D datasets with 2D ones such as images and statistical and abstract data. We will conclude by presenting comparative analysis with traditional visualization applications and share the feedback provided by our users: scientists and engineers.
Final Progress Report for 'An Abstract Job Handling Grid Service for Dataset Analysis'

International Nuclear Information System (INIS)

David A Alexander

2005-01-01

For Phase I of the Job Handling project, Tech-X has built a Grid service for processing analysis requests, as well as a Graphical User Interface (GUI) client that uses the service. The service is designed to generically support High-Energy Physics (HEP) experimental analysis tasks. It has an extensible, flexible, open architecture and language. The service uses the Solenoidal Tracker At RHIC (STAR) experiment as a working example. STAR is an experiment at the Relativistic Heavy Ion Collider (RHIC) at the Brookhaven National Laboratory (BNL). STAR and other experiments at BNL generate multiple Petabytes of HEP data. The raw data is captured as millions of input files stored in a distributed data catalog. Potentially using thousands of files as input, analysis requests are submitted to a processing environment containing thousands of nodes. The Grid service provides a standard interface to the processing farm. It enables researchers to run large-scale, massively parallel analysis tasks, regardless of the computational resources available in their location
Robust computational analysis of rRNA hypervariable tag datasets.

Directory of Open Access Journals (Sweden)

Maksim Sipos

Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.
Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

Science.gov (United States)

Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian;

2015-01-01

Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex

Handling Data Skew in MapReduce Cluster by Using Partition Tuning

Directory of Open Access Journals (Sweden)

Yufei Gao

2017-01-01

Full Text Available The healthcare industry has generated large amounts of data, and analyzing these has emerged as an important problem in recent years. The MapReduce programming model has been successfully used for big data analytics. However, data skew invariably occurs in big data analytics and seriously affects efficiency. To overcome the data skew problem in MapReduce, we have in the past proposed a data processing algorithm called Partition Tuning-based Skew Handling (PTSH. In comparison with the one-stage partitioning strategy used in the traditional MapReduce model, PTSH uses a two-stage strategy and the partition tuning method to disperse key-value pairs in virtual partitions and recombines each partition in case of data skew. The robustness and efficiency of the proposed algorithm were tested on a wide variety of simulated datasets and real healthcare datasets. The results showed that PTSH algorithm can handle data skew in MapReduce efficiently and improve the performance of MapReduce jobs in comparison with the native Hadoop, Closer, and locality-aware and fairness-aware key partitioning (LEEN. We also found that the time needed for rule extraction can be reduced significantly by adopting the PTSH algorithm, since it is more suitable for association rule mining (ARM on healthcare data.

On the visualization of water-related big data: extracting insights from drought proxies' datasets

Science.gov (United States)

Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

2017-04-01

Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was
Safe Patient Handling and Mobility: Development and Implementation of a Large-Scale Education Program.

Science.gov (United States)

Lee, Corinne; Knight, Suzanne W; Smith, Sharon L; Nagle, Dorothy J; DeVries, Lori

This article addresses the development, implementation, and evaluation of an education program for safe patient handling and mobility at a large academic medical center. The ultimate goal of the program was to increase safety during patient mobility/transfer and reduce nursing staff injury from lifting/pulling. This comprehensive program was designed on the basis of the principles of prework, application, and support at the point of care. A combination of online learning, demonstration, skill evaluation, and coaching at the point of care was used to achieve the goal. Specific roles and responsibilities were developed to facilitate implementation. It took 17 master trainers, 88 certified trainers, 176 unit-based trainers, and 98 coaches to put 3706 nurses and nursing assistants through the program. Evaluations indicated both an increase in knowledge about safe patient handling and an increased ability to safely mobilize patients. The challenge now is sustainability of safe patient-handling practices and the growth and development of trainers and coaches.
VisIVO: A Library and Integrated Tools for Large Astrophysical Dataset Exploration

Science.gov (United States)

Becciani, U.; Costa, A.; Ersotelos, N.; Krokos, M.; Massimino, P.; Petta, C.; Vitello, F.

2012-09-01

VisIVO provides an integrated suite of tools and services that can be used in many scientific fields. VisIVO development starts in the Virtual Observatory framework. VisIVO allows users to visualize meaningfully highly-complex, large-scale datasets and create movies of these visualizations based on distributed infrastructures. VisIVO supports high-performance, multi-dimensional visualization of large-scale astrophysical datasets. Users can rapidly obtain meaningful visualizations while preserving full and intuitive control of the relevant parameters. VisIVO consists of VisIVO Desktop - a stand-alone application for interactive visualization on standard PCs, VisIVO Server - a platform for high performance visualization, VisIVO Web - a custom designed web portal, VisIVOSmartphone - an application to exploit the VisIVO Server functionality and the latest VisIVO features: VisIVO Library allows a job running on a computational system (grid, HPC, etc.) to produce movies directly with the code internal data arrays without the need to produce intermediate files. This is particularly important when running on large computational facilities, where the user wants to have a look at the results during the data production phase. For example, in grid computing facilities, images can be produced directly in the grid catalogue while the user code is running in a system that cannot be directly accessed by the user (a worker node). The deployment of VisIVO on the DG and gLite is carried out with the support of EDGI and EGI-Inspire projects. Depending on the structure and size of datasets under consideration, the data exploration process could take several hours of CPU for creating customized views and the production of movies could potentially last several days. For this reason an MPI parallel version of VisIVO could play a fundamental role in increasing performance, e.g. it could be automatically deployed on nodes that are MPI aware. A central concept in our development is thus to
StarDB: a large-scale DBMS for strings

KAUST Repository

Sahli, Majed

2015-08-01

Strings and applications using them are proliferating in science and business. Currently, strings are stored in file systems and processed using ad-hoc procedural code. Existing techniques are not flexible and cannot efficiently handle complex queries or large datasets. In this paper, we demonstrate StarDB, a distributed database system for analytics on strings. StarDB hides data and system complexities and allows users to focus on analytics. It uses a comprehensive set of parallel string operations and provides a declarative query language to solve complex queries. StarDB automatically tunes itself and runs with over 90% efficiency on supercomputers, public clouds, clusters, and workstations. We test StarDB using real datasets that are 2 orders of magnitude larger than the datasets reported by previous works.
CUDA based Level Set Method for 3D Reconstruction of Fishes from Large Acoustic Data

DEFF Research Database (Denmark)

Sharma, Ojaswa; Anton, François

2009-01-01

Acoustic images present views of underwater dynamics, even in high depths. With multi-beam echo sounders (SONARs), it is possible to capture series of 2D high resolution acoustic images. 3D reconstruction of the water column and subsequent estimation of fish abundance and fish species identificat...... of suppressing threshold and show its convergence as the evolution proceeds. We also present a GPU based streaming computation of the method using NVIDIA's CUDA framework to handle large volume data-sets. Our implementation is optimised for memory usage to handle large volumes....
An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

Directory of Open Access Journals (Sweden)

Kang Zhang

2014-01-01

Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.
The NOAA Dataset Identifier Project

Science.gov (United States)

de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

2013-12-01

The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.
Ergonomic material-handling device

Science.gov (United States)

Barsnick, Lance E.; Zalk, David M.; Perry, Catherine M.; Biggs, Terry; Tageson, Robert E.

2004-08-24

A hand-held ergonomic material-handling device capable of moving heavy objects, such as large waste containers and other large objects requiring mechanical assistance. The ergonomic material-handling device can be used with neutral postures of the back, shoulders, wrists and knees, thereby reducing potential injury to the user. The device involves two key features: 1) gives the user the ability to adjust the height of the handles of the device to ergonomically fit the needs of the user's back, wrists and shoulders; and 2) has a rounded handlebar shape, as well as the size and configuration of the handles which keep the user's wrists in a neutral posture during manipulation of the device.
Viking Seismometer PDS Archive Dataset

Science.gov (United States)

Lorenz, R. D.

2016-12-01

The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.
Extending SME to Handle Large-Scale Cognitive Modeling.

Science.gov (United States)

Forbus, Kenneth D; Ferguson, Ronald W; Lovett, Andrew; Gentner, Dedre

2017-07-01

Analogy and similarity are central phenomena in human cognition, involved in processes ranging from visual perception to conceptual change. To capture this centrality requires that a model of comparison must be able to integrate with other processes and handle the size and complexity of the representations required by the tasks being modeled. This paper describes extensions to Structure-Mapping Engine (SME) since its inception in 1986 that have increased its scope of operation. We first review the basic SME algorithm, describe psychological evidence for SME as a process model, and summarize its role in simulating similarity-based retrieval and generalization. Then we describe five techniques now incorporated into the SME that have enabled it to tackle large-scale modeling tasks: (a) Greedy merging rapidly constructs one or more best interpretations of a match in polynomial time: O(n 2 log(n)); (b) Incremental operation enables mappings to be extended as new information is retrieved or derived about the base or target, to model situations where information in a task is updated over time; (c) Ubiquitous predicates model the varying degrees to which items may suggest alignment; (d) Structural evaluation of analogical inferences models aspects of plausibility judgments; (e) Match filters enable large-scale task models to communicate constraints to SME to influence the mapping process. We illustrate via examples from published studies how these enable it to capture a broader range of psychological phenomena than before. Copyright © 2016 Cognitive Science Society, Inc.
Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

Directory of Open Access Journals (Sweden)

Folorunso Olufemi A.

2011-04-01

Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.
How Retailers Handle Complaint Management

DEFF Research Database (Denmark)

Hansen, Torben; Wilke, Ricky; Zaichkowsky, Judy

2009-01-01

This article fills a gap in the literature by providing insight about the handling of complaint management (CM) across a large cross section of retailers in the grocery, furniture, electronic and auto sectors. Determinants of retailers’ CM handling are investigated and insight is gained as to the......This article fills a gap in the literature by providing insight about the handling of complaint management (CM) across a large cross section of retailers in the grocery, furniture, electronic and auto sectors. Determinants of retailers’ CM handling are investigated and insight is gained...... as to the links between CM and redress of consumers’ complaints. The results suggest that retailers who attach large negative consequences to consumer dissatisfaction are more likely than other retailers to develop a positive strategic view on customer complaining, but at the same time an increase in perceived...
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying; Stein, Michael L.

2014-01-01

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

KAUST Repository

Sun, Ying

2014-11-07

For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.
Image-based Exploration of Iso-surfaces for Large Multi- Variable Datasets using Parameter Space.

KAUST Repository

Binyahib, Roba S.

2013-05-13

With an increase in processing power, more complex simulations have resulted in larger data size, with higher resolution and more variables. Many techniques have been developed to help the user to visualize and analyze data from such simulations. However, dealing with a large amount of multivariate data is challenging, time- consuming and often requires high-end clusters. Consequently, novel visualization techniques are needed to explore such data. Many users would like to visually explore their data and change certain visual aspects without the need to use special clusters or having to load a large amount of data. This is the idea behind explorable images (EI). Explorable images are a novel approach that provides limited interactive visualization without the need to re-render from the original data [40]. In this work, the concept of EI has been used to create a workflow that deals with explorable iso-surfaces for scalar fields in a multivariate, time-varying dataset. As a pre-processing step, a set of iso-values for each scalar field is inferred and extracted from a user-assisted sampling technique in time-parameter space. These iso-values are then used to generate iso- surfaces that are then pre-rendered (from a fixed viewpoint) along with additional buffers (i.e. normals, depth, values of other fields, etc.) to provide a compressed representation of iso-surfaces in the dataset. We present a tool that at run-time allows the user to interactively browse and calculate a combination of iso-surfaces superimposed on each other. The result is the same as calculating multiple iso- surfaces from the original data but without the memory and processing overhead. Our tool also allows the user to change the (scalar) values superimposed on each of the surfaces, modify their color map, and interactively re-light the surfaces. We demonstrate the effectiveness of our approach over a multi-terabyte combustion dataset. We also illustrate the efficiency and accuracy of our
Large-scale machine learning and evaluation platform for real-time traffic surveillance

Science.gov (United States)

Eichel, Justin A.; Mishra, Akshaya; Miller, Nicholas; Jankovic, Nicholas; Thomas, Mohan A.; Abbott, Tyler; Swanson, Douglas; Keller, Joel

2016-09-01

In traffic engineering, vehicle detectors are trained on limited datasets, resulting in poor accuracy when deployed in real-world surveillance applications. Annotating large-scale high-quality datasets is challenging. Typically, these datasets have limited diversity; they do not reflect the real-world operating environment. There is a need for a large-scale, cloud-based positive and negative mining process and a large-scale learning and evaluation system for the application of automatic traffic measurements and classification. The proposed positive and negative mining process addresses the quality of crowd sourced ground truth data through machine learning review and human feedback mechanisms. The proposed learning and evaluation system uses a distributed cloud computing framework to handle data-scaling issues associated with large numbers of samples and a high-dimensional feature space. The system is trained using AdaBoost on 1,000,000 Haar-like features extracted from 70,000 annotated video frames. The trained real-time vehicle detector achieves an accuracy of at least 95% for 1/2 and about 78% for 19/20 of the time when tested on ˜7,500,000 video frames. At the end of 2016, the dataset is expected to have over 1 billion annotated video frames.
Benchmarking Deep Learning Models on Large Healthcare Datasets.

Science.gov (United States)

Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

2018-06-04

Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.
Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

Directory of Open Access Journals (Sweden)

Fei Hu

2018-04-01

Full Text Available Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a system design and architecture (distributed architecture, logical data model, physical data model, and data operations; and (b practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption. Four major conclusions are offered: (1 no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2 SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3 SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4 SciDB, Spark, and Hive have better support of user defined functions (UDFs to extend the system capability.
Spectral methods in machine learning and new strategies for very large datasets

Science.gov (United States)

Belabbas, Mohamed-Ali; Wolfe, Patrick J.

2009-01-01

Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490
Computing and data handling requirements for SSC [Superconducting Super Collider] and LHC [Large Hadron Collider] experiments

International Nuclear Information System (INIS)

Lankford, A.J.

1990-05-01

A number of issues for computing and data handling in the online in environment at future high-luminosity, high-energy colliders, such as the Superconducting Super Collider (SSC) and Large Hadron Collider (LHC), are outlined. Requirements for trigger processing, data acquisition, and online processing are discussed. Some aspects of possible solutions are sketched. 6 refs., 3 figs

Information contained within the large scale gas injection test (Lasgit) dataset exposed using a bespoke data analysis tool-kit

International Nuclear Information System (INIS)

Bennett, D.P.; Thomas, H.R.; Cuss, R.J.; Harrington, J.F.; Vardon, P.J.

2012-01-01

Document available in extended abstract form only. The Large Scale Gas Injection Test (Lasgit) is a field scale experiment run by the British Geological Survey (BGS) and is located approximately 420 m underground at SKB's Aespoe Hard Rock Laboratory (HRL) in Sweden. It has been designed to study the impact on safety of gas build up within a KBS-3V concept high level radioactive waste repository. Lasgit has been in almost continuous operation for approximately seven years and is still underway. An analysis of the dataset arising from the Lasgit experiment with particular attention to the smaller scale features and phenomenon recorded has been undertaken in parallel to the macro scale analysis performed by the BGS. Lasgit is a highly instrumented, frequently sampled and long-lived experiment leading to a substantial dataset containing in excess of 14.7 million datum points. The data is anticipated to include a wealth of information, including information regarding overall processes as well as smaller scale or 'second order' features. Due to the size of the dataset coupled with the detailed analysis of the dataset required and the reduction in subjectivity associated with measurement compared to observation, computational analysis is essential. Moreover, due to the length of operation and complexity of experimental activity, the Lasgit dataset is not typically suited to 'out of the box' time series analysis algorithms. In particular, the features that are not suited to standard algorithms include non-uniformities due to (deliberate) changes in sample rate at various points in the experimental history and missing data due to hardware malfunction/failure causing interruption of logging cycles. To address these features a computational tool-kit capable of performing an Exploratory Data Analysis (EDA) on long-term, large-scale datasets with non-uniformities has been developed. Particular tool-kit abilities include: the parameterization of signal variation in the dataset
An assessment of differences in gridded precipitation datasets in complex terrain

Science.gov (United States)

Henn, Brian; Newman, Andrew J.; Livneh, Ben; Daly, Christopher; Lundquist, Jessica D.

2018-01-01

Hydrologic modeling and other geophysical applications are sensitive to precipitation forcing data quality, and there are known challenges in spatially distributing gauge-based precipitation over complex terrain. We conduct a comparison of six high-resolution, daily and monthly gridded precipitation datasets over the Western United States. We compare the long-term average spatial patterns, and interannual variability of water-year total precipitation, as well as multi-year trends in precipitation across the datasets. We find that the greatest absolute differences among datasets occur in high-elevation areas and in the maritime mountain ranges of the Western United States, while the greatest percent differences among datasets relative to annual total precipitation occur in arid and rain-shadowed areas. Differences between datasets in some high-elevation areas exceed 200 mm yr-1 on average, and relative differences range from 5 to 60% across the Western United States. In areas of high topographic relief, true uncertainties and biases are likely higher than the differences among the datasets; we present evidence of this based on streamflow observations. Precipitation trends in the datasets differ in magnitude and sign at smaller scales, and are sensitive to how temporal inhomogeneities in the underlying precipitation gauge data are handled.
A Perspective on Remote Handling Operations and Human Machine Interface for Remote Handling in Fusion

International Nuclear Information System (INIS)

Haist, B.; Hamilton, D.; Sanders, St.

2006-01-01

A large-scale fusion device presents many challenges to the remote handling operations team. This paper is based on unique operational experience at JET and gives a perspective on remote handling task development, logistics and resource management, as well as command, control and human-machine interface systems. Remote operations require an accurate perception of a dynamic environment, ideally providing the operators with the same unrestricted knowledge of the task scene as would be available if they were actually at the remote work location. Traditional camera based systems suffer from a limited number of viewpoints and also degrade quickly when exposed to high radiation. Virtual Reality and Augmented Reality software offer great assistance. The remote handling system required to maintain a tokamak requires a large number of different and complex pieces of equipment coordinating to perform a large array of tasks. The demands on the operator's skill in performing the tasks can escalate to a point where the efficiency and safety of operations are compromised. An operations guidance system designed to facilitate the planning, development, validation and execution of remote handling procedures is essential. Automatic planning of motion trajectories of remote handling equipment and the remote transfer of heavy loads will be routine and need to be reliable. This paper discusses the solutions developed at JET in these areas and also the trends in management and presentation of operational data as well as command, control and HMI technology development offering the potential to greatly assist remote handling in future fusion machines. (author)
Privacy-preserving record linkage on large real world datasets.

Science.gov (United States)

Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

2014-08-01

Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised. Copyright © 2013 Elsevier Inc. All rights reserved.
Genetic architecture of vitamin B12 and folate levels uncovered applying deeply sequenced large datasets

DEFF Research Database (Denmark)

Grarup, Niels; Sulem, Patrick; Sandholt, Camilla H

2013-01-01

of the underlying biology of human traits and diseases. Here, we used a large Icelandic whole genome sequence dataset combined with Danish exome sequence data to gain insight into the genetic architecture of serum levels of vitamin B12 (B12) and folate. Up to 22.9 million sequence variants were analyzed in combined...... in serum B12 or folate levels do not modify the risk of developing these conditions. Yet, the study demonstrates the value of combining whole genome and exome sequencing approaches to ascertain the genetic and molecular architectures underlying quantitative trait associations....
A high-resolution European dataset for hydrologic modeling

Science.gov (United States)

Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

2013-04-01

There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as
CoVennTree: A new method for the comparative analysis of large datasets

Directory of Open Access Journals (Sweden)

Steffen C. Lott

2015-02-01

Full Text Available The visualization of massive datasets, such as those resulting from comparative metatranscriptome analyses or the analysis of microbial population structures using ribosomal RNA sequences, is a challenging task. We developed a new method called CoVennTree (Comparative weighted Venn Tree that simultaneously compares up to three multifarious datasets by aggregating and propagating information from the bottom to the top level and produces a graphical output in Cytoscape. With the introduction of weighted Venn structures, the contents and relationships of various datasets can be correlated and simultaneously aggregated without losing information. We demonstrate the suitability of this approach using a dataset of 16S rDNA sequences obtained from microbial populations at three different depths of the Gulf of Aqaba in the Red Sea. CoVennTree has been integrated into the Galaxy ToolShed and can be directly downloaded and integrated into the user instance.
The Management Challenge: Handling Exams Involving Large Quantities of Students, on and off Campus--A Design Concept

Science.gov (United States)

Larsson, Ken

2014-01-01

This paper looks at the process of managing large numbers of exams efficiently and secure with the use of a dedicated IT support. The system integrates regulations on different levels, from national to local, (even down to departments) and ensures that the rules are employed in all stages of handling the exams. The system has a proven record of…
Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

Energy Technology Data Exchange (ETDEWEB)

Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu, E-mail: lssqlh@mail.sysu.edu.cn; Yang, Jian-Hua, E-mail: lssqlh@mail.sysu.edu.cn [RNA Information Center, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Sun Yat-sen University, Guangzhou (China)

2015-01-14

Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.
Discovery of Protein–lncRNA Interactions by Integrating Large-Scale CLIP-Seq and RNA-Seq Datasets

International Nuclear Information System (INIS)

Li, Jun-Hao; Liu, Shun; Zheng, Ling-Ling; Wu, Jie; Sun, Wen-Ju; Wang, Ze-Lin; Zhou, Hui; Qu, Liang-Hu; Yang, Jian-Hua

2015-01-01

Long non-coding RNAs (lncRNAs) are emerging as important regulatory molecules in developmental, physiological, and pathological processes. However, the precise mechanism and functions of most of lncRNAs remain largely unknown. Recent advances in high-throughput sequencing of immunoprecipitated RNAs after cross-linking (CLIP-Seq) provide powerful ways to identify biologically relevant protein–lncRNA interactions. In this study, by analyzing millions of RNA-binding protein (RBP) binding sites from 117 CLIP-Seq datasets generated by 50 independent studies, we identified 22,735 RBP–lncRNA regulatory relationships. We found that one single lncRNA will generally be bound and regulated by one or multiple RBPs, the combination of which may coordinately regulate gene expression. We also revealed the expression correlation of these interaction networks by mining expression profiles of over 6000 normal and tumor samples from 14 cancer types. Our combined analysis of CLIP-Seq data and genome-wide association studies data discovered hundreds of disease-related single nucleotide polymorphisms resided in the RBP binding sites of lncRNAs. Finally, we developed interactive web implementations to provide visualization, analysis, and downloading of the aforementioned large-scale datasets. Our study represented an important step in identification and analysis of RBP–lncRNA interactions and showed that these interactions may play crucial roles in cancer and genetic diseases.
The Path from Large Earth Science Datasets to Information

Science.gov (United States)

Vicente, G. A.

2013-12-01

The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.
Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

Directory of Open Access Journals (Sweden)

Kindie Biredagn Nahato

2015-01-01

Full Text Available The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.
Exploring massive, genome scale datasets with the GenometriCorr package.

Directory of Open Access Journals (Sweden)

Alexander Favorov

2012-05-01

Full Text Available We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features, for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets.The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.
Exploring massive, genome scale datasets with the genometricorr package

KAUST Repository

Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

2012-01-01

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.
Exploring massive, genome scale datasets with the genometricorr package

KAUST Repository

Favorov, Alexander

2012-05-31

We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.
Measurement and genetics of human subcortical and hippocampal asymmetries in large datasets.

Science.gov (United States)

Guadalupe, Tulio; Zwiers, Marcel P; Teumer, Alexander; Wittfeld, Katharina; Vasquez, Alejandro Arias; Hoogman, Martine; Hagoort, Peter; Fernandez, Guillen; Buitelaar, Jan; Hegenscheid, Katrin; Völzke, Henry; Franke, Barbara; Fisher, Simon E; Grabe, Hans J; Francks, Clyde

2014-07-01

Functional and anatomical asymmetries are prevalent features of the human brain, linked to gender, handedness, and cognition. However, little is known about the neurodevelopmental processes involved. In zebrafish, asymmetries arise in the diencephalon before extending within the central nervous system. We aimed to identify genes involved in the development of subtle, left-right volumetric asymmetries of human subcortical structures using large datasets. We first tested the feasibility of measuring left-right volume differences in such large-scale samples, as assessed by two automated methods of subcortical segmentation (FSL|FIRST and FreeSurfer), using data from 235 subjects who had undergone MRI twice. We tested the agreement between the first and second scan, and the agreement between the segmentation methods, for measures of bilateral volumes of six subcortical structures and the hippocampus, and their volumetric asymmetries. We also tested whether there were biases introduced by left-right differences in the regional atlases used by the methods, by analyzing left-right flipped images. While many bilateral volumes were measured well (scan-rescan r = 0.6-0.8), most asymmetries, with the exception of the caudate nucleus, showed lower repeatabilites. We meta-analyzed genome-wide association scan results for caudate nucleus asymmetry in a combined sample of 3,028 adult subjects but did not detect associations at genome-wide significance (P left-right patterning of the viscera. Our results provide important information for researchers who are currently aiming to carry out large-scale genome-wide studies of subcortical and hippocampal volumes, and their asymmetries. Copyright © 2013 Wiley Periodicals, Inc.
Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

Science.gov (United States)

Lary, D. J.

2013-12-01

A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.
Comprehensive comparison of large-scale tissue expression datasets

DEFF Research Database (Denmark)

Santos Delgado, Alberto; Tsafou, Kalliopi; Stolte, Christian

2015-01-01

a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between......For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present......://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface....
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

Science.gov (United States)

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Open and scalable analytics of large Earth observation datasets: From scenes to multidimensional arrays using SciDB and GDAL

Science.gov (United States)

Appel, Marius; Lahn, Florian; Buytaert, Wouter; Pebesma, Edzer

2018-04-01

Earth observation (EO) datasets are commonly provided as collection of scenes, where individual scenes represent a temporal snapshot and cover a particular region on the Earth's surface. Using these data in complex spatiotemporal modeling becomes difficult as soon as data volumes exceed a certain capacity or analyses include many scenes, which may spatially overlap and may have been recorded at different dates. In order to facilitate analytics on large EO datasets, we combine and extend the geospatial data abstraction library (GDAL) and the array-based data management and analytics system SciDB. We present an approach to automatically convert collections of scenes to multidimensional arrays and use SciDB to scale computationally intensive analytics. We evaluate the approach in three study cases on national scale land use change monitoring with Landsat imagery, global empirical orthogonal function analysis of daily precipitation, and combining historical climate model projections with satellite-based observations. Results indicate that the approach can be used to represent various EO datasets and that analyses in SciDB scale well with available computational resources. To simplify analyses of higher-dimensional datasets as from climate model output, however, a generalization of the GDAL data model might be needed. All parts of this work have been implemented as open-source software and we discuss how this may facilitate open and reproducible EO analyses.

Sophisticated fuel handling system evolved

International Nuclear Information System (INIS)

Ross, D.A.

1988-01-01

The control systems at Sellafield fuel handling plant are described. The requirements called for built-in diagnostic features as well as the ability to handle a large sequencing application. Speed was also important; responses better than 50ms were required. The control systems are used to automate operations within each of the three main process caves - two Magnox fuel decanners and an advanced gas-cooled reactor fuel dismantler. The fuel route within the fuel handling plant is illustrated and described. ASPIC (Automated Sequence Package for Industrial Control) which was developed as a controller for the plant processes is described. (U.K.)
Computational Methods for Large Spatio-temporal Datasets and Functional Data Ranking

KAUST Repository

Huang, Huang

2017-01-01

that are both computationally and statistically efficient. We explore the improvement of the approximation theoretically and investigate the performance by simulations. For real applications, we analyze a soil moisture dataset with 2 million measurements
New fuzzy support vector machine for the class imbalance problem in medical datasets classification.

Science.gov (United States)

Gu, Xiaoqing; Ni, Tongguang; Wang, Hongyuan

2014-01-01

In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification

Directory of Open Access Journals (Sweden)

Xiaoqing Gu

2014-01-01

Full Text Available In medical datasets classification, support vector machine (SVM is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM for the class imbalance problem (called FSVM-CIP is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP.
Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer

Directory of Open Access Journals (Sweden)

Peterlongo Pierre

2012-03-01

Full Text Available Abstract Background The analysis of next-generation sequencing data from large genomes is a timely research topic. Sequencers are producing billions of short sequence fragments from newly sequenced organisms. Computational methods for reconstructing whole genomes/transcriptomes (de novo assemblers are typically employed to process such data. However, these methods require large memory resources and computation time. Many basic biological questions could be answered targeting specific information in the reads, thus avoiding complete assembly. Results We present Mapsembler, an iterative micro and targeted assembler which processes large datasets of reads on commodity hardware. Mapsembler checks for the presence of given regions of interest that can be constructed from reads and builds a short assembly around it, either as a plain sequence or as a graph, showing contextual structure. We introduce new algorithms to retrieve approximate occurrences of a sequence from reads and construct an extension graph. Among other results presented in this paper, Mapsembler enabled to retrieve previously described human breast cancer candidate fusion genes, and to detect new ones not previously known. Conclusions Mapsembler is the first software that enables de novo discovery around a region of interest of repeats, SNPs, exon skipping, gene fusion, as well as other structural events, directly from raw sequencing reads. As indexing is localized, the memory footprint of Mapsembler is negligible. Mapsembler is released under the CeCILL license and can be freely downloaded from http://alcovna.genouest.org/mapsembler/.
Large-scale seismic signal analysis with Hadoop

Science.gov (United States)

Addair, T. G.; Dodge, D. A.; Walter, W. R.; Ruppert, S. D.

2014-05-01

In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated seismograms, and there is growing interest in building next-generation pipelines that employ correlation as a core part of their operation. In an effort to better understand the distribution and behavior of correlated seismic events, we have cross correlated a global dataset consisting of over 300 million seismograms. This was done using a conventional distributed cluster, and required 42 days. In anticipation of processing much larger datasets, we have re-architected the system to run as a series of MapReduce jobs on a Hadoop cluster. In doing so we achieved a factor of 19 performance increase on a test dataset. We found that fundamental algorithmic transformations were required to achieve the maximum performance increase. Whereas in the original IO-bound implementation, we went to great lengths to minimize IO, in the Hadoop implementation where IO is cheap, we were able to greatly increase the parallelism of our algorithms by performing a tiered series of very fine-grained (highly parallelizable) transformations on the data. Each of these MapReduce jobs required reading and writing large amounts of data. But, because IO is very fast, and because the fine-grained computations could be handled extremely quickly by the mappers, the net was a large performance gain.
CImbinator: a web-based tool for drug synergy analysis in small- and large-scale datasets.

Science.gov (United States)

Flobak, Åsmund; Vazquez, Miguel; Lægreid, Astrid; Valencia, Alfonso

2017-08-01

Drug synergies are sought to identify combinations of drugs particularly beneficial. User-friendly software solutions that can assist analysis of large-scale datasets are required. CImbinator is a web-service that can aid in batch-wise and in-depth analyzes of data from small-scale and large-scale drug combination screens. CImbinator offers to quantify drug combination effects, using both the commonly employed median effect equation, as well as advanced experimental mathematical models describing dose response relationships. CImbinator is written in Ruby and R. It uses the R package drc for advanced drug response modeling. CImbinator is available at http://cimbinator.bioinfo.cnio.es , the source-code is open and available at https://github.com/Rbbt-Workflows/combination_index . A Docker image is also available at https://hub.docker.com/r/mikisvaz/rbbt-ci_mbinator/ . asmund.flobak@ntnu.no or miguel.vazquez@cnio.es. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
X-ray computed tomography datasets for forensic analysis of vertebrate fossils

Science.gov (United States)

Rowe, Timothy B.; Luo, Zhe-Xi; Ketcham, Richard A.; Maisano, Jessica A.; Colbert, Matthew W.

2016-01-01

We describe X-ray computed tomography (CT) datasets from three specimens recovered from Early Cretaceous lakebeds of China that illustrate the forensic interpretation of CT imagery for paleontology. Fossil vertebrates from thinly bedded sediments often shatter upon discovery and are commonly repaired as amalgamated mosaics grouted to a solid backing slab of rock or plaster. Such methods are prone to inadvertent error and willful forgery, and once required potentially destructive methods to identify mistakes in reconstruction. CT is an efficient, nondestructive alternative that can disclose many clues about how a specimen was handled and repaired. These annotated datasets illustrate the power of CT in documenting specimen integrity and are intended as a reference in applying CT more broadly to evaluating the authenticity of comparable fossils. PMID:27272251
Handling Large and Complex Data in a Photovoltaic Research Institution Using a Custom Laboratory Information Management System

Energy Technology Data Exchange (ETDEWEB)

White, Robert R.; Munch, Kristin

2014-01-01

Twenty-five years ago the desktop computer started becoming ubiquitous in the scientific lab. Researchers were delighted with its ability to both control instrumentation and acquire data on a single system, but they were not completely satisfied. There were often gaps in knowledge that they thought might be gained if they just had more data and they could get the data faster. Computer technology has evolved in keeping with Moore’s Law meeting those desires; however those improvements have of late become both a boon and bane for researchers. Computers are now capable of producing high speed data streams containing terabytes of information; capabilities that evolved faster than envisioned last century. Software to handle large scientific data sets has not kept up. How much information might be lost through accidental mismanagement or how many discoveries are missed through data overload are now vital questions. An important new task in most scientific disciplines involves developing methods to address those issues and to create the software that can handle large data sets with an eye towards scalability. This software must create archived, indexed, and searchable data from heterogeneous instrumentation for the implementation of a strong data-driven materials development strategy. At the National Center for Photovoltaics in the National Renewable Energy Laboratory, we began development a few years ago on a Laboratory Information Management System (LIMS) designed to handle lab-wide scientific data acquisition, management, processing and mining needs for physics and materials science data, and with a specific focus towards future scalability for new equipment or research focuses. We will present the decisions, processes, and problems we went through while building our LIMS system for materials research, its current operational state and our steps for future development.
Stormbow: A Cloud-Based Tool for Reads Mapping and Expression Quantification in Large-Scale RNA-Seq Studies.

Science.gov (United States)

Zhao, Shanrong; Prenger, Kurt; Smith, Lance

2013-01-01

RNA-Seq is becoming a promising replacement to microarrays in transcriptome profiling and differential gene expression study. Technical improvements have decreased sequencing costs and, as a result, the size and number of RNA-Seq datasets have increased rapidly. However, the increasing volume of data from large-scale RNA-Seq studies poses a practical challenge for data analysis in a local environment. To meet this challenge, we developed Stormbow, a cloud-based software package, to process large volumes of RNA-Seq data in parallel. The performance of Stormbow has been tested by practically applying it to analyse 178 RNA-Seq samples in the cloud. In our test, it took 6 to 8 hours to process an RNA-Seq sample with 100 million reads, and the average cost was $3.50 per sample. Utilizing Amazon Web Services as the infrastructure for Stormbow allows us to easily scale up to handle large datasets with on-demand computational resources. Stormbow is a scalable, cost effective, and open-source based tool for large-scale RNA-Seq data analysis. Stormbow can be freely downloaded and can be used out of box to process Illumina RNA-Seq datasets.
Knowledge discovery in large model datasets in the marine environment: the THREDDS Data Server example

Directory of Open Access Journals (Sweden)

A. Bergamasco

2012-06-01

Full Text Available In order to monitor, describe and understand the marine environment, many research institutions are involved in the acquisition and distribution of ocean data, both from observations and models. Scientists from these institutions are spending too much time looking for, accessing, and reformatting data: they need better tools and procedures to make the science they do more efficient. The U.S. Integrated Ocean Observing System (US-IOOS is working on making large amounts of distributed data usable in an easy and efficient way. It is essentially a network of scientists, technicians and technologies designed to acquire, collect and disseminate observational and modelled data resulting from coastal and oceanic marine regions investigations to researchers, stakeholders and policy makers. In order to be successful, this effort requires standard data protocols, web services and standards-based tools. Starting from the US-IOOS approach, which is being adopted throughout much of the oceanographic and meteorological sectors, we describe here the CNR-ISMAR Venice experience in the direction of setting up a national Italian IOOS framework using the THREDDS (THematic Real-time Environmental Distributed Data Services Data Server (TDS, a middleware designed to fill the gap between data providers and data users. The TDS provides services that allow data users to find the data sets pertaining to their scientific needs, to access, to visualize and to use them in an easy way, without downloading files to the local workspace. In order to achieve this, it is necessary that the data providers make their data available in a standard form that the TDS understands, and with sufficient metadata to allow the data to be read and searched in a standard way. The core idea is then to utilize a Common Data Model (CDM, a unified conceptual model that describes different datatypes within each dataset. More specifically, Unidata (www.unidata.ucar.edu has developed CDM
A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

Science.gov (United States)

Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

2015-06-13

Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution.
Automatic processing of multimodal tomography datasets.

Science.gov (United States)

Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

2017-01-01

With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.
Remote handling at LAMPF

International Nuclear Information System (INIS)

Grisham, D.L.; Lambert, J.E.

1983-01-01

Experimental area A at the Clinton P. Anderson Meson Physics Facility (LAMPF) encompasses a large area. Presently there are four experimental target cells along the main proton beam line that have become highly radioactive, thus dictating that all maintenance be performed remotely. The Monitor remote handling system was developed to perform in situ maintenance at any location within area A. Due to the complexity of experimental systems and confined space, conventional remote handling methods based upon hot cell and/or hot bay concepts are not workable. Contrary to conventional remote handling which require special tooling for each specifically planned operation, the Monitor concept is aimed at providing a totally flexible system capable of remotely performing general mechanical and electrical maintenance operations using standard tools. The Monitor system is described
Negation handling in sentiment classification using rule-based adapted from Indonesian language syntactic for Indonesian text in Twitter

Science.gov (United States)

Amalia, Rizkiana; Arif Bijaksana, Moch; Darmantoro, Dhinta

2018-03-01

The presence of the word negation is able to change the polarity of the text if it is not handled properly it will affect the performance of the sentiment classification. Negation words in Indonesian are ‘tidak’, ‘bukan’, ‘belum’ and ‘jangan’. Also, there is a conjunction word that able to reverse the actual values, as the word ‘tetapi’, or ‘tapi’. Unigram has shortcomings in dealing with the existence of negation because it treats negation word and the negated words as separate words. A general approach for negation handling in English text gives the tag ‘NEG_’ for following words after negation until the first punctuation. But this may gives the tag to un-negated, and this approach does not handle negation and conjunction in one sentences. The rule-based method to determine what words negated by adapting the rules of Indonesian language syntactic of negation to determine the scope of negation was proposed in this study. With adapting syntactic rules and tagging “NEG_” using SVM classifier with RBF kernel has better performance results than the other experiments. Considering the average F1-score value, the performance of this proposed method can be improved against baseline equal to 1.79% (baseline without negation handling) and 5% (baseline with existing negation handling) for a dataset that all tweets contain negation words. And also for the second dataset that has the various number of negation words in document tweet. It can be improved against baseline at 2.69% (without negation handling) and 3.17% (with existing negation handling).
Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

Science.gov (United States)

Chegwidden, O.; Nijssen, B.; Pytlak, E.

2017-12-01

Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us
Handling missing values in the MDS-UPDRS.

Science.gov (United States)

Goetz, Christopher G; Luo, Sheng; Wang, Lu; Tilley, Barbara C; LaPelle, Nancy R; Stebbins, Glenn T

2015-10-01

This study was undertaken to define the number of missing values permissible to render valid total scores for each Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS) part. To handle missing values, imputation strategies serve as guidelines to reject an incomplete rating or create a surrogate score. We tested a rigorous, scale-specific, data-based approach to handling missing values for the MDS-UPDRS. From two large MDS-UPDRS datasets, we sequentially deleted item scores, either consistently (same items) or randomly (different items) across all subjects. Lin's Concordance Correlation Coefficient (CCC) compared scores calculated without missing values with prorated scores based on sequentially increasing missing values. The maximal number of missing values retaining a CCC greater than 0.95 determined the threshold for rendering a valid prorated score. A second confirmatory sample was selected from the MDS-UPDRS international translation program. To provide valid part scores applicable across all Hoehn and Yahr (H&Y) stages when the same items are consistently missing, one missing item from Part I, one from Part II, three from Part III, but none from Part IV can be allowed. To provide valid part scores applicable across all H&Y stages when random item entries are missing, one missing item from Part I, two from Part II, seven from Part III, but none from Part IV can be allowed. All cutoff values were confirmed in the validation sample. These analyses are useful for constructing valid surrogate part scores for MDS-UPDRS when missing items fall within the identified threshold and give scientific justification for rejecting partially completed ratings that fall below the threshold. © 2015 International Parkinson and Movement Disorder Society.
FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web.

Science.gov (United States)

Probst, Daniel; Reymond, Jean-Louis

2018-04-15

During the past decade, big data have become a major tool in scientific endeavors. Although statistical methods and algorithms are well-suited for analyzing and summarizing enormous amounts of data, the results do not allow for a visual inspection of the entire data. Current scientific software, including R packages and Python libraries such as ggplot2, matplotlib and plot.ly, do not support interactive visualizations of datasets exceeding 100 000 data points on the web. Other solutions enable the web-based visualization of big data only through data reduction or statistical representations. However, recent hardware developments, especially advancements in graphical processing units, allow for the rendering of millions of data points on a wide range of consumer hardware such as laptops, tablets and mobile phones. Similar to the challenges and opportunities brought to virtually every scientific field by big data, both the visualization of and interaction with copious amounts of data are both demanding and hold great promise. Here we present FUn, a framework consisting of a client (Faerun) and server (Underdark) module, facilitating the creation of web-based, interactive 3D visualizations of large datasets, enabling record level visual inspection. We also introduce a reference implementation providing access to SureChEMBL, a database containing patent information on more than 17 million chemical compounds. The source code and the most recent builds of Faerun and Underdark, Lore.js and the data preprocessing toolchain used in the reference implementation, are available on the project website (http://doc.gdb.tools/fun/). daniel.probst@dcb.unibe.ch or jean-louis.reymond@dcb.unibe.ch.
Medical Dataset Classification: A Machine Learning Paradigm Integrating Particle Swarm Optimization with Extreme Learning Machine Classifier

Directory of Open Access Journals (Sweden)

C. V. Subbulakshmi

2015-01-01

Full Text Available Medical data classification is a prime data mining problem being discussed about for a decade that has attracted several researchers around the world. Most classifiers are designed so as to learn from the data itself using a training process, because complete expert knowledge to determine classifier parameters is impracticable. This paper proposes a hybrid methodology based on machine learning paradigm. This paradigm integrates the successful exploration mechanism called self-regulated learning capability of the particle swarm optimization (PSO algorithm with the extreme learning machine (ELM classifier. As a recent off-line learning method, ELM is a single-hidden layer feedforward neural network (FFNN, proved to be an excellent classifier with large number of hidden layer neurons. In this research, PSO is used to determine the optimum set of parameters for the ELM, thus reducing the number of hidden layer neurons, and it further improves the network generalization performance. The proposed method is experimented on five benchmarked datasets of the UCI Machine Learning Repository for handling medical dataset classification. Simulation results show that the proposed approach is able to achieve good generalization performance, compared to the results of other classifiers.
Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

NARCIS (Netherlands)

Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

2017-01-01

Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

Installation and method for handling fuel assemblies of fast nuclear reactors

International Nuclear Information System (INIS)

Aubert, Michel; Renaux, Charley.

1982-01-01

This invention concerns an installation and a method for handling the assemblies which makes it possible to have a large revolving plug smaller in diameter than that of the presently known solutions. This large, coaxial to the core, revolving plug has a handling arm enabling a fraction of the assemblies to be reached and deposited in a handling well. Through a small offset revolving plug the remainder of the assemblies can be reached and deposited in a pick-up well accessible to the arm of the large revolving plug [fr
Blanket handling concepts for future fusion power plants

International Nuclear Information System (INIS)

Bogusch, E.; Gottfried, R.; Maisonnier, D.

2003-01-01

In the frame of the power plant conceptual studies (PPCS) launched by the European Commission, two main blanket handling concepts have been investigated with respect to engineering feasibility and the impact on the plant availability and on cost: the large module handling concept (LMHC) and the large sector handling concept (LSHC). The LMHC has been considered as the reference handling concept while the LSHC has been considered as an attractive alternative to the LMHC due to its potential of smaller replacement times and hence increasing the plant availability. Although no principle feasibility issue has been identified, a number of engineering issues have been highlighted for the LSHC that would require considerable efforts for their resolution. Since its availability of about 77% based on a replacement time for all the internals of about 4.2 months is slightly lower than for the LMHC, the LMHC remains the reference blanket replacement concept for a conceptual reactor
Framework for Interactive Parallel Dataset Analysis on the Grid

Energy Technology Data Exchange (ETDEWEB)

Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

2007-01-10

We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.
Impacts of a lengthening open water season on Alaskan coastal communities: deriving locally relevant indices from large-scale datasets and community observations

Science.gov (United States)

Rolph, Rebecca J.; Mahoney, Andrew R.; Walsh, John; Loring, Philip A.

2018-05-01

Using thresholds of physical climate variables developed from community observations, together with two large-scale datasets, we have produced local indices directly relevant to the impacts of a reduced sea ice cover on Alaska coastal communities. The indices include the number of false freeze-ups defined by transient exceedances of ice concentration prior to a corresponding exceedance that persists, false break-ups, timing of freeze-up and break-up, length of the open water duration, number of days when the winds preclude hunting via boat (wind speed threshold exceedances), the number of wind events conducive to geomorphological work or damage to infrastructure from ocean waves, and the number of these wind events with on- and along-shore components promoting water setup along the coastline. We demonstrate how community observations can inform use of large-scale datasets to derive these locally relevant indices. The two primary large-scale datasets are the Historical Sea Ice Atlas for Alaska and the atmospheric output from a regional climate model used to downscale the ERA-Interim atmospheric reanalysis. We illustrate the variability and trends of these indices by application to the rural Alaska communities of Kotzebue, Shishmaref, and Utqiaġvik (previously Barrow), although the same procedure and metrics can be applied to other coastal communities. Over the 1979-2014 time period, there has been a marked increase in the number of combined false freeze-ups and false break-ups as well as the number of days too windy for hunting via boat for all three communities, especially Utqiaġvik. At Utqiaġvik, there has been an approximate tripling of the number of wind events conducive to coastline erosion from 1979 to 2014. We have also found a delay in freeze-up and earlier break-up, leading to a lengthened open water period for all of the communities examined.
Impacts of a lengthening open water season on Alaskan coastal communities: deriving locally relevant indices from large-scale datasets and community observations

Directory of Open Access Journals (Sweden)

R. J. Rolph

2018-05-01

Full Text Available Using thresholds of physical climate variables developed from community observations, together with two large-scale datasets, we have produced local indices directly relevant to the impacts of a reduced sea ice cover on Alaska coastal communities. The indices include the number of false freeze-ups defined by transient exceedances of ice concentration prior to a corresponding exceedance that persists, false break-ups, timing of freeze-up and break-up, length of the open water duration, number of days when the winds preclude hunting via boat (wind speed threshold exceedances, the number of wind events conducive to geomorphological work or damage to infrastructure from ocean waves, and the number of these wind events with on- and along-shore components promoting water setup along the coastline. We demonstrate how community observations can inform use of large-scale datasets to derive these locally relevant indices. The two primary large-scale datasets are the Historical Sea Ice Atlas for Alaska and the atmospheric output from a regional climate model used to downscale the ERA-Interim atmospheric reanalysis. We illustrate the variability and trends of these indices by application to the rural Alaska communities of Kotzebue, Shishmaref, and Utqiaġvik (previously Barrow, although the same procedure and metrics can be applied to other coastal communities. Over the 1979–2014 time period, there has been a marked increase in the number of combined false freeze-ups and false break-ups as well as the number of days too windy for hunting via boat for all three communities, especially Utqiaġvik. At Utqiaġvik, there has been an approximate tripling of the number of wind events conducive to coastline erosion from 1979 to 2014. We have also found a delay in freeze-up and earlier break-up, leading to a lengthened open water period for all of the communities examined.
Testing AGN unification via inference from large catalogs

Science.gov (United States)

Nikutta, Robert; Ivezic, Zeljko; Elitzur, Moshe; Nenkova, Maia

2018-01-01

Source orientation and clumpiness of the central dust are the main factors in AGN classification. Type-1 QSOs are easy to observe and large samples are available (e.g. in SDSS), but obscured type-2 AGN are dimmer and redder as our line of sight is more obscured, making it difficult to obtain a complete sample. WISE has found up to a million QSOs. With only 4 bands and a relatively small aperture the analysis of individual sources is challenging, but the large sample allows inference of bulk properties at a very significant level.CLUMPY (www.clumpy.org) is arguably the most popular database of AGN torus SEDs. We model the ensemble properties of the entire WISE AGN content using regularized linear regression, with orientation-dependent CLUMPY color-color-magnitude (CCM) tracks as basis functions. We can reproduce the observed number counts per CCM bin with percent-level accuracy, and simultaneously infer the probability distributions of all torus parameters, redshifts, additional SED components, and identify type-1/2 AGN populations through their IR properties alone. We increase the statistical power of our AGN unification tests even further, by adding other datasets as axes in the regression problem. To this end, we make use of the NOAO Data Lab (datalab.noao.edu), which hosts several high-level large datasets and provides very powerful tools for handling large data, e.g. cross-matched catalogs, fast remote queries, etc.
Scalable and portable visualization of large atomistic datasets

Science.gov (United States)

Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

2004-10-01

A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data
Regulations on handling dangerous objects in Japan (with particular reference to sodium)

International Nuclear Information System (INIS)

Nagai, M.

1971-01-01

Sodium is designated as a kind of dangerous object, so that special care has to be taken in handling or storing large amounts of sodium. Formal regulations on sodium handling in Japan are prescribed in Fire Service Law, which is supplemented by Rules on Handling Dangerous Objects. Since these regulations are not intended to be applied to large sodium circuits, some defects and inappropriate expressions might be found in them. An attempt is made here to pick up these problems and important points from Japanese regulations on handling dangerous objects with particular reference to sodium
A dataset of human decision-making in teamwork management

Science.gov (United States)

Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

2017-01-01

Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.
Bionic design methodology for wear reduction of bulk solids handling equipment

NARCIS (Netherlands)

Chen, G.; Schott, D.L.; Lodewijks, G.

2016-01-01

Large-scale handling of particulate solids can cause severe wear on bulk solids handling equipment surfaces. Wear reduces equipment life span and increases maintenance cost. Examples of traditional methods to reduce wear of bulk solids handling equipment include optimizing transport operations
Remote handling for an ISIS target change

International Nuclear Information System (INIS)

Broome, T.A.; Holding, M.

1989-01-01

During 1987 two ISIS targets were changed. This document describes the main features of the remote handling aspects of the work. All the work has to be carried out using remote handling techniques. The radiation level measured on the surface of the reflector when the second target had been removed was about 800 mGy/h demonstrating that hands on operations on any part of the target reflector moderator assembly is not practical. The target changes were the first large scale operations in the Target Station Remote Handling Cell and a great deal was learned about both equipment and working practices. Some general principles emerged which are applicable to other active handling tasks on facilities like ISIS and these are discussed below. 8 figs
Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

Science.gov (United States)

Sorokine, A.; Stewart, R. N.

2017-10-01

Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Toward computational cumulative biology by combining models of biological datasets.

Science.gov (United States)

Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

2014-01-01

A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.
The Amateurs' Love Affair with Large Datasets

Science.gov (United States)

Price, Aaron; Jacoby, S. H.; Henden, A.

2006-12-01

Amateur astronomers are professionals in other areas. They bring expertise from such varied and technical careers as computer science, mathematics, engineering, and marketing. These skills, coupled with an enthusiasm for astronomy, can be used to help manage the large data sets coming online in the next decade. We will show specific examples where teams of amateurs have been involved in mining large, online data sets and have authored and published their own papers in peer-reviewed astronomical journals. Using the proposed LSST database as an example, we will outline a framework for involving amateurs in data analysis and education with large astronomical surveys.
CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

Directory of Open Access Journals (Sweden)

K.-G. Karlsson

2013-05-01

Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.
Processing large remote sensing image data sets on Beowulf clusters

Science.gov (United States)

Steinwand, Daniel R.; Maddox, Brian; Beckmann, Tim; Schmidt, Gail

2003-01-01

High-performance computing is often concerned with the speed at which floating- point calculations can be performed. The architectures of many parallel computers and/or their network topologies are based on these investigations. Often, benchmarks resulting from these investigations are compiled with little regard to how a large dataset would move about in these systems. This part of the Beowulf study addresses that concern by looking at specific applications software and system-level modifications. Applications include an implementation of a smoothing filter for time-series data, a parallel implementation of the decision tree algorithm used in the Landcover Characterization project, a parallel Kriging algorithm used to fit point data collected in the field on invasive species to a regular grid, and modifications to the Beowulf project's resampling algorithm to handle larger, higher resolution datasets at a national scale. Systems-level investigations include a feasibility study on Flat Neighborhood Networks and modifications of that concept with Parallel File Systems.
Development of tritium-handling technique

International Nuclear Information System (INIS)

Ohmura, Hiroshi; Hosaka, Akio; Okamoto, Takahumi

1988-01-01

The overview of developing activities for tritium-handling techniques in IHI are presented. To establish a fusion power plant, tritium handling is one of the key technologies. Recently in JAERI, conceptual design of FER (Fusion Experimental Reactor) has been carried out, and the FER system requires a processing system for a large amount of tritium. IHI concentrate on investigation of fuel gas purification, isotope separation and storage systems under contract with Toshiba Corporation. Design results of the systems and each components are reviewed. IHI has been developing fundamental handling techniques which are the ZrNi bed for hydrogen isotope storage and isotope separation by laser. The ZrNi bed with a tritium storage capacity of 1000 Ci has been constructed and recovery capability of the hydrogen isotope until 10 -4 Torr {0.013 Pa} was confirmed. In laser isotope separation, the optimum laser wave length has been determined. (author)
Building and calibrating a large-extent and high resolution coupled groundwater-land surface model using globally available data-sets

Science.gov (United States)

Sutanudjaja, E. H.; Van Beek, L. P.; de Jong, S. M.; van Geer, F.; Bierkens, M. F.

2012-12-01

The current generation of large-scale hydrological models generally lacks a groundwater model component simulating lateral groundwater flow. Large-scale groundwater models are rare due to a lack of hydro-geological data required for their parameterization and a lack of groundwater head data required for their calibration. In this study, we propose an approach to develop a large-extent fully-coupled land surface-groundwater model by using globally available datasets and calibrate it using a combination of discharge observations and remotely-sensed soil moisture data. The underlying objective is to devise a collection of methods that enables one to build and parameterize large-scale groundwater models in data-poor regions. The model used, PCR-GLOBWB-MOD, has a spatial resolution of 1 km x 1 km and operates on a daily basis. It consists of a single-layer MODFLOW groundwater model that is dynamically coupled to the PCR-GLOBWB land surface model. This fully-coupled model accommodates two-way interactions between surface water levels and groundwater head dynamics, as well as between upper soil moisture states and groundwater levels, including a capillary rise mechanism to sustain upper soil storage and thus to fulfill high evaporation demands (during dry conditions). As a test bed, we used the Rhine-Meuse basin, where more than 4000 groundwater head time series have been collected for validation purposes. The model was parameterized using globally available data-sets on surface elevation, drainage direction, land-cover, soil and lithology. Next, the model was calibrated using a brute force approach and massive parallel computing, i.e. by running the coupled groundwater-land surface model for more than 3000 different parameter sets. Here, we varied minimal soil moisture storage and saturated conductivities of the soil layers as well as aquifer transmissivities. Using different regularization strategies and calibration criteria we compared three calibration scenarios
A scalable method for identifying frequent subtrees in sets of large phylogenetic trees.

Science.gov (United States)

Ramu, Avinash; Kahveci, Tamer; Burleigh, J Gordon

2012-10-03

We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection of phylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100 taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Our method works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of input trees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to build larger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find a frequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate that this heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyond current methods. Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST in all of our synthetic datasets where we could verify the correctness of the result. It also performed well on large empirical data sets. Its performance is robust to the number and size of the input trees. Overall, this method provides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
A Bayesian spatio-temporal geostatistical model with an auxiliary lattice for large datasets

KAUST Repository

Xu, Ganggang; Liang, Faming; Genton, Marc G.

2015-01-01

method is not only able to handle irregularly spaced observations in the spatial domain, but it is also able to bypass the missing data problem in a spatio-temporal process. Because the computational complexity of the proposed Markov chain Monte Carlo

The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

Science.gov (United States)

Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

2012-03-15

Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
SPATIO-TEMPORAL DATA MODEL FOR INTEGRATING EVOLVING NATION-LEVEL DATASETS

Directory of Open Access Journals (Sweden)

A. Sorokine

2017-10-01

Full Text Available Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc. and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets. Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.
Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

Science.gov (United States)

Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

2018-03-01

Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.
How to handle spatial heterogeneity in hydrological models.

Science.gov (United States)

Loritz, Ralf; Neuper, Malte; Gupta, Hoshin; Zehe, Erwin

2017-04-01

The amount of data we observe in our environmental systems is larger than ever. This leads to a new kind of problem where hydrological modelers can have access to large datasets with various quantitative and qualitative observations but are uncertain about the information content with respect to the hydrological functioning of a landscape. For example digital elevation models obviously contain plenty of information about the topography of a landscape; however the question of relevance for Hydrology is how much of this information is important for the hydrological functioning of a landscape. This kind of question is not limited to topography and we can ask similar questions when handling distributed rainfall data or geophysical images. In this study we would like to show how one can separate dominant patterns in the landscape from idiosyncratic system details. We use a 2D numerical hillslope model in combination with an extensive research data set to test a variety of different model setups that are built upon different landscape characteristics and run by different rainfalls measurements. With the help of information theory based measures we can identify and learn how much heterogeneity is really necessary for successful hydrological simulations and how much of it we can neglect.
Large-scale groundwater modeling using global datasets: a test case for the Rhine-Meuse basin

Directory of Open Access Journals (Sweden)

E. H. Sutanudjaja

2011-09-01

Full Text Available The current generation of large-scale hydrological models does not include a groundwater flow component. Large-scale groundwater models, involving aquifers and basins of multiple countries, are still rare mainly due to a lack of hydro-geological data which are usually only available in developed countries. In this study, we propose a novel approach to construct large-scale groundwater models by using global datasets that are readily available. As the test-bed, we use the combined Rhine-Meuse basin that contains groundwater head data used to verify the model output. We start by building a distributed land surface model (30 arc-second resolution to estimate groundwater recharge and river discharge. Subsequently, a MODFLOW transient groundwater model is built and forced by the recharge and surface water levels calculated by the land surface model. Results are promising despite the fact that we still use an offline procedure to couple the land surface and MODFLOW groundwater models (i.e. the simulations of both models are separately performed. The simulated river discharges compare well to the observations. Moreover, based on our sensitivity analysis, in which we run several groundwater model scenarios with various hydro-geological parameter settings, we observe that the model can reasonably well reproduce the observed groundwater head time series. However, we note that there are still some limitations in the current approach, specifically because the offline-coupling technique simplifies the dynamic feedbacks between surface water levels and groundwater heads, and between soil moisture states and groundwater heads. Also the current sensitivity analysis ignores the uncertainty of the land surface model output. Despite these limitations, we argue that the results of the current model show a promise for large-scale groundwater modeling practices, including for data-poor environments and at the global scale.
A Look at Technologies Vis-a-vis Information Handling Techniques.

Science.gov (United States)

Swanson, Rowena W.

The paper examines several ideas for information handling implemented with new technologies that suggest directions for future development. These are grouped under the topic headings: Handling Large Data Banks, Providing Personalized Information Packages, Providing Information Specialist Services, and Expanding Man-Machine Interaction. Guides in…
EasyPCC: Benchmark Datasets and Tools for High-Throughput Measurement of the Plant Canopy Coverage Ratio under Field Conditions

Directory of Open Access Journals (Sweden)

Wei Guo

2017-04-01

Full Text Available Understanding interactions of genotype, environment, and management under field conditions is vital for selecting new cultivars and farming systems. Image analysis is considered a robust technique in high-throughput phenotyping with non-destructive sampling. However, analysis of digital field-derived images remains challenging because of the variety of light intensities, growth environments, and developmental stages. The plant canopy coverage (PCC ratio is an important index of crop growth and development. Here, we present a tool, EasyPCC, for effective and accurate evaluation of the ground coverage ratio from a large number of images under variable field conditions. The core algorithm of EasyPCC is based on a pixel-based segmentation method using a decision-tree-based segmentation model (DTSM. EasyPCC was developed under the MATLAB® and R languages; thus, it could be implemented in high-performance computing to handle large numbers of images following just a single model training process. This study used an experimental set of images from a paddy field to demonstrate EasyPCC, and to show the accuracy improvement possible by adjusting key points (e.g., outlier deletion and model retraining. The accuracy (R2 = 0.99 of the calculated coverage ratio was validated against a corresponding benchmark dataset. The EasyPCC source code is released under GPL license with benchmark datasets of several different crop types for algorithm development and for evaluating ground coverage ratios.
Classification of Large-Scale Remote Sensing Images for Automatic Identification of Health Hazards: Smoke Detection Using an Autologistic Regression Classifier.

Science.gov (United States)

Wolters, Mark A; Dean, C B

2017-01-01

Remote sensing images from Earth-orbiting satellites are a potentially rich data source for monitoring and cataloguing atmospheric health hazards that cover large geographic regions. A method is proposed for classifying such images into hazard and nonhazard regions using the autologistic regression model, which may be viewed as a spatial extension of logistic regression. The method includes a novel and simple approach to parameter estimation that makes it well suited to handling the large and high-dimensional datasets arising from satellite-borne instruments. The methodology is demonstrated on both simulated images and a real application to the identification of forest fire smoke.
The LANDFIRE Refresh strategy: updating the national dataset

Science.gov (United States)

Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

2013-01-01

The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.
Exact fast computation of band depth for large functional datasets: How quickly can one million curves be ranked?

KAUST Repository

Sun, Ying

2012-10-01

© 2012 John Wiley & Sons, Ltd. Band depth is an important nonparametric measure that generalizes order statistics and makes univariate methods based on order statistics possible for functional data. However, the computational burden of band depth limits its applicability when large functional or image datasets are considered. This paper proposes an exact fast method to speed up the band depth computation when bands are defined by two curves. Remarkable computational gains are demonstrated through simulation studies comparing our proposal with the original computation and one existing approximate method. For example, we report an experiment where our method can rank one million curves, evaluated at fifty time points each, in 12.4 seconds with Matlab.
Omicseq: a web-based search engine for exploring omics datasets

Science.gov (United States)

Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

2017-01-01

Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462
Extracting Prior Distributions from a Large Dataset of In-Situ Measurements to Support SWOT-based Estimation of River Discharge

Science.gov (United States)

Hagemann, M.; Gleason, C. J.

2017-12-01

The upcoming (2021) Surface Water and Ocean Topography (SWOT) NASA satellite mission aims, in part, to estimate discharge on major rivers worldwide using reach-scale measurements of stream width, slope, and height. Current formalizations of channel and floodplain hydraulics are insufficient to fully constrain this problem mathematically, resulting in an infinitely large solution set for any set of satellite observations. Recent work has reformulated this problem in a Bayesian statistical setting, in which the likelihood distributions derive directly from hydraulic flow-law equations. When coupled with prior distributions on unknown flow-law parameters, this formulation probabilistically constrains the parameter space, and results in a computationally tractable description of discharge. Using a curated dataset of over 200,000 in-situ acoustic Doppler current profiler (ADCP) discharge measurements from over 10,000 USGS gaging stations throughout the United States, we developed empirical prior distributions for flow-law parameters that are not observable by SWOT, but that are required in order to estimate discharge. This analysis quantified prior uncertainties on quantities including cross-sectional area, at-a-station hydraulic geometry width exponent, and discharge variability, that are dependent on SWOT-observable variables including reach-scale statistics of width and height. When compared against discharge estimation approaches that do not use this prior information, the Bayesian approach using ADCP-derived priors demonstrated consistently improved performance across a range of performance metrics. This Bayesian approach formally transfers information from in-situ gaging stations to remote-sensed estimation of discharge, in which the desired quantities are not directly observable. Further investigation using large in-situ datasets is therefore a promising way forward in improving satellite-based estimates of river discharge.
Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

Science.gov (United States)

Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

2017-04-01

CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the
Fuel handling machine and auxiliary systems for a fuel handling cell

International Nuclear Information System (INIS)

Suikki, M.

2013-10-01

This working report is an update for as well as a supplement to an earlier fuel handling machine design (Kukkola and Roennqvist 2006). A focus in the earlier design proposal was primarily on the selection of a mechanical structure and operating principle for the fuel handling machine. This report introduces not only a fuel handling machine design but also auxiliary fuel handling cell equipment and its operation. An objective of the design work was to verify the operating principles of and space allocations for fuel handling cell equipment. The fuel handling machine is a remote controlled apparatus capable of handling intensely radiating fuel assemblies in the fuel handling cell of an encapsulation plant. The fuel handling cell is air tight space radiation-shielded with massive concrete walls. The fuel handling machine is based on a bridge crane capable of traveling in the handling cell along wall tracks. The bridge crane has its carriage provided with a carousel type turntable having mounted thereon both fixed and telescopic masts. The fixed mast has a gripper movable on linear guides for the transfer of fuel assemblies. The telescopic mast has a manipulator arm capable of maneuvering equipment present in the fuel handling cell, as well as conducting necessary maintenance and cleaning operations or rectifying possible fault conditions. The auxiliary fuel handling cell systems consist of several subsystems. The subsystems include a service manipulator, a tool carrier for manipulators, a material hatch, assisting winches, a vacuum cleaner, as well as a hose reel. With the exception of the vacuum cleaner, the devices included in the fuel handling cell's auxiliary system are only used when the actual encapsulation process is not ongoing. The malfunctions of mechanisms or actuators responsible for the motion actions of a fuel handling machine preclude in a worst case scenario the bringing of the fuel handling cell and related systems to a condition appropriate for
Fuel handling machine and auxiliary systems for a fuel handling cell

Energy Technology Data Exchange (ETDEWEB)

Suikki, M. [Optimik Oy, Turku (Finland)

2013-10-15

This working report is an update for as well as a supplement to an earlier fuel handling machine design (Kukkola and Roennqvist 2006). A focus in the earlier design proposal was primarily on the selection of a mechanical structure and operating principle for the fuel handling machine. This report introduces not only a fuel handling machine design but also auxiliary fuel handling cell equipment and its operation. An objective of the design work was to verify the operating principles of and space allocations for fuel handling cell equipment. The fuel handling machine is a remote controlled apparatus capable of handling intensely radiating fuel assemblies in the fuel handling cell of an encapsulation plant. The fuel handling cell is air tight space radiation-shielded with massive concrete walls. The fuel handling machine is based on a bridge crane capable of traveling in the handling cell along wall tracks. The bridge crane has its carriage provided with a carousel type turntable having mounted thereon both fixed and telescopic masts. The fixed mast has a gripper movable on linear guides for the transfer of fuel assemblies. The telescopic mast has a manipulator arm capable of maneuvering equipment present in the fuel handling cell, as well as conducting necessary maintenance and cleaning operations or rectifying possible fault conditions. The auxiliary fuel handling cell systems consist of several subsystems. The subsystems include a service manipulator, a tool carrier for manipulators, a material hatch, assisting winches, a vacuum cleaner, as well as a hose reel. With the exception of the vacuum cleaner, the devices included in the fuel handling cell's auxiliary system are only used when the actual encapsulation process is not ongoing. The malfunctions of mechanisms or actuators responsible for the motion actions of a fuel handling machine preclude in a worst case scenario the bringing of the fuel handling cell and related systems to a condition appropriate for
Learning visual balance from large-scale datasets of aesthetically highly rated images

Science.gov (United States)

Jahanian, Ali; Vishwanathan, S. V. N.; Allebach, Jan P.

2015-03-01

The concept of visual balance is innate for humans, and influences how we perceive visual aesthetics and cognize harmony. Although visual balance is a vital principle of design and taught in schools of designs, it is barely quantified. On the other hand, with emergence of automantic/semi-automatic visual designs for self-publishing, learning visual balance and computationally modeling it, may escalate aesthetics of such designs. In this paper, we present how questing for understanding visual balance inspired us to revisit one of the well-known theories in visual arts, the so called theory of "visual rightness", elucidated by Arnheim. We define Arnheim's hypothesis as a design mining problem with the goal of learning visual balance from work of professionals. We collected a dataset of 120K images that are aesthetically highly rated, from a professional photography website. We then computed factors that contribute to visual balance based on the notion of visual saliency. We fitted a mixture of Gaussians to the saliency maps of the images, and obtained the hotspots of the images. Our inferred Gaussians align with Arnheim's hotspots, and confirm his theory. Moreover, the results support the viability of the center of mass, symmetry, as well as the Rule of Thirds in our dataset.
Test sample handling apparatus

International Nuclear Information System (INIS)

1981-01-01

A test sample handling apparatus using automatic scintillation counting for gamma detection, for use in such fields as radioimmunoassay, is described. The apparatus automatically and continuously counts large numbers of samples rapidly and efficiently by the simultaneous counting of two samples. By means of sequential ordering of non-sequential counting data, it is possible to obtain precisely ordered data while utilizing sample carrier holders having a minimum length. (U.K.)
MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets.

Science.gov (United States)

Lemieux, Sebastien; Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J; Mader, Sylvie; Sauvageau, Guy

2017-07-27

Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Sharing Video Datasets in Design Research

DEFF Research Database (Denmark)

Christensen, Bo; Abildgaard, Sille Julie Jøhnk

2017-01-01

This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...
Omicseq: a web-based search engine for exploring omics datasets.

Science.gov (United States)

Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

2017-07-03

The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

A robust dataset-agnostic heart disease classifier from Phonocardiogram.

Science.gov (United States)

Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

2017-07-01

Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.
A Comparative Analysis of Classification Algorithms on Diverse Datasets

Directory of Open Access Journals (Sweden)

M. Alghobiri

2018-04-01

Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.
How Does Bug-Handling Effort Differ Among Different Programming Languages?

OpenAIRE

Zhang, Jie; Li, Feng; Hao, Dan; Wang, Meng; Zhang, Lu

2018-01-01

Handling bugs is an essential part of software development. The impact of programming language on this task has long been a topic of much debate. For example, some people hold the view that bugs in Python are easy to handle because its code is easy to read and understand, while some others believe the absence of static typing in Python will lead to higher bug-handling effort. This paper presents the first large-scale study to investigate whether the ecosystems of different (categories of) pro...
Prediction of Canopy Heights over a Large Region Using Heterogeneous Lidar Datasets: Efficacy and Challenges

Directory of Open Access Journals (Sweden)

Ranjith Gopalakrishnan

2015-08-01

Full Text Available Generating accurate and unbiased wall-to-wall canopy height maps from airborne lidar data for large regions is useful to forest scientists and natural resource managers. However, mapping large areas often involves using lidar data from different projects, with varying acquisition parameters. In this work, we address the important question of whether one can accurately model canopy heights over large areas of the Southeastern US using a very heterogeneous dataset of small-footprint, discrete-return airborne lidar data (with 76 separate lidar projects. A unique aspect of this effort is the use of nationally uniform and extensive field data (~1800 forested plots from the Forest Inventory and Analysis (FIA program of the US Forest Service. Preliminary results are quite promising: Over all lidar projects, we observe a good correlation between the 85th percentile of lidar heights and field-measured height (r = 0.85. We construct a linear regression model to predict subplot-level dominant tree heights from distributional lidar metrics (R2 = 0.74, RMSE = 3.0 m, n = 1755. We also identify and quantify the importance of several factors (like heterogeneity of vegetation, point density, the predominance of hardwoods or softwoods, the average height of the forest stand, slope of the plot, and average scan angle of lidar acquisition that influence the efficacy of predicting canopy heights from lidar data. For example, a subset of plots (coefficient of variation of vegetation heights <0.2 significantly reduces the RMSE of our model from 3.0–2.4 m (~20% reduction. We conclude that when all these elements are factored into consideration, combining data from disparate lidar projects does not preclude robust estimation of canopy heights.
On sample size and different interpretations of snow stability datasets

Science.gov (United States)

Schirmer, M.; Mitterer, C.; Schweizer, J.

2009-04-01

Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar
Contribution of Road Grade to the Energy Use of Modern Automobiles Across Large Datasets of Real-World Drive Cycles: Preprint

Energy Technology Data Exchange (ETDEWEB)

Wood, E.; Burton, E.; Duran, A.; Gonder, J.

2014-01-01

Understanding the real-world power demand of modern automobiles is of critical importance to engineers using modeling and simulation to inform the intelligent design of increasingly efficient powertrains. Increased use of global positioning system (GPS) devices has made large scale data collection of vehicle speed (and associated power demand) a reality. While the availability of real-world GPS data has improved the industry's understanding of in-use vehicle power demand, relatively little attention has been paid to the incremental power requirements imposed by road grade. This analysis quantifies the incremental efficiency impacts of real-world road grade by appending high fidelity elevation profiles to GPS speed traces and performing a large simulation study. Employing a large real-world dataset from the National Renewable Energy Laboratory's Transportation Secure Data Center, vehicle powertrain simulations are performed with and without road grade under five vehicle models. Aggregate results of this study suggest that road grade could be responsible for 1% to 3% of fuel use in light-duty automobiles.
EPA Nanorelease Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....
Homogenised Australian climate datasets used for climate change monitoring

International Nuclear Information System (INIS)

Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

2007-01-01

Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal
Heuristics for Relevancy Ranking of Earth Dataset Search Results

Science.gov (United States)

Lynnes, Christopher; Quinn, Patrick; Norton, James

2016-01-01

As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.
Comparison of Shallow Survey 2012 Multibeam Datasets

Science.gov (United States)

Ramirez, T. M.

2012-12-01

The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and
Classification-based comparison of pre-processing methods for interpretation of mass spectrometry generated clinical datasets

Directory of Open Access Journals (Sweden)

Hoefsloot Huub CJ

2009-05-01

Full Text Available Abstract Background Mass spectrometry is increasingly being used to discover proteins or protein profiles associated with disease. Experimental design of mass-spectrometry studies has come under close scrutiny and the importance of strict protocols for sample collection is now understood. However, the question of how best to process the large quantities of data generated is still unanswered. Main challenges for the analysis are the choice of proper pre-processing and classification methods. While these two issues have been investigated in isolation, we propose to use the classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Results Two in-house generated clinical SELDI-TOF MS datasets are used in this study as an example of high throughput mass-spectrometry data. We perform a systematic comparison of two commonly used pre-processing methods as implemented in Ciphergen ProteinChip Software and in the Cromwell package. With respect to reproducibility, Ciphergen and Cromwell pre-processing are largely comparable. We find that the overlap between peaks detected by either Ciphergen ProteinChip Software or Cromwell is large. This is especially the case for the more stringent peak detection settings. Moreover, similarity of the estimated intensities between matched peaks is high. We evaluate the pre-processing methods using five different classification methods. Classification is done in a double cross-validation protocol using repeated random sampling to obtain an unbiased estimate of classification accuracy. No pre-processing method significantly outperforms the other for all peak detection settings evaluated. Conclusion We use classification of patient samples as a clinically relevant benchmark for the evaluation of pre-processing methods. Both pre-processing methods lead to similar classification results on an ovarian cancer and a Gaucher disease dataset. However, the settings for pre
Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

Science.gov (United States)

Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

2014-01-01

SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111
Process mining in oncology using the MIMIC-III dataset

Science.gov (United States)

Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

2018-03-01

Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.
Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

Science.gov (United States)

Williamson, A.; Newman, A. V.

2017-12-01

Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.
The LHCb Data Management System

International Nuclear Information System (INIS)

Baud, J P; Charpentier, Ph; Ciba, K; Lanciotti, E; Màthè, Z; Graciani, R; Remenska, D; Santana, R

2012-01-01

The LHCb Data Management System is based on the DIRAC Grid Community Solution. LHCbDirac provides extensions to the basic DMS such as a Bookkeeping System. Datasets are defined as sets of files corresponding to a given query in the Bookkeeping system. Datasets can be manipulated by CLI tools as well as by automatic transformations (removal, replication, processing). A dynamic handling of dataset replication is performed, based on disk space usage at the sites and dataset popularity. For custodial storage, an on-demand recall of files from tape is performed, driven by the requests of the jobs, including disk cache handling. We shall describe the tools that are available for Data Management, from handling of large datasets to basic tools for users as well as for monitoring the dynamic behavior of LHCb Storage capacity.
Estimating parameters for probabilistic linkage of privacy-preserved datasets.

Science.gov (United States)

Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

2017-07-10

Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher
Facing the Challenges of Accessing, Managing, and Integrating Large Observational Datasets in Ecology: Enabling and Enriching the Use of NEON's Observational Data

Science.gov (United States)

Thibault, K. M.

2013-12-01

As the construction of NEON and its transition to operations progresses, more and more data will become available to the scientific community, both from NEON directly and from the concomitant growth of existing data repositories. Many of these datasets include ecological observations of a diversity of taxa in both aquatic and terrestrial environments. Although observational data have been collected and used throughout the history of organismal biology, the field has not yet fully developed a culture of data management, documentation, standardization, sharing and discoverability to facilitate the integration and synthesis of datasets. Moreover, the tools required to accomplish these goals, namely database design, implementation, and management, and automation and parallelization of analytical tasks through computational techniques, have not historically been included in biology curricula, at either the undergraduate or graduate levels. To ensure the success of data-generating projects like NEON in advancing organismal ecology and to increase transparency and reproducibility of scientific analyses, an acceleration of the cultural shift to open science practices, the development and adoption of data standards, such as the DarwinCore standard for taxonomic data, and increased training in computational approaches for biologists need to be realized. Here I highlight several initiatives that are intended to increase access to and discoverability of publicly available datasets and equip biologists and other scientists with the skills that are need to manage, integrate, and analyze data from multiple large-scale projects. The EcoData Retriever (ecodataretriever.org) is a tool that downloads publicly available datasets, re-formats the data into an efficient relational database structure, and then automatically imports the data tables onto a user's local drive into the database tool of the user's choice. The automation of these tasks results in nearly instantaneous execution
Handling of multiassembly sealed baskets between reactor storage and a remote handling facility

International Nuclear Information System (INIS)

Massey, J.V.; Kessler, J.H.; McSherry, A.J.

1989-06-01

The storage of multiple fuel assemblies in sealed (welded) dry storage baskets is gaining increasing use to augment at-reactor fuel storage capacity. Since this increasing use will place a significant number of such baskets on reactor sites, some initial downstream planning for their future handling scenarios for retrieving multi-assembly sealed baskets (MSBs) from onsite storage and transferring and shipping the fuel (and/or the baskets) to a federally operated remote handling facility (RHF). Numerous options or at-reactor and away-from-reactor handling were investigated. Materials handling flowsheets were developed along with conceptual designs for the equipment and tools required to handle and open the MSBs. The handling options were evaluated and compared to a reference case, fuel handling sequence (i.e., fuel assemblies are taken from the fuel pool, shipped to a receiving and handling facility and placed into interim storage). The main parameters analyzed are throughout, radiation dose burden and cost. In addition to evaluating the handling of MSBs, this work also evaluated handling consolidated fuel canisters (CFCs). In summary, the handling of MSBs and CFCs in the store, ship and bury fuel cycle was found to be feasible and, under some conditions, to offer significant benefits in terms of throughput, cost and safety. 14 refs., 20 figs., 24 tabs
The CMS dataset bookkeeping service

Science.gov (United States)

Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

2008-07-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.
The CMS dataset bookkeeping service

Energy Technology Data Exchange (ETDEWEB)

Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

2008-07-15

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

The CMS dataset bookkeeping service

International Nuclear Information System (INIS)

Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

2008-01-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems
The CMS dataset bookkeeping service

International Nuclear Information System (INIS)

Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

2007-01-01

The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems
Nuclear fuel handling apparatus

International Nuclear Information System (INIS)

Andrea, C.; Dupen, C.F.G.; Noyes, R.C.

1977-01-01

A fuel handling machine for a liquid metal cooled nuclear reactor in which a retractable handling tube and gripper are lowered into the reactor to withdraw a spent fuel assembly into the handling tube. The handling tube containing the fuel assembly immersed in liquid sodium is then withdrawn completely from the reactor into the outer barrel of the handling machine. The machine is then used to transport the spent fuel assembly directly to a remotely located decay tank. The fuel handling machine includes a decay heat removal system which continuously removes heat from the interior of the handling tube and which is capable of operating at its full cooling capacity at all times. The handling tube is supported in the machine from an articulated joint which enables it to readily align itself with the correct position in the core. An emergency sodium supply is carried directly by the machine to provide make up in the event of a loss of sodium from the handling tube during transport to the decay tank. 5 claims, 32 drawing figures
Proteomics dataset

DEFF Research Database (Denmark)

Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

2017-01-01

The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....
New public dataset for spotting patterns in medieval document images

Science.gov (United States)

En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

2017-01-01

With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.
The OXL format for the exchange of integrated datasets

Directory of Open Access Journals (Sweden)

Taubert Jan

2007-12-01

Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.
Recent progress on developments of tritium safe handling techniques in Japan

International Nuclear Information System (INIS)

Watanabe, Kuniaki; Matsuyama, Masao

1993-01-01

Vast amounts of tritium will be used for thermonuclear fusion reactors. Without establishing safe handling techniques for large amounts of tritium, undoubtedly the fusion reactors will not be accepted. Japanese activity on tritium related research has considerably developed in the last 10 years. This review paper gives a brief summary of safe handling techniques developed by Japanese research groups. (author)
Data handling with SAM and art at the NOνA experiment

International Nuclear Information System (INIS)

Aurisano, A; Backhouse, C; Davies, G S; Illingworth, R; Mengel, M; Norman, A; Mayer, N; Rocco, D; Zirnstein, J

2015-01-01

During operations, NOvA produces between 5,000 and 7,000 raw files per day with peaks in excess of 12,000. These files must be processed in several stages to produce fully calibrated and reconstructed analysis files. In addition, many simulated neutrino interactions must be produced and processed through the same stages as data. To accommodate the large volume of data and Monte Carlo, production must be possible both on the Fermilab grid and on off-site farms, such as the ones accessible through the Open Science Grid. To handle the challenge of cataloging these files and to facilitate their off-line processing, we have adopted the SAM system developed at Fermilab. SAM indexes files according to metadata, keeps track of each file's physical locations, provides dataset management facilities, and facilitates data transfer to off-site grids. To integrate SAM with Fermilab's art software framework and the NOvA production workflow, we have developed methods to embed metadata into our configuration files, art files, and standalone ROOT files. A module in the art framework propagates the embedded information from configuration files into art files, and from input art files to output art files, allowing us to maintain a complete processing history within our files. Embedding metadata in configuration files also allows configuration files indexed in SAM to be used as inputs to Monte Carlo production jobs. Further, SAM keeps track of the input files used to create each output file. Parentage information enables the construction of self-draining datasets which have become the primary production paradigm used at NOvA. In this paper we will present an overview of SAM at NOvA and how it has transformed the file production framework used by the experiment. (paper)
Web mapping system for complex processing and visualization of environmental geospatial datasets

Science.gov (United States)

Titov, Alexander; Gordov, Evgeny; Okladnikov, Igor

2016-04-01

Environmental geospatial datasets (meteorological observations, modeling and reanalysis results, etc.) are used in numerous research applications. Due to a number of objective reasons such as inherent heterogeneity of environmental datasets, big dataset volume, complexity of data models used, syntactic and semantic differences that complicate creation and use of unified terminology, the development of environmental geodata access, processing and visualization services as well as client applications turns out to be quite a sophisticated task. According to general INSPIRE requirements to data visualization geoportal web applications have to provide such standard functionality as data overview, image navigation, scrolling, scaling and graphical overlay, displaying map legends and corresponding metadata information. It should be noted that modern web mapping systems as integrated geoportal applications are developed based on the SOA and might be considered as complexes of interconnected software tools for working with geospatial data. In the report a complex web mapping system including GIS web client and corresponding OGC services for working with geospatial (NetCDF, PostGIS) dataset archive is presented. There are three basic tiers of the GIS web client in it: 1. Tier of geospatial metadata retrieved from central MySQL repository and represented in JSON format 2. Tier of JavaScript objects implementing methods handling: --- NetCDF metadata --- Task XML object for configuring user calculations, input and output formats --- OGC WMS/WFS cartographical services 3. Graphical user interface (GUI) tier representing JavaScript objects realizing web application business logic Metadata tier consists of a number of JSON objects containing technical information describing geospatial datasets (such as spatio-temporal resolution, meteorological parameters, valid processing methods, etc). The middleware tier of JavaScript objects implementing methods for handling geospatial
Error characterisation of global active and passive microwave soil moisture datasets

Directory of Open Access Journals (Sweden)

W. A. Dorigo

2010-12-01

Full Text Available Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT and passive (AMSR-E and SSM/I microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active, the use of different frequencies (C-, X-, and Ku-band for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH. The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different
Argo_CUDA: Exhaustive GPU based approach for motif discovery in large DNA datasets.

Science.gov (United States)

Vishnevsky, Oleg V; Bocharnikov, Andrey V; Kolchanov, Nikolay A

2018-02-01

The development of chromatin immunoprecipitation sequencing (ChIP-seq) technology has revolutionized the genetic analysis of the basic mechanisms underlying transcription regulation and led to accumulation of information about a huge amount of DNA sequences. There are a lot of web services which are currently available for de novo motif discovery in datasets containing information about DNA/protein binding. An enormous motif diversity makes their finding challenging. In order to avoid the difficulties, researchers use different stochastic approaches. Unfortunately, the efficiency of the motif discovery programs dramatically declines with the query set size increase. This leads to the fact that only a fraction of top "peak" ChIP-Seq segments can be analyzed or the area of analysis should be narrowed. Thus, the motif discovery in massive datasets remains a challenging issue. Argo_Compute Unified Device Architecture (CUDA) web service is designed to process the massive DNA data. It is a program for the detection of degenerate oligonucleotide motifs of fixed length written in 15-letter IUPAC code. Argo_CUDA is a full-exhaustive approach based on the high-performance GPU technologies. Compared with the existing motif discovery web services, Argo_CUDA shows good prediction quality on simulated sets. The analysis of ChIP-Seq sequences revealed the motifs which correspond to known transcription factor binding sites.
Advanced remote handling developments for high radiation applications

International Nuclear Information System (INIS)

Herndon, J.N.; Kring, C.T.; Feldman, M.J.; Kuban, D.P.; Martin, H.L.; Rowe, J.C.; Hamel, W.R.

1985-01-01

The Remote Control Engineering Task of the Consolidated Fuel Reprocessing Program at Oak Ridge National Laboratory has been developing advanced techniques for remote maintenance of future US fuel reprocessing plants. These efforts are based on the application of teleoperated, force-reflecting servomanipulators for dexterous remote handling with television viewing for large-volume hazardous applications. These developments fully address the nonrepetitive nature of remote maintenance in the unstructured environments encountered in fuel reprocessing. This paper covers the primary emphasis in the present program; the design, fabrication, and installation of a prototype remote handling system for reprocessing applications, the Advanced Integrated Maintenance System
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Directory of Open Access Journals (Sweden)

Xiangyun Xiao

Full Text Available The reconstruction of gene regulatory networks (GRNs from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM, experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
A new asynchronous parallel algorithm for inferring large-scale gene regulatory networks.

Science.gov (United States)

Xiao, Xiangyun; Zhang, Wei; Zou, Xiufen

2015-01-01

The reconstruction of gene regulatory networks (GRNs) from high-throughput experimental data has been considered one of the most important issues in systems biology research. With the development of high-throughput technology and the complexity of biological problems, we need to reconstruct GRNs that contain thousands of genes. However, when many existing algorithms are used to handle these large-scale problems, they will encounter two important issues: low accuracy and high computational cost. To overcome these difficulties, the main goal of this study is to design an effective parallel algorithm to infer large-scale GRNs based on high-performance parallel computing environments. In this study, we proposed a novel asynchronous parallel framework to improve the accuracy and lower the time complexity of large-scale GRN inference by combining splitting technology and ordinary differential equation (ODE)-based optimization. The presented algorithm uses the sparsity and modularity of GRNs to split whole large-scale GRNs into many small-scale modular subnetworks. Through the ODE-based optimization of all subnetworks in parallel and their asynchronous communications, we can easily obtain the parameters of the whole network. To test the performance of the proposed approach, we used well-known benchmark datasets from Dialogue for Reverse Engineering Assessments and Methods challenge (DREAM), experimentally determined GRN of Escherichia coli and one published dataset that contains more than 10 thousand genes to compare the proposed approach with several popular algorithms on the same high-performance computing environments in terms of both accuracy and time complexity. The numerical results demonstrate that our parallel algorithm exhibits obvious superiority in inferring large-scale GRNs.
Single-molecule mechanics of protein-labelled DNA handles

Directory of Open Access Journals (Sweden)

Vivek S. Jadhav

2016-01-01

Full Text Available DNA handles are often used as spacers and linkers in single-molecule experiments to isolate and tether RNAs, proteins, enzymes and ribozymes, amongst other biomolecules, between surface-modified beads for nanomechanical investigations. Custom DNA handles with varying lengths and chemical end-modifications are readily and reliably synthesized en masse, enabling force spectroscopic measurements with well-defined and long-lasting mechanical characteristics under physiological conditions over a large range of applied forces. Although these chemically tagged DNA handles are widely used, their further individual modification with protein receptors is less common and would allow for additional flexibility in grabbing biomolecules for mechanical measurements. In-depth information on reliable protocols for the synthesis of these DNA–protein hybrids and on their mechanical characteristics under varying physiological conditions are lacking in literature. Here, optical tweezers are used to investigate different protein-labelled DNA handles in a microfluidic environment under different physiological conditions. Digoxigenin (DIG-dsDNA-biotin handles of varying sizes (1000, 3034 and 4056 bp were conjugated with streptavidin or neutravidin proteins. The DIG-modified ends of these hybrids were bound to surface-modified polystyrene (anti-DIG beads. Using different physiological buffers, optical force measurements showed consistent mechanical characteristics with long dissociation times. These protein-modified DNA hybrids were also interconnected in situ with other tethered biotinylated DNA molecules. Electron-multiplying CCD (EMCCD imaging control experiments revealed that quantum dot–streptavidin conjugates at the end of DNA handles remain freely accessible. The experiments presented here demonstrate that handles produced with our protein–DNA labelling procedure are excellent candidates for grasping single molecules exposing tags suitable for molecular
A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

Directory of Open Access Journals (Sweden)

Mohammad Amin Shayegan

2014-01-01

Full Text Available A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.
Ergonomics and comfort in lawn mower handle positioning: An evaluation of handle geometry.

Science.gov (United States)

Lowndes, Bethany R; Heald, Elizabeth A; Hallbeck, M Susan

2015-11-01

Hand operation accompanied with any combination of large forces, awkward positions and repetition may lead to upper limb injury or illness and may be exacerbated by vibration. Commercial lawn mowers expose operators to these factors during actuation of hand controls and therefore may be a health concern. A nontraditional lawn mower control system may decrease upper limb illnesses and injuries through more neutral hand and body positioning. This study compared maximum grip strength in twelve different orientations (3 grip spans and 4 positions) and evaluated self-described comfortable handle positions. The results displayed force differences between nontraditional (X) and both vertical (V) and pistol (P) positions (p < 0.0001) and among the different grip spans (p < 0.0001). Based on these results, recommended designs should incorporate a tilt between 45 and 70°, handle rotations between 48 and 78°, and reduced force requirements or decreased grip spans to improve user health and comfort. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
LACIE data-handling techniques

Science.gov (United States)

Waits, G. H. (Principal Investigator)

1979-01-01

Techniques implemented to facilitate processing of LANDSAT multispectral data between 1975 and 1978 are described. The data that were handled during the large area crop inventory experiment and the storage mechanisms used for the various types of data are defined. The overall data flow, from the placing of the LANDSAT orders through the actual analysis of the data set, is discussed. An overview is provided of the status and tracking system that was developed and of the data base maintenance and operational task. The archiving of the LACIE data is explained.
A multimodal MRI dataset of professional chess players.

Science.gov (United States)

Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

2015-01-01

Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.
Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

Science.gov (United States)

Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

2013-12-01

Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM

Web-based visualization of very large scientific astronomy imagery

Science.gov (United States)

Bertin, E.; Pillay, R.; Marmo, C.

2015-04-01

Visualizing and navigating through large astronomy images from a remote location with current astronomy display tools can be a frustrating experience in terms of speed and ergonomics, especially on mobile devices. In this paper, we present a high performance, versatile and robust client-server system for remote visualization and analysis of extremely large scientific images. Applications of this work include survey image quality control, interactive data query and exploration, citizen science, as well as public outreach. The proposed software is entirely open source and is designed to be generic and applicable to a variety of datasets. It provides access to floating point data at terabyte scales, with the ability to precisely adjust image settings in real-time. The proposed clients are light-weight, platform-independent web applications built on standard HTML5 web technologies and compatible with both touch and mouse-based devices. We put the system to the test and assess the performance of the system and show that a single server can comfortably handle more than a hundred simultaneous users accessing full precision 32 bit astronomy data.
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

DEFF Research Database (Denmark)

Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

2006-01-01

Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...
Superphenix 1 primary handling system fabrication and testing

International Nuclear Information System (INIS)

Branchu, J.; Ebbinghaus, K.; Gigarel, C.

1985-01-01

Primary handling covers the operations performed for spent fuel removal, new fuel insertion, and the insodium storage outside the new or spent fuel vessel. This equipment typifies many of the difficulties encountered with the project as a whole: fabrication coordination when several countries are involved and design and construction of very large, relatively complex components. Detailed design studies were mainly influenced by thermal and seismic requirements, as applicable to sodium-immersed structures. Where possible, well-tried mechanical solutions were used, but widely differing techniques were involved, ranging from the high precision fabrication of structures and mechanisms comprising numerous component parts, implying complex machining operations. No particular problems were encountered during the sodium testing of the primary handling equipment. Trends for the 1500-MW (electric) breeder include investigation of the advisability of fuel storage in the core lattice and the possibility of handling system simplification
Programs for visualization, handling and quantification of PIXE maps at the AGLAE facility

Energy Technology Data Exchange (ETDEWEB)

Pichon, L., E-mail: laurent.pichon@culture.fr [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France); Calligaro, T. [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France); PSL Research University, Chimie ParisTech-CNRS, Institut de Recherche Chimie Paris, UMR8247, 75005 Paris (France); Lemasson, Q.; Moignard, B.; Pacheco, C. [Centre de recherche et de restauration des musées de France, C2RMF, Palais du Louvre – Porte des Lions, 14 Quai François Mitterrand, 75001 Paris (France); Fédération de recherche NewAGLAE, FR3506 CNRS, Ministère de la Culture et de la Communication, Chimie ParisTech, Palais du Louvre, 75001 Paris (France)

2015-11-15

The external beam setup of the AGLAE facility has been developed in order to combine PIXE with PIGE, EBS and recently IBIL for the analysis of cultural heritage artefacts. The upgraded external beam end-station integrates five large solid angle X-ray detectors either to reduce the risk of damage on sensitive artworks by decreasing the beam intensity or to routinely acquire elemental maps at various scales. While many programs are available to process PIXE maps acquired with nuclear microprobes, a software to process the major and trace elements PIXE maps point by point using GUPIX is not available. The present paper describes three programs developed for the AGLAE facility to process numerous maps obtained with multiple detectors. AGLAEMAP allows to handle maps and pixel groups within maps, TRAUPIXE to process quantitatively PIXE spectra of all pixels and DATAIMAGING to display the resulting quantitative elemental maps. The benefits of this software suite are demonstrated by processing a dataset acquired on a pellet of geostandard reference material and on a terre mêlée pottery shard sample created by the famous ceramist Bernard Palissy (1510–1589), highlighting chemical elements present in this polychrome ceramic.
Programs for visualization, handling and quantification of PIXE maps at the AGLAE facility

International Nuclear Information System (INIS)

Pichon, L.; Calligaro, T.; Lemasson, Q.; Moignard, B.; Pacheco, C.

2015-01-01

The external beam setup of the AGLAE facility has been developed in order to combine PIXE with PIGE, EBS and recently IBIL for the analysis of cultural heritage artefacts. The upgraded external beam end-station integrates five large solid angle X-ray detectors either to reduce the risk of damage on sensitive artworks by decreasing the beam intensity or to routinely acquire elemental maps at various scales. While many programs are available to process PIXE maps acquired with nuclear microprobes, a software to process the major and trace elements PIXE maps point by point using GUPIX is not available. The present paper describes three programs developed for the AGLAE facility to process numerous maps obtained with multiple detectors. AGLAEMAP allows to handle maps and pixel groups within maps, TRAUPIXE to process quantitatively PIXE spectra of all pixels and DATAIMAGING to display the resulting quantitative elemental maps. The benefits of this software suite are demonstrated by processing a dataset acquired on a pellet of geostandard reference material and on a terre mêlée pottery shard sample created by the famous ceramist Bernard Palissy (1510–1589), highlighting chemical elements present in this polychrome ceramic.
TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

Science.gov (United States)

Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

2013-05-30

Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.
Open-source web-enabled data management, analyses, and visualization of very large data in geosciences using Jupyter, Apache Spark, and community tools

Science.gov (United States)

Chaudhary, A.

2017-12-01

Current simulation models and sensors are producing high-resolution, high-velocity data in geosciences domain. Knowledge discovery from these complex and large size datasets require tools that are capable of handling very large data and providing interactive data analytics features to researchers. To this end, Kitware and its collaborators are producing open-source tools GeoNotebook, GeoJS, Gaia, and Minerva for geosciences that are using hardware accelerated graphics and advancements in parallel and distributed processing (Celery and Apache Spark) and can be loosely coupled to solve real-world use-cases. GeoNotebook (https://github.com/OpenGeoscience/geonotebook) is co-developed by Kitware and NASA-Ames and is an extension to the Jupyter Notebook. It provides interactive visualization and python-based analysis of geospatial data and depending the backend (KTile or GeoPySpark) can handle data sizes of Hundreds of Gigabytes to Terabytes. GeoNotebook uses GeoJS (https://github.com/OpenGeoscience/geojs) to render very large geospatial data on the map using WebGL and Canvas2D API. GeoJS is more than just a GIS library as users can create scientific plots such as vector and contour and can embed InfoVis plots using D3.js. GeoJS aims for high-performance visualization and interactive data exploration of scientific and geospatial location aware datasets and supports features such as Point, Line, Polygon, and advanced features such as Pixelmap, Contour, Heatmap, and Choropleth. Our another open-source tool Minerva ((https://github.com/kitware/minerva) is a geospatial application that is built on top of open-source web-based data management system Girder (https://github.com/girder/girder) which provides an ability to access data from HDFS or Amazon S3 buckets and provides capabilities to perform visualization and analyses on geosciences data in a web environment using GDAL and GeoPandas wrapped in a unified API provided by Gaia (https
Controlling fugitive dust emissions in material handling operations

Energy Technology Data Exchange (ETDEWEB)

Tooker, G E

1992-05-01

The primary mechanism of fugitive dust generation in bulk material handling transfer operations is by dispersion of dust in turbulent air induced to flow with falling or projected material streams. This paper returns to basic theories of particle dynamics and fluid mechanics to quantify the dust generating mechanism by rational analysis. Calculations involving fluid mechanisms are made easier by the availability of the personal computer and the many math manipulating programs. Rational analysis is much more cost effective when estimating collection air volumes to control fugitive emissions; especially in enclosed material handling transfers transporting large volumes of dusty material. Example calculations, using a typical enclosed conveyor-to-conveyor transfer operation are presented to illustrate and highlight the key parameters that determine the magnitude of induced air flow that must be controlled. The methods presented in this paper for estimating collection air volumes apply only enclosed material handling transfers, exhausted to a dust collector. Since some assistance to the control of dust emissions must be given by the material handling transfer chute design, a discussion of good transfer chute design practice is presented. 4 refs., 2 figs., 2 tabs.
Soil chemistry in lithologically diverse datasets: the quartz dilution effect

Science.gov (United States)

Bern, Carleton R.

2009-01-01

National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.
ClimateNet: A Machine Learning dataset for Climate Science Research

Science.gov (United States)

Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

2017-12-01

Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

OpenAIRE

Antonio Jesús Pérez-Luque; Cristina Patricia Sánchez-Rojas; Regino Zamora; Ramón Pérez-Pérez; Francisco Javier Bonet

2015-01-01

Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems...
Radioactivity, shielding, radiation damage, and remote handling

International Nuclear Information System (INIS)

Wilson, M.T.

1975-01-01

Proton beams of a few hundred million electron volts of energy are capable of inducing hundreds of curies of activity per microampere of beam intensity into the materials they intercept. This adds a new dimension to the parameters that must be considered when designing and operating a high-intensity accelerator facility. Large investments must be made in shielding. The shielding itself may become activated and require special considerations as to its composition, location, and method of handling. Equipment must be designed to withstand large radiation dosages. Items such as vacuum seals, water tubing, and electrical insulation must be fabricated from radiation-resistant materials. Methods of maintaining and replacing equipment are required that limit the radiation dosages to workers.The high-intensity facilities of LAMPF, SIN, and TRIUMF and the high-energy facility of FERMILAB have each evolved a philosophy of radiation handling that matches their particular machine and physical plant layouts. Special tooling, commercial manipulator systems, remote viewing, and other techniques of the hot cell and fission reactor realms are finding application within accelerator facilities. (U.S.)
RARD: The Related-Article Recommendation Dataset

OpenAIRE

Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

2017-01-01

Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...
Recommendations for cask features for robotic handling from the Advanced Handling Technology Project

International Nuclear Information System (INIS)

Drotning, W.

1991-02-01

This report describes the current status and recent progress in the Advanced Handling Technology Project (AHTP) initiated to explore the use of advanced robotic systems and handling technologies to perform automated cask handling operations at radioactive waste handling facilities, and to provide guidance to cask designers on the impact of robotic handling on cask design. Current AHTP tasks have developed system mock-ups to investigate robotic manipulation of impact limiters and cask tiedowns. In addition, cask uprighting and transport, using computer control of a bridge crane and robot, were performed to demonstrate the high speed cask transport operation possible under computer control. All of the current AHTP tasks involving manipulation of impact limiters and tiedowns require robotic operations using a torque wrench. To perform these operations, a pneumatic torque wrench and control system were integrated into the tool suite and control architecture of the gantry robot. The use of captured fasteners is briefly discussed as an area where alternative cask design preferences have resulted from the influence of guidance for robotic handling vs traditional operations experience. Specific robotic handling experiences with these system mock-ups highlight a number of continually recurring design principles: (1) robotic handling feasibility is improved by mechanical designs which emphasize operation with limited dexterity in constrained workspaces; (2) clearances, tolerances, and chamfers must allow for operations under actual conditions with consideration for misalignment and imprecise fixturing; (3) successful robotic handling is enhanced by including design detail in representations for model-based control; (4) robotic handling and overall quality assurance are improved by designs which eliminate the use of loose, disassembled parts. 8 refs., 15 figs
Isfahan MISP Dataset.

Science.gov (United States)

Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

2017-01-01

An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).
A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset.

Science.gov (United States)

Kamal, Sarwar; Ripon, Shamim Hasnat; Dey, Nilanjan; Ashour, Amira S; Santhi, V

2016-07-01

In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor (K-NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Criteria of GenCall score to edit marker data and methods to handle missing markers have an influence on accuracy of genomic predictions

DEFF Research Database (Denmark)

Edriss, Vahid; Guldbrandtsen, Bernt; Lund, Mogens Sandø

2013-01-01

The aim of this study was to investigate the effect of different strategies for handling low-quality or missing data on prediction accuracy for direct genomic values of protein yield, mastitis and fertility using a Bayesian variable model and a GBLUP model in the Danish Jersey population. The data...... contained 1071 Jersey bulls that were genotyped with the Illumina Bovine 50K chip. After preliminary editing, 39227 SNP remained in the dataset. Four methods to handle missing genotypes were: 1) BEAGLE: missing markers were imputed using Beagle 3.3 software, 2) COMMON: missing genotypes at a locus were...
Large datasets: Segmentation, feature extraction, and compression

Energy Technology Data Exchange (ETDEWEB)

Downing, D.J.; Fedorov, V.; Lawkins, W.F.; Morris, M.D.; Ostrouchov, G.

1996-07-01

Large data sets with more than several mission multivariate observations (tens of megabytes or gigabytes of stored information) are difficult or impossible to analyze with traditional software. The amount of output which must be scanned quickly dilutes the ability of the investigator to confidently identify all the meaningful patterns and trends which may be present. The purpose of this project is to develop both a theoretical foundation and a collection of tools for automated feature extraction that can be easily customized to specific applications. Cluster analysis techniques are applied as a final step in the feature extraction process, which helps make data surveying simple and effective.
Handling imbalance data in churn prediction using combined SMOTE and RUS with bagging method

Science.gov (United States)

Pura Hartati, Eka; Adiwijaya; Arif Bijaksana, Moch

2018-03-01

Customer churn has become a significant problem and also a challenge for Telecommunication company such as PT. Telkom Indonesia. It is necessary to evaluate whether the big problems of churn customer and the company’s managements will make appropriate strategies to minimize the churn and retaining the customer. Churn Customer data which categorized churn Atas Permintaan Sendiri (APS) in this Company is an imbalance data, and this issue is one of the challenging tasks in machine learning. This study will investigate how is handling class imbalance in churn prediction using combined Synthetic Minority Over-Sampling (SMOTE) and Random Under-Sampling (RUS) with Bagging method for a better churn prediction performance’s result. The dataset that used is Broadband Internet data which is collected from Telkom Regional 6 Kalimantan. The research firstly using data preprocessing to balance the imbalanced dataset and also to select features by sampling technique SMOTE and RUS, and then building churn prediction model using Bagging methods and C4.5.
Data-Driven Decision Support for Radiologists: Re-using the National Lung Screening Trial Dataset for Pulmonary Nodule Management

OpenAIRE

Morrison, James J.; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L.

2014-01-01

Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was conve...

Open University Learning Analytics dataset.

Science.gov (United States)

Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

2017-11-28

Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.
Dataset of herbarium specimens of threatened vascular plants in Catalonia.

Science.gov (United States)

Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

2017-01-01

This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.
A review of continent scale hydrological datasets available for Africa

OpenAIRE

Bonsor, H.C.

2010-01-01

As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...
Reservoir water level forecasting using group method of data handling

Science.gov (United States)

Zaji, Amir Hossein; Bonakdari, Hossein; Gharabaghi, Bahram

2018-06-01

Accurately forecasted reservoir water level is among the most vital data for efficient reservoir structure design and management. In this study, the group method of data handling is combined with the minimum description length method to develop a very practical and functional model for predicting reservoir water levels. The models' performance is evaluated using two groups of input combinations based on recent days and recent weeks. Four different input combinations are considered in total. The data collected from Chahnimeh#1 Reservoir in eastern Iran are used for model training and validation. To assess the models' applicability in practical situations, the models are made to predict a non-observed dataset for the nearby Chahnimeh#4 Reservoir. According to the results, input combinations (L, L -1) and (L, L -1, L -12) for recent days with root-mean-squared error (RMSE) of 0.3478 and 0.3767, respectively, outperform input combinations (L, L -7) and (L, L -7, L -14) for recent weeks with RMSE of 0.3866 and 0.4378, respectively, with the dataset from https://www.typingclub.com/st. Accordingly, (L, L -1) is selected as the best input combination for making 7-day ahead predictions of reservoir water levels.
Simulation of the MRS receiving and handling facility

International Nuclear Information System (INIS)

Triplett, M.B.; Imhoff, C.H.; Hostick, C.J.

1984-02-01

Monitored retrievable storage (MRS) will be required to handle a large volume of spent fuel or high-level waste (HLW) in case of delays in repository deployment. The quantities of materials to be received and repackaged for storage far exceed the requirements of existing waste mangement facilities. A computer simulation model of the MRS receiving and handling (R and H) fcility has been constructed and used to evaluate design alternatives. Studies have identified processes or activities which may constrain throughput performance. In addition, the model has helped to assess design tradeoffs such as those to be made among improved process times, redundant service lines, and improved component availability. 1 reference, 5 figures
Quantifying the effect of editor-author relations on manuscript handling times.

Science.gov (United States)

Sarigöl, Emre; Garcia, David; Scholtes, Ingo; Schweitzer, Frank

2017-01-01

In this article we study to what extent the academic peer review process is influenced by social relations between the authors of a manuscript and the editor handling the manuscript. Taking the open access journal PlosOne as a case study, our analysis is based on a data set of more than 100,000 articles published between 2007 and 2015. Using available data on handling editor, submission and acceptance time of manuscripts, we study the question whether co-authorship relations between authors and the handling editor affect the manuscript handling time , i.e. the time taken between the submission and acceptance of a manuscript. Our analysis reveals (1) that editors handle papers co-authored by previous collaborators significantly more often than expected at random, and (2) that such prior co-author relations are significantly related to faster manuscript handling. Addressing the question whether these shorter manuscript handling times can be explained by the quality of publications, we study the number of citations and downloads which accepted papers eventually accumulate. Moreover, we consider the influence of additional (social) factors, such as the editor's experience, the topical similarity between authors and editors, as well as reciprocal citation relations between authors and editors. Our findings show that, even when correcting for other factors like time, experience, and performance, prior co-authorship relations have a large and significant influence on manuscript handling times, speeding up the editorial decision on average by 19 days.
Passive technologies for future large-scale photonic integrated circuits on silicon: polarization handling, light non-reciprocity and loss reduction

Directory of Open Access Journals (Sweden)

Daoxin Dai

2012-03-01

Full Text Available Silicon-based large-scale photonic integrated circuits are becoming important, due to the need for higher complexity and lower cost for optical transmitters, receivers and optical buffers. In this paper, passive technologies for large-scale photonic integrated circuits are described, including polarization handling, light non-reciprocity and loss reduction. The design rule for polarization beam splitters based on asymmetrical directional couplers is summarized and several novel designs for ultra-short polarization beam splitters are reviewed. A novel concept for realizing a polarization splitter–rotator is presented with a very simple fabrication process. Realization of silicon-based light non-reciprocity devices (e.g., optical isolator, which is very important for transmitters to avoid sensitivity to reflections, is also demonstrated with the help of magneto-optical material by the bonding technology. Low-loss waveguides are another important technology for large-scale photonic integrated circuits. Ultra-low loss optical waveguides are achieved by designing a Si3N4 core with a very high aspect ratio. The loss is reduced further to <0.1 dB m−1 with an improved fabrication process incorporating a high-quality thermal oxide upper cladding by means of wafer bonding. With the developed ultra-low loss Si3N4 optical waveguides, some devices are also demonstrated, including ultra-high-Q ring resonators, low-loss arrayed-waveguide grating (demultiplexers, and high-extinction-ratio polarizers.
Harnessing Connectivity in a Large-Scale Small-Molecule Sensitivity Dataset | Office of Cancer Genomics

Science.gov (United States)

Identifying genetic alterations that prime a cancer cell to respond to a particular therapeutic agent can facilitate the development of precision cancer medicines. Cancer cell-line (CCL) profiling of small-molecule sensitivity has emerged as an unbiased method to assess the relationships between genetic or cellular features of CCLs and small-molecule response. Here, we developed annotated cluster multidimensional enrichment analysis to explore the associations between groups of small molecules and groups of CCLs in a new, quantitative sensitivity dataset.
Posttraining handling facilitates memory for auditory-cue fear conditioning in rats.

Science.gov (United States)

Hui, Isabel R; Hui, Gabriel K; Roozendaal, Benno; McGaugh, James L; Weinberger, Norman M

2006-09-01

A large number of studies have indicated that stress exposure or the administration of stress hormones and other neuroactive drugs immediately after a learning experience modulates the consolidation of long-term memory. However, there has been little investigation into how arousal induced by handling of the animals in order to administer these drugs affects memory. Therefore, the present study examined whether the posttraining injection or handling procedure per se affects memory of auditory-cue classical fear conditioning. Male Sprague-Dawley rats, which had been pre-handled on three days for 1 min each prior to conditioning, received three pairings of a single-frequency auditory stimulus and footshock, followed immediately by either a subcutaneous injection of a vehicle solution or brief handling without injection. A control group was placed back into their home cages without receiving any posttraining treatment. Retention was tested 24 h later in a novel chamber and suppression of ongoing motor behavior during a 10-s presentation of the auditory-cue served as the measure of conditioned fear. Animals that received posttraining injection or handling did not differ from each other but showed significantly less stimulus-induced movement compared to the non-handled control group. These findings thus indicate that the posttraining injection or handling procedure is sufficiently arousing or stressful to facilitate memory consolidation of auditory-cue classical fear conditioning.
Adaptive visualization for large-scale graph

International Nuclear Information System (INIS)

Nakamura, Hiroko; Shinano, Yuji; Ohzahata, Satoshi

2010-01-01

We propose an adoptive visualization technique for representing a large-scale hierarchical dataset within limited display space. A hierarchical dataset has nodes and links showing the parent-child relationship between the nodes. These nodes and links are described using graphics primitives. When the number of these primitives is large, it is difficult to recognize the structure of the hierarchical data because many primitives are overlapped within a limited region. To overcome this difficulty, we propose an adaptive visualization technique for hierarchical datasets. The proposed technique selects an appropriate graph style according to the nodal density in each area. (author)
Supporting flexible processes with adaptive workflow and case handling

NARCIS (Netherlands)

Günther, C.W.; Reichert, M.; Aalst, van der W.M.P.

2008-01-01

Workflow management technology has profoundly transformed the way complex tasks are being handled in modern, large-scale organizations. However, it is mostly those systems' inherent lack of flexibility that hinders their broad acceptance, and that is perceived as annoyance by users. In this context,
Full scale tests on remote handled FFTF fuel assembly waste handling and packaging

International Nuclear Information System (INIS)

Allen, C.R.; Cash, R.J.; Dawson, S.A.; Strode, J.N.

1986-01-01

Handling and packaging of remote handled, high activity solid waste fuel assembly hardware components from spent FFTF reactor fuel assemblies have been evaluated using full scale components. The demonstration was performed using FFTF fuel assembly components and simulated components which were handled remotely using electromechanical manipulators, shielding walls, master slave manipulators, specially designed grapples, and remote TV viewing. The testing and evaluation included handling, packaging for current and conceptual shipping containers, and the effects of volume reduction on packing efficiency and shielding requirements. Effects of waste segregation into transuranic (TRU) and non-transuranic fractions also are discussed
Decoys Selection in Benchmarking Datasets: Overview and Perspectives

Science.gov (United States)

Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

2018-01-01

Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509
Mridangam stroke dataset

OpenAIRE

CompMusic

2014-01-01

The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...
2008 TIGER/Line Nationwide Dataset

Data.gov (United States)

California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...
Data-driven decision support for radiologists: re-using the National Lung Screening Trial dataset for pulmonary nodule management.

Science.gov (United States)

Morrison, James J; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L

2015-02-01

Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was converted into Structured Query Language (SQL) tables hosted on a web server, and a web-based JavaScript application was developed which performs real-time queries. JavaScript is used for both the server-side and client-side language, allowing for rapid development of a robust client interface and server-side data layer. Real-time data mining of user-specified patient cohorts achieved a rapid return of cohort cancer statistics and lung nodule distribution information. This system demonstrates the potential of individualized real-time data mining using large high-quality clinical trial datasets to drive evidence-based clinical decision-making.
Kernel-based discriminant feature extraction using a representative dataset

Science.gov (United States)

Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

2002-07-01

Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.
Design of an audio advertisement dataset

Science.gov (United States)

Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

2015-12-01

Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.
Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

Science.gov (United States)

Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

2015-01-01

The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.
Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

Energy Technology Data Exchange (ETDEWEB)

Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

2011-01-01

Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

Dimension Reduction Aided Hyperspectral Image Classification with a Small-sized Training Dataset: Experimental Comparisons

Directory of Open Access Journals (Sweden)

Jinya Su

2017-11-01

Full Text Available Hyperspectral images (HSI provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification, and the need of a large amount of labelled data for training. Particularly, it is generally difficult and expensive for experts to acquire sufficient training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of training dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance and feature extraction (e.g., Principal Component Analysis (PCA, Kernel PCA are adopted to augment a baseline classification method, Support Vector Machine (SVM. The proposed algorithms are evaluated using a real HSI dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small training dataset, which makes it suitable for real-time applications or when only limited training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large datasets but with much less computing time.
MFTF exception handling system

International Nuclear Information System (INIS)

Nowell, D.M.; Bridgeman, G.D.

1979-01-01

In the design of large experimental control systems, a major concern is ensuring that operators are quickly alerted to emergency or other exceptional conditions and that they are provided with sufficient information to respond adequately. This paper describes how the MFTF exception handling system satisfies these requirements. Conceptually exceptions are divided into one of two classes. Those which affect command status by producing an abort or suspend condition and those which fall into a softer notification category of report only or operator acknowledgement requirement. Additionally, an operator may choose to accept an exception condition as operational, or turn off monitoring for sensors determined to be malfunctioning. Control panels and displays used in operator response to exceptions are described
The GTZAN dataset

DEFF Research Database (Denmark)

Sturm, Bob L.

2013-01-01

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...
Microscopy Image Browser: A Platform for Segmentation and Analysis of Multidimensional Datasets.

Directory of Open Access Journals (Sweden)

Ilya Belevich

2016-01-01

Full Text Available Understanding the structure-function relationship of cells and organelles in their natural context requires multidimensional imaging. As techniques for multimodal 3-D imaging have become more accessible, effective processing, visualization, and analysis of large datasets are posing a bottleneck for the workflow. Here, we present a new software package for high-performance segmentation and image processing of multidimensional datasets that improves and facilitates the full utilization and quantitative analysis of acquired data, which is freely available from a dedicated website. The open-source environment enables modification and insertion of new plug-ins to customize the program for specific needs. We provide practical examples of program features used for processing, segmentation and analysis of light and electron microscopy datasets, and detailed tutorials to enable users to rapidly and thoroughly learn how to use the program.
30o inclination in handles of plastic boxes can reduce postural and muscular workload during handling

Directory of Open Access Journals (Sweden)

Luciana C. C. B. Silva

2013-06-01

Full Text Available BACKGROUND: The handling of materials, which occurs in the industrial sector, is associated with lesions on the lumbar spine and in the upper limbs. Inserting handles in industrial boxes is a way to reduce work-related risks. Although the position and angle of the handles are significant factors in comfort and safety during handling, these factors have rarely been studied objectively. OBJECTIVE: To compare the handling of a commercial box and prototypes with handles and to evaluate the effects on upper limb posture, muscle electrical activity, and perceived acceptability using different grips while handling materials from different heights. METHOD: Thirty-seven healthy volunteers evaluated the handles of prototypes that allowed for changes in position (top and bottom and angle (0°, 15°, and 30°. Wrist, elbow, and shoulder movements were evaluated using electrogoniometry and inclinometry. The muscle electrical activity in the wrist extensors, biceps brachii, and the upper portion of the trapezius was measured using a portable electromyographer. The recorded data on muscle movements and electrical activity were synchronized. Subjective evaluations of acceptability were evaluated using a visual analog scale. RESULTS AND CONCLUSIONS: The prototypes with handles at a 30° angle produced the highest acceptability ratings, more neutral wrist positions, lower levels of electromyographic activity for the upper trapezius, and lower elevation angles for the arms. The different measurement methods were complementary in evaluating the upper limbs during handling.
New transport and handling contract

CERN Multimedia

SC Department

2008-01-01

A new transport and handling contract entered into force on 1.10.2008. As with the previous contract, the user interface is the internal transport/handling request form on EDH: https://edh.cern.ch/Document/TransportRequest/ To ensure that you receive the best possible service, we invite you to complete the various fields as accurately as possible and to include a mobile telephone number on which we can reach you. You can follow the progress of your request (schedule, completion) in the EDH request routing information. We remind you that the following deadlines apply: 48 hours for the transport of heavy goods (up to 8 tonnes) or simple handling operations 5 working days for crane operations, transport of extra-heavy goods, complex handling operations and combined transport and handling operations in the tunnel. For all enquiries, the number to contact remains unchanged: 72202. Heavy Handling Section TS-HE-HH 72672 - 160319
Transport and handling LHC components A permanent challenge

CERN Document Server

Bertone, C

2004-01-01

The LHC project, collider and experiments, is an assembly of thousands of elements, large or small, heavy or light, fragile or robust. Each element has its own transport requirements that constitute a real challenge to handle. Even simple manoeuvres could lead to difficulties in integration, routing and execution due to the complex environment and confined underground spaces. Examples of typical LHC elements transport and handling will be detailed such as the 16-m long, 34-t heavy, fragile cryomagnets from the surface to the final destination in the tunnel, or the delicate cryogenic cold-boxes down to pits and detector components. This challenge did not only require a lot of imagination but also a close cooperation between all the involved parties, in particular with colleagues from safety, cryogenics, civil engineering, integration and logistics.
Parallel Index and Query for Large Scale Data Analysis

Energy Technology Data Exchange (ETDEWEB)

Chou, Jerry; Wu, Kesheng; Ruebel, Oliver; Howison, Mark; Qiang, Ji; Prabhat,; Austin, Brian; Bethel, E. Wes; Ryne, Rob D.; Shoshani, Arie

2011-07-18

Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing of a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.
Remote sensing data handling to improve the system integration of indonesian national spatial data infrastructure

International Nuclear Information System (INIS)

Hari, G. R. V.

2010-01-01

With the usage of metadata as a reference for spatial data query, remote sensing images and other spatial datasets have been linked to their related semantic information. In the current catalogue systems, like those or satellite data provides, or clearinghouses, each remote sensing image is maintained as an independent entity. There is a very limited possibility to know the linkage of one image to another, even if one image has actually been derived from the other. It is an advantage for many purposes if the linkage among remote sensing image or other spatial data can be maintained or at least reconstructed. This research will explore how an image is linked to its related information, and how an image can be linked to another images. By exploring links among remote sensing images, a query of remote sensing data collection can be extended, for example, to find the answer of the query: 'which images are used to create certain dataset?', or 'which images have been created from a concrete dataset?', or 'is there a relationship between image A and image B based on their processing steps?'. By building links among spatial datasets in a collection based on their creation process, a further possibility of spatial data organization can be supported. The applicability and compatibility of the proposed method with the current platform is also considered. The proposed method can be implemented using the same standard and protocol and using the same metadata file as used by the existing system. This approach makes it also possible to be implemented in many countries which use the same infrastructure. To prove this purpose, we develop a prototype based on open source platform, including PostgreSQL, Apache Webserver, Mapserver WebGIS, and PHP programming environment. The output of this research leads to an improvement of spatial data handling, where an adjacency list is used to maintain spatial dataset history link. This improvement can enhance the query of spatial data in a
User Friendly Open GIS Tool for Large Scale Data Assimilation - a Case Study of Hydrological Modelling

Science.gov (United States)

Gupta, P. K.

2012-08-01

Open source software (OSS) coding has tremendous advantages over proprietary software. These are primarily fuelled by high level programming languages (JAVA, C++, Python etc...) and open source geospatial libraries (GDAL/OGR, GEOS, GeoTools etc.). Quantum GIS (QGIS) is a popular open source GIS package, which is licensed under GNU GPL and is written in C++. It allows users to perform specialised tasks by creating plugins in C++ and Python. This research article emphasises on exploiting this capability of QGIS to build and implement plugins across multiple platforms using the easy to learn - Python programming language. In the present study, a tool has been developed to assimilate large spatio-temporal datasets such as national level gridded rainfall, temperature, topographic (digital elevation model, slope, aspect), landuse/landcover and multi-layer soil data for input into hydrological models. At present this tool has been developed for Indian sub-continent. An attempt is also made to use popular scientific and numerical libraries to create custom applications for digital inclusion. In the hydrological modelling calibration and validation are important steps which are repetitively carried out for the same study region. As such the developed tool will be user friendly and used efficiently for these repetitive processes by reducing the time required for data management and handling. Moreover, it was found that the developed tool can easily assimilate large dataset in an organised manner.
The effects of spatial population dataset choice on estimates of population at risk of disease

Directory of Open Access Journals (Sweden)

Gething Peter W

2011-02-01

consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.
Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets.

Science.gov (United States)

Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor

2015-01-01

We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.
Trends in Modern Exception Handling

Directory of Open Access Journals (Sweden)

Marcin Kuta

2003-01-01

Full Text Available Exception handling is nowadays a necessary component of error proof information systems. The paper presents overview of techniques and models of exception handling, problems connected with them and potential solutions. The aspects of implementation of propagation mechanisms and exception handling, their effect on semantics and general program efficiency are also taken into account. Presented mechanisms were adopted to modern programming languages. Considering design area, formal methods and formal verification of program properties we can notice exception handling mechanisms are weakly present what makes a field for future research.
Development of a SPARK Training Dataset

Energy Technology Data Exchange (ETDEWEB)

Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

2015-03-01

In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the
Large-Scale Sentinel-1 Processing for Solid Earth Science and Urgent Response using Cloud Computing and Machine Learning

Science.gov (United States)

Hua, H.; Owen, S. E.; Yun, S. H.; Agram, P. S.; Manipon, G.; Starch, M.; Sacco, G. F.; Bue, B. D.; Dang, L. B.; Linick, J. P.; Malarout, N.; Rosen, P. A.; Fielding, E. J.; Lundgren, P.; Moore, A. W.; Liu, Z.; Farr, T.; Webb, F.; Simons, M.; Gurrola, E. M.

2017-12-01

With the increased availability of open SAR data (e.g. Sentinel-1 A/B), new challenges are being faced with processing and analyzing the voluminous SAR datasets to make geodetic measurements. Upcoming SAR missions such as NISAR are expected to generate close to 100TB per day. The Advanced Rapid Imaging and Analysis (ARIA) project can now generate geocoded unwrapped phase and coherence products from Sentinel-1 TOPS mode data in an automated fashion, using the ISCE software. This capability is currently being exercised on various study sites across the United States and around the globe, including Hawaii, Central California, Iceland and South America. The automated and large-scale SAR data processing and analysis capabilities use cloud computing techniques to speed the computations and provide scalable processing power and storage. Aspects such as how to processing these voluminous SLCs and interferograms at global scales, keeping up with the large daily SAR data volumes, and how to handle the voluminous data rates are being explored. Scene-partitioning approaches in the processing pipeline help in handling global-scale processing up to unwrapped interferograms with stitching done at a late stage. We have built an advanced science data system with rapid search functions to enable access to the derived data products. Rapid image processing of Sentinel-1 data to interferograms and time series is already being applied to natural hazards including earthquakes, floods, volcanic eruptions, and land subsidence due to fluid withdrawal. We will present the status of the ARIA science data system for generating science-ready data products and challenges that arise from being able to process SAR datasets to derived time series data products at large scales. For example, how do we perform large-scale data quality screening on interferograms? What approaches can be used to minimize compute, storage, and data movement costs for time series analysis in the cloud? We will also
Conceptual design of divertor cassette handling by remote handling system for JT-60SA

International Nuclear Information System (INIS)

Hayashi, Takao; Sakurai, Shinji; Masaki, Kei; Tamai, Hiroshi; Yoshida, Kiyoshi; Matsukawa, Makoto

2007-01-01

The JT-60SA aims to contribute and supplement ITER toward DEMO reactor based on tokamak concept. One of the features of JT-60SA is its high power long pulse heating, causing the large annual neutron fluence. Because the expected dose rate at the vacuum vessel (VV) may exceed 1 mSv/hr after 10 years operation and three month cooling, the human access inside the VV is prohibited. Therefore a remote handling (RH) system is necessary for the maintenance and repair of in-vessel components. This paper described the RH system of JT-60SA, especially the expansion of the RH rail and exchange of the divertor modules. The RH rail is divided into nine and three-point mounting. The nine sections can cover 225 degrees in toroidal direction. A divertor module, which is 10 degrees wide in toroidal direction and weighs 500kg itself due to the limitations of port width and handling weight, can be exchanged by heavy weight manipulator (HWM). The HWM brings the divertor module to the front of the other RH port, which is used for supporting the rail and/or carrying in and out equipments. Then another RH device receives and brings out the module by a pallet installed from outside the VV. (author)
Conceptual design of divertor cassette handling by remote handling system of JT-60SA

International Nuclear Information System (INIS)

Hayashi, Takao; Sakurai, Shinji; Masaki, Kei; Tamai, Hiroshi; Yoshida, Kiyoshi; Matsukawa, Makoto

2008-01-01

The JT-60SA aims to contribute and supplement ITER toward demonstration fusion reactor based on tokamak concept. One of the features of JT-60SA is its high power long pulse heating, causing the large annual neutron fluence. Because the expected dose rate at the vacuum vessel (VV) may exceed 1 mSv/hr after 10 years operation and three month cooling, the human access inside the VV is restricted. Therefore a remote handling (RH) system is necessary for the maintenance and repair of in-vessel components. This paper described the RH system of JT-60SA, especially the expansion of the RH rail and exchange of the divertor cassettes. The RH rail is divided into nine and three-point mounting. The nine sections can cover 225 degrees in toroidal direction. A divertor cassette, which is 10 degrees wide in toroidal direction and weighs 500 kg itself due to the limitations of port width and handling weight, can be exchanged by heavy weight manipulator (HWM). The HWM brings the divertor cassette to the front of the other RH port, which is used for supporting the rail and/or carrying in and out equipments. Then another RH device receives and brings out the cassette by a pallet installed from outside the VV. (author)
Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

Energy Technology Data Exchange (ETDEWEB)

Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

2011-01-01

Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.
Microfluidics on liquid handling stations (μF-on-LHS): a new industry-compatible microfluidic platform

Science.gov (United States)

Kittelmann, Jörg; Radtke, Carsten P.; Waldbaur, Ansgar; Neumann, Christiane; Hubbuch, Jürgen; Rapp, Bastian E.

2014-03-01

Since the early days microfluidics as a scientific discipline has been an interdisciplinary research field with a wide scope of potential applications. Besides tailored assays for point-of-care (PoC) diagnostics, microfluidics has been an important tool for large-scale screening of reagents and building blocks in organic chemistry, pharmaceutics and medical engineering. Furthermore, numerous potential marketable products have been described over the years. However, especially in industrial applications, microfluidics is often considered only an alternative technology for fluid handling, a field which is industrially mostly dominated by large-scale numerically controlled fluid and liquid handling stations. Numerous noteworthy products have dominated this field in the last decade and have been inhibited the widespread application of microfluidics technology. However, automated liquid handling stations and microfluidics do not have to be considered as mutually exclusive approached. We have recently introduced a hybrid fluidic platform combining an industrially established liquid handling station and a generic microfluidic interfacing module that allows probing a microfluidic system (such as an essay or a synthesis array) using the instrumentation provided by the liquid handling station. We term this technology "Microfluidic on Liquid Handling Stations (μF-on-LHS)" - a classical "best of both worlds"- approach that allows combining the highly evolved, automated and industry-proven LHS systems with any type of microfluidic assay. In this paper we show, to the best of our knowledge, the first droplet microfluidics application on an industrial LHS using the μF-on-LHS concept.
Remote handling machines

International Nuclear Information System (INIS)

Sato, Shinri

1985-01-01

In nuclear power facilities, the management of radioactive wastes is made with its technology plus the automatic techniques. Under the radiation field, the maintenance or aid of such systems is important. To cope with this situation, MF-2 system, MF-3 system and a manipulator system as remote handling machines are described. MF-2 system consists of an MF-2 carrier truck, a control unit and a command trailer. It is capable of handling heavy-weight objects. The system is not by hydraulic but by electrical means. MF-3 system consists of a four-crawler truck and a manipulator. The truck is versatile in its posture by means of the four independent crawlers. The manipulator system is bilateral in operation, so that the delicate handling is made possible. (Mori, K.)

Accelerating Relevance Vector Machine for Large-Scale Data on Spark

Directory of Open Access Journals (Sweden)

Liu Fang

2017-01-01

Full Text Available Relevance vector machine (RVM is a machine learning algorithm based on a sparse Bayesian framework, which performs well when running classification and regression tasks on small-scale datasets. However, RVM also has certain drawbacks which restricts its practical applications such as (1 slow training process, (2 poor performance on training large-scale datasets. In order to solve these problem, we propose Discrete AdaBoost RVM (DAB-RVM which incorporate ensemble learning in RVM at first. This method performs well with large-scale low-dimensional datasets. However, as the number of features increases, the training time of DAB-RVM increases as well. To avoid this phenomenon, we utilize the sufficient training samples of large-scale datasets and propose all features boosting RVM (AFB-RVM, which modifies the way of obtaining weak classifiers. In our experiments we study the differences between various boosting techniques with RVM, demonstrating the performance of the proposed approaches on Spark. As a result of this paper, two proposed approaches on Spark for different types of large-scale datasets are available.
Quantitative Missense Variant Effect Prediction Using Large-Scale Mutagenesis Data.

Science.gov (United States)

Gray, Vanessa E; Hause, Ronald J; Luebeck, Jens; Shendure, Jay; Fowler, Douglas M

2018-01-24

Large datasets describing the quantitative effects of mutations on protein function are becoming increasingly available. Here, we leverage these datasets to develop Envision, which predicts the magnitude of a missense variant's molecular effect. Envision combines 21,026 variant effect measurements from nine large-scale experimental mutagenesis datasets, a hitherto untapped training resource, with a supervised, stochastic gradient boosting learning algorithm. Envision outperforms other missense variant effect predictors both on large-scale mutagenesis data and on an independent test dataset comprising 2,312 TP53 variants whose effects were measured using a low-throughput approach. This dataset was never used for hyperparameter tuning or model training and thus serves as an independent validation set. Envision prediction accuracy is also more consistent across amino acids than other predictors. Finally, we demonstrate that Envision's performance improves as more large-scale mutagenesis data are incorporated. We precompute Envision predictions for every possible single amino acid variant in human, mouse, frog, zebrafish, fruit fly, worm, and yeast proteomes (https://envision.gs.washington.edu/). Copyright © 2017 Elsevier Inc. All rights reserved.
Treatments of Missing Values in Large National Data Affect Conclusions: The Impact of Multiple Imputation on Arthroplasty Research.

Science.gov (United States)

Ondeck, Nathaniel T; Fu, Michael C; Skrip, Laura A; McLynn, Ryan P; Su, Edwin P; Grauer, Jonathan N

2018-03-01

Despite the advantages of large, national datasets, one continuing concern is missing data values. Complete case analysis, where only cases with complete data are analyzed, is commonly used rather than more statistically rigorous approaches such as multiple imputation. This study characterizes the potential selection bias introduced using complete case analysis and compares the results of common regressions using both techniques following unicompartmental knee arthroplasty. Patients undergoing unicompartmental knee arthroplasty were extracted from the 2005 to 2015 National Surgical Quality Improvement Program. As examples, the demographics of patients with and without missing preoperative albumin and hematocrit values were compared. Missing data were then treated with both complete case analysis and multiple imputation (an approach that reproduces the variation and associations that would have been present in a full dataset) and the conclusions of common regressions for adverse outcomes were compared. A total of 6117 patients were included, of which 56.7% were missing at least one value. Younger, female, and healthier patients were more likely to have missing preoperative albumin and hematocrit values. The use of complete case analysis removed 3467 patients from the study in comparison with multiple imputation which included all 6117 patients. The 2 methods of handling missing values led to differing associations of low preoperative laboratory values with commonly studied adverse outcomes. The use of complete case analysis can introduce selection bias and may lead to different conclusions in comparison with the statistically rigorous multiple imputation approach. Joint surgeons should consider the methods of handling missing values when interpreting arthroplasty research. Copyright © 2017 Elsevier Inc. All rights reserved.
The minimum information required for a glycomics experiment (MIRAGE) project: sample preparation guidelines for reliable reporting of glycomics datasets.

Science.gov (United States)

Struwe, Weston B; Agravat, Sanjay; Aoki-Kinoshita, Kiyoko F; Campbell, Matthew P; Costello, Catherine E; Dell, Anne; Ten Feizi; Haslam, Stuart M; Karlsson, Niclas G; Khoo, Kay-Hooi; Kolarich, Daniel; Liu, Yan; McBride, Ryan; Novotny, Milos V; Packer, Nicolle H; Paulson, James C; Rapp, Erdmann; Ranzinger, Rene; Rudd, Pauline M; Smith, David F; Tiemeyer, Michael; Wells, Lance; York, William S; Zaia, Joseph; Kettner, Carsten

2016-09-01

The minimum information required for a glycomics experiment (MIRAGE) project was established in 2011 to provide guidelines to aid in data reporting from all types of experiments in glycomics research including mass spectrometry (MS), liquid chromatography, glycan arrays, data handling and sample preparation. MIRAGE is a concerted effort of the wider glycomics community that considers the adaptation of reporting guidelines as an important step towards critical evaluation and dissemination of datasets as well as broadening of experimental techniques worldwide. The MIRAGE Commission published reporting guidelines for MS data and here we outline guidelines for sample preparation. The sample preparation guidelines include all aspects of sample generation, purification and modification from biological and/or synthetic carbohydrate material. The application of MIRAGE sample preparation guidelines will lead to improved recording of experimental protocols and reporting of understandable and reproducible glycomics datasets. © The Author 2016. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Development of liquid handling techniques in microgravity

Science.gov (United States)

Antar, Basil N.

1995-01-01

A large number of experiments dealing with protein crystal growth and also with growth of crystals from solution require complicated fluid handling procedures including filling of empty containers with liquids, mixing of solutions, and stirring of liquids. Such procedures are accomplished in a straight forward manner when performed under terrestrial conditions in the laboratory. However, in the low gravity environment of space, such as on board the Space Shuttle or an Earth-orbiting space station, these procedures sometimes produced entirely undesirable results. Under terrestrial conditions, liquids usually completely separate from the gas due to the buoyancy effects of Earth's gravity. Consequently, any gas pockets that are entrained into the liquid during a fluid handling procedure will eventually migrate towards the top of the vessel where they can be removed. In a low gravity environment any folded gas bubble will remain within the liquid bulk indefinitely at a location that is not known a priori resulting in a mixture of liquid and vapor.
Editorial: Datasets for Learning Analytics

NARCIS (Netherlands)

Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

2018-01-01

The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of
The Geometry of Finite Equilibrium Datasets

DEFF Research Database (Denmark)

Balasko, Yves; Tvede, Mich

We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....
Bucket handle tears of the medial meniscus: meniscal intrusion rather than meniscal extrusion

International Nuclear Information System (INIS)

Schlossberg, S.; Umans, H.; Flusser, G.; DiFelice, G.S.; Lerer, D.B.

2007-01-01

To determine the frequency of medial meniscal extrusion (MME) versus ''medial meniscal intrusion'' in the setting of bucket handle tears. Images were evaluated for previously reported risk factors for MME, including: medial meniscal root tear, radial tear, degenerative joint disease and joint effusion. Forty-one consecutive cases of bucket handle tear of the medial meniscus were reviewed by consensus by two musculoskeletal radiologists. Imaging was performed using a 1.5 GE Signa MR unit. Patient age, gender, medial meniscal root integrity, MME, medial meniscal intrusion, degenerative joint disease, effusion and anterior cruciate ligament (ACL) tear were recorded. Thirteen females and 27 males (age 12-62 years, median=30 years) were affected; one had bucket handle tear of each knee. Effusion was small in 13, moderate in 9 and large in 18. Degenerative joint disease was mild in three, moderate in two and severe in one. 26 ACL tears included three partial and three chronic. Medial meniscal root tear was complete in one case and partial thickness in two. None of the 40 cases with an intact or partially torn medial meniscal root demonstrated MME. MME of 3.1 mm was seen in the only full-thickness medial meniscal root tear, along with chronic ACL tear, moderate degenerative joint disease and large effusion. Medial meniscal intrusion of the central bucket handle fragment into the intercondylar notch was present in all 41 cases. Given an intact medial meniscal root in the setting of a ''pure'' bucket handle tear, there is no MME. (orig.)
Safety of Cargo Aircraft Handling Procedure

Directory of Open Access Journals (Sweden)

Daniel Hlavatý

2017-07-01

Full Text Available The aim of this paper is to get acquainted with the ways how to improve the safety management system during cargo aircraft handling. The first chapter is dedicated to general information about air cargo transportation. This includes the history or types of cargo aircraft handling, but also the means of handling. The second part is focused on detailed description of cargo aircraft handling, including a description of activities that are performed before and after handling. The following part of this paper covers a theoretical interpretation of safety, safety indicators and legislative provisions related to the safety of cargo aircraft handling. The fourth part of this paper analyzes the fault trees of events which might occur during handling. The factors found by this analysis are compared with safety reports of FedEx. Based on the comparison, there is a proposal on how to improve the safety management in this transportation company.
Musculoskeletal injuries resulting from patient handling tasks among hospital workers.

Science.gov (United States)

Pompeii, Lisa A; Lipscomb, Hester J; Schoenfisch, Ashley L; Dement, John M

2009-07-01

The purpose of this study was to evaluate musculoskeletal injuries and disorders resulting from patient handling prior to the implementation of a "minimal manual lift" policy at a large tertiary care medical center. We sought to define the circumstances surrounding patient handling injuries and to identify potential preventive measures. Human resources data were used to define the cohort and their time at work. Workers' compensation records (1997-2003) were utilized to identify work-related musculoskeletal claims, while the workers' description of injury was used to identify those that resulted from patient handling. Adjusted rate ratios were generated using Poisson regression. One-third (n = 876) of all musculoskeletal injuries resulted from patient handling activities. Most (83%) of the injury burden was incurred by inpatient nurses, nurses' aides and radiology technicians, while injury rates were highest for nurses' aides (8.8/100 full-time equivalent, FTEs) and smaller workgroups including emergency medical technicians (10.3/100 FTEs), patient transporters (4.3/100 FTEs), operating room technicians (3.1/100 FTEs), and morgue technicians (2.2/100 FTEs). Forty percent of injuries due to lifting/transferring patients may have been prevented through the use of mechanical lift equipment, while 32% of injuries resulting from repositioning/turning patients, pulling patients up in bed, or catching falling patients may not have been prevented by the use of lift equipment. The use of mechanical lift equipment could significantly reduce the risk of some patient handling injuries but additional interventions need to be considered that address other patient handling tasks. Smaller high-risk workgroups should not be neglected in prevention efforts.
The Changing Shape of Global Inequality 1820--2000; Exploring a New Dataset

NARCIS (Netherlands)

van Zanden, Jan Luiten|info:eu-repo/dai/nl/071115374; Baten, Joerg; Foldvari, Peter|info:eu-repo/dai/nl/323382045; van Leeuwen, Bas|info:eu-repo/dai/nl/330811924

2014-01-01

new dataset for charting the development of global inequality between 1820 and 2000 is presented, based on a large variety of sources and methods for estimating (gross household) income inequality. On this basis we estimate the evolution of global income inequality over the past two centuries. Two
Ergonomics and patient handling.

Science.gov (United States)

McCoskey, Kelsey L

2007-11-01

This study aimed to describe patient-handling demands in inpatient units during a 24-hour period at a military health care facility. A 1-day total population survey described the diverse nature and impact of patient-handling tasks relative to a variety of nursing care units, patient characteristics, and transfer equipment. Productivity baselines were established based on patient dependency, physical exertion, type of transfer, and time spent performing the transfer. Descriptions of the physiological effect of transfers on staff based on patient, transfer, and staff characteristics were developed. Nursing staff response to surveys demonstrated how patient-handling demands are impacted by the staff's physical exertion and level of patient dependency. The findings of this study describe the types of transfers occurring in these inpatient units and the physical exertion and time requirements for these transfers. This description may guide selection of the most appropriate and cost-effective patient-handling equipment required for specific units and patients.
Recent fuel handling experience in Canada

International Nuclear Information System (INIS)

Welch, A.C.

1991-01-01

For many years, good operation of the fuel handling system at Ontario Hydro's nuclear stations has been taken for granted with the unavailability of the station arising from fuel handling system-related problems usually contributing less than one percent of the total unavailability of the stations. While the situation at the newer Hydro stations continues generally to be good (with the specific exception of some units at Pickering B) some specific and some general problems have caused significant loss of availability at the older plants (Pickering A and Bruce A). Generally the experience at the 600 MWe units in Canada has also continued to be good with Point Lepreau leading the world in availability. As a result of working to correct identified deficiencies, there were some changes for the better as some items of equipment that were a chronic source of trouble were replaced with improved components. In addition, the fuel handling system has been used three times as a delivery system for large-scale non destructive examination of the pressure tubes, twice at Bruce and once at Pickering and performing these inspections this way has saved many days of reactor downtime. Under COG there are several programs to develop improved versions of some of the main assemblies of the fuelling machine head. This paper will generally cover the events relating to Pickering in more detail but will describe the problems with the Bruce Fuelling Machine Bridges since the 600 MW 1P stations have a bridge drive arrangement that is somewhat similar to Bruce
Equipment for the handling of thorium materials

International Nuclear Information System (INIS)

Heisler, S.W. Jr.; Mihalovich, G.S.

1988-01-01

The Feed Materials Production Center (FMPC) is the United States Department of Energy's storage facility for thorium. FMPC thorium handling and overpacking projects ensure the continued safe handling and storage of the thorium inventory until final disposition of the materials is determined and implemented. The handling and overpacking of the thorium materials requires the design of a system that utilizes remote handling and overpacking equipment not currently utilized at the FMPC in the handling of uranium materials. The use of remote equipment significantly reduces radiation exposure to personnel during the handling and overpacking efforts. The design system combines existing technologies from the nuclear industry, the materials processing and handling industry and the mining industry. The designed system consists of a modified fork lift truck for the transport of thorium containers, automated equipment for material identification and inventory control, and remote handling and overpacking equipment for material identification and inventory control, and remote handling and overpacking equipment for repackaging of the thorium materials
One tree to link them all: a phylogenetic dataset for the European tetrapoda.

Science.gov (United States)

Roquet, Cristina; Lavergne, Sébastien; Thuiller, Wilfried

2014-08-08

Since the ever-increasing availability of phylogenetic informative data, the last decade has seen an upsurge of ecological studies incorporating information on evolutionary relationships among species. However, detailed species-level phylogenies are still lacking for many large groups and regions, which are necessary for comprehensive large-scale eco-phylogenetic analyses. Here, we provide a dataset of 100 dated phylogenetic trees for all European tetrapods based on a mixture of supermatrix and supertree approaches. Phylogenetic inference was performed separately for each of the main Tetrapoda groups of Europe except mammals (i.e. amphibians, birds, squamates and turtles) by means of maximum likelihood (ML) analyses of supermatrix applying a tree constraint at the family (amphibians and squamates) or order (birds and turtles) levels based on consensus knowledge. For each group, we inferred 100 ML trees to be able to provide a phylogenetic dataset that accounts for phylogenetic uncertainty, and assessed node support with bootstrap analyses. Each tree was dated using penalized-likelihood and fossil calibration. The trees obtained were well-supported by existing knowledge and previous phylogenetic studies. For mammals, we modified the most complete supertree dataset available on the literature to include a recent update of the Carnivora clade. As a final step, we merged the phylogenetic trees of all groups to obtain a set of 100 phylogenetic trees for all European Tetrapoda species for which data was available (91%). We provide this phylogenetic dataset (100 chronograms) for the purpose of comparative analyses, macro-ecological or community ecology studies aiming to incorporate phylogenetic information while accounting for phylogenetic uncertainty.
How to Handle Abuse

Science.gov (United States)

... Handle Abuse KidsHealth / For Kids / How to Handle Abuse What's in this article? Tell Right Away How Do You Know Something Is Abuse? ... babysitter, teacher, coach, or a bigger kid. Child abuse can happen anywhere — at ... building. Tell Right Away A kid who is being seriously hurt ...
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain).

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

2015-01-01

Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988-1990 and 2009-2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area.
Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

Science.gov (United States)

Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

2015-01-01

Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988–1990 and 2009–2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area. PMID:25878552
Development of a SPARK Training Dataset

International Nuclear Information System (INIS)

Sayre, Amanda M.; Olson, Jarrod R.

2015-01-01

In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer
Waste Handling Building Conceptual Study

International Nuclear Information System (INIS)

G.W. Rowe

2000-01-01

The objective of the ''Waste Handling Building Conceptual Study'' is to develop proposed design requirements for the repository Waste Handling System in sufficient detail to allow the surface facility design to proceed to the License Application effort if the proposed requirements are approved by DOE. Proposed requirements were developed to further refine waste handling facility performance characteristics and design constraints with an emphasis on supporting modular construction, minimizing fuel inventory, and optimizing facility maintainability and dry handling operations. To meet this objective, this study attempts to provide an alternative design to the Site Recommendation design that is flexible, simple, reliable, and can be constructed in phases. The design concept will be input to the ''Modular Design/Construction and Operation Options Report'', which will address the overall program objectives and direction, including options and issues associated with transportation, the subsurface facility, and Total System Life Cycle Cost. This study (herein) is limited to the Waste Handling System and associated fuel staging system

Data Container Study for Handling array-based data using Hive, Spark, MongoDB, SciDB and Rasdaman

Science.gov (United States)

Xu, M.; Hu, F.; Yang, J.; Yu, M.; Yang, C. P.

2017-12-01

Geoscience communities have come up with various big data storage solutions, such as Rasdaman and Hive, to address the grand challenges for massive Earth observation data management and processing. To examine the readiness of current solutions in supporting big Earth observation, we propose to investigate and compare four popular data container solutions, including Rasdaman, Hive, Spark, SciDB and MongoDB. Using different types of spatial and non-spatial queries, datasets stored in common scientific data formats (e.g., NetCDF and HDF), and two applications (i.e. dust storm simulation data mining and MERRA data analytics), we systematically compare and evaluate the feature and performance of these four data containers in terms of data discover and access. The computing resources (e.g. CPU, memory, hard drive, network) consumed while performing various queries and operations are monitored and recorded for the performance evaluation. The initial results show that 1) the popular data container clusters are able to handle large volume of data, but their performances vary in different situations. Meanwhile, there is a trade-off between data preprocessing, disk saving, query-time saving, and resource consuming. 2) ClimateSpark, MongoDB and SciDB perform the best among all the containers in all the queries tests, and Hive performs the worst. 3) These studied data containers can be applied on other array-based datasets, such as high resolution remote sensing data and model simulation data. 4) Rasdaman clustering configuration is more complex than the others. A comprehensive report will detail the experimental results, and compare their pros and cons regarding system performance, ease of use, accessibility, scalability, compatibility, and flexibility.
Practices of Handling

DEFF Research Database (Denmark)

Ræbild, Ulla

to touch, pick up, carry, or feel with the hands. Figuratively it is to manage, deal with, direct, train, or control. Additionally, as a noun, a handle is something by which we grasp or open up something. Lastly, handle also has a Nordic root, here meaning to trade, bargain or deal. Together all four...... meanings seem to merge in the fashion design process, thus opening up for an embodied engagement with matter that entails direction giving, organizational management and negotiation. By seeing processes of handling as a key fashion methodological practice, it is possible to divert the discourse away from...... introduces four ways whereby fashion designers apply their own bodies as tools for design; a) re-activating past garment-design experiences, b) testing present garment-design experiences c) probing for new garment-design experiences and d) design of future garment experiences by body proxy. The paper...
An Annotated Dataset of 14 Meat Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Comparison of recent SnIa datasets

International Nuclear Information System (INIS)

Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

2009-01-01

We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample
SIMADL: Simulated Activities of Daily Living Dataset

Directory of Open Access Journals (Sweden)

Talal Alshammari

2018-04-01

Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.
SRV-automatic handling device

International Nuclear Information System (INIS)

Yamada, Koji

1987-01-01

Automatic handling device for the steam relief valves (SRV's) is developed in order to achieve a decrease in exposure of workers, increase in availability factor, improvement in reliability, improvement in safety of operation, and labor saving. A survey is made during a periodical inspection to examine the actual SVR handling operation. An SRV automatic handling device consists of four components: conveyor, armed conveyor, lifting machine, and control/monitoring system. The conveyor is so designed that the existing I-rail installed in the containment vessel can be used without any modification. This is employed for conveying an SRV along the rail. The armed conveyor, designed for a box rail, is used for an SRV installed away from the rail. By using the lifting machine, an SRV installed away from the I-rail is brought to a spot just below the rail so that the SRV can be transferred by the conveyor. The control/monitoring system consists of a control computer, operation panel, TV monitor and annunciator. The SRV handling device is operated by remote control from a control room. A trial equipment is constructed and performance/function testing is carried out using actual SRV's. As a result, is it shown that the SRV handling device requires only two operators to serve satisfactorily. The required time for removal and replacement of one SRV is about 10 minutes. (Nogami, K.)
Control Measure Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...
Grain Handling and Storage.

Science.gov (United States)

Harris, Troy G.; Minor, John

This text for a secondary- or postecondary-level course in grain handling and storage contains ten chapters. Chapter titles are (1) Introduction to Grain Handling and Storage, (2) Elevator Safety, (3) Grain Grading and Seed Identification, (4) Moisture Control, (5) Insect and Rodent Control, (6) Grain Inventory Control, (7) Elevator Maintenance,…
Large-Scale Pattern Discovery in Music

Science.gov (United States)

Bertin-Mahieux, Thierry

This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent.
TRANSPORT/HANDLING REQUESTS

CERN Multimedia

Groupe ST/HM

2002-01-01

A new EDH document entitled 'Transport/Handling Request' will be in operation as of Monday, 11th February 2002, when the corresponding icon will be accessible from the EDH desktop, together with the application instructions. This EDH form will replace the paper-format transport/handling request form for all activities involving the transport of equipment and materials. However, the paper form will still be used for all vehicle-hire requests. The introduction of the EDH transport/handling request form is accompanied by the establishment of the following time limits for the various services concerned: 24 hours for the removal of office items, 48 hours for the transport of heavy items (of up to 6 metric tons and of standard road width), 5 working days for a crane operation, extra-heavy transport operation or complete removal, 5 working days for all transport operations relating to LHC installation. ST/HM Group, Logistics Section Tel: 72672 - 72202
The Kinetics Human Action Video Dataset

OpenAIRE

Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

2017-01-01

We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...
Handling radioactivity: a practical approach for scientists and engineers

International Nuclear Information System (INIS)

Stewart, D.C.

1981-01-01

The aim of this book is to present an overall view in a descriptive and essentially nonmathematical way of the practicalities of handling radioactivity. It is hoped that the material will be particularly helpful to those entering the nuclear field for the first time and to those working in related areas whose responsibilities require them to have a general knowledge of the subject of radioactivity handling and its vocabulary. The presentation is primarily for bench-scale operations. There is a considerable emphasis on facilities since these are fundamental to the safe handling of active materials. Facility design and detail is also unfortunately an area where the relevant information is largely scattered through literature sources that are not accessible to most readers. Some of the topics surveyed - such as dosimetry, shielding and nuclear criticality - are extremely complex and no pretense is made that the treatment here represents more than bare bone summaries of the fields. A considerable effort has been made to cite the key references in each area where more detailed information can be found. A few additional useful references not cited directly in the text appear in an abbreviated bibliography at the end of the book
Advanced remote handling for future applications: The advanced integrated maintenance system

International Nuclear Information System (INIS)

Herndon, J.N.; Kring, C.T.; Rowe, J.C.

1986-01-01

The Consolidated Fuel Reprocessing Program at Oak Ridge National Laboratory has been developing advanced techniques for remote maintenance of future US fuel reprocessing plants. The developed technology has a wide spectrum of application for other hazardous environments. These efforts are based on the application of teleoperated, force-reflecting servomanipulators for dexterous remote handling with television viewing for large-volume hazardous applications. These developments fully address the nonrepetitive nature of remote maintenance in the unstructured environments encountered in fuel reprocessing. This paper covers the primary emphasis in the present program; the design, fabrication, installation, and operation of a prototype remote handling system for reprocessing applications, the Advanced Integrated Maintenance System
Bucket-handle meniscal tears of the knee: sensitivity and specificity of MRI signs

International Nuclear Information System (INIS)

Dorsay, Theodore A.; Helms, Clyde A.

2003-01-01

To determine the sensitivity and specificity of reported MRI signs in the evaluation of bucket-handle tears of the knee.Design and patients A retrospective analysis of 71 knee MR examinations that were read as displaying evidence of a bucket-handle or ''bucket-handle type'' tear was performed. We evaluated for the presence or absence of the absent bow tie sign, the coronal truncation sign, the double posterior cruciate ligament (PCL) sign, the anterior flipped fragment sign, and a fragment displaced into the intercondylar notch. Sensitivity and specificity were calculated relative to the gold standard of arthroscopy. Forty-three of 71 cases were surgically proven as bucket-handle tears. The absent bow tie sign demonstrated a sensitivity of 88.4%. The presence of at least one of the displaced fragment signs had a sensitivity of 90.7%. A finding of both the absent bow tie sign and one of the displaced fragment signs demonstrated a specificity of 85.7%. The double PCL sign demonstrated a specificity of 100%. The anterior flipped meniscus sign had a specificity of 89.7%. Bucket-handle tears of the menisci, reported in about 10% of most large series, have been described by several signs with MRI. This report gives the sensitivity and specificity of MRI for bucket-handle tears using each of these signs independently and in combination. MRI is shown to be very accurate for diagnosing bucket-handle tears when two or more of these signs coexist. (orig.)
Remotely controlled large container disposal methodology

International Nuclear Information System (INIS)

Amir, S.J.

1994-09-01

Remotely Handled Large Containers (RHLC), also called drag-off boxes, have been used at the Hanford Site since the 1940s to dispose of large pieces of radioactively contaminated equipment. These containers are typically large steel-reinforced concrete boxes, which weigh as much as 40 tons. Because large quantities of high-dose waste can produce radiation levels as high as 200 mrem/hour at 200 ft, the containers are remotely handled (either lifted off the railcar by crane or dragged off with a cable). Many of the existing containers do not meet existing structural and safety design criteria and some of the transportation requirements. The drag-off method of pulling the box off the railcar using a cable and a tractor is also not considered a safe operation, especially in view of past mishaps
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

Directory of Open Access Journals (Sweden)

Douglas Teodoro

Full Text Available The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

Science.gov (United States)

Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556
ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

Science.gov (United States)

Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

2018-01-01

The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.
Cross-Dataset Analysis and Visualization Driven by Expressive Web Services

Science.gov (United States)

Alexandru Dumitru, Mircea; Catalin Merticariu, Vlad

2015-04-01

The deluge of data that is hitting us every day from satellite and airborne sensors is changing the workflow of environmental data analysts and modelers. Web geo-services play now a fundamental role, and are no longer needed to preliminary download and store the data, but rather they interact in real-time with GIS applications. Due to the very large amount of data that is curated and made available by web services, it is crucial to deploy smart solutions for optimizing network bandwidth, reducing duplication of data and moving the processing closer to the data. In this context we have created a visualization application for analysis and cross-comparison of aerosol optical thickness datasets. The application aims to help researchers identify and visualize discrepancies between datasets coming from various sources, having different spatial and time resolutions. It also acts as a proof of concept for integration of OGC Web Services under a user-friendly interface that provides beautiful visualizations of the explored data. The tool was built on top of the World Wind engine, a Java based virtual globe built by NASA and the open source community. For data retrieval and processing we exploited the OGC Web Coverage Service potential: the most exciting aspect being its processing extension, a.k.a. the OGC Web Coverage Processing Service (WCPS) standard. A WCPS-compliant service allows a client to execute a processing query on any coverage offered by the server. By exploiting a full grammar, several different kinds of information can be retrieved from one or more datasets together: scalar condensers, cross-sectional profiles, comparison maps and plots, etc. This combination of technology made the application versatile and portable. As the processing is done on the server-side, we ensured that the minimal amount of data is transferred and that the processing is done on a fully-capable server, leaving the client hardware resources to be used for rendering the visualization
The development of the Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS): a large-scale data sharing initiative.

Science.gov (United States)

Lutomski, Jennifer E; Baars, Maria A E; Schalk, Bianca W M; Boter, Han; Buurman, Bianca M; den Elzen, Wendy P J; Jansen, Aaltje P D; Kempen, Gertrudis I J M; Steunenberg, Bas; Steyerberg, Ewout W; Olde Rikkert, Marcel G M; Melis, René J F

2013-01-01

In 2008, the Ministry of Health, Welfare and Sport commissioned the National Care for the Elderly Programme. While numerous research projects in older persons' health care were to be conducted under this national agenda, the Programme further advocated the development of The Older Persons and Informal Caregivers Survey Minimum DataSet (TOPICS-MDS) which would be integrated into all funded research protocols. In this context, we describe TOPICS data sharing initiative (www.topics-mds.eu). A working group drafted TOPICS-MDS prototype, which was subsequently approved by a multidisciplinary panel. Using instruments validated for older populations, information was collected on demographics, morbidity, quality of life, functional limitations, mental health, social functioning and health service utilisation. For informal caregivers, information was collected on demographics, hours of informal care and quality of life (including subjective care-related burden). Between 2010 and 2013, a total of 41 research projects contributed data to TOPICS-MDS, resulting in preliminary data available for 32,310 older persons and 3,940 informal caregivers. The majority of studies sampled were from primary care settings and inclusion criteria differed across studies. TOPICS-MDS is a public data repository which contains essential data to better understand health challenges experienced by older persons and informal caregivers. Such findings are relevant for countries where increasing health-related expenditure has necessitated the evaluation of contemporary health care delivery. Although open sharing of data can be difficult to achieve in practice, proactively addressing issues of data protection, conflicting data analysis requests and funding limitations during TOPICS-MDS developmental phase has fostered a data sharing culture. To date, TOPICS-MDS has been successfully incorporated into 41 research projects, thus supporting the feasibility of constructing a large (>30,000 observations

Cask system design guidance for robotic handling

International Nuclear Information System (INIS)

Griesmeyer, J.M.; Drotning, W.D.; Morimoto, A.K.; Bennett, P.C.

1990-10-01

Remote automated cask handling has the potential to reduce both the occupational exposure and the time required to process a nuclear waste transport cask at a handling facility. The ongoing Advanced Handling Technologies Project (AHTP) at Sandia National Laboratories is described. AHTP was initiated to explore the use of advanced robotic systems to perform cask handling operations at handling facilities for radioactive waste, and to provide guidance to cask designers regarding the impact of robotic handling on cask design. The proof-of-concept robotic systems developed in AHTP are intended to extrapolate from currently available commercial systems to the systems that will be available by the time that a repository would be open for operation. The project investigates those cask handling operations that would be performed at a nuclear waste repository facility during cask receiving and handling. The ongoing AHTP indicates that design guidance, rather than design specification, is appropriate, since the requirements for robotic handling do not place severe restrictions on cask design but rather focus on attention to detail and design for limited dexterity. The cask system design features that facilitate robotic handling operations are discussed, and results obtained from AHTP design and operation experience are summarized. The application of these design considerations is illustrated by discussion of the robot systems and their operation on cask feature mock-ups used in the AHTP project. 11 refs., 11 figs
Production management of window handles

Directory of Open Access Journals (Sweden)

Manuela Ingaldi

2014-12-01

Full Text Available In the chapter a company involved in the production of aluminum window and door handles was presented. The main customers of the company are primarily companies which produce PCV joinery and wholesalers supplying these companies. One chosen product from the research company - a single-arm pin-lift window handle - was described and its production process depicted technologically. The chapter also includes SWOT analysis conducted in the research company and the value stream of the single-arm pin-lift window handle.
7 CFR 58.443 - Whey handling.

Science.gov (United States)

2010-01-01

... 7 Agriculture 3 2010-01-01 2010-01-01 false Whey handling. 58.443 Section 58.443 Agriculture... Procedures § 58.443 Whey handling. (a) Adequate sanitary facilities shall be provided for the handling of whey. If outside, necessary precautions shall be taken to minimize flies, insects and development of...
Hypothenar hammer syndrome from ice hockey stick-handling.

Science.gov (United States)

Zayed, Mohamed A; McDonald, Joey; Tittley, Jacques G

2013-11-01

Ulnar artery thrombosis and hypothenar hammer syndrome are rare vascular complications that could potentially occur with repeated blows or trauma to the hand. Although initially reported as an occupational hazard among laborers and craftsmen, it has been observed more recently among recreationalists and athletes. Until now, it has never been reported as a complication in ice hockey players. In this case report, a 26-year-old Canadian professional ice hockey player presented with acute dominant right hand paleness, coolness, and pain with hand use. The patient used a wooden hockey stick with a large knob of tape at the end of the handle, which he regularly gripped in the palm of his right hand to help with face-offs and general stick-handling. Sonographic evaluation demonstrated no arterial flow in the distal right ulnar artery distribution, and ulnar artery occlusion with no aneurysmal degeneration was confirmed by magnetic resonance angiogram. Intraarterial thrombolytic therapy was initiated, and subsequent serial angiograms demonstrated significant improvement in distal ulnar artery flow as well as recanalization of right hand deep palmar arch and digital arteries. The patient's symptoms resolved, and he was maintained on therapeutic anticoagulation for 3 months prior to returning to playing ice hockey professionally, but with a padded glove and no tape knob at the handle tip. This case highlights a unique presentation of hockey stick-handling causing ulnar artery thrombosis that was likely from repeated palmar hypothenar trauma. Appropriate diagnostic imaging, early intraarterial thrombolysis, and postoperative surveillance and follow-up were crucial for the successful outcome in this patient. Copyright © 2013 Elsevier Inc. All rights reserved.
Preliminary analysis of the postulated changes needed to achieve rail cask handling capabilities at selected light water reactors

International Nuclear Information System (INIS)

Konzek, G.J.

1986-02-01

Reactor-specific railroad and crane information for all LWRs in the US was extracted from current sources of information. Based on this information, reactors were separated into two basic groups consisting of reactors with existing, usable rail cask capabilities and those without these capabilities. The latter group is the main focus of this study. The group of reactors without present rail cask handling capabilities was further separated into two subgroups consisting of reactors considered essentially incapable of handling a large rail cask of about 100 tons and reactors where postulated facility changes could result in rail cask handling capabilities. Based on a selected population of 127 reactors, the results of this assessment indicate that usable rail cask capabilities exist at 83 (65%) of the reactors. Twelve (27%) of the remaining 44 reactors are deemed incapable of handling a large rail cask without major changes, and 32 reactors are considered likely candidates for potentially achieving rail cask handling capabilities. In the latter group, facility changes were postulated that would conceptually enable these reactors to handle large rail casks. The estimated cost per plant of required facility changes varied widely from a high of about $35 million to a low of <$0.3 million. Only 11 of the 32 plants would require crane upgrades. Spur track and right-of-way costs would apparently vary widely among sites. These results are based on preliminary analyses using available generic cost data. They represent lower bound values that are useful for developing an initial assessment of the viability of the postulated changes on a system-wide basis, but are not intended to be absolute values for specific reactors or sites
Global-scale evaluation of 22 precipitation datasets using gauge observations and hydrological modeling

Directory of Open Access Journals (Sweden)

H. E. Beck

2017-12-01

Full Text Available We undertook a comprehensive evaluation of 22 gridded (quasi-global (sub-daily precipitation (P datasets for the period 2000–2016. Thirteen non-gauge-corrected P datasets were evaluated using daily P gauge observations from 76 086 gauges worldwide. Another nine gauge-corrected datasets were evaluated using hydrological modeling, by calibrating the HBV conceptual model against streamflow records for each of 9053 small to medium-sized ( <  50 000 km2 catchments worldwide, and comparing the resulting performance. Marked differences in spatio-temporal patterns and accuracy were found among the datasets. Among the uncorrected P datasets, the satellite- and reanalysis-based MSWEP-ng V1.2 and V2.0 datasets generally showed the best temporal correlations with the gauge observations, followed by the reanalyses (ERA-Interim, JRA-55, and NCEP-CFSR and the satellite- and reanalysis-based CHIRP V2.0 dataset, the estimates based primarily on passive microwave remote sensing of rainfall (CMORPH V1.0, GSMaP V5/6, and TMPA 3B42RT V7 or near-surface soil moisture (SM2RAIN-ASCAT, and finally, estimates based primarily on thermal infrared imagery (GridSat V1.0, PERSIANN, and PERSIANN-CCS. Two of the three reanalyses (ERA-Interim and JRA-55 unexpectedly obtained lower trend errors than the satellite datasets. Among the corrected P datasets, the ones directly incorporating daily gauge data (CPC Unified, and MSWEP V1.2 and V2.0 generally provided the best calibration scores, although the good performance of the fully gauge-based CPC Unified is unlikely to translate to sparsely or ungauged regions. Next best results were obtained with P estimates directly incorporating temporally coarser gauge data (CHIRPS V2.0, GPCP-1DD V1.2, TMPA 3B42 V7, and WFDEI-CRU, which in turn outperformed the one indirectly incorporating gauge data through another multi-source dataset (PERSIANN-CDR V1R1. Our results highlight large differences in estimation accuracy
Development of commercial robots for radwaste handling

International Nuclear Information System (INIS)

Colborn, K.A.

1988-01-01

The cost and dose burden associated with low level radwaste handling activities is a matter of increasing concern to the commercial nuclear power industry. This concern is evidenced by the fact that many utilities have begun to revaluate waste generation, handling, and disposal activities at their plants in an effort to improve their overall radwaste handling operations. This paper reports on the project Robots for Radwaste Handling, to identify the potential of robots to improve radwaste handling operations. The project has focussed on the potential of remote or automated technology to improve well defined, recognizable radwaste operations. The project focussed on repetitive, low skill level radwaste handling and decontamination tasks which involve significant radiation exposure
Insights into SCP/TAPS proteins of liver flukes based on large-scale bioinformatic analyses of sequence datasets.

Directory of Open Access Journals (Sweden)

Cinzia Cantacessi

Full Text Available BACKGROUND: SCP/TAPS proteins of parasitic helminths have been proposed to play key roles in fundamental biological processes linked to the invasion of and establishment in their mammalian host animals, such as the transition from free-living to parasitic stages and the modulation of host immune responses. Despite the evidence that SCP/TAPS proteins of parasitic nematodes are involved in host-parasite interactions, there is a paucity of information on this protein family for parasitic trematodes of socio-economic importance. METHODOLOGY/PRINCIPAL FINDINGS: We conducted the first large-scale study of SCP/TAPS proteins of a range of parasitic trematodes of both human and veterinary importance (including the liver flukes Clonorchis sinensis, Opisthorchis viverrini, Fasciola hepatica and F. gigantica as well as the blood flukes Schistosoma mansoni, S. japonicum and S. haematobium. We mined all current transcriptomic and/or genomic sequence datasets from public databases, predicted secondary structures of full-length protein sequences, undertook systematic phylogenetic analyses and investigated the differential transcription of SCP/TAPS genes in O. viverrini and F. hepatica, with an emphasis on those that are up-regulated in the developmental stages infecting the mammalian host. CONCLUSIONS: This work, which sheds new light on SCP/TAPS proteins, guides future structural and functional explorations of key SCP/TAPS molecules associated with diseases caused by flatworms. Future fundamental investigations of these molecules in parasites and the integration of structural and functional data could lead to new approaches for the control of parasitic diseases.
Assessment of radiation damage behaviour in a large collection of empirically optimized datasets highlights the importance of unmeasured complicating effects

International Nuclear Information System (INIS)

Krojer, Tobias; Delft, Frank von

2011-01-01

A retrospective analysis of radiation damage behaviour in a statistically significant number of real-life datasets is presented, in order to gauge the importance of the complications not yet measured or rigorously evaluated in current experiments, and the challenges that remain before radiation damage can be considered a problem solved in practice. The radiation damage behaviour in 43 datasets of 34 different proteins collected over a year was examined, in order to gauge the reliability of decay metrics in practical situations, and to assess how these datasets, optimized only empirically for decay, would have benefited from the precise and automatic prediction of decay now possible with the programs RADDOSE [Murray, Garman & Ravelli (2004 ▶). J. Appl. Cryst.37, 513–522] and BEST [Bourenkov & Popov (2010 ▶). Acta Cryst. D66, 409–419]. The results indicate that in routine practice the diffraction experiment is not yet characterized well enough to support such precise predictions, as these depend fundamentally on three interrelated variables which cannot yet be determined robustly and practically: the flux density distribution of the beam; the exact crystal volume; the sensitivity of the crystal to dose. The former two are not satisfactorily approximated from typical beamline information such as nominal beam size and transmission, or two-dimensional images of the beam and crystal; the discrepancies are particularly marked when using microfocus beams (<20 µm). Empirically monitoring decay with the dataset scaling B factor (Bourenkov & Popov, 2010 ▶) appears more robust but is complicated by anisotropic and/or low-resolution diffraction. These observations serve to delineate the challenges, scientific and logistic, that remain to be addressed if tools for managing radiation damage in practical data collection are to be conveniently robust enough to be useful in real time
Preoperational checkout of the remote-handled transuranic waste handling at the Waste Isolation Pilot Plant

International Nuclear Information System (INIS)

1987-09-01

This plan describes the preoperational checkout for handling Remote-Handled Transuranic (RH-TRU) Wastes from their receipt at the Waste Isolation Pilot Plant (WIPP) to their emplacement underground. This plan identifies the handling operations to be performed, personnel groups responsible for executing these operations, and required equipment items. In addition, this plan describes the quality assurance that will be exercised throughout the checkout, and finally, it establishes criteria by which to measure the success of the checkout. 7 refs., 5 figs
Monitoring of airborne contamination during the handling of technetium-99m and radioiodine

International Nuclear Information System (INIS)

Eadie, A.S.; Horton, P.W.; Hilditch, T.E.

1980-01-01

Measurements have been made using an air sampler to measure airborne radioactivity produced during the routine handling of large activities of technetium-99m, iodine-125 and iodine-131. The results indicate that 99 Tcsup(m) can be safely handled in environments without direct exhaust of the ventilated air but that 125 I and 131 I should always be handled in a ventilated environment such as a fume cupboard or a down-draught work-station of the total-exhaust type. Monitoring of thyroid uptake proves to be the most reliable means of monitoring airborne contamination by these radionuclides, but burdens and radiation doses for typical procedures are well within the maximum permissible limits of the Code of Practice for the Protection of Persons from Ionizing Radiation arising from Medical and Dental Use. (author)
Handling of Multimedia Files in the Invenio Software

CERN Document Server

Oltmanns, Björn; Schiefer, Bernhard

‘Handling of multimedia files in the Invenio Software’ is motivated by the need for integration of multimedia files in the open-source, large-scale digital library software Invenio, developed and used at CERN, the European Organisation for Nuclear Research. In the last years, digital assets like pictures, presentations podcasts and videos became abundant in these systems and digital libraries have grown out of their classic role of only storing bibliographical metadata. The thesis focuses on digital video as a type of multimedia and covers the complete workflow of handling video material in the Invenio software: from the ingestion of digital video material to its processing on to the storage and preservation and finally the streaming and presentation of videos to the user. The potential technologies to realise a video submission workflow are discussed in-depth and evaluated towards system integration with Invenio. The focus is set on open and free technologies, which can be redistributed with the Inve...
Influence of visual feedback on human task performance in ITER remote handling

NARCIS (Netherlands)

Schropp, Gwendolijn Y R; Heemskerk, Cock J M; Kappers, Astrid M L; Bergmann Tiest, Wouter M; Elzendoorn, Ben S Q; Bult, David

In ITER, maintenance operations will be largely performed by remote handling (RH). Before ITER can be put into operation, safety regulations and licensing authorities require proof of maintainability for critical components. Part of the proof will come from using standard components and procedures.
Preference Handling for Artificial Intelligence

OpenAIRE

Goldsmith, Judy; University of Kentucky; Junker, Ulrich; ILOG

2009-01-01

This article explains the benefits of preferences for AI systems and draws a picture of current AI research on preference handling. It thus provides an introduction to the topics covered by this special issue on preference handling.
Religious Serpent Handling and Community Relations.

Science.gov (United States)

Williamson, W Paul; Hood, Ralph W

2015-01-01

Christian serpent handling sects of Appalachia comprise a community that has long been mischaracterized and marginalized by the larger communities surrounding them. To explore this dynamic, this article traces the emergence of serpent handling in Appalachia and the emergence of anti-serpent-handling state laws, which eventually failed to curb the practice, as local communities gave serpent handling groups support. We present two studies to consider for improving community relations with serpent handling sects. In study 1, we present data relating the incidence of reported serpent-bite deaths with the rise of anti-serpent-handling laws and their eventual abatement, based on increasing acceptance of serpent handlers by the larger community. Study 2 presents interview data on serpent bites and death that provide explanations for these events from the cultural and religious perspective. We conclude that first-hand knowledge about serpent handlers, and other marginalized groups, helps to lessen suspicion and allows them to be seen as not much different, which are tendencies that are important for promoting inter-community harmony.
Handling device for nuclear fuel assemblies and assembly appropriate for such a device

International Nuclear Information System (INIS)

Cransac, J.P.; Jaquelin, R.; Renaux, C.

1985-01-01

The handling device comprises a guide tube of which axis is vertical, in which a grab moves, hanging from a chain, under the action of a back-geared motor. The grab being stopped in its rotation in the guide tube, an assembly can be gripped with a bayonet system while controlling the rotation of the grab - guide tube system a back-geared motor. The device can be hanged from the small or large rotating plug of a fast neutron reactor. It can be used in a handling flask [fr
Meta-Analysis of High-Throughput Datasets Reveals Cellular Responses Following Hemorrhagic Fever Virus Infection

Directory of Open Access Journals (Sweden)

Gavin C. Bowick

2011-05-01

Full Text Available The continuing use of high-throughput assays to investigate cellular responses to infection is providing a large repository of information. Due to the large number of differentially expressed transcripts, often running into the thousands, the majority of these data have not been thoroughly investigated. Advances in techniques for the downstream analysis of high-throughput datasets are providing additional methods for the generation of additional hypotheses for further investigation. The large number of experimental observations, combined with databases that correlate particular genes and proteins with canonical pathways, functions and diseases, allows for the bioinformatic exploration of functional networks that may be implicated in replication or pathogenesis. Herein, we provide an example of how analysis of published high-throughput datasets of cellular responses to hemorrhagic fever virus infection can generate additional functional data. We describe enrichment of genes involved in metabolism, post-translational modification and cardiac damage; potential roles for specific transcription factors and a conserved involvement of a pathway based around cyclooxygenase-2. We believe that these types of analyses can provide virologists with additional hypotheses for continued investigation.
Safe handling of tritium

International Nuclear Information System (INIS)

1991-01-01

The main objective of this publication is to provide practical guidance and recommendations on operational radiation protection aspects related to the safe handling of tritium in laboratories, industrial-scale nuclear facilities such as heavy-water reactors, tritium removal plants and fission fuel reprocessing plants, and facilities for manufacturing commercial tritium-containing devices and radiochemicals. The requirements of nuclear fusion reactors are not addressed specifically, since there is as yet no tritium handling experience with them. However, much of the material covered is expected to be relevant to them as well. Annex III briefly addresses problems in the comparatively small-scale use of tritium at universities, medical research centres and similar establishments. However, the main subject of this publication is the handling of larger quantities of tritium. Operational aspects include designing for tritium safety, safe handling practice, the selection of tritium-compatible materials and equipment, exposure assessment, monitoring, contamination control and the design and use of personal protective equipment. This publication does not address the technologies involved in tritium control and cleanup of effluents, tritium removal, or immobilization and disposal of tritium wastes, nor does it address the environmental behaviour of tritium. Refs, figs and tabs
Fluxnet Synthesis Dataset Collaboration Infrastructure

Energy Technology Data Exchange (ETDEWEB)

Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

2008-02-06

The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.
7 CFR 926.9 - Handle.

Science.gov (United States)

2010-01-01

... the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Marketing Agreements and Orders; Fruits, Vegetables, Nuts), DEPARTMENT OF AGRICULTURE DATA COLLECTION, REPORTING AND RECORDKEEPING REQUIREMENTS APPLICABLE TO CRANBERRIES NOT SUBJECT TO THE CRANBERRY MARKETING ORDER § 926.9 Handle. Handle...

A Novel Technique for Time-Centric Analysis of Massive Remotely-Sensed Datasets

Directory of Open Access Journals (Sweden)

Glenn E. Grant

2015-04-01

Full Text Available Analyzing massive remotely-sensed datasets presents formidable challenges. The volume of satellite imagery collected often outpaces analytical capabilities, however thorough analyses of complete datasets may provide new insights into processes that would otherwise be unseen. In this study we present a novel, object-oriented approach to storing, retrieving, and analyzing large remotely-sensed datasets. The objective is to provide a new structure for scalable storage and rapid, Internet-based analysis of climatology data. The concept of a “data rod” is introduced, a conceptual data object that organizes time-series information into a temporally-oriented vertical column at any given location. To demonstrate one possible use, we ingest 25 years of Greenland imagery into a series of pure-object databases, then retrieve and analyze the data. The results provide a basis for evaluating the database performance and scientific analysis capabilities. The project succeeds in demonstrating the effectiveness of the prototype database architecture and analysis approach, not because new scientific information is discovered, but because quality control issues are revealed in the source data that had gone undetected for years.
Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

Science.gov (United States)

Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

2017-01-01

During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.
Simulation of Smart Home Activity Datasets

Directory of Open Access Journals (Sweden)

Jonathan Synnott

2015-06-01

Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.
Simulation of Smart Home Activity Datasets.

Science.gov (United States)

Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

2015-06-16

A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.
USER FRIENDLY OPEN GIS TOOL FOR LARGE SCALE DATA ASSIMILATION – A CASE STUDY OF HYDROLOGICAL MODELLING

Directory of Open Access Journals (Sweden)

P. K. Gupta

2012-08-01

Full Text Available Open source software (OSS coding has tremendous advantages over proprietary software. These are primarily fuelled by high level programming languages (JAVA, C++, Python etc... and open source geospatial libraries (GDAL/OGR, GEOS, GeoTools etc.. Quantum GIS (QGIS is a popular open source GIS package, which is licensed under GNU GPL and is written in C++. It allows users to perform specialised tasks by creating plugins in C++ and Python. This research article emphasises on exploiting this capability of QGIS to build and implement plugins across multiple platforms using the easy to learn – Python programming language. In the present study, a tool has been developed to assimilate large spatio-temporal datasets such as national level gridded rainfall, temperature, topographic (digital elevation model, slope, aspect, landuse/landcover and multi-layer soil data for input into hydrological models. At present this tool has been developed for Indian sub-continent. An attempt is also made to use popular scientific and numerical libraries to create custom applications for digital inclusion. In the hydrological modelling calibration and validation are important steps which are repetitively carried out for the same study region. As such the developed tool will be user friendly and used efficiently for these repetitive processes by reducing the time required for data management and handling. Moreover, it was found that the developed tool can easily assimilate large dataset in an organised manner.
Solar Integration National Dataset Toolkit | Grid Modernization | NREL

Science.gov (United States)

Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data
Protecting worker health and safety using remote handling systems

International Nuclear Information System (INIS)

Dennison, D.K.; Merrill, R.D.; Reed, R.K.

1995-03-01

Lawrence Livermore National Laboratory (LLNL) is currently developing and installing two large-scale, remotely controlled systems for use in improving worker health and safety by minimizing exposure to hazardous and radioactive materials. The first system is a full-scale liquid feed system for use in delivering chemical reagents to LLNL's existing aqueous low-level radioactive and mixed waste treatment facility (Tank Farm). The Tank Farm facility is used to remove radioactive and toxic materials in aqueous wastes prior to discharge to the City of Livermore Water Reclamation Plant (LWRP), in accordance with established discharge limits. Installation of this new reagent feed system improves operational safety and process efficiency by eliminating the need to manually handle reagents used in the treatment processes. This was done by installing a system that can inject precisely metered amounts of various reagents into the treatment tanks and can be controlled either remotely or locally via a programmable logic controller (PLC). The second system uses a robotic manipulator to remotely handle, characterize, process, sort, and repackage hazardous wastes containing tritium. This system uses an IBM-developed gantry robot mounted within a special glove box enclosure designed to isolate tritiated wastes from system operators and minimize the potential for release of tritium to the atmosphere. Tritiated waste handling is performed remotely, using the robot in a teleoperational mode for one-of-a-kind functions and in an autonomous mode for repetitive operations. The system is compatible with an existing portable gas cleanup unit designed to capture any gas-phase tritium inadvertently released into the glove box during waste handling
PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

Directory of Open Access Journals (Sweden)

E. Hietanen

2016-06-01

Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.
Operational Aspects of Dealing with the Large BaBar Data Set

Energy Technology Data Exchange (ETDEWEB)

Trunov, Artem G

2003-06-13

To date, the BaBar experiment has stored over 0.7PB of data in an Objectivity/DB database. Approximately half this data-set comprises simulated data of which more than 70% has been produced at more than 20 collaborating institutes outside of SLAC. The operational aspects of managing such a large data set and providing access to the physicists in a timely manner is a challenging and complex problem. We describe the operational aspects of managing such a large distributed data-set as well as importing and exporting data from geographically spread BaBar collaborators. We also describe problems common to dealing with such large datasets.
Wind Integration National Dataset Toolkit | Grid Modernization | NREL

Science.gov (United States)

Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND
Risk behaviours among internet-facilitated sex workers: evidence from two new datasets.

Science.gov (United States)

Cunningham, Scott; Kendall, Todd D

2010-12-01

Sex workers have historically played a central role in STI outbreaks by forming a core group for transmission and due to their higher rates of concurrency and inconsistent condom usage. Over the past 15 years, North American commercial sex markets have been radically reorganised by internet technologies that channelled a sizeable share of the marketplace online. These changes may have had a meaningful impact on the role that sex workers play in STI epidemics. In this study, two new datasets documenting the characteristics and practices of internet-facilitated sex workers are presented and analysed. The first dataset comes from a ratings website where clients share detailed information on over 94,000 sex workers in over 40 cities between 1999 and 2008. The second dataset reflects a year-long field survey of 685 sex workers who advertise online. Evidence from these datasets suggests that internet-facilitated sex workers are dissimilar from the street-based workers who largely populated the marketplace in earlier eras. Differences in characteristics and practices were found which suggest a lower potential for the spread of STIs among internet-facilitated sex workers. The internet-facilitated population appears to include a high proportion of sex workers who are well-educated, hold health insurance and operate only part time. They also engage in relatively low levels of risky sexual practices.
An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

Directory of Open Access Journals (Sweden)

Hubbard Alan E

2010-06-01

Full Text Available Abstract Background As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited. Results Using a multiple sclerosis (MS case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD, and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, MPHOSPH9, CTNNA3, PHACTR2 and IL7, by RF analysis and warrant further follow-up in independent studies. Conclusions This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.
MRI of meniscal bucket-handle tears

Energy Technology Data Exchange (ETDEWEB)

Magee, T.H.; Hinson, G.W. [Menorah Medical Center, Overland Park, KS (United States). Dept. of Radiology

1998-09-01

A meniscal bucket-handle tear is a tear with an attached fragment displaced from the meniscus of the knee joint. Low sensitivity of MRI for detection of bucket-handle tears (64% as compared with arthroscopy) has been reported previously. We report increased sensitivity for detecting bucket-handle tears with the use of coronal short tau inversion recovery (STIR) images. Results. By using four criteria for diagnosis of meniscal bucket-handle tears, our overall sensitivity compared with arthroscopy was 93% (28 of 30 meniscal bucket-handle tears seen at arthroscopy were detected by MRI). The meniscal fragment was well visualized in all 28 cases on coronal STIR images. The double posterior cruciate ligament sign was seen in 8 of 30 cases, the flipped meniscus was seen in 10 of 30 cases and a fragment in the intercondylar notch was seen in 18 of 30 cases. (orig.)
The multiple imputation method: a case study involving secondary data analysis.

Science.gov (United States)

Walani, Salimah R; Cleland, Charles M

2015-05-01

To illustrate with the example of a secondary data analysis study the use of the multiple imputation method to replace missing data. Most large public datasets have missing data, which need to be handled by researchers conducting secondary data analysis studies. Multiple imputation is a technique widely used to replace missing values while preserving the sample size and sampling variability of the data. The 2004 National Sample Survey of Registered Nurses. The authors created a model to impute missing values using the chained equation method. They used imputation diagnostics procedures and conducted regression analysis of imputed data to determine the differences between the log hourly wages of internationally educated and US-educated registered nurses. The authors used multiple imputation procedures to replace missing values in a large dataset with 29,059 observations. Five multiple imputed datasets were created. Imputation diagnostics using time series and density plots showed that imputation was successful. The authors also present an example of the use of multiple imputed datasets to conduct regression analysis to answer a substantive research question. Multiple imputation is a powerful technique for imputing missing values in large datasets while preserving the sample size and variance of the data. Even though the chained equation method involves complex statistical computations, recent innovations in software and computation have made it possible for researchers to conduct this technique on large datasets. The authors recommend nurse researchers use multiple imputation methods for handling missing data to improve the statistical power and external validity of their studies.
Geostatistics for Large Datasets

KAUST Repository

Sun, Ying

2011-10-31

Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.
Geostatistics for Large Datasets

KAUST Repository

Sun, Ying; Li, Bo; Genton, Marc G.

2011-01-01

Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.
The natural angle between the hand and handle and the effect of handle orientation on wrist radial/ulnar deviation during maximal push exertions.

Science.gov (United States)

Young, Justin G; Lin, Jia-Hua; Chang, Chien-Chi; McGorry, Raymond W

2013-01-01

The purpose of this experiment was to quantify the natural angle between the hand and a handle, and to investigate three design factors: handle rotation, handle tilt and between-handle width on the natural angle as well as resultant wrist radial/ulnar deviation ('RUD') for pushing tasks. Photographs taken of the right upper limb of 31 participants (14 women and 17 men) performing maximal seated push exertions on different handles were analysed. Natural hand/handle angle and RUD were assessed. It was found that all of the three design factors significantly affected natural handle angle and wrist RUD, but participant gender did not. The natural angle between the hand and the cylindrical handle was 65 ± 7°. Wrist deviation was reduced for handles that were rotated 0° (horizontal) and at the narrow width (31 cm). Handles that were tilted forward 15° reduced radial deviation consistently (12-13°) across handle conditions. Manual materials handling (MMH) tasks involving pushing have been related to increased risk of musculoskeletal injury. This study shows that handle orientation influences hand and wrist posture during pushing, and suggests that the design of push handles on carts and other MMH aids can be improved by adjusting their orientation to fit the natural interface between the hand and handle.
9 CFR 3.118 - Handling.

Science.gov (United States)

2010-01-01

... 9 Animals and Animal Products 1 2010-01-01 2010-01-01 false Handling. 3.118 Section 3.118 Animals and Animal Products ANIMAL AND PLANT HEALTH INSPECTION SERVICE, DEPARTMENT OF AGRICULTURE ANIMAL WELFARE STANDARDS Specifications for the Humane Handling, Care, Treatment, and Transportation of Marine...
Safety measuring for sodium handling

Energy Technology Data Exchange (ETDEWEB)

Jeong, Ji Young; Jeong, K C; Kim, T J; Kim, B H; Choi, J H

2001-09-01

This is the report for the safety measures of sodium handling. These contents are prerequisites for the development of sodium technology and thus the workers participate in sodium handling and experiments have to know them perfectly. As an appendix, the relating parts of the laws are presented.
A New Outlier Detection Method for Multidimensional Datasets

KAUST Repository

Abdel Messih, Mario A.

2012-07-01

This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

Collision free pick up and movement of large objects

International Nuclear Information System (INIS)

Drotning, W.D.; McKee, G.R.

1998-01-01

An automated system is described for the sensor-based precision docking and manipulation of large objects. Past work in the remote handling of large nuclear waste containers is extensible to the problems associated with the handling of large objects such as coils of flat steel in industry. Computer vision and ultrasonic proximity sensing as described here are used to control the precision docking of large objects, and swing damped motion control of overhead cranes is used to control the position of the pick up device and suspended payload during movement. Real-time sensor processing and model-based control are used to accurately position payloads
NP-PAH Interaction Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...
A dataset on tail risk of commodities markets.

Science.gov (United States)

Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

2017-12-01

This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.
Asthma, guides for diagnostic and handling

International Nuclear Information System (INIS)

Salgado, Carlos E; Caballero A, Andres S; Garcia G, Elizabeth

1999-01-01

The paper defines the asthma, includes topics as diagnostic, handling of the asthma, special situations as asthma and pregnancy, handling of the asthmatic patient's perioperatory and occupational asthma
Robotic liquid handling and automation in epigenetics.

Science.gov (United States)

Gaisford, Wendy

2012-10-01

Automated liquid-handling robots and high-throughput screening (HTS) are widely used in the pharmaceutical industry for the screening of large compound libraries, small molecules for activity against disease-relevant target pathways, or proteins. HTS robots capable of low-volume dispensing reduce assay setup times and provide highly accurate and reproducible dispensing, minimizing variation between sample replicates and eliminating the potential for manual error. Low-volume automated nanoliter dispensers ensure accuracy of pipetting within volume ranges that are difficult to achieve manually. In addition, they have the ability to potentially expand the range of screening conditions from often limited amounts of valuable sample, as well as reduce the usage of expensive reagents. The ability to accurately dispense lower volumes provides the potential to achieve a greater amount of information than could be otherwise achieved using manual dispensing technology. With the emergence of the field of epigenetics, an increasing number of drug discovery companies are beginning to screen compound libraries against a range of epigenetic targets. This review discusses the potential for the use of low-volume liquid handling robots, for molecular biological applications such as quantitative PCR and epigenetics.
Automatic liquid handling for life science: a critical review of the current state of the art.

Science.gov (United States)

Kong, Fanwei; Yuan, Liang; Zheng, Yuan F; Chen, Weidong

2012-06-01

Liquid handling plays a pivotal role in life science laboratories. In experiments such as gene sequencing, protein crystallization, antibody testing, and drug screening, liquid biosamples frequently must be transferred between containers of varying sizes and/or dispensed onto substrates of varying types. The sample volumes are usually small, at the micro- or nanoliter level, and the number of transferred samples can be huge when investigating large-scope combinatorial conditions. Under these conditions, liquid handling by hand is tedious, time-consuming, and impractical. Consequently, there is a strong demand for automated liquid-handling methods such as sensor-integrated robotic systems. In this article, we survey the current state of the art in automatic liquid handling, including technologies developed by both industry and research institutions. We focus on methods for dealing with small volumes at high throughput and point out challenges for future advancements.
Remote-handled transuranic system assessment appendices. Volume 2

Energy Technology Data Exchange (ETDEWEB)

NONE

1995-11-01

Volume 2 of this report contains six appendices to the report: Inventory and generation of remote-handled transuranic waste; Remote-handled transuranic waste site storage; Characterization of remote-handled transuranic waste; RH-TRU waste treatment alternatives system analysis; Packaging and transportation study; and Remote-handled transuranic waste disposal alternatives.
Remote-handled transuranic system assessment appendices. Volume 2

International Nuclear Information System (INIS)

1995-11-01

Volume 2 of this report contains six appendices to the report: Inventory and generation of remote-handled transuranic waste; Remote-handled transuranic waste site storage; Characterization of remote-handled transuranic waste; RH-TRU waste treatment alternatives system analysis; Packaging and transportation study; and Remote-handled transuranic waste disposal alternatives
The sound of migration: exploring data sonification as a means of interpreting multivariate salmon movement datasets

Directory of Open Access Journals (Sweden)

Jens C. Hegg

2018-02-01

Full Text Available The migration of Pacific salmon is an important part of functioning freshwater ecosystems, but as populations have decreased and ecological conditions have changed, so have migration patterns. Understanding how the environment, and human impacts, change salmon migration behavior requires observing migration at small temporal and spatial scales across large geographic areas. Studying these detailed fish movements is particularly important for one threatened population of Chinook salmon in the Snake River of Idaho whose juvenile behavior may be rapidly evolving in response to dams and anthropogenic impacts. However, exploring movement data sets of large numbers of salmon can present challenges due to the difficulty of visualizing the multivariate, time-series datasets. Previous research indicates that sonification, representing data using sound, has the potential to enhance exploration of multivariate, time-series datasets. We developed sonifications of individual fish movements using a large dataset of salmon otolith microchemistry from Snake River Fall Chinook salmon. Otoliths, a balance and hearing organ in fish, provide a detailed chemical record of fish movements recorded in the tree-like rings they deposit each day the fish is alive. This data represents a scalable, multivariate dataset of salmon movement ideal for sonification. We tested independent listener responses to validate the effectiveness of the sonification tool and mapping methods. The sonifications were presented in a survey to untrained listeners to identify salmon movements with increasingly more fish, with and without visualizations. Our results showed that untrained listeners were most sensitive to transitions mapped to pitch and timbre. Accuracy results were non-intuitive; in aggregate, respondents clearly identified important transitions, but individual accuracy was low. This aggregate effect has potential implications for the use of sonification in the context of crowd
Proteomics dataset

DEFF Research Database (Denmark)

Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

2017-01-01

patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....
PARTOS - Passive and Active Ray TOmography Software: description and preliminary analysis using TOMO-ETNA experiment’s dataset

Directory of Open Access Journals (Sweden)

Alejandro Díaz-Moreno

2016-09-01

Full Text Available In this manuscript we present the new friendly seismic tomography software based on joint inversion of active and passive seismic sources called PARTOS (Passive Active Ray TOmography Software. This code has been developed on the base of two well-known widely used tomographic algorithms (LOTOS and ATOM-3D, providing a robust set of algorithms. The dataset used to set and test the program has been provided by TOMO-ETNA experiment. TOMO-ETNA database is a large, high-quality dataset that includes active and passive seismic sources recorded during a period of 4 months in 2014. We performed a series of synthetic tests in order to estimate the resolution and robustness of the solutions. Real data inversion has been carried out using 3 different subsets: i active data; ii passive data; and iii joint dataset. Active database is composed by a total of 16,950 air-gun shots during 1 month and passive database includes 452 local and regional earthquakes recorded during 4 months. This large dataset provides a high ray density within the study region. The combination of active and passive seismic data, together with the high quality of the database, permits to obtain a new tomographic approach of the region under study never done before. An additional user-guide of PARTOS software is provided in order to facilitate the implementation for new users.
A curated transcriptome dataset collection to investigate the functional programming of human hematopoietic cells in early life.

Science.gov (United States)

Rahman, Mahbuba; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Cugno, Chiara; Chaussabel, Damien; Marr, Nico

2016-01-01

Compendia of large-scale datasets made available in public repositories provide an opportunity to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to research investigators for interpretation. Here we make available a collection of transcriptome datasets to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom web application called the Gene Expression Browser (GXB), which was designed for interactive query and visualization of integrated large-scale data. Quality control checks were performed. Multiple sample groupings and gene rank lists were created allowing users to reveal age-related differences in transcriptome profiles, changes in the gene expression of neonatal hematopoietic cells to a variety of immune stimulators and modulators, as well as during cell differentiation. Available demographic, clinical, and cell phenotypic information can be overlaid with the gene expression data and used to sort samples. Web links to customized graphical views can be generated and subsequently inserted in manuscripts to report novel findings. GXB also enables browsing of a single gene across projects, thereby providing new perspectives on age- and developmental stage-specific expression of a given gene across the human hematopoietic system. This dataset collection is available at: http://developmentalimmunology.gxbsidra.org/dm3/geneBrowser/list.
DDOS ATTACK DETECTION SIMULATION AND HANDLING MECHANISM

Directory of Open Access Journals (Sweden)

Ahmad Sanmorino

2013-11-01

Full Text Available In this study we discuss how to handle DDoS attack that coming from the attacker by using detection method and handling mechanism. Detection perform by comparing number of packets and number of flow. Whereas handling mechanism perform by limiting or drop the packets that detected as a DDoS attack. The study begins with simulation on real network, which aims to get the real traffic data. Then, dump traffic data obtained from the simulation used for detection method on our prototype system called DASHM (DDoS Attack Simulation and Handling Mechanism. From the result of experiment that has been conducted, the proposed method successfully detect DDoS attack and handle the incoming packet sent by attacker.
An integrated pan-tropical biomass map using multiple reference datasets.

Science.gov (United States)

Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

2016-04-01

We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. © 2015 John Wiley & Sons Ltd.
National Hydrography Dataset (NHD)

Data.gov (United States)

Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...
The Harvard organic photovoltaic dataset.

Science.gov (United States)

Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-09-27

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.
Ergonomics of disposable handles for minimally invasive surgery.

Science.gov (United States)

Büchel, D; Mårvik, R; Hallabrin, B; Matern, U

2010-05-01

The ergonomic deficiencies of currently available minimally invasive surgery (MIS) instrument handles have been addressed in many studies. In this study, a new ergonomic pistol handle concept, realized as a prototype, and two disposable ring handles were investigated according to ergonomic properties set by new European standards. In this study, 25 volunteers performed four practical tasks to evaluate the ergonomics of the handles used in standard operating procedures (e.g., measuring a suture and cutting to length, precise maneuvering and targeting, and dissection of a gallbladder). Moreover, 20 participants underwent electromyography (EMG) tests to measure the muscle strain they experienced while carrying out the basic functions (grasp, rotate, and maneuver) in the x, y, and z axes. The data measured included the number of errors, the time required for task completion, perception of pressure areas, and EMG data. The values for usability in the test were effectiveness, efficiency, and user satisfaction. Surveys relating to the subjective rating were completed after each task for each of the three handles tested. Each handle except the new prototype caused pressure areas and pain. Extreme differences in muscle strain could not be observed for any of the three handles. Experienced surgeons worked more quickly with the prototype when measuring and cutting a suture (approximately 20%) and during precise maneuvering and targeting (approximately 20%). On the other hand, they completed the dissection task faster with the handle manufactured by Ethicon. Fewer errors were made with the prototype in dissection of the gallbladder. In contrast to the handles available on the market, the prototype was always rated as positive by the volunteers in the subjective surveys. None of the handles could fulfil all of the requirements with top scores. Each handle had its advantages and disadvantages. In contrast to the ring handles, the volunteers could fulfil most of the tasks more
GLEAM version 3: Global Land Evaporation Datasets and Model

Science.gov (United States)

Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

2015-12-01

Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ
The LHCb Data Management System

CERN Multimedia

CERN. Geneva

2012-01-01

We shall describe all the tools that are available for Data Management, from handling of large datasets to basic tools for users as well as for monitoring the dynamic behaviour of LHCb Storage capacity.
Radiological safety aspects of handling plutonium

International Nuclear Information System (INIS)

Sundararajan, A.R.

2016-01-01

Department of Atomic Energy in its scheme of harnessing the nuclear energy for electrical power generation and strategic applications has given a huge role to utilization of plutonium. In the power production programme, fast reactors with plutonium as fuel are expected to play a major role. This would require establishing fuel reprocessing plants to handle both thermal and fast reactor fuels. So in the nuclear fuel cycle facilities variety of chemical, metallurgical, mechanical operations have to be carried out involving significant inventories of "2"3"9 Pu and associated radionuclides. Plutonium is the most radiotoxic radionuclide and therefore any facility handling it has to be designed and operated with utmost care. Two problems of major concern in the protection of persons working in plutonium handling facilities are the internal exposure to the operating personnel from uptake of plutonium and transplutonic nuclides as they are highly radiotoxic and the radiation exposure of hands and eye lens during fuel fabrication operations especially while handling recycled high burn up plutonium. In view of the fact that annual limit for intake is very small for "2"3"9Pu and its radiation emission characteristics are such that it is a huge challenge for the health physicists to detect Pu in air and in workers. This paper discusses the principles and practices followed in providing radiological surveillance to workers in plutonium handling areas. The challenges in protecting the workers from receiving exposures to hands and eye lens in handling high burn up plutonium are also discussed. The sites having Pu fuel cycle facilities should have trained medical staff to handle cases involving excessive intake of plutonium. (author)

Tables and figure datasets

Data.gov (United States)

U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....
Enclosure for handling high activity materials

International Nuclear Information System (INIS)

Jimeno de Osso, F.

1977-01-01

One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With this purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. In this report a description is given of those aspects that may be of interest to people working in this field. (Author)
Enclosure for handling high activity materials

Energy Technology Data Exchange (ETDEWEB)

Jimeno de Osso, F

1977-07-01

One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With this purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. In this report a description is given of those aspects that may be of interest to people working in this field. (Author)
Collecting big datasets of human activity one checkin at a time

OpenAIRE

Hossmann, Theus; Efstratiou, Christos; Mascolo, Cecilia

2012-01-01

A variety of cutting edge applications for mobile phones exploit the availability of phone sensors to accurately infer the user activity and location to offer more effective services. To validate and evaluate these new applications, appropriate and extensive datasets are needed: in particular, large sets of traces of sensor data (accelerometer, GPS, micro- phone, etc.), labelled with corresponding user activities. So far, such traces have only been collected in short-lived, small-scale setups...
Ultrafast superpixel segmentation of large 3D medical datasets

Science.gov (United States)

Leblond, Antoine; Kauffmann, Claude

2016-03-01

Even with recent hardware improvements, superpixel segmentation of large 3D medical images at interactive speed (Gauss-Seidel like acceleration. The work unit partitioning scheme will however vary on odd- and even-numbered iterations to reduce convergence barriers. Synchronization will be ensured by an 8-step 3D variant of the traditional Red Black Ordering scheme. An attack model and early termination will also be described and implemented as additional acceleration techniques. Using our hybrid framework and typical operating parameters, we were able to compute the superpixels of a high-resolution 512x512x512 aortic angioCT scan in 283 ms using a AMD R9 290X GPU. We achieved a 22.3X speed-up factor compared to the published reference GPU implementation.
Specialization and Flexibility in Port Cargo Handling

Directory of Open Access Journals (Sweden)

Hakkı KİŞİ

2016-11-01

Full Text Available Cargo handling appears to be the fundamental function of ports. In this context, the question of type of equipment and capacity rate need to be tackled with respect to cargo handling principles. The purpose of this study is to discuss the types of equipment to be used in ports, relating the matter to costs and capacity. The question is studied with a basic economic theoretical approach. Various conditions like port location, size, resources, cargo traffic, ships, etc. are given parameters to dictate the type and specification of the cargo handling equipment. Besides, a simple approach in the context of cost capacity relation can be useful in deciding whether to use specialized or flexible equipment. Port equipment is sometimes expected to be flexible to handle various types of cargo as many as possible and sometimes to be specialized to handle one specific type of cargo. The cases that might be suitable for those alternatives are discussed from an economic point of view in this article. Consequently, effectiveness and efficiency criteria play important roles in determining the handling equipment in ports.
PHYSICS PERFORMANCE AND DATASET (PPD)

CERN Multimedia

L. Silvestris

2013-01-01

The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...
HMSRP Hawaiian Monk Seal Handling Data

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — This data set contains records for all handling and measurement of Hawaiian monk seals since 1981. Live seals are handled and measured during a variety of events...
Integrated Surface Dataset (Global)

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...
Aaron Journal article datasets

Data.gov (United States)

U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....
Market Squid Ecology Dataset

Data.gov (United States)

National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....
BILGO: Bilateral greedy optimization for large scale semidefinite programming

KAUST Repository

Hao, Zhifeng; Yuan, Ganzhao; Ghanem, Bernard

2013-01-01

Many machine learning tasks (e.g. metric and manifold learning problems) can be formulated as convex semidefinite programs. To enable the application of these tasks on a large-scale, scalability and computational efficiency are considered as desirable properties for a practical semidefinite programming algorithm. In this paper, we theoretically analyze a new bilateral greedy optimization (denoted BILGO) strategy in solving general semidefinite programs on large-scale datasets. As compared to existing methods, BILGO employs a bilateral search strategy during each optimization iteration. In such an iteration, the current semidefinite matrix solution is updated as a bilateral linear combination of the previous solution and a suitable rank-1 matrix, which can be efficiently computed from the leading eigenvector of the descent direction at this iteration. By optimizing for the coefficients of the bilateral combination, BILGO reduces the cost function in every iteration until the KKT conditions are fully satisfied, thus, it tends to converge to a global optimum. In fact, we prove that BILGO converges to the global optimal solution at a rate of O(1/k), where k is the iteration counter. The algorithm thus successfully combines the efficiency of conventional rank-1 update algorithms and the effectiveness of gradient descent. Moreover, BILGO can be easily extended to handle low rank constraints. To validate the effectiveness and efficiency of BILGO, we apply it to two important machine learning tasks, namely Mahalanobis metric learning and maximum variance unfolding. Extensive experimental results clearly demonstrate that BILGO can solve large-scale semidefinite programs efficiently.
BILGO: Bilateral greedy optimization for large scale semidefinite programming

KAUST Repository

Hao, Zhifeng

2013-10-03

Many machine learning tasks (e.g. metric and manifold learning problems) can be formulated as convex semidefinite programs. To enable the application of these tasks on a large-scale, scalability and computational efficiency are considered as desirable properties for a practical semidefinite programming algorithm. In this paper, we theoretically analyze a new bilateral greedy optimization (denoted BILGO) strategy in solving general semidefinite programs on large-scale datasets. As compared to existing methods, BILGO employs a bilateral search strategy during each optimization iteration. In such an iteration, the current semidefinite matrix solution is updated as a bilateral linear combination of the previous solution and a suitable rank-1 matrix, which can be efficiently computed from the leading eigenvector of the descent direction at this iteration. By optimizing for the coefficients of the bilateral combination, BILGO reduces the cost function in every iteration until the KKT conditions are fully satisfied, thus, it tends to converge to a global optimum. In fact, we prove that BILGO converges to the global optimal solution at a rate of O(1/k), where k is the iteration counter. The algorithm thus successfully combines the efficiency of conventional rank-1 update algorithms and the effectiveness of gradient descent. Moreover, BILGO can be easily extended to handle low rank constraints. To validate the effectiveness and efficiency of BILGO, we apply it to two important machine learning tasks, namely Mahalanobis metric learning and maximum variance unfolding. Extensive experimental results clearly demonstrate that BILGO can solve large-scale semidefinite programs efficiently.
Using large hydrological datasets to create a robust, physically based, spatially distributed model for Great Britain

Science.gov (United States)

Lewis, Elizabeth; Kilsby, Chris; Fowler, Hayley

2014-05-01

The impact of climate change on hydrological systems requires further quantification in order to inform water management. This study intends to conduct such analysis using hydrological models. Such models are of varying forms, of which conceptual, lumped parameter models and physically-based models are two important types. The majority of hydrological studies use conceptual models calibrated against measured river flow time series in order to represent catchment behaviour. This method often shows impressive results for specific problems in gauged catchments. However, the results may not be robust under non-stationary conditions such as climate change, as physical processes and relationships amenable to change are not accounted for explicitly. Moreover, conceptual models are less readily applicable to ungauged catchments, in which hydrological predictions are also required. As such, the physically based, spatially distributed model SHETRAN is used in this study to develop a robust and reliable framework for modelling historic and future behaviour of gauged and ungauged catchments across the whole of Great Britain. In order to achieve this, a large array of data completely covering Great Britain for the period 1960-2006 has been collated and efficiently stored ready for model input. The data processed include a DEM, rainfall, PE and maps of geology, soil and land cover. A desire to make the modelling system easy for others to work with led to the development of a user-friendly graphical interface. This allows non-experts to set up and run a catchment model in a few seconds, a process that can normally take weeks or months. The quality and reliability of the extensive dataset for modelling hydrological processes has also been evaluated. One aspect of this has been an assessment of error and uncertainty in rainfall input data, as well as the effects of temporal resolution in precipitation inputs on model calibration. SHETRAN has been updated to accept gridded rainfall
Orthology detection combining clustering and synteny for very large datasets

OpenAIRE

Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

2014-01-01

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the ...
ATLAS File and Dataset Metadata Collection and Use

CERN Document Server

Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

2012-01-01

The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...
Norwegian Hydrological Reference Dataset for Climate Change Studies

Energy Technology Data Exchange (ETDEWEB)

Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

2012-07-01

Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)
Management of transport and handling contracts

CERN Document Server

Rühl, I

2004-01-01

This paper shall outline the content, application and management strategies for the various contracts related to transport and handling activities. In total, the two sections Logistics and Handling Maintenance are in charge of 27 (!) contracts ranging from small supply contracts to big industrial support contracts. The activities as well as the contracts can generally be divided into four main topics "Vehicle Fleet Management"; "Supply, Installation and Commissioning of Lifting and Hoisting Equipment"; "Equipment Maintenance" and "Industrial Support for Transport and Handling". Each activity and contract requires different approaches and permanent adaptation to the often changing CERN's requirements. In particular, the management and the difficulties experienced with the contracts E072 "Maintenance of lifting and hoisting equipment", F420 "Supply of seven overhead traveling cranes for LHC" and S090/S103 "Industrial support for transport and handling" will be explained in detail.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

KAUST Repository

Litvinenko, Alexander; Sun, Ying; Genton, Marc G.; Keyes, David E.

2017-01-01

The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\H$-) matrix format with computational cost $\\mathcal{O}(k^2n \\log^2 n/p)$ and storage $\\mathcal{O}(kn \\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.
Likelihood Approximation With Parallel Hierarchical Matrices For Large Spatial Datasets

KAUST Repository

Litvinenko, Alexander

2017-11-01

The main goal of this article is to introduce the parallel hierarchical matrix library HLIBpro to the statistical community. We describe the HLIBCov package, which is an extension of the HLIBpro library for approximating large covariance matrices and maximizing likelihood functions. We show that an approximate Cholesky factorization of a dense matrix of size $2M\\\\times 2M$ can be computed on a modern multi-core desktop in few minutes. Further, HLIBCov is used for estimating the unknown parameters such as the covariance length, variance and smoothness parameter of a Matérn covariance function by maximizing the joint Gaussian log-likelihood function. The computational bottleneck here is expensive linear algebra arithmetics due to large and dense covariance matrices. Therefore covariance matrices are approximated in the hierarchical ($\\\\H$-) matrix format with computational cost $\\\\mathcal{O}(k^2n \\\\log^2 n/p)$ and storage $\\\\mathcal{O}(kn \\\\log n)$, where the rank $k$ is a small integer (typically $k<25$), $p$ the number of cores and $n$ the number of locations on a fairly general mesh. We demonstrate a synthetic example, where the true values of known parameters are known. For reproducibility we provide the C++ code, the documentation, and the synthetic data.

The Harvard organic photovoltaic dataset

Science.gov (United States)

Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

2016-01-01

The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312
Ergonomics: safe patient handling and mobility.

Science.gov (United States)

Hallmark, Beth; Mechan, Patricia; Shores, Lynne

2015-03-01

This article reviews and investigates the issues surrounding ergonomics, with a specific focus on safe patient handling and mobility. The health care worker of today faces many challenges, one of which is related to the safety of patients. Safe patient handling and mobility is on the forefront of the movement to improve patient safety. This article reviews the risks associated with patient handling and mobility, and informs the reader of current evidence-based practice relevant to this area of care. Copyright © 2015 Elsevier Inc. All rights reserved.
Synthetic and Empirical Capsicum Annuum Image Dataset

NARCIS (Netherlands)

Barth, R.

2016-01-01

This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to
4P: fast computing of population genetics statistics from large DNA polymorphism panels.

Science.gov (United States)

Benazzo, Andrea; Panziera, Alex; Bertorelle, Giorgio

2015-01-01

Massive DNA sequencing has significantly increased the amount of data available for population genetics and molecular ecology studies. However, the parallel computation of simple statistics within and between populations from large panels of polymorphic sites is not yet available, making the exploratory analyses of a set or subset of data a very laborious task. Here, we present 4P (parallel processing of polymorphism panels), a stand-alone software program for the rapid computation of genetic variation statistics (including the joint frequency spectrum) from millions of DNA variants in multiple individuals and multiple populations. It handles a standard input file format commonly used to store DNA variation from empirical or simulation experiments. The computational performance of 4P was evaluated using large SNP (single nucleotide polymorphism) datasets from human genomes or obtained by simulations. 4P was faster or much faster than other comparable programs, and the impact of parallel computing using multicore computers or servers was evident. 4P is a useful tool for biologists who need a simple and rapid computer program to run exploratory population genetics analyses in large panels of genomic data. It is also particularly suitable to analyze multiple data sets produced in simulation studies. Unix, Windows, and MacOs versions are provided, as well as the source code for easier pipeline implementations.
Remote handling equipment

International Nuclear Information System (INIS)

Clement, G.

1984-01-01

After a definition of intervention, problems encountered for working in an adverse environment are briefly analyzed for development of various remote handling equipments. Some examples of existing equipments are given [fr
Tritium handling facilities at the Los Alamos Scientific Laboratory

International Nuclear Information System (INIS)

Anderson, J.L.; Damiano, F.A.; Nasise, J.E.

1975-01-01

A new tritium facility, recently activated at the Los Alamos Scientific Laboratory, is described. The facility contains a large drybox, associated gas processing system, a facility for handling tritium gas at pressures to approximately 100 MPa, and an effluent treatment system which removes tritium from all effluents prior to their release to the atmosphere. The system and its various components are discussed in detail with special emphasis given to those aspects which significantly reduce personnel exposures and atmospheric releases. (auth)
Welding method by remote handling

International Nuclear Information System (INIS)

Hashinokuchi, Minoru.

1994-01-01

Water is charged into a pit (or a water reservoir) and an article to be welded is placed on a support in the pit by remote handling. A steel plate is disposed so as to cover the article to be welded by remote handling. The welding device is positioned to the portion to be welded and fixed in a state where the article to be welded is shielded from radiation by water and the steel plate. Water in the pit is drained till the portion to be welded is exposed to the atmosphere. Then, welding is conducted. After completion of the welding, water is charged again to the pit and the welding device and fixing jigs are decomposed in a state where the article to be welded is shielded again from radiation by water and the steel plate. Subsequently, the steel plate is removed by remote handling. Then, the article to be welded is returned from the pit to a temporary placing pool by remote handling. This can reduce operator's exposure. Further, since the amount of the shielding materials can be minimized, the amount of radioactive wastes can be decreased. (I.N.)
Unified Access Architecture for Large-Scale Scientific Datasets

Science.gov (United States)

Karna, Risav

2014-05-01

Data-intensive sciences have to deploy diverse large scale database technologies for data analytics as scientists have now been dealing with much larger volume than ever before. While array databases have bridged many gaps between the needs of data-intensive research fields and DBMS technologies (Zhang 2011), invocation of other big data tools accompanying these databases is still manual and separate the database management's interface. We identify this as an architectural challenge that will increasingly complicate the user's work flow owing to the growing number of useful but isolated and niche database tools. Such use of data analysis tools in effect leaves the burden on the user's end to synchronize the results from other data manipulation analysis tools with the database management system. To this end, we propose a unified access interface for using big data tools within large scale scientific array database using the database queries themselves to embed foreign routines belonging to the big data tools. Such an invocation of foreign data manipulation routines inside a query into a database can be made possible through a user-defined function (UDF). UDFs that allow such levels of freedom as to call modules from another language and interface back and forth between the query body and the side-loaded functions would be needed for this purpose. For the purpose of this research we attempt coupling of four widely used tools Hadoop (hadoop1), Matlab (matlab1), R (r1) and ScaLAPACK (scalapack1) with UDF feature of rasdaman (Baumann 98), an array-based data manager, for investigating this concept. The native array data model used by an array-based data manager provides compact data storage and high performance operations on ordered data such as spatial data, temporal data, and matrix-based data for linear algebra operations (scidbusr1). Performances issues arising due to coupling of tools with different paradigms, niche functionalities, separate processes and output
Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets

Science.gov (United States)

Kim, S. K.; Prabhat, M.; Williams, D. N.

2017-12-01

Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.
The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

Science.gov (United States)

Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M

2006-04-21

Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association
Civilsamfundets ABC: H for Handling

DEFF Research Database (Denmark)

Lund, Anker Brink; Meyer, Gitte

2015-01-01

Hvad er civilsamfundet? Anker Brink Lund og Gitte Meyer fra CBS Center for Civil Society Studies gennemgår civilsamfundet bogstav for bogstav. Vi er nået til H for Handling.......Hvad er civilsamfundet? Anker Brink Lund og Gitte Meyer fra CBS Center for Civil Society Studies gennemgår civilsamfundet bogstav for bogstav. Vi er nået til H for Handling....
The technique on handling radiation

International Nuclear Information System (INIS)

1997-11-01

This book describes measurement of radiation and handling radiation. The first part deals with measurement of radiation. The contents of this part are characteristic on measurement technique of radiation, radiation detector, measurement of energy spectrum, measurement of radioactivity, measurement for a level of radiation and county's statistics on radiation. The second parts explains handling radiation with treating of sealed radioisotope, treating unsealed source and radiation shield.
EEG datasets for motor imagery brain-computer interface.

Science.gov (United States)

Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

2017-07-01

Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.
Uranium hexafluoride handling

International Nuclear Information System (INIS)

1991-01-01

The United States Department of Energy, Oak Ridge Field Office, and Martin Marietta Energy Systems, Inc., are co-sponsoring this Second International Conference on Uranium Hexafluoride Handling. The conference is offered as a forum for the exchange of information and concepts regarding the technical and regulatory issues and the safety aspects which relate to the handling of uranium hexafluoride. Through the papers presented here, we attempt not only to share technological advances and lessons learned, but also to demonstrate that we are concerned about the health and safety of our workers and the public, and are good stewards of the environment in which we all work and live. These proceedings are a compilation of the work of many experts in that phase of world-wide industry which comprises the nuclear fuel cycle. Their experience spans the entire range over which uranium hexafluoride is involved in the fuel cycle, from the production of UF 6 from the naturally-occurring oxide to its re-conversion to oxide for reactor fuels. The papers furnish insights into the chemical, physical, and nuclear properties of uranium hexafluoride as they influence its transport, storage, and the design and operation of plant-scale facilities for production, processing, and conversion to oxide. The papers demonstrate, in an industry often cited for its excellent safety record, continuing efforts to further improve safety in all areas of handling uranium hexafluoride
ASSISTments Dataset from Multiple Randomized Controlled Experiments

Science.gov (United States)

Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

2016-01-01

In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…
SITE GENERATED RADIOLOGICAL WASTE HANDLING SYSTEM DESCRIPTION DOCUMENT

Energy Technology Data Exchange (ETDEWEB)

S. C. Khamankar

2000-06-20

The Site Generated Radiological Waste Handling System handles radioactive waste products that are generated at the geologic repository operations area. The waste is collected, treated if required, packaged for shipment, and shipped to a disposal site. Waste streams include low-level waste (LLW) in solid and liquid forms, as-well-as mixed waste that contains hazardous and radioactive constituents. Liquid LLW is segregated into two streams, non-recyclable and recyclable. The non-recyclable stream may contain detergents or other non-hazardous cleaning agents and is packaged for shipment. The recyclable stream is treated to recycle a large portion of the water while the remaining concentrated waste is packaged for shipment; this greatly reduces the volume of waste requiring disposal. There will be no liquid LLW discharge. Solid LLW consists of wet solids such as ion exchange resins and filter cartridges, as-well-as dry active waste such as tools, protective clothing, and poly bags. Solids will be sorted, volume reduced, and packaged for shipment. The generation of mixed waste at the Monitored Geologic Repository (MGR) is not planned; however, if it does come into existence, it will be collected and packaged for disposal at its point of occurrence, temporarily staged, then shipped to government-approved off-site facilities for disposal. The Site Generated Radiological Waste Handling System has equipment located in both the Waste Treatment Building (WTB) and in the Waste Handling Building (WHB). All types of liquid and solid LLW are processed in the WTB, while wet solid waste from the Pool Water Treatment and Cooling System is packaged where received in the WHB. There is no installed hardware for mixed waste. The Site Generated Radiological Waste Handling System receives waste from locations where water is used for decontamination functions. In most cases the water is piped back to the WTB for processing. The WTB and WHB provide staging areas for storing and shipping LLW
SITE GENERATED RADIOLOGICAL WASTE HANDLING SYSTEM DESCRIPTION DOCUMENT

International Nuclear Information System (INIS)

S. C. Khamankar

2000-01-01

The Site Generated Radiological Waste Handling System handles radioactive waste products that are generated at the geologic repository operations area. The waste is collected, treated if required, packaged for shipment, and shipped to a disposal site. Waste streams include low-level waste (LLW) in solid and liquid forms, as-well-as mixed waste that contains hazardous and radioactive constituents. Liquid LLW is segregated into two streams, non-recyclable and recyclable. The non-recyclable stream may contain detergents or other non-hazardous cleaning agents and is packaged for shipment. The recyclable stream is treated to recycle a large portion of the water while the remaining concentrated waste is packaged for shipment; this greatly reduces the volume of waste requiring disposal. There will be no liquid LLW discharge. Solid LLW consists of wet solids such as ion exchange resins and filter cartridges, as-well-as dry active waste such as tools, protective clothing, and poly bags. Solids will be sorted, volume reduced, and packaged for shipment. The generation of mixed waste at the Monitored Geologic Repository (MGR) is not planned; however, if it does come into existence, it will be collected and packaged for disposal at its point of occurrence, temporarily staged, then shipped to government-approved off-site facilities for disposal. The Site Generated Radiological Waste Handling System has equipment located in both the Waste Treatment Building (WTB) and in the Waste Handling Building (WHB). All types of liquid and solid LLW are processed in the WTB, while wet solid waste from the Pool Water Treatment and Cooling System is packaged where received in the WHB. There is no installed hardware for mixed waste. The Site Generated Radiological Waste Handling System receives waste from locations where water is used for decontamination functions. In most cases the water is piped back to the WTB for processing. The WTB and WHB provide staging areas for storing and shipping LLW
Handling Pyrophoric Reagents

Energy Technology Data Exchange (ETDEWEB)

Alnajjar, Mikhail S.; Haynie, Todd O.

2009-08-14

Pyrophoric reagents are extremely hazardous. Special handling techniques are required to prevent contact with air and the resulting fire. This document provides several methods for working with pyrophoric reagents outside of an inert atmosphere.
Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

International Nuclear Information System (INIS)

Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

2014-01-01

This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)
Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control

Science.gov (United States)

Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R

2014-01-01

Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770

Spatiotemporal dataset on Chinese population distribution and its driving factors from 1949 to 2013

Science.gov (United States)

Wang, Lizhe; Chen, Lajiao

2016-07-01

Spatio-temporal data on human population and its driving factors is critical to understanding and responding to population problems. Unfortunately, such spatio-temporal data on a large scale and over the long term are often difficult to obtain. Here, we present a dataset on Chinese population distribution and its driving factors over a remarkably long period, from 1949 to 2013. Driving factors of population distribution were selected according to the push-pull migration laws, which were summarized into four categories: natural environment, natural resources, economic factors and social factors. Natural environment and natural resources indicators were calculated using Geographic Information System (GIS) and Remote Sensing (RS) techniques, whereas economic and social factors from 1949 to 2013 were collected from the China Statistical Yearbook and China Compendium of Statistics from 1949 to 2008. All of the data were quality controlled and unified into an identical dataset with the same spatial scope and time period. The dataset is expected to be useful for understanding how population responds to and impacts environmental change.
ATA diagnostic data handling system: an overview

International Nuclear Information System (INIS)

Chambers, F.W.; Kallman, J.; McDonald, J.; Slominski, M.

1984-01-01

The functions to be performed by the ATA diagnostic data handling system are discussed. The capabilities of the present data acquisition system (System 0) are presented. The goals for the next generation acquisition system (System 1), currently under design, are discussed. Facilities on the Octopus system for data handling are reviewed. Finally, we discuss what has been learned about diagnostics and computer based data handling during the past year
Small values in big data: The continuing need for appropriate metadata

Science.gov (United States)

Stow, Craig A.; Webster, Katherine E.; Wagner, Tyler; Lottig, Noah R.; Soranno, Patricia A.; Cha, YoonKyung

2018-01-01

Compiling data from disparate sources to address pressing ecological issues is increasingly common. Many ecological datasets contain left-censored data – observations below an analytical detection limit. Studies from single and typically small datasets show that common approaches for handling censored data — e.g., deletion or substituting fixed values — result in systematic biases. However, no studies have explored the degree to which the documentation and presence of censored data influence outcomes from large, multi-sourced datasets. We describe left-censored data in a lake water quality database assembled from 74 sources and illustrate the challenges of dealing with small values in big data, including detection limits that are absent, range widely, and show trends over time. We show that substitutions of censored data can also bias analyses using ‘big data’ datasets, that censored data can be effectively handled with modern quantitative approaches, but that such approaches rely on accurate metadata that describe treatment of censored data from each source.
Guidelines for Remote Handling Maintenance of ITER Neutral Beam Components

International Nuclear Information System (INIS)

Cordier, J.-J.; Hemsworth, R.; Bayetti, P.

2006-01-01

Remote handling maintenance of ITER components is one of the main challenges of the ITER project. This type of maintenance shall be operational for the nuclear phase of exploitation of ITER, and be considered at a very early stage since it significantly impacts on the components design, interfaces management and integration business. A large part of the R/H equipment will be procured by the EU partner, in particular the whole Neutral Beam Remote Handling (RH) equipment package. A great deal of work has already been done in this field during the EDA phase of ITER project, but improvements and alternative option that are now proposed by ITER lead to added RH and maintenance engineering studies. The Neutral Beam Heating -and- Current Drive system 1 is being revisited by the ITER project. The vertical maintenance scheme that is presently considered by ITER, may significantly impact on the reference design of the Neutral Beam (NB) system and associated components and lead to new design of the NB box itself. In addition, revision of both NB cell radiation level zoning and remote handling classification of the beam line injector will also significantly impact on components design and maintenance. Based on the experience gained on the vertical maintenance scheme, developed in detail for the ITER Neutral Beam Test Facility 2 to be built in Europe in a near future, guidelines for the revision of the design and preliminary feasibility study of the remote handling vertical maintenance scheme of beam line components are described in the paper. A maintenance option for the SINGAP3 accelerator is also presented. (author)
Automation of 3D micro object handling process

DEFF Research Database (Denmark)

Gegeckaite, Asta; Hansen, Hans Nørgaard

2007-01-01

Most of the micro objects in industrial production are handled with manual labour or in semiautomatic stations. Manual labour usually makes handling and assembly operations highly flexible, but slow, relatively imprecise and expensive. Handling of 3D micro objects poses special challenges due to ...
Introduction of a simple-model-based land surface dataset for Europe

Science.gov (United States)

Orth, Rene; Seneviratne, Sonia I.

2015-04-01

Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar
Data Mining for Imbalanced Datasets: An Overview

Science.gov (United States)

Chawla, Nitesh V.

A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.
Scheduling of outbound luggage handling at airports

DEFF Research Database (Denmark)

Barth, Torben C.; Pisinger, David

2012-01-01

This article considers the outbound luggage handling problem at airports. The problem is to assign handling facilities to outbound flights and decide about the handling start time. This dynamic, near real-time assignment problem is part of the daily airport operations. Quality, efficiency......). Another solution method is a decomposition approach. The problem is divided into different subproblems and solved in iterative steps. The different solution approaches are tested on real world data from Frankfurt Airport....
Sequence trajectory generation for garment handling systems

OpenAIRE

Liu, Honghai; Lin, Hua

2008-01-01

This paper presents a novel generic approach to the planning strategy of garment handling systems. An assumption is proposed to separate the components of such systems into a component for intelligent gripper techniques and a component for handling planning strategies. Researchers can concentrate on one of the two components first, then merge the two problems together. An algorithm is addressed to generate the trajectory position and a clothes handling sequence of clothes partitions, which ar...
Enhanced wood fuel handling: market and design studies

Energy Technology Data Exchange (ETDEWEB)

Landen, R.; Rippengal, R.; Redman, A.N.

1997-09-01

This report examines the potential for the manufacture and sale of novel wood fuel handling systems as a means of addressing users' concerns regarding current capital costs and potential high labour costs of non-automated systems. The report considers fuel handling technology that is basically appropriate for wood-fired heating systems of between c.100kW and c.1MW maximum continuous rating. This report details work done by the project collaborators in order to: (1) assess the current status of wood fuel handling technology; (2) evaluate the market appetite for improved wood fuel handling technology; (3) derive capital costs which are acceptable to customers; (4) review design options; and (5) select one or more design options worthy of further development. The current status of wood fuel handling technology is determined, and some basic modelling to give guidance on acceptable capital costs of 100-1000kW wood fuel handling systems is undertaken. (author)
Sparse kernel orthonormalized PLS for feature extraction in large datasets

DEFF Research Database (Denmark)

Arenas-García, Jerónimo; Petersen, Kaare Brandt; Hansen, Lars Kai

2006-01-01

In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm...... is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method...
Parallax handling of image stitching using dominant-plane homography

Science.gov (United States)

Pang, Zhaofeng; Li, Cheng; Zhao, Baojun; Tang, Linbo

2015-10-01

In this paper, we present a novel image stitching method to handle parallax in practical application. For images with significant amount of parallax, the more effective approach is to align roughly and globally the overlapping regions and then apply a seam-cutting method to composite naturally stitched images. It is well known that images can be modeled by various planes result from the projective parallax under non-ideal imaging condition. The dominant-plane homography has important advantages of warping an image globally and avoiding some local distortions. The proposed method primarily addresses large parallax problem through two steps: (1) selecting matching point pairs located on the dominant plane, by clustering matching correspondences and then measuring the cost of each cluster; and (2) in order to obtain a plausible seam, edge maps of overlapped area incorporation arithmetic is adopted to modify the standard seam-cutting method. Furthermore, our approach is demonstrated to achieve reliable performance of handling parallax through a mass of experimental comparisons with state-of-the-art methods.
Handling and Transport of Oversized Accelerator Components and Physics Detectors

CERN Document Server

Prodon, S; Guinchard, M; Minginette, P

2006-01-01

For cost, planning and organisational reasons, it is often decided to install large pre-built accelerators components and physics detectors. As a result surface exceptional transports are required from the construction to the installation sites. Such heavy transports have been numerous during the LHC installation phase. This paper will describe the different types of transport techniques used to fit the particularities of accelerators and detectors components (weight, height, acceleration, planarity) as well as the measurement techniques for monitoring and the logistical aspects (organisation with the police, obstacles on the roads, etc). As far as oversized equipment is concerned, the lowering into the pit is challenging, as well as the transport in tunnel galleries in a very scare space and without handling means attached to the structure like overhead travelling cranes. From the PS accelerator to the LHC, handling systems have been developed at CERN to fit with these particular working conditions. This pap...
A hybrid organic-inorganic perovskite dataset

Science.gov (United States)

Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

2017-05-01

Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.
Genomics dataset of unidentified disclosed isolates

Directory of Open Access Journals (Sweden)

Bhagwan N. Rekadwad

2016-09-01

Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends
Inverse modelling estimates of N2O surface emissions and stratospheric losses using a global dataset

Science.gov (United States)

Thompson, R. L.; Bousquet, P.; Chevallier, F.; Dlugokencky, E. J.; Vermeulen, A. T.; Aalto, T.; Haszpra, L.; Meinhardt, F.; O'Doherty, S.; Moncrieff, J. B.; Popa, M.; Steinbacher, M.; Jordan, A.; Schuck, T. J.; Brenninkmeijer, C. A.; Wofsy, S. C.; Kort, E. A.

2010-12-01

Nitrous oxide (N2O) levels have been steadily increasing in the atmosphere over the past few decades at a rate of approximately 0.3% per year. This trend is of major concern as N2O is both a long-lived Greenhouse Gas (GHG) and an Ozone Depleting Substance (ODS), as it is a precursor of NO and NO2, which catalytically destroy ozone in the stratosphere. Recently, N2O emissions have been recognised as the most important ODS emissions and are now of greater importance than emissions of CFC's. The growth in atmospheric N2O is predominantly due to the enhancement of surface emissions by human activities. Most notably, the intensification and proliferation of agriculture since the mid-19th century, which has been accompanied by the increased input of reactive nitrogen to soils and has resulted in significant perturbations to the natural N-cycle and emissions of N2O. There exist two approaches for estimating N2O emissions, the so-called 'bottom-up' and 'top-down' approaches. Top-down approaches, based on the inversion of atmospheric measurements, require an estimate of the loss of N2O via photolysis and oxidation in the stratosphere. Uncertainties in the loss magnitude contribute uncertainties of 15 to 20% to the global annual surface emissions, complicating direct comparisons between bottom-up and top-down estimates. In this study, we present a novel inversion framework for the simultaneous optimization of N2O surface emissions and the magnitude of the loss, which avoids errors in the emissions due to incorrect assumptions about the lifetime of N2O. We use a Bayesian inversion with a variational formulation (based on 4D-Var) in order to handle very large datasets. N2O fluxes are retrieved at 4-weekly resolution over a global domain with a spatial resolution of 3.75° x 2.5° longitude by latitude. The efficacy of the simultaneous optimization of emissions and losses is tested using a global synthetic dataset, which mimics the available atmospheric data. Lastly, using real
IPCC Socio-Economic Baseline Dataset

Data.gov (United States)

National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...
Disinformative data in large-scale hydrological modelling

Directory of Open Access Journals (Sweden)

A. Kauffeldt

2013-07-01

Full Text Available Large-scale hydrological modelling has become an important tool for the study of global and regional water resources, climate impacts, and water-resources management. However, modelling efforts over large spatial domains are fraught with problems of data scarcity, uncertainties and inconsistencies between model forcing and evaluation data. Model-independent methods to screen and analyse data for such problems are needed. This study aimed at identifying data inconsistencies in global datasets using a pre-modelling analysis, inconsistencies that can be disinformative for subsequent modelling. The consistency between (i basin areas for different hydrographic datasets, and (ii between climate data (precipitation and potential evaporation and discharge data, was examined in terms of how well basin areas were represented in the flow networks and the possibility of water-balance closure. It was found that (i most basins could be well represented in both gridded basin delineations and polygon-based ones, but some basins exhibited large area discrepancies between flow-network datasets and archived basin areas, (ii basins exhibiting too-high runoff coefficients were abundant in areas where precipitation data were likely affected by snow undercatch, and (iii the occurrence of basins exhibiting losses exceeding the potential-evaporation limit was strongly dependent on the potential-evaporation data, both in terms of numbers and geographical distribution. Some inconsistencies may be resolved by considering sub-grid variability in climate data, surface-dependent potential-evaporation estimates, etc., but further studies are needed to determine the reasons for the inconsistencies found. Our results emphasise the need for pre-modelling data analysis to identify dataset inconsistencies as an important first step in any large-scale study. Applying data-screening methods before modelling should also increase our chances to draw robust conclusions from subsequent
SIZE SELECTION IN DIVING TUFTED DUCKS AYTHYA-FULIGULA EXPLAINED BY DIFFERENTIAL HANDLING OF SMALL AND LARGE MUSSELS DREISSENA-POLYMORPHA

NARCIS (Netherlands)

DELEEUW, JJ; VANEERDEN, MR

1992-01-01

We studied prey size selection of Tufted Ducks feeding on fresh-water mussels under semi-natural conditions. In experiments with non-diving birds, we found that Tufted Ducks use two techniques to handle mussels. Mussels less than 16 mm in length are strained from a waterflow generated in the bill
Materials Handling. Module SH-01. Safety and Health.

Science.gov (United States)

Center for Occupational Research and Development, Inc., Waco, TX.

This student module on materials handling is one of 50 modules concerned with job safety and health. It presents the procedures for safe materials handling. Discussed are manual handling methods (lifting and carrying by hand) and mechanical lifting (lifting by powered trucks, cranes or conveyors). Following the introduction, 15 objectives (each…

Incorporating Handling Qualities Analysis into Rotorcraft Conceptual Design

Science.gov (United States)

Lawrence, Ben

2014-01-01

This paper describes the initial development of a framework to incorporate handling qualities analyses into a rotorcraft conceptual design process. In particular, the paper describes how rotorcraft conceptual design level data can be used to generate flight dynamics models for handling qualities analyses. Also, methods are described that couple a basic stability augmentation system to the rotorcraft flight dynamics model to extend analysis to beyond that of the bare airframe. A methodology for calculating the handling qualities characteristics of the flight dynamics models and for comparing the results to ADS-33E criteria is described. Preliminary results from the application of the handling qualities analysis for variations in key rotorcraft design parameters of main rotor radius, blade chord, hub stiffness and flap moment of inertia are shown. Varying relationships, with counteracting trends for different handling qualities criteria and different flight speeds are exhibited, with the action of the control system playing a complex part in the outcomes. Overall, the paper demonstrates how a broad array of technical issues across flight dynamics stability and control, simulation and modeling, control law design and handling qualities testing and evaluation had to be confronted to implement even a moderately comprehensive handling qualities analysis of relatively low fidelity models. A key outstanding issue is to how to 'close the loop' with an overall design process, and options for the exploration of how to feedback handling qualities results to a conceptual design process are proposed for future work.
Enclosure for handling high activity materials abstract

International Nuclear Information System (INIS)

Jimeno de Osso, F.; Dominguez Rodriguez, G.; Cruz Castillo, F. de la; Rodriguez Esteban, A.

1977-01-01

One of the most important problems that are met at the laboratories producing and handling radioisotopes is that of designing, building and operating enclosures suitable for the safe handling of active substances. With that purpose in mind, an enclosure has been designed and built for handling moderately high activities under a shielding made of 150 mm thick lead. A description is given of those aspects that may be of interest to people working in this field. (author) [es
A procedure to validate and correct the {sup 13}C chemical shift calibration of RNA datasets

Energy Technology Data Exchange (ETDEWEB)

Aeschbacher, Thomas; Schubert, Mario, E-mail: schubert@mol.biol.ethz.ch; Allain, Frederic H.-T., E-mail: allain@mol.biol.ethz.ch [ETH Zuerich, Institute for Molecular Biology and Biophysics (Switzerland)

2012-02-15

Chemical shifts reflect the structural environment of a certain nucleus and can be used to extract structural and dynamic information. Proper calibration is indispensable to extract such information from chemical shifts. Whereas a variety of procedures exist to verify the chemical shift calibration for proteins, no such procedure is available for RNAs to date. We present here a procedure to analyze and correct the calibration of {sup 13}C NMR data of RNAs. Our procedure uses five {sup 13}C chemical shifts as a reference, each of them found in a narrow shift range in most datasets deposited in the Biological Magnetic Resonance Bank. In 49 datasets we could evaluate the {sup 13}C calibration and detect errors or inconsistencies in RNA {sup 13}C chemical shifts based on these chemical shift reference values. More than half of the datasets (27 out of those 49) were found to be improperly referenced or contained inconsistencies. This large inconsistency rate possibly explains that no clear structure-{sup 13}C chemical shift relationship has emerged for RNA so far. We were able to recalibrate or correct 17 datasets resulting in 39 usable {sup 13}C datasets. 6 new datasets from our lab were used to verify our method increasing the database to 45 usable datasets. We can now search for structure-chemical shift relationships with this improved list of {sup 13}C chemical shift data. This is demonstrated by a clear relationship between ribose {sup 13}C shifts and the sugar pucker, which can be used to predict a C2 Prime - or C3 Prime -endo conformation of the ribose with high accuracy. The improved quality of the chemical shift data allows statistical analysis with the potential to facilitate assignment procedures, and the extraction of restraints for structure calculations of RNA.
Advanced dust monitoring system applied to new TRU handling facility of JAERI

International Nuclear Information System (INIS)

Yabuta, H.; Shigeta, Y.; Sawahata, K.; Hasegawa, K.

1993-01-01

In JAERI, a large, scale multipurpose facility is under construction, which consists of a TRU waste management testing installation, a solution fuel treatment installation and critical assemblies with uranium and/or plutonium solution fuel. The facility is also equipped with a lot of gloveboxes for handling and treatment of solution fuel and hot cells for research on reprocessing process. As there may be a relatively high potential of air contamination, it is important to monitor air contamination effectively and efficiently. An advanced dust monitoring system was introduced for convenience of handling and automatical measurement of filter papers, by developing a filter-holder with an IC memory and a radioactivity measuring device with an automatic filter-holder changing mechanism as a part of a centralized monitoring system with a computer
Nanoparticle-organic pollutant interaction dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...
Handling of waste in ports

International Nuclear Information System (INIS)

Olson, P.H.

1994-01-01

The regulations governing the handling of port-generated waste are often national and/or local legislation, whereas the handling of ship-generated waste is governed by the MARPOL Convention in most parts of the world. The handling of waste consists of two main phases -collection and treatment. Waste has to be collected in every port and on board every ship, whereas generally only some wastes are treated and to a certain degree in ports and on board ships. This paper considers the different kinds of waste generated in both ports and on board ships, where and how it is generated, how it could be collected and treated. The two sources are treated together to show how some ship-generated waste may be treated in port installations primarily constructed for the treatment of the port-generated waste, making integrated use of the available treatment facilities. (author)
Magnetic resonance imaging of meniscal bucket-handle tears

International Nuclear Information System (INIS)

Dfouni, N.; Garcia, J.; Kindynis, Ph.; Bosson, D.

1997-01-01

To define MR signs of meniscal bucket-handle tears and evaluate the diagnostic efficiency of this technique. Retrospective study of 30 patients with a meniscal bucket-handle tear and 30 with a different type of tear, all proven by arthroscopy. The following MR signs of a bucket-handle tear were evaluated: 'separate meniscal fragment, 'double posterior cruciate ligament', 'snake sign' and 'double anterior horn'. A correct diagnosis of a bucket-handle tear was only made in 18/30 of patients. Several of the MR signs were seen in the same patient in 17 cases. A double posterior cruciate ligament was present only in cases of medial meniscus tears. The 12 menisci without these signs, and therefore not diagnosed as bucket-handle tears, were all classified as meniscal tears on the basis of signal extending to the meniscal surface. Nine of these were not displaced into the inter-condylar notch at arthroscopy. The interobserver agreement was excellent: kappa 0.88. The diagnosis of a bucket-handle meniscal tear, if it is displaced, can be made when one or more of the four MR evaluated signs are present. Other forms of meniscal tears are only exceptionally diagnosed as bucket-handle tears. (authors)
HC StratoMineR: A web-based tool for the rapid analysis of high content datasets

NARCIS (Netherlands)

Omta, W.; Heesbeen, R. van; Pagliero, R.; Velden, L. van der; Lelieveld, D.; Nellen, M.; Kramer, M.; Yeong, M.; Saeidi, A.; Medema, R.; Spruit, M.; Brinkkemper, S.; Klumperman, J.; Egan, D.

2016-01-01

High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that
HC StratoMineR : A Web-Based Tool for the Rapid Analysis of High-Content Datasets

NARCIS (Netherlands)

Omta, Wienand A; van Heesbeen, Roy G; Pagliero, Romina J; van der Velden, Lieke M; Lelieveld, Daphne; Nellen, Mehdi; Kramer, Maik; Yeong, Marley; Saeidi, Amir M; Medema, Rene H; Spruit, Marco; Brinkkemper, Sjaak; Klumperman, Judith; Egan, David A

2016-01-01

High-content screening (HCS) can generate large multidimensional datasets and when aligned with the appropriate data mining tools, it can yield valuable insights into the mechanism of action of bioactive molecules. However, easy-to-use data mining tools are not widely available, with the result that
Evaluation study on rationalization of coal handling in snowy area

Energy Technology Data Exchange (ETDEWEB)

Suesada, Yasuhiko; Yamagata, Keisuke; Kuwahara, Mitsuhiro

1987-09-25

The adhesion of coal to the coal handling facilities in large cool-fired power plants in the snowy area was investigated for siting them in the future. The amount of water derived from melted snow in addition to that from the rain fall were measured and the statistical amounts of rain and snow falls for the past ten years were examined. Then the amount of water derived from melted snow was calculated by regression. The result indicates that the amount of rain fall in summer is larger than that from melted snow. The moisture content of coal in a coal yard reaches the moisture content at which the coal readily adheres to the facilities after snow fall and it penetrates the pile of coal to the bottom with the lapse of time. The penetrating rate of it largely depends upon the particle distribution of coal as well as the ranks of coal. The adhesion of coal to the coal handling facilities is caused mainly by the amount of dust coal and the moisture content of coal. The amount of adhered coal estimated from the shear properties qualitatively agrees with the experimental result using a model of chute. Adding the dusting inhibitor exceeding the normal value increases the amount of of adhesion of coal. (13 figs, 3 tabs)
A bivariate contaminated binormal model for robust fitting of proper ROC curves to a pair of correlated, possibly degenerate, ROC datasets.

Science.gov (United States)

Zhai, Xuetong; Chakraborty, Dev P

2017-06-01

The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive μ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ 1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics
Assessment of the risks associated with Iodine-125 handling production sources for brachytherapy

Energy Technology Data Exchange (ETDEWEB)

Souza, Daiane C.B.; Rostelato, Maria Elisa C.; Vicente, Roberto; Zeituni, Carlos A.; Tiezzi, Rodrigo; Costa, Osvaldo L.; Souza, Carla D.; Peleias Junior, Fernando S.; Rodrigues, Bruna T.; Souza, Anderson S.; Batista, Talita Q.; Melo, Emerson R.; Camargo, Anderson R., E-mail: dcsouza@usp.br [Instituto de Pesquisas Energeticas e Nucleares (IPEN/CNEN-SP), Sao Paulo, SP (Brazil); Karam Junior, Dib, E-mail: dib.karam@usp.br [Universidade de Sao Paulo (USP), Sao Paulo, SP (Brazil)

2015-07-01

In Brazil, prostate cancer is the second most frequent disease, with an estimated 68,800 new cases in 2013. This type of cancer can be treated with brachytherapy, which uses sealed sources of Iodine-125 implanted permanently in the prostate. These sources are currently imported at a high cost, making public treatment in large scale impractical. To reduce costs and to meet domestic demand, the laboratory for production of brachytherapy sources at the Nuclear and Energy Research Institute (IPEN) is currently nationalizing the production of this radioisotope. Iodine is quite volatile making the handling of its radioactive isotopes potentially dangerous. The aim of this paper is to evaluate the risks to which workers are exposed during the production and handling of the sources. The research method consisted initially of a literature review on the toxicity of iodine, intake limits, related physical risks, handling of accidents, generation of radioactive wastes, etc. The results allowed for establishing safety and radioprotection policies in order to ensure efficient and safe production in all stages and the implementation of good laboratory practices. (author)
Demonstration of a remotely operated TRU waste size-reduction and material handling process

International Nuclear Information System (INIS)

Stewart, J.A. III; Schuler, T.F.; Ward, C.R.

1986-01-01

Noncombustible Pu-238 and Pu-239 waste is generated as a result of normal operation and decommissioning activity at the Savannah River Plant and is being retrievably stored at the site. As part of the long-term plan to process the stored waste and current waste for permanent disposal, a remote size-reduction and material handling process is being tested at Savannah River Laboratory to provide design support for the plant TRU Waste Facility scheduled to be completed in 1993. The process consists of a large, low-speed shredder and material handling system, a remote worktable, a bagless transfer system, and a robotically controlled manipulator, or Telerobot. Initial testing of the shredder and material handling system and a cycle test of the bagless transfer system were completed. Initial Telerobot run-in and system evaluation was completed. User software was evaluated and modified to support complete menu-driven operation. Telerobot prototype size-reduction tooling was designed and successfully tested. Complete nonradioactive testing of the equipment is scheduled to be completed in 1987
Assessment of the risks associated with Iodine-125 handling production sources for brachytherapy

International Nuclear Information System (INIS)

Souza, Daiane C.B.; Rostelato, Maria Elisa C.; Vicente, Roberto; Zeituni, Carlos A.; Tiezzi, Rodrigo; Costa, Osvaldo L.; Souza, Carla D.; Peleias Junior, Fernando S.; Rodrigues, Bruna T.; Souza, Anderson S.; Batista, Talita Q.; Melo, Emerson R.; Camargo, Anderson R.; Karam Junior, Dib

2015-01-01

In Brazil, prostate cancer is the second most frequent disease, with an estimated 68,800 new cases in 2013. This type of cancer can be treated with brachytherapy, which uses sealed sources of Iodine-125 implanted permanently in the prostate. These sources are currently imported at a high cost, making public treatment in large scale impractical. To reduce costs and to meet domestic demand, the laboratory for production of brachytherapy sources at the Nuclear and Energy Research Institute (IPEN) is currently nationalizing the production of this radioisotope. Iodine is quite volatile making the handling of its radioactive isotopes potentially dangerous. The aim of this paper is to evaluate the risks to which workers are exposed during the production and handling of the sources. The research method consisted initially of a literature review on the toxicity of iodine, intake limits, related physical risks, handling of accidents, generation of radioactive wastes, etc. The results allowed for establishing safety and radioprotection policies in order to ensure efficient and safe production in all stages and the implementation of good laboratory practices. (author)
How to Handle Impasses in Bargaining.

Science.gov (United States)

Durrant, Robert E.

Guidelines in an outline format are presented to school board members and administrators on how to handle impasses in bargaining. The following two rules are given: there sometimes may be strikes, but there always will be settlements; and on the way to settlements, there always will be impasses. Suggestions for handling impasses are listed under…
Parallel Multivariate Spatio-Temporal Clustering of Large Ecological Datasets on Hybrid Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sreepathi, Sarat [ORNL; Kumar, Jitendra [ORNL; Mills, Richard T. [Argonne National Laboratory; Hoffman, Forrest M. [ORNL; Sripathi, Vamsi [Intel Corporation; Hargrove, William Walter [United States Department of Agriculture (USDA), United States Forest Service (USFS)

2017-09-01

A proliferation of data from vast networks of remote sensing platforms (satellites, unmanned aircraft systems (UAS), airborne etc.), observational facilities (meteorological, eddy covariance etc.), state-of-the-art sensors, and simulation models offer unprecedented opportunities for scientific discovery. Unsupervised classification is a widely applied data mining approach to derive insights from such data. However, classification of very large data sets is a complex computational problem that requires efficient numerical algorithms and implementations on high performance computing (HPC) platforms. Additionally, increasing power, space, cooling and efficiency requirements has led to the deployment of hybrid supercomputing platforms with complex architectures and memory hierarchies like the Titan system at Oak Ridge National Laboratory. The advent of such accelerated computing architectures offers new challenges and opportunities for big data analytics in general and specifically, large scale cluster analysis in our case. Although there is an existing body of work on parallel cluster analysis, those approaches do not fully meet the needs imposed by the nature and size of our large data sets. Moreover, they had scaling limitations and were mostly limited to traditional distributed memory computing platforms. We present a parallel Multivariate Spatio-Temporal Clustering (MSTC) technique based on k-means cluster analysis that can target hybrid supercomputers like Titan. We developed a hybrid MPI, CUDA and OpenACC implementation that can utilize both CPU and GPU resources on computational nodes. We describe performance results on Titan that demonstrate the scalability and efficacy of our approach in processing large ecological data sets.
Differences among nursing homes in outcomes of a safe resident handling program.

Science.gov (United States)

Kurowski, Alicia; Gore, Rebecca; Buchholz, Bryan; Punnett, Laura

2012-01-01

A large nursing home corporation implemented a safe resident handling program (SRHP) in 2004-2007. We evaluated its efficacy over a 2-year period by examining differences among 5 centers in program outcomes and potential predictors of those differences. We observed nursing assistants (NAs), recording activities and body postures at 60-second intervals on personal digital assistants at baseline and at 3-month, 12-month, and 24-month follow-ups. The two outcomes computed were change in equipment use during resident handling and change in a physical workload index that estimated spinal loading due to body postures and handled loads. Potential explanatory factors were extracted from post-observation interviews, investigator surveys of the workforce, from administrative data, and employee satisfaction surveys. The facility with the most positive outcome measures was associated with many positive changes in explanatory factors and the facility with the fewest positive outcome measures experienced negative changes in the same factors. These findings suggest greater SRHP benefits where there was lower NA turnover and agency staffing; less time pressure; and better teamwork, staff communication, and supervisory support. © 2012 American Society for Healthcare Risk Management of the American Hospital Association.
Chemical product and function dataset

Data.gov (United States)

U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....
WASTE HANDLING BUILDING FIRE PROTECTION SYSTEM DESCRIPTION DOCUMENT

Energy Technology Data Exchange (ETDEWEB)

J. D. Bigbee

2000-06-21

The Waste Handling Building Fire Protection System provides the capability to detect, control, and extinguish fires and/or mitigate explosions throughout the Waste Handling Building (WHB). Fire protection includes appropriate water-based and non-water-based suppression, as appropriate, and includes the distribution and delivery systems for the fire suppression agents. The Waste Handling Building Fire Protection System includes fire or explosion detection panel(s) controlling various detectors, system actuation, annunciators, equipment controls, and signal outputs. The system interfaces with the Waste Handling Building System for mounting of fire protection equipment and components, location of fire suppression equipment, suppression agent runoff, and locating fire rated barriers. The system interfaces with the Waste Handling Building System for adequate drainage and removal capabilities of liquid runoff resulting from fire protection discharges. The system interfaces with the Waste Handling Building Electrical Distribution System for power to operate, and with the Site Fire Protection System for fire protection water supply to automatic sprinklers, standpipes, and hose stations. The system interfaces with the Site Fire Protection System for fire signal transmission outside the WHB as needed to respond to a fire emergency, and with the Waste Handling Building Ventilation System to detect smoke and fire in specific areas, to protect building high-efficiency particulate air (HEPA) filters, and to control portions of the Waste Handling Building Ventilation System for smoke management and manual override capability. The system interfaces with the Monitored Geologic Repository (MGR) Operations Monitoring and Control System for annunciation, and condition status.
WASTE HANDLING BUILDING FIRE PROTECTION SYSTEM DESCRIPTION DOCUMENT

International Nuclear Information System (INIS)

J. D. Bigbee

2000-01-01

The Waste Handling Building Fire Protection System provides the capability to detect, control, and extinguish fires and/or mitigate explosions throughout the Waste Handling Building (WHB). Fire protection includes appropriate water-based and non-water-based suppression, as appropriate, and includes the distribution and delivery systems for the fire suppression agents. The Waste Handling Building Fire Protection System includes fire or explosion detection panel(s) controlling various detectors, system actuation, annunciators, equipment controls, and signal outputs. The system interfaces with the Waste Handling Building System for mounting of fire protection equipment and components, location of fire suppression equipment, suppression agent runoff, and locating fire rated barriers. The system interfaces with the Waste Handling Building System for adequate drainage and removal capabilities of liquid runoff resulting from fire protection discharges. The system interfaces with the Waste Handling Building Electrical Distribution System for power to operate, and with the Site Fire Protection System for fire protection water supply to automatic sprinklers, standpipes, and hose stations. The system interfaces with the Site Fire Protection System for fire signal transmission outside the WHB as needed to respond to a fire emergency, and with the Waste Handling Building Ventilation System to detect smoke and fire in specific areas, to protect building high-efficiency particulate air (HEPA) filters, and to control portions of the Waste Handling Building Ventilation System for smoke management and manual override capability. The system interfaces with the Monitored Geologic Repository (MGR) Operations Monitoring and Control System for annunciation, and condition status

Quinone-induced protein handling changes: Implications for major protein handling systems in quinone-mediated toxicity

International Nuclear Information System (INIS)

Xiong, Rui; Siegel, David; Ross, David

2014-01-01

Para-quinones such as 1,4-Benzoquinone (BQ) and menadione (MD) and ortho-quinones including the oxidation products of catecholamines, are derived from xenobiotics as well as endogenous molecules. The effects of quinones on major protein handling systems in cells; the 20/26S proteasome, the ER stress response, autophagy, chaperone proteins and aggresome formation, have not been investigated in a systematic manner. Both BQ and aminochrome (AC) inhibited proteasomal activity and activated the ER stress response and autophagy in rat dopaminergic N27 cells. AC also induced aggresome formation while MD had little effect on any protein handling systems in N27 cells. The effect of NQO1 on quinone induced protein handling changes and toxicity was examined using N27 cells stably transfected with NQO1 to generate an isogenic NQO1-overexpressing line. NQO1 protected against BQ–induced apoptosis but led to a potentiation of AC- and MD-induced apoptosis. Modulation of quinone-induced apoptosis in N27 and NQO1-overexpressing cells correlated only with changes in the ER stress response and not with changes in other protein handling systems. These data suggested that NQO1 modulated the ER stress response to potentiate toxicity of AC and MD, but protected against BQ toxicity. We further demonstrated that NQO1 mediated reduction to unstable hydroquinones and subsequent redox cycling was important for the activation of the ER stress response and toxicity for both AC and MD. In summary, our data demonstrate that quinone-specific changes in protein handling are evident in N27 cells and the induction of the ER stress response is associated with quinone-mediated toxicity. - Highlights: • Unstable hydroquinones contributed to quinone-induced ER stress and toxicity
Quinone-induced protein handling changes: Implications for major protein handling systems in quinone-mediated toxicity

Energy Technology Data Exchange (ETDEWEB)

Xiong, Rui; Siegel, David; Ross, David, E-mail: david.ross@ucdenver.edu

2014-10-15

Para-quinones such as 1,4-Benzoquinone (BQ) and menadione (MD) and ortho-quinones including the oxidation products of catecholamines, are derived from xenobiotics as well as endogenous molecules. The effects of quinones on major protein handling systems in cells; the 20/26S proteasome, the ER stress response, autophagy, chaperone proteins and aggresome formation, have not been investigated in a systematic manner. Both BQ and aminochrome (AC) inhibited proteasomal activity and activated the ER stress response and autophagy in rat dopaminergic N27 cells. AC also induced aggresome formation while MD had little effect on any protein handling systems in N27 cells. The effect of NQO1 on quinone induced protein handling changes and toxicity was examined using N27 cells stably transfected with NQO1 to generate an isogenic NQO1-overexpressing line. NQO1 protected against BQ–induced apoptosis but led to a potentiation of AC- and MD-induced apoptosis. Modulation of quinone-induced apoptosis in N27 and NQO1-overexpressing cells correlated only with changes in the ER stress response and not with changes in other protein handling systems. These data suggested that NQO1 modulated the ER stress response to potentiate toxicity of AC and MD, but protected against BQ toxicity. We further demonstrated that NQO1 mediated reduction to unstable hydroquinones and subsequent redox cycling was important for the activation of the ER stress response and toxicity for both AC and MD. In summary, our data demonstrate that quinone-specific changes in protein handling are evident in N27 cells and the induction of the ER stress response is associated with quinone-mediated toxicity. - Highlights: • Unstable hydroquinones contributed to quinone-induced ER stress and toxicity.
Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

Science.gov (United States)

Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

2014-12-01

Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.
Handling Kids in Crisis with Care

Science.gov (United States)

Bushinski, Cari

2018-01-01

The Handle with Care program helps schools help students who experience trauma. While at the scene of an event like a domestic violence call, drug raid, or car accident, law enforcement personnel determine the names and school of any children present. They notify that child's school to "handle ___ with care" the next day, and the school…
General Purpose Multimedia Dataset - GarageBand 2008

DEFF Research Database (Denmark)

Meng, Anders

This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....
Survey of tritiated oil sources and handling practices

International Nuclear Information System (INIS)

Miller, J.M.

1994-08-01

Tritium interactions with oil sources (primarily associated with pumps) in tritium-handling facilities can lead to the incorporation of tritium in the oil and the production of tritiated hydrocarbons. This results in a source of radiological hazard and the need for special handling considerations during maintenance, decontamination, decommissioning and waste packaging and storage. The results of a general survey of tritiated-oil sources and their associated characteristics, handling practices, analysis techniques and waste treatment/storage methods are summarized here. Information was obtained from various tritium-handling laboratories, fusion devices, and CANDU plants. 38 refs., 1 fig
Effects of handling on fear reactions in young Icelandic horses

DEFF Research Database (Denmark)

Marsbøll, Anna Feldberg; Christensen, Janne Winther

2015-01-01

To investigate the effect of a short-term standardised handling procedure on reactions of young horses in 2 types of fear tests (including and excluding human handling). Study design An experimental study with 3-year-old Icelandic horses (n = 24). Methods Handled horses (n = 12) were trained according...... to a standardised handling procedure whereas controls (n = 12) remained untrained. Behavioural and heart rate responses in a novel object test and 2 handling fear tests (HFTs) were measured. The HFTs were conducted with both an unknown (HFT-unknown) and a known handler (HFT-known). Results There was no effect...... correlated significantly between tests. Conclusions Previous handling may affect the behavioural fear response of horses when handled by their usual handler, whereas this effect did not apply to an unknown handler. Heart rates appeared unaffected by handling and may be a more reliable indicator...
Large Scale Survey Data in Career Development Research

Science.gov (United States)

Diemer, Matthew A.

2008-01-01

Large scale survey datasets have been underutilized but offer numerous advantages for career development scholars, as they contain numerous career development constructs with large and diverse samples that are followed longitudinally. Constructs such as work salience, vocational expectations, educational expectations, work satisfaction, and…
Quantifying uncertainty in observational rainfall datasets

Science.gov (United States)

Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

2015-04-01

The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded
HARVESTING, INTEGRATING AND DISTRIBUTING LARGE OPEN GEOSPATIAL DATASETS USING FREE AND OPEN-SOURCE SOFTWARE

Directory of Open Access Journals (Sweden)

R. Oliveira

2016-06-01

Full Text Available Federal, State and Local government agencies in the USA are investing heavily on the dissemination of Open Data sets produced by each of them. The main driver behind this thrust is to increase agencies’ transparency and accountability, as well as to improve citizens’ awareness. However, not all Open Data sets are easy to access and integrate with other Open Data sets available even from the same agency. The City and County of Denver Open Data Portal distributes several types of geospatial datasets, one of them is the city parcels information containing 224,256 records. Although this data layer contains many pieces of information it is incomplete for some custom purposes. Open-Source Software were used to first collect data from diverse City of Denver Open Data sets, then upload them to a repository in the Cloud where they were processed using a PostgreSQL installation on the Cloud and Python scripts. Our method was able to extract non-spatial information from a ‘not-ready-to-download’ source that could then be combined with the initial data set to enhance its potential use.
A Resampling-Based Stochastic Approximation Method for Analysis of Large Geostatistical Data

KAUST Repository

Liang, Faming

2013-03-01

The Gaussian geostatistical model has been widely used in modeling of spatial data. However, it is challenging to computationally implement this method because it requires the inversion of a large covariance matrix, particularly when there is a large number of observations. This article proposes a resampling-based stochastic approximation method to address this challenge. At each iteration of the proposed method, a small subsample is drawn from the full dataset, and then the current estimate of the parameters is updated accordingly under the framework of stochastic approximation. Since the proposed method makes use of only a small proportion of the data at each iteration, it avoids inverting large covariance matrices and thus is scalable to large datasets. The proposed method also leads to a general parameter estimation approach, maximum mean log-likelihood estimation, which includes the popular maximum (log)-likelihood estimation (MLE) approach as a special case and is expected to play an important role in analyzing large datasets. Under mild conditions, it is shown that the estimator resulting from the proposed method converges in probability to a set of parameter values of equivalent Gaussian probability measures, and that the estimator is asymptotically normally distributed. To the best of the authors\\' knowledge, the present study is the first one on asymptotic normality under infill asymptotics for general covariance functions. The proposed method is illustrated with large datasets, both simulated and real. Supplementary materials for this article are available online. © 2013 American Statistical Association.
Handling wood shavings

Energy Technology Data Exchange (ETDEWEB)

1974-09-18

Details of bulk handling equipment suitable for collection and compressing wood waste from commercial joinery works are discussed. The Redler Bin Discharger ensures free flow of chips from storage silo discharge prior to compression into briquettes for use as fuel or processing into chipboard.
Turkey Run Landfill Emissions Dataset

Data.gov (United States)

U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....
Variable Selection in Heterogeneous Datasets: A Truncated-rank Sparse Linear Mixed Model with Applications to Genome-wide Association Studies.

Science.gov (United States)

Wang, Haohan; Aragam, Bryon; Xing, Eric P

2018-04-26

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method. Copyright © 2018. Published by Elsevier Inc.
An Analysis of the GTZAN Music Genre Dataset

DEFF Research Database (Denmark)

Sturm, Bob L.

2012-01-01

Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...
A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

Science.gov (United States)

Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

2017-10-01

In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.
Handling of bulk solids theory and practice

CERN Document Server

Shamlou, P A

1990-01-01

Handling of Bulk Solids provides a comprehensive discussion of the field of solids flow and handling in the process industries. Presentation of the subject follows classical lines of separate discussions for each topic, so each chapter is self-contained and can be read on its own. Topics discussed include bulk solids flow and handling properties; pressure profiles in bulk solids storage vessels; the design of storage silos for reliable discharge of bulk materials; gravity flow of particulate materials from storage vessels; pneumatic transportation of bulk solids; and the hazards of solid-mater
Torus sector handling system

International Nuclear Information System (INIS)

Grisham, D.L.

1981-01-01

A remote handling system is proposed for moving a torus sector of the accelerator from under the cryostat to a point where it can be handled by a crane and for the reverse process for a new sector. Equipment recommendations are presented, as well as possible alignment schemes. Some general comments about future remote-handling methods and the present capabilities of existing systems will also be included. The specific task to be addressed is the removal and replacement of a 425 to 450 ton torus sector. This requires a horizontal movement of approx. 10 m from a normal operating position to a point where its further transport can be accomplished by more conventional means (crane or floor transporter). The same horizontal movement is required for reinstallation, but a positional tolerance of 2 cm is required to allow reasonable fit-up for the vacuum seal from the radial frames to the torus sector. Since the sectors are not only heavy but rather tall and narrow, the transport system must provide a safe, stable, and repeatable method fo sector movement. This limited study indicates that the LAMPF-based method of transporting torus sectors offers a proven method of moving heavy items. In addition, the present state of the art in remote equipment is adequate for FED maintenance
Dataset definition for CMS operations and physics analyses

Science.gov (United States)

Franzoni, Giovanni; Compact Muon Solenoid Collaboration

2016-04-01

Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.
Dataset definition for CMS operations and physics analyses

CERN Document Server

AUTHOR|(CDS)2051291

2016-01-01

Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

Dataset of NRDA emission data

Data.gov (United States)

U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....
Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

Science.gov (United States)

Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

2017-08-01

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.
Discovery and Reuse of Open Datasets: An Exploratory Study

Directory of Open Access Journals (Sweden)

Sara

2016-07-01

Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.
Visualization of conserved structures by fusing highly variable datasets.

Science.gov (United States)

Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

2002-01-01

Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual
An Annotated Dataset of 14 Cardiac MR Images

DEFF Research Database (Denmark)

Stegmann, Mikkel Bille

2002-01-01

This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....
Design approach for safe tritium handling in ITER

International Nuclear Information System (INIS)

Ohira, Shigeru

2002-01-01

Outlines for tritium handling and a fundamental approach for ensuring safety are presented. The amount of tritium stored and processed in the ITER facility will be much larger than that in the existing facilities for fusion research, though the processing methods and the conditions of processing (e.g., concentration, pressure, etc.) will be similar for those used in those facilities. Therefore, considerations to be taken for tritium handling, such as limitations of tritium permeation and leaks, provision of an appropriate ventilation/detritiation system for maintenance, measures to ensure mechanical integrity, etc., can be provided based on the knowledge obtained in the facilities. The Technical Advisory Committee of the Science and Technology Agency established a fundamental approach in 2000, and set out the basic safety principles and approaches as technical requirements of safety design and assessment, which were derived from the safety characteristics of the ITER plant. Sufficient prevention of accidents can be achieved by ensuring and maintaining the structural integrity of the enclosures containing radioactive materials against the loads anticipated during operation, and a low hazard potential of radioactive materials, sufficiently within prescribed limits, can be maintained by the vitiation and clean-up system even if large release is postulated. (author)
Getting to grips with remote handling and robotics

Energy Technology Data Exchange (ETDEWEB)

Mosey, D [Ontario Hydro, Toronto (Canada)

1984-12-01

A report on the Canadian Nuclear Society Conference on robotics and remote handling in the nuclear industry, September 1984. Remote handling in reactor operations, particularly in the Candu reactors is discussed, and the costs and benefits of use of remote handling equipment are considered. Steam generator inspection and repair is an area in which practical application of robotic technology has made a major advance.
Dataset - Adviesregel PPL 2010

NARCIS (Netherlands)

Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

2011-01-01

This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an
Handling knowledge on osteoporosis - a qualitative study

DEFF Research Database (Denmark)

Nielsen, Dorthe; Huniche, Lotte; Brixen, Kim

2013-01-01

Scand J Caring Sci; 2012 Handling knowledge on osteoporosis - a qualitative study The aim of this qualitative study was to increase understanding of the importance of osteoporosis information and knowledge for patients' ways of handling osteoporosis in their everyday lives. Interviews were...
Tension in the recent Type Ia supernovae datasets

International Nuclear Information System (INIS)

Wei, Hao

2010-01-01

In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.
Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

Science.gov (United States)

Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

2016-11-01

This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.
Technical note: An inorganic water chemistry dataset (1972–2011 ...

African Journals Online (AJOL)

A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...
Redefining NHS complaint handling--the real challenge.

Science.gov (United States)

Seelos, L; Adamson, C

1994-01-01

More and more organizations find that a constructive and open dialogue with their customers can be an effective strategy for building long-term customer relations. In this context, it has been recognized that effective complaint-contact handling can make a significant contribution to organizations' attempts to maximize customer satisfaction and loyalty. Within the NHS, an intellectual awareness exists that effective complaint/contact handling can contribute to making services more efficient and cost-effective by developing customer-oriented improvement initiatives. Recent efforts have focused on redefining NHS complaint-handling procedures to make them more user-friendly and effective for both NHS employees and customers. Discusses the challenges associated with opening up the NHS to customer feedback. Highlights potential weaknesses in the current approach and argues that the real challenge is for NHS managers to facilitate a culture change that moves the NHS away from a long-established defensive complaint handling practice.
EnviroAtlas - Percent Large, Medium, and Small Natural Areas for the Conterminous United States

Data.gov (United States)

U.S. Environmental Protection Agency — This EnviroAtlas dataset contains the percentage of small, medium, and large natural areas for each Watershed Boundary Dataset (WBD) 12-Digit Hydrologic Unit Code...
Design and operation of a remotely operated plutonium waste size reduction and material handling process

International Nuclear Information System (INIS)

Stewart, J.A. III; Charlesworth, D.L.

1986-01-01

Noncombustible 238 Pu and 239 Pu waste is generated as a result of normal operation and decommissioning activity at the Savannah River Plant, and is being retrievably stored there. As part of the long-term plant to process the stored waste and current waste for permanent disposal, a remote size reduction and material handling process is being cold-tested at Savannah River Laboratory. The process consists of a large, low-speed shredder and material handling system, a remote worktable, a bagless transfer system, and a robotically controlled manipulator. Initial testing of the shredder and material handling system and a cycle test of the bagless transfer system has been completed. Fabrication and acceptance testing of the Telerobat, a robotically controlled manipulator has been completed. Testing is scheduled to begin in 3/86. Design features maximizing the ability to remotely maintain the equipment were incorporated. Complete cold-testing of the equipment is scheduled to be completed in 1987
Wind and wave dataset for Matara, Sri Lanka

Science.gov (United States)

Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

2018-01-01

We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).
Wind and wave dataset for Matara, Sri Lanka

Directory of Open Access Journals (Sweden)

Y. Luo

2018-01-01

Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.
QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

Directory of Open Access Journals (Sweden)

Davy Guan

2018-04-01

Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.
MRI findings in bucket-handle tears of the triangular fibrocartilage complex

International Nuclear Information System (INIS)

Jose, Jean; Arizpe, Azael; Chen, David; Barrera, Carlos M.; Ezuddin, Nisreen Shabbir

2018-01-01

The triangular fibrocartilage complex (TFCC) is an intricate ligamentous and cartilaginous structure that helps transmit axial load across the wrist, and provide stability to the ulnocarpal and distal radioulnar joints (DRUJ). Because the blood supply to the TFCC varies depending on location, certain types of tears are more amenable to surgical repair than others. Since Palmer proposed his classification system of TFCC tears in 1989, only 1 case of a ''bucket-handle'' type tear has been reported. In this article, we describe two new cases of bucket-handle tears of the TFCC. In both cases, the torn fragment was displaced into a previously undescribed location (intra-articular DRUJ and prestyloid recess). Because this type of injury pattern has not been previously well characterized in the literature and such cases rarely reported, MRI findings have not been fully described and its implications on clinical management have largely yet to be determined. (orig.)
MRI findings in bucket-handle tears of the triangular fibrocartilage complex

Energy Technology Data Exchange (ETDEWEB)

Jose, Jean [University of Miami Miller School of Medicine, Department of Radiology, Jackson Memorial Hospital, Miami, FL (United States); University of Miami Miller School of Medicine, Department of Orthopedic Surgery, Uhealth Sports Medicine Institute, Miami, FL (United States); Arizpe, Azael; Chen, David [University of Miami Miller School of Medicine, Department of Orthopedic Surgery, Jackson Memorial Hospital, Miami, FL (United States); Barrera, Carlos M.; Ezuddin, Nisreen Shabbir [University of Miami, Miller School of Medicine, Coral Gables, FL (United States)

2018-03-15

The triangular fibrocartilage complex (TFCC) is an intricate ligamentous and cartilaginous structure that helps transmit axial load across the wrist, and provide stability to the ulnocarpal and distal radioulnar joints (DRUJ). Because the blood supply to the TFCC varies depending on location, certain types of tears are more amenable to surgical repair than others. Since Palmer proposed his classification system of TFCC tears in 1989, only 1 case of a ''bucket-handle'' type tear has been reported. In this article, we describe two new cases of bucket-handle tears of the TFCC. In both cases, the torn fragment was displaced into a previously undescribed location (intra-articular DRUJ and prestyloid recess). Because this type of injury pattern has not been previously well characterized in the literature and such cases rarely reported, MRI findings have not been fully described and its implications on clinical management have largely yet to be determined. (orig.)

MHSS: a material handling system simulator

Energy Technology Data Exchange (ETDEWEB)

Pomernacki, L.; Hollstien, R.B.

1976-04-07

A Material Handling System Simulator (MHSS) program is described that provides specialized functional blocks for modeling and simulation of nuclear material handling systems. Models of nuclear fuel fabrication plants may be built using functional blocks that simulate material receiving, storage, transport, inventory, processing, and shipping operations as well as the control and reporting tasks of operators or on-line computers. Blocks are also provided that allow the user to observe and gather statistical information on the dynamic behavior of simulated plants over single or replicated runs. Although it is currently being developed for the nuclear materials handling application, MHSS can be adapted to other industries in which material accountability is important. In this paper, emphasis is on the simulation methodology of the MHSS program with application to the nuclear material safeguards problem. (auth)
Large Display Interaction Using Mobile Devices

OpenAIRE

Bauer, Jens

2015-01-01

Large displays become more and more popular, due to dropping prices. Their size and high resolution leverages collaboration and they are capable of dis- playing even large datasets in one view. This becomes even more interesting as the number of big data applications increases. The increased screen size and other properties of large displays pose new challenges to the Human- Computer-Interaction with these screens. This includes issues such as limited scalability to the number of users, diver...
Treatment of plutonium-contaminated solid waste: a review of handling systems

International Nuclear Information System (INIS)

Meredith, B.E.; Hardy, A.R.

1985-02-01

Handling techniques are reviewed to identify those suitable for adaptation for use in transporting large items of redundant plutonium contaminated plant and equipment to a remotely operated size reduction facility, moving them into the facility, presenting them to size reduction equipment and loading the processed waste into drums. It is concluded that an integrated system based on a combination of slatted conveyors, roller tables, air transporters and manipulators, merits further consideration. An appropriate experimental programme is outlined. (author)
Development of Safe Food Handling Guidelines for Korean Consumers.

Science.gov (United States)

Kang, Hee-Jin; Lee, Min-Woo; Hwang, In-Kyeong; Kim, Jeong-Weon

2015-08-01

The purpose of this study was to develop guidelines for Korean consumers with regard to safe food handling practices at home by identifying current food handling issues. Korean consumers' behaviors regarding their safe food handling were identified via survey questionnaires that included items on individual hygiene practices, prepreparation steps when cooking, the cooking process, and the storage of leftover foods. The subjects were 417 Korean parents with elementary school children living in Seoul and Gyeonggi Province in the central area of Korea. The survey results revealed gaps between the knowledge or practices of Korean consumers and scientific evidence pertaining to safe food handling practices. Based on these findings, a leaflet on safe food handling guidelines was developed in accordance with Korean food culture. These guidelines suggest personal hygiene practices as well as fundamental principles and procedures for safe food handling from the stage of food purchase to that of keeping leftover dishes. A pilot application study with 50 consumers revealed that the guidelines effectively improved Korean consumers' safe food handling practices, suggesting that they can serve as practical educational material suitable for Korean consumers.
The Dataset of Countries at Risk of Electoral Violence

OpenAIRE

Birch, Sarah; Muchlinski, David

2017-01-01

Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...
Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

Science.gov (United States)

Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

2010-06-30

QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but
Towards interoperable and reproducible QSAR analyses: Exchange of datasets

Directory of Open Access Journals (Sweden)

Spjuth Ola

2010-06-01

Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join
VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

Science.gov (United States)

Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.
Environmental and safety aspects. The best of 'powder handling and processing' 1989 - 1997. 'Bulk solids handling' 1992 - 1997 (H/2000)

Energy Technology Data Exchange (ETDEWEB)

Woehlbier, R.H. (ed.)

2000-07-01

The book contains articles published either during 1992-1997 in ''bulk solids handling'' or during 1989-1997 in ''powder handling and processing''. Main topics are aspects of safety and environmental protection in bulk solids handling: dusts, hazardous powders, prevention and mitigation of dust explosions, powdered coal handling, dedusting, filters, electrostatic precipitation, materials recovery, occupational safety.(uke)
A semiparametric graphical modelling approach for large-scale equity selection.

Science.gov (United States)

Liu, Han; Mulvey, John; Zhao, Tianqi

2016-01-01

We propose a new stock selection strategy that exploits rebalancing returns and improves portfolio performance. To effectively harvest rebalancing gains, we apply ideas from elliptical-copula graphical modelling and stability inference to select stocks that are as independent as possible. The proposed elliptical-copula graphical model has a latent Gaussian representation; its structure can be effectively inferred using the regularized rank-based estimators. The resulting algorithm is computationally efficient and scales to large data-sets. To show the efficacy of the proposed method, we apply it to conduct equity selection based on a 16-year health care stock data-set and a large 34-year stock data-set. Empirical tests show that the proposed method is superior to alternative strategies including a principal component analysis-based approach and the classical Markowitz strategy based on the traditional buy-and-hold assumption.
Development of automated analytical systems for large throughput

International Nuclear Information System (INIS)

Ernst, P.C.; Hoffman, E.L.

1982-01-01

The need to be able to handle a large throughput of samples for neutron activation analysis has led to the development of automated counting and sample handling systems. These are coupled with available computer-assisted INAA techniques to perform a wide range of analytical services on a commercial basis. A fully automated delayed neutron counting system and a computer controlled pneumatic transfer for INAA use are described, as is a multi-detector gamma-spectroscopy system. (author)
7 CFR 959.126 - Handling of culls.

Science.gov (United States)

2010-01-01

... 7 Agriculture 8 2010-01-01 2010-01-01 false Handling of culls. 959.126 Section 959.126 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (Marketing Agreements...) Handled for canning or freezing. (b) As a safeguard against culls entering fresh market channels each...
Survey of postharvest handling, preservation and processing ...

African Journals Online (AJOL)

Survey of postharvest handling, preservation and processing practices along the camel milk chain in Isiolo district, Kenya. ... Despite the important contribution of camel milk to food security for pastoralists in Kenya, little is known about the postharvest handling, preservation and processing practices. In this study, existing ...
Handling uncertainty through adaptiveness in planning approaches

NARCIS (Netherlands)

Zandvoort, M.; Vlist, van der M.J.; Brink, van den A.

2018-01-01

Planners and water managers seek to be adaptive to handle uncertainty through the use of planning approaches. In this paper, we study what type of adaptiveness is proposed and how this may be operationalized in planning approaches to adequately handle different uncertainties. We took a
Remote handling facility and equipment used for space truss assembly

International Nuclear Information System (INIS)

Burgess, T.W.

1987-01-01

The ACCESS truss remote handling experiments were performed at Oak Ridge National Laboratory's (ORNL's) Remote Operation and Maintenance Demonstration (ROMD) facility. The ROMD facility has been developed by the US Department of Energy's (DOE's) Consolidated Fuel Reprocessing Program to develop and demonstrate remote maintenance techniques for advanced nuclear fuel reprocessing equipment and other programs of national interest. The facility is a large-volume, high-bay area that encloses a complete, technologically advanced remote maintenance system that first began operation in FY 1982. The maintenance system consists of a full complement of teleoperated manipulators, manipulator transport systems, and overhead hoists that provide the capability of performing a large variety of remote handling tasks. This system has been used to demonstrate remote manipulation techniques for the DOE, the Power Reactor and Nuclear Fuel Development Corporation (PNC) of Japan, and the US Navy in addition to the National Aeronautics and Space Administration. ACCESS truss remote assembly was performed in the ROMD facility using the Central Research Laboratory's (CRL) model M-2 servomanipulator. The model M-2 is a dual-arm, bilateral force-reflecting, master/slave servomanipulator which was jointly developed by CRL and ORNL and represents the state of the art in teleoperated manipulators commercially available in the United States today. The model M-2 servomanipulator incorporates a distributed, microprocessor-based digital control system and was the first successful implementation of an entirely digitally controlled servomanipulator. The system has been in operation since FY 1983. 3 refs., 2 figs
PREPD O and VE remote handling system

International Nuclear Information System (INIS)

Theil, T.N.

1985-01-01

The Process Experimental Pilot Plant (PREPP) at the Idaho National Engineering Laboratory is designed for volume reduction and packaging of transuranic (TRU) waste. The PREPP opening and verification enclosure (O and VE) remote handling system, within that facility, is designed to provide examination of the contents of various TRU waste storage containers. This remote handling system will provide the means of performing a hazardous operation that is currently performed manually. The TeleRobot to be used in this system is a concept that will incorporate and develop man in the loop operation (manual mode), standardized automatic sequencing of end effector tools, increased payload and reach over currently available computer-controlled robots, and remote handling of a hazardous waste operation. The system is designed within limited space constraints and an operation that was originally planned, and is currently being manually performed at other plants. The PREPP O and VE remote handling system design incorporates advancing technology to improve the working environment in the nuclear field
Harvesting and handling agricultural residues for energy

Energy Technology Data Exchange (ETDEWEB)

Jenkins, B.M.; Summer, H.R.

1986-05-01

Significant progress in understanding the needs for design of agricultural residue collection and handling systems has been made but additional research is required. Recommendations are made for research to (a) integrate residue collection and handling systems into general agricultural practices through the development of multi-use equipment and total harvest systems; (b) improve methods for routine evaluation of agricultural residue resources, possibly through remote sensing and image processing; (c) analyze biomass properties to obtain detailed data relevant to engineering design and analysis; (d) evaluate long-term environmental, social, and agronomic impacts of residue collection; (e) develop improved equipment with higher capacities to reduce residue collection and handling costs, with emphasis on optimal design of complete systems including collection, transportation, processing, storage, and utilization; and (f) produce standard forms of biomass fuels or products to enhance material handling and expand biomass markets through improved reliability and automatic control of biomass conversion and other utilization systems. 118 references.
3DSEM: A 3D microscopy dataset

Directory of Open Access Journals (Sweden)

Ahmad P. Tafti

2016-03-01

Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM
Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

Directory of Open Access Journals (Sweden)

Mingwei Leng

2013-01-01

Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.
30 CFR 75.817 - Cable handling and support systems.

Science.gov (United States)

2010-07-01

... High-Voltage Longwalls § 75.817 Cable handling and support systems. Longwall mining equipment must be provided with cable-handling and support systems that are constructed, installed and maintained to minimize... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Cable handling and support systems. 75.817...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.