WorldWideScience

Sample records for publication georeferencing datasets

  1. Georeferencing UAS Derivatives Through Point Cloud Registration with Archived Lidar Datasets

    Science.gov (United States)

    Magtalas, M. S. L. Y.; Aves, J. C. L.; Blanco, A. C.

    2016-10-01

    Georeferencing gathered images is a common step before performing spatial analysis and other processes on acquired datasets using unmanned aerial systems (UAS). Methods of applying spatial information to aerial images or their derivatives is through onboard GPS (Global Positioning Systems) geotagging, or through tying of models through GCPs (Ground Control Points) acquired in the field. Currently, UAS (Unmanned Aerial System) derivatives are limited to meter-levels of accuracy when their generation is unaided with points of known position on the ground. The use of ground control points established using survey-grade GPS or GNSS receivers can greatly reduce model errors to centimeter levels. However, this comes with additional costs not only with instrument acquisition and survey operations, but also in actual time spent in the field. This study uses a workflow for cloud-based post-processing of UAS data in combination with already existing LiDAR data. The georeferencing of the UAV point cloud is executed using the Iterative Closest Point algorithm (ICP). It is applied through the open-source CloudCompare software (Girardeau-Montaut, 2006) on a `skeleton point cloud'. This skeleton point cloud consists of manually extracted features consistent on both LiDAR and UAV data. For this cloud, roads and buildings with minimal deviations given their differing dates of acquisition are considered consistent. Transformation parameters are computed for the skeleton cloud which could then be applied to the whole UAS dataset. In addition, a separate cloud consisting of non-vegetation features automatically derived using CANUPO classification algorithm (Brodu and Lague, 2012) was used to generate a separate set of parameters. Ground survey is done to validate the transformed cloud. An RMSE value of around 16 centimeters was found when comparing validation data to the models georeferenced using the CANUPO cloud and the manual skeleton cloud. Cloud-to-cloud distance computations of

  2. Automatic Georeferencing of Astronaut Auroral Photography: Providing a New Dataset for Space Physics

    Science.gov (United States)

    Riechert, Maik; Walsh, Andrew P.; Taylor, Matt

    2014-05-01

    Astronauts aboard the International Space Station (ISS) have taken tens of thousands of photographs showing the aurora in high temporal and spatial resolution. The use of these images in research though is limited as they often miss accurate pointing and scale information. In this work we develop techniques and software libraries to automatically georeference such images, and provide a time and location-searchable database and website of those images. Aurora photographs very often include a visible starfield due to the necessarily long camera exposure times. We extend on the proof-of-concept of Walsh et al. (2012) who used starfield recognition software, Astrometry.net, to reconstruct the pointing and scale information. Previously a manual pre-processing step, the starfield can now in most cases be separated from earth and spacecraft structures successfully using image recognition. Once the pointing and scale of an image are known, latitudes and longitudes can be calculated for each pixel corner for an assumed auroral emission height. As part of this work, an open-source Python library is developed which automates the georeferencing process and aids in visualization tasks. The library facilitates the resampling of the resulting data from an irregular to a regular coordinate grid in a given pixel per degree density, it supports the export of data in CDF and NetCDF formats, and it generates polygons for drawing graphs and stereographic maps. In addition, the THEMIS all-sky imager web archive has been included as a first transparently accessible imaging source which in this case is useful when drawing maps of ISS passes over North America. The database and website are in development and will use the Python library as their base. Through this work, georeferenced auroral ISS photography is made available as a continously extended and easily accessible dataset. This provides potential not only for new studies on the aurora australis, as there are few all-sky imagers in

  3. Georeferencing Animal Specimen Datasets

    NARCIS (Netherlands)

    van Erp, M.G.J.; Hensel, R.; Ceolin, D.; van der Meij, M.

    2014-01-01

    For biodiversity research, the field of study that is concerned with the richness of species of our planet, it is of the utmost importance that the location of an animal specimen find is known with high precision. Due to specimens often having been collected over the course of many years, their

  4. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  5. Georeferenced cartography dataset of the La Fossa crater fumarolic field at Vulcano Island (Aeolian Archipelago, Italy: conversion and comparison of data from local to global positioning methods

    Directory of Open Access Journals (Sweden)

    Carmelo Sammarco

    2011-07-01

    Full Text Available The present study illustrates the procedures applied for the coordinate system conversion of the historical fumarole positions at La Fossa crater, to allow their comparison with newly acquired global positioning system (GPS data. Due to the absence of ground control points in the field and on both the old Gauss Boaga and the new UTM WGS 1984 maps, we had to model the transformation errors between the two systems using differential GPS techniques. Once corrected, the maps show a residual Easting shifting, due to erroneous georeferencing of the original base maps; this is corrected by morphological comparative methods. The good correspondence between the corrected positions of the historical data and the results of the new GPS survey that was carried out in 2009 highlights the good quality of the old surveys, although they were carried out without the use of accurate topographical instruments.

  6. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  7. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Science.gov (United States)

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  8. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  9. GEOREFERENCED IMAGE SYSTEM WITH DRONES

    Directory of Open Access Journals (Sweden)

    Héctor A. Pérez-Sánchez

    2017-07-01

    Full Text Available This paper has as general purpose develop and implementation of a system that allows the generation of flight routes for a drone, the acquisition of geographic location information (GPS during the flight and taking photographs of points of interest for creating georeferenced images, same that will be used to generate KML files (Keyhole Markup Language for the representation of geographical data in three dimensions to be displayed on the Google Earth tool.

  10. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    Energy Technology Data Exchange (ETDEWEB)

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  11. Dataset of Building and Environment Publication in 2016, A reference method for measuring emissions of SVOCs in small chambers

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains DEHP air concentrations in the emission test chamber. This dataset...

  12. A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

    Directory of Open Access Journals (Sweden)

    Claudiane A. Fukuchi

    2018-04-01

    Full Text Available In a typical clinical gait analysis, the gait patterns of pathological individuals are commonly compared with the typically faster, comfortable pace of healthy subjects. However, due to potential bias related to gait speed, this comparison may not be valid. Publicly available gait datasets have failed to address this issue. Therefore, the goal of this study was to present a publicly available dataset of 42 healthy volunteers (24 young adults and 18 older adults who walked both overground and on a treadmill at a range of gait speeds. Their lower-extremity and pelvis kinematics were measured using a three-dimensional (3D motion-capture system. The external forces during both overground and treadmill walking were collected using force plates and an instrumented treadmill, respectively. The results include both raw and processed kinematic and kinetic data in different file formats: c3d and ASCII files. In addition, a metadata file is provided that contain demographic and anthropometric data and data related to each file in the dataset. All data are available at Figshare (DOI: 10.6084/m9.figshare.5722711. We foresee several applications of this public dataset, including to examine the influences of speed, age, and environment (overground vs. treadmill on gait biomechanics, to meet educational needs, and, with the inclusion of additional participants, to use as a normative dataset.

  13. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  14. A dataset for examining trends in publication of new Australian insects

    Directory of Open Access Journals (Sweden)

    Robert Mesibov

    2014-07-01

    Full Text Available Australian Faunal Directory data were used to create a new, publicly available dataset, nai50, which lists 18318 species and subspecies names for Australian insects described in the period 1961–2010, together with associated publishing data. The number of taxonomic publications introducing the new names varied little around a long-term average of 70 per year, with ca 420 new names published per year during the 30-year period 1981–2010. Within this stable pattern there were steady increases in multi-authored and 'Smith in Jones and Smith' names, and a decline in publication of names in entomology journals and books. For taxonomic works published in Australia, a publications peak around 1990 reflected increases in museum, scientific society and government agency publishing, but a subsequent decline is largely explained by a steep drop in the number of papers on insect taxonomy published by Australia's national science agency, CSIRO.

  15. Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

    Science.gov (United States)

    Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

    2017-12-01

    This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

  16. Dataset of Atmospheric Environment Publication in 2016, Characterization of organophosphorus flame retardants’ sorption on building materials and consumer products

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains OPFR sorption concentrations on building materials and consumer...

  17. Datasets will not be made accessible to the public due to the fact that they include household level data with PII.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Datasets will not be made accessible to the public due to the fact that they include household level data with PII. This dataset is not publicly accessible because:...

  18. Public Health Applications of Remotely-sensed Environmental Datasets for the Conterminous United States

    Science.gov (United States)

    Al-Hamdan, Mohammad; Crosson, William; Economou, Sigrid; Estes, Marice Jr; Estes, Sue; Hemmings, Sarah; Kent, Shia; Puckett, Mark; Quattrochi, Dale; Wade, Gina

    2013-01-01

    NASA Marshall Space Flight Center is collaborating with the University of Alabama at Birmingham (UAB) School of Public Health and the Centers for Disease Control and Prevention (CDC) National Center for Public Health Informatics to address issues of environmental health and enhance public health decision-making using NASA remotely-sensed data and products. The objectives of this study are to develop high-quality spatial data sets of environmental variables, link these with public health data from a national cohort study, and deliver the linked data sets and associated analyses to local, state and federal end-user groups. Three daily environmental data sets were developed for the conterminous U.S. on different spatial resolutions for the period 2003-2008: (1) spatial surfaces of estimated fine particulate matter (PM2.5) exposures on a 10-km grid using the US Environmental Protection Agency (EPA) ground observations and NASA's MODerate-resolution Imaging Spectroradiometer (MODIS) data; (2) a 1-km grid of Land Surface Temperature (LST) using MODIS data; and (3) a 12-km grid of daily Incoming Solar Radiation (Insolation) and heat-related products using the North American Land Data Assimilation System (NLDAS) forcing data. These environmental data sets were linked with public health data from the UAB REasons for Geographic And Racial Differences in Stroke (REGARDS) national cohort study to determine whether exposures to these environmental risk factors are related to cognitive decline, stroke and other health outcomes. These environmental datasets and the results of the public health linkage analyses will be disseminated to end-users for decision-making through the CDC Wide-ranging Online Data for Epidemiologic Research (WONDER) system and through peer-reviewed publications respectively. The linkage of these data with the CDC WONDER system substantially expands public access to NASA data, making their use by a wide range of decision makers feasible. By successful

  19. User Defined Geo-referenced Information

    DEFF Research Database (Denmark)

    Konstantas, Dimitri; Villalba, Alfredo; di Marzo Serugendo, Giovanna

    2009-01-01

    . In this paper we present two novel mobile and wireless collaborative services and concepts, the Hovering Information, a mobile, geo-referenced content information management system, and the QoS Information service, providing user observed end-to-end infrastructure geo-related QoS information....

  20. Georeferencing in QGIS 2.0

    Directory of Open Access Journals (Sweden)

    Jim Clifford

    2013-12-01

    Full Text Available In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer. Georeferencing is required for anyone who wants to accurately digitize data found on a paper map, and since historians work mostly in the realm of paper, georeferencing is one of our most commonly used tools. The technique uses a series of control points to give a two-dimensional object like a paper map the real world coordinates it needs to align with the three-dimensional features of the earth in GIS software (in Intro to Google Maps and Google Earth we saw an ‘overlay’ which is a Google Earth shortcut version of georeferencing. Georeferencing a historical map requires a knowledge of both the geography and the history of the place you are studying to ensure accuracy. The built and natural landscapes change over time, and it is important to confirm that the location of your control points — whether they be houses, intersections, or even towns — have remained constant. Entering control points in a GIS is easy, but behind the scenes, georeferencing uses complex transformation and compression processes. These are used to correct the distortions and inaccuracies found in many historical maps and stretch the maps so that they fit geographic coordinates. In cartography this is known as rubber-sheeting because it treats the map as if it were made of rubber and the control points as if they were tacks ‘pinning’ the historical document to a three dimensional surface like the globe. To offer some examples of georeferenced historical maps, we prepared some National Topographic Series maps hosted on the University of Toronto Map Library website courtesy of Marcel Fortin, and we overlaid them on a Google web map. Viewers can adjust the transparency with the slider bar on the top right, view the historical map as an overlay on terrain or satellite images, or click ‘Earth’ to switch into Google Earth mode and see 3D

  1. Automatic Detection of Online Recruitment Frauds: Characteristics, Methods, and a Public Dataset

    Directory of Open Access Journals (Sweden)

    Sokratis Vidros

    2017-03-01

    Full Text Available The critical process of hiring has relatively recently been ported to the cloud. Specifically, the automated systems responsible for completing the recruitment of new employees in an online fashion, aim to make the hiring process more immediate, accurate and cost-efficient. However, the online exposure of such traditional business procedures has introduced new points of failure that may lead to privacy loss for applicants and harm the reputation of organizations. So far, the most common case of Online Recruitment Frauds (ORF, is employment scam. Unlike relevant online fraud problems, the tackling of ORF has not yet received the proper attention, remaining largely unexplored until now. Responding to this need, the work at hand defines and describes the characteristics of this severe and timely novel cyber security research topic. At the same time, it contributes and evaluates the first to our knowledge publicly available dataset of 17,880 annotated job ads, retrieved from the use of a real-life system.

  2. Road Bridges and Culverts, Bridge dataset only includes bridges maintained by Johnson County Public Works in the unincorporated areas, Published in Not Provided, Johnson County Government.

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — Road Bridges and Culverts dataset current as of unknown. Bridge dataset only includes bridges maintained by Johnson County Public Works in the unincorporated areas.

  3. Proliferative diabetic retinopathy characterization based on fractal features: Evaluation on a publicly available dataset.

    Science.gov (United States)

    Orlando, José Ignacio; van Keer, Karel; Barbosa Breda, João; Manterola, Hugo Luis; Blaschko, Matthew B; Clausse, Alejandro

    2017-12-01

    Diabetic retinopathy (DR) is one of the most widespread causes of preventable blindness in the world. The most dangerous stage of this condition is proliferative DR (PDR), in which the risk of vision loss is high and treatments are less effective. Fractal features of the retinal vasculature have been previously explored as potential biomarkers of DR, yet the current literature is inconclusive with respect to their correlation with PDR. In this study, we experimentally assess their discrimination ability to recognize PDR cases. A statistical analysis of the viability of using three reference fractal characterization schemes - namely box, information, and correlation dimensions - to identify patients with PDR is presented. These descriptors are also evaluated as input features for training ℓ1 and ℓ2 regularized logistic regression classifiers, to estimate their performance. Our results on MESSIDOR, a public dataset of 1200 fundus photographs, indicate that patients with PDR are more likely to exhibit a higher fractal dimension than healthy subjects or patients with mild levels of DR (P≤1.3×10-2). Moreover, a supervised classifier trained with both fractal measurements and red lesion-based features reports an area under the ROC curve of 0.93 for PDR screening and 0.96 for detecting patients with optic disc neovascularizations. The fractal dimension of the vasculature increases with the level of DR. Furthermore, PDR screening using multiscale fractal measurements is more feasible than using their derived fractal dimensions. Code and further resources are provided at https://github.com/ignaciorlando/fundus-fractal-analysis. © 2017 American Association of Physicists in Medicine.

  4. OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets.

    Science.gov (United States)

    Perez-Riverol, Yasset; Ternent, Tobias; Koch, Maximilian; Barsnes, Harald; Vrousgou, Olga; Jupp, Simon; Vizcaíno, Juan Antonio

    2017-10-01

    The availability of user-friendly software to annotate biological datasets and experimental details is becoming essential in data management practices, both in local storage systems and in public databases. The Ontology Lookup Service (OLS, http://www.ebi.ac.uk/ols) is a popular centralized service to query, browse and navigate biomedical ontologies and controlled vocabularies. Recently, the OLS framework has been completely redeveloped (version 3.0), including enhancements in the data model, like the added support for Web Ontology Language based ontologies, among many other improvements. However, the new OLS is not backwards compatible and new software tools are needed to enable access to this widely used framework now that the previous version is no longer available. We here present the OLS Client as a free, open-source Java library to retrieve information from the new version of the OLS. It enables rapid tool creation by providing a robust, pluggable programming interface and common data model to programmatically access the OLS. The library has already been integrated and is routinely used by several bioinformatics resources and related data annotation tools. Secondly, we also introduce an updated version of the OLS Dialog (version 2.0), a Java graphical user interface that can be easily plugged into Java desktop applications to access the OLS. The software and related documentation are freely available at https://github.com/PRIDE-Utilities/ols-client and https://github.com/PRIDE-Toolsuite/ols-dialog. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. Approximate direct georeferencing in national coordinates

    Science.gov (United States)

    Legat, Klaus

    Direct georeferencing has gained an increasing importance in photogrammetry and remote sensing. Thereby, the parameters of exterior orientation (EO) of an image sensor are determined by GPS/INS, yielding results in a global geocentric reference frame. Photogrammetric products like digital terrain models or orthoimages, however, are often required in national geodetic datums and mapped by national map projections, i.e., in "national coordinates". As the fundamental mathematics of photogrammetry is based on Cartesian coordinates, the scene restitution is often performed in a Cartesian frame located at some central position of the image block. The subsequent transformation to national coordinates is a standard problem in geodesy and can be done in a rigorous manner-at least if the formulas of the map projection are rigorous. Drawbacks of this procedure include practical deficiencies related to the photogrammetric processing as well as the computational cost of transforming the whole scene. To avoid these problems, the paper pursues an alternative processing strategy where the EO parameters are transformed prior to the restitution. If only this transition was done, however, the scene would be systematically distorted. The reason is that the national coordinates are not Cartesian due to the earth curvature and the unavoidable length distortion of map projections. To settle these distortions, several corrections need to be applied. These are treated in detail for both passive and active imaging. Since all these corrections are approximations only, the resulting technique is termed "approximate direct georeferencing". Still, the residual distortions are usually very low as is demonstrated by simulations, rendering the technique an attractive approach to direct georeferencing.

  6. Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations.

    Science.gov (United States)

    Hrynaszkiewicz, Iain; Khodiyar, Varsha; Hufton, Andrew L; Sansone, Susanna-Assunta

    2016-01-01

    Sharing of experimental clinical research data usually happens between individuals or research groups rather than via public repositories, in part due to the need to protect research participant privacy. This approach to data sharing makes it difficult to connect journal articles with their underlying datasets and is often insufficient for ensuring access to data in the long term. Voluntary data sharing services such as the Yale Open Data Access (YODA) and Clinical Study Data Request (CSDR) projects have increased accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses but these resources are generally disconnected from journal articles-where researchers typically search for reliable information to inform future research. New scholarly journal and article types dedicated to increasing accessibility of research data have emerged in recent years and, in general, journals are developing stronger links with data repositories. There is a need for increased collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services to increase the visibility and reliability of clinical research. Using the journal Scientific Data as a case study, we propose and show examples of changes to the format and peer-review process for journal articles to more robustly link them to data that are only available on request. We also propose additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs).

  7. A New Heuristic Anonymization Technique for Privacy Preserved Datasets Publication on Cloud Computing

    Science.gov (United States)

    Aldeen Yousra, S.; Mazleena, Salleh

    2018-05-01

    Recent advancement in Information and Communication Technologies (ICT) demanded much of cloud services to sharing users’ private data. Data from various organizations are the vital information source for analysis and research. Generally, this sensitive or private data information involves medical, census, voter registration, social network, and customer services. Primary concern of cloud service providers in data publishing is to hide the sensitive information of individuals. One of the cloud services that fulfill the confidentiality concerns is Privacy Preserving Data Mining (PPDM). The PPDM service in Cloud Computing (CC) enables data publishing with minimized distortion and absolute privacy. In this method, datasets are anonymized via generalization to accomplish the privacy requirements. However, the well-known privacy preserving data mining technique called K-anonymity suffers from several limitations. To surmount those shortcomings, I propose a new heuristic anonymization framework for preserving the privacy of sensitive datasets when publishing on cloud. The advantages of K-anonymity, L-diversity and (α, k)-anonymity methods for efficient information utilization and privacy protection are emphasized. Experimental results revealed the superiority and outperformance of the developed technique than K-anonymity, L-diversity, and (α, k)-anonymity measure.

  8. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  9. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  10. Overview of a public-industry partnership for enhancing corn nitrogen research and datasets

    Science.gov (United States)

    Due to economic and environmental consequences of nitrogen (N) lost from fertilizer applications in corn (Zea mays L.), considerable public and industry attention has been devoted to development of N decision tools. Now a wide variety of tools are available to farmers for managing N inputs. However,...

  11. Projected outcomes of a public-industry partnership for enhancing corn nitrogen research and datasets

    Science.gov (United States)

    Research is needed over a wide geographic range of soil and weather scenarios to evaluate methods and tools for corn N fertilizer applications. The objectives of this research were to conduct standardized corn N rate response field studies to evaluate the performance of multiple public-domain N deci...

  12. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets.

    Science.gov (United States)

    Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-08-01

    Lesions detected at mammography are described with a highly standardized terminology: the breast imaging-reporting and data system (BI-RADS) lexicon. Up to now, no validated semantic computer assisted classification algorithm exists to interactively link combinations of morphological descriptors from the lexicon to a probabilistic risk estimate of malignancy. The authors therefore aim at the external validation of the mammographic mass diagnosis (MMassDx) algorithm. A classification algorithm like MMassDx must perform well in a variety of clinical circumstances and in datasets that were not used to generate the algorithm in order to ultimately become accepted in clinical routine. The MMassDx algorithm uses a naïve Bayes network and calculates post-test probabilities of malignancy based on two distinct sets of variables, (a) BI-RADS descriptors and age ("descriptor model") and (b) BI-RADS descriptors, age, and BI-RADS assessment categories ("inclusive model"). The authors evaluate both the MMassDx (descriptor) and MMassDx (inclusive) models using two large publicly available datasets of mammographic mass lesions: the digital database for screening mammography (DDSM) dataset, which contains two subsets from the same examinations-a medio-lateral oblique (MLO) view and cranio-caudal (CC) view dataset-and the mammographic mass (MM) dataset. The DDSM contains 1220 mass lesions and the MM dataset contains 961 mass lesions. The authors evaluate discriminative performance using area under the receiver-operating-characteristic curve (AUC) and compare this to the BI-RADS assessment categories alone (i.e., the clinical performance) using the DeLong method. The authors also evaluate whether assigned probabilistic risk estimates reflect the lesions' true risk of malignancy using calibration curves. The authors demonstrate that the MMassDx algorithms show good discriminatory performance. AUC for the MMassDx (descriptor) model in the DDSM data is 0.876/0.895 (MLO/CC view) and AUC

  13. Dataset of Atmospheric Environment Publication in 2016, Source emission and model evaluation of formaldehyde from composite and solid wood furniture in a full-scale chamber

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains formaldehyde air concentrations in the emission test chamber and...

  14. Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds

    Energy Technology Data Exchange (ETDEWEB)

    Kumbhare, Alok [Univ. of Southern California, Los Angeles, CA (United States); Simmhan, Yogesth [Univ. of Southern California, Los Angeles, CA (United States); Prasanna, Viktor [Univ. of Southern California, Los Angeles, CA (United States)

    2011-11-14

    As Cloud platforms gain increasing traction among scientific and business communities for outsourcing storage, computing and content delivery, there is also growing concern about the associated loss of control over private data hosted in the Cloud. In this paper, we present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. Our repository model allows users to securely store and share their data in the Cloud without revealing the plain text to unauthorized users, the Cloud storage provider or the repository itself. The system masks file names, user permissions and access patterns while providing auditing capabilities with provable data updates.

  15. Meaningful interpretation by reanalyzing the publicly available dataset: A case study of Salvia miltiorrhiza

    Directory of Open Access Journals (Sweden)

    Maulik Patel

    2017-10-01

    Full Text Available MicroRNAs are a newly discovered class of non-protein small RNAs with 22-24 nucleotides and evolutionary conserved post-transcriptional regulatory RNAs, which shows an enormous role in several biological and metabolic process. Plant derived miRNA entered into the body fluid and regulated the expression of host mRNA. Salvia miltiorrhiza is an important medicinal plant known for its potent cardiovascular and anticancer activity and hence, it was selected to identify the cross kingdom regulatory mechanism between medicinal plant and human. In this study, total 8 highly stable putative novel miRNA were predicted from the publically available ESTs of Salvia miltiorrhiza, out of which 2 miRNA were found to be regulating 32 target genes in human. Functional annotation, gene ontology and network analysis were carried out based on their significance and find out the association with the prominent disease like cancer and cardiac diseases. The network analysis showed the some important network protein like SOCS2, CD274, STAT3, TRAF3 and CXCL2 with their associated pathways. The predicted miRNA have a significant potential role in cytokine, Apoptosis and EGFR receptor signaling pathways and its biological regulation. It may be further validated using in-vivo experiment for broader picture into their miRNA epigenetic mechanism and action.

  16. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning.

    Science.gov (United States)

    Abràmoff, Michael David; Lou, Yiyue; Erginay, Ali; Clarida, Warren; Amelon, Ryan; Folk, James C; Niemeijer, Meindert

    2016-10-01

    To compare performance of a deep-learning enhanced algorithm for automated detection of diabetic retinopathy (DR), to the previously published performance of that algorithm, the Iowa Detection Program (IDP)-without deep learning components-on the same publicly available set of fundus images and previously reported consensus reference standard set, by three US Board certified retinal specialists. We used the previously reported consensus reference standard of referable DR (rDR), defined as International Clinical Classification of Diabetic Retinopathy moderate, severe nonproliferative (NPDR), proliferative DR, and/or macular edema (ME). Neither Messidor-2 images, nor the three retinal specialists setting the Messidor-2 reference standard were used for training IDx-DR version X2.1. Sensitivity, specificity, negative predictive value, area under the curve (AUC), and their confidence intervals (CIs) were calculated. Sensitivity was 96.8% (95% CI: 93.3%-98.8%), specificity was 87.0% (95% CI: 84.2%-89.4%), with 6/874 false negatives, resulting in a negative predictive value of 99.0% (95% CI: 97.8%-99.6%). No cases of severe NPDR, PDR, or ME were missed. The AUC was 0.980 (95% CI: 0.968-0.992). Sensitivity was not statistically different from published IDP sensitivity, which had a CI of 94.4% to 99.3%, but specificity was significantly better than the published IDP specificity CI of 55.7% to 63.0%. A deep-learning enhanced algorithm for the automated detection of DR, achieves significantly better performance than a previously reported, otherwise essentially identical, algorithm that does not employ deep learning. Deep learning enhanced algorithms have the potential to improve the efficiency of DR screening, and thereby to prevent visual loss and blindness from this devastating disease.

  17. Tracking vegetation phenology across diverse North American biomes using PhenoCam imagery: A new, publicly-available dataset

    Science.gov (United States)

    Richardson, A. D.

    2015-12-01

    Vegetation phenology controls the seasonality of many ecosystem processes, as well as numerous biosphere-atmosphere feedbacks. Phenology is highly sensitive to climate change and variability, and is thus a key aspect of global change ecology. The goal of the PhenoCam network is to serve as a long-term, continental-scale, phenological observatory. The network uses repeat digital photography—images captured using conventional, visible-wavelength, automated digital cameras—to characterize vegetation phenology in diverse ecosystems across North America and around the world. At present, imagery from over 200 research sites, spanning a wide range of ecoregions, climate zones, and plant functional types, is currently being archived and processed in near-real-time through the PhenoCam project web page (http://phenocam.sr.unh.edu/). Data derived from PhenoCam imagery have been previously used to evaluate satellite phenology products, to constrain and test new phenology models, to understand relationships between canopy phenology and ecosystem processes, and to study the seasonal changes in leaf-level physiology that are associated with changes in leaf color. I will describe a new, publicly-available phenological dataset, derived from over 600 site-years of PhenoCam imagery. For each archived image (ca. 5 million), we extracted RGB (red, green, blue) color channel information, with means and other statistics calculated across a region-of-interest (ROI) delineating a specific vegetation type. From the high-frequency (typically, 30 minute) imagery, we derived time series characterizing vegetation color, including "canopy greenness", processed to 1- and 3-day intervals. For ecosystems with a single annual cycle of vegetation activity, we derived estimates, with uncertainties, for the start, middle, and end of spring and autumn phenological transitions. Given the lack of multi-year, standardized, and geographically distributed phenological data for North America, we

  18. An open, multi-vendor, multi-field-strength brain MR dataset and analysis of publicly available skull stripping methods agreement.

    Science.gov (United States)

    Souza, Roberto; Lucena, Oeslle; Garrafa, Julia; Gobbi, David; Saluzzi, Marina; Appenzeller, Simone; Rittner, Letícia; Frayne, Richard; Lotufo, Roberto

    2018-04-15

    This paper presents an open, multi-vendor, multi-field strength magnetic resonance (MR) T1-weighted volumetric brain imaging dataset, named Calgary-Campinas-359 (CC-359). The dataset is composed of images of older healthy adults (29-80 years) acquired on scanners from three vendors (Siemens, Philips and General Electric) at both 1.5 T and 3 T. CC-359 is comprised of 359 datasets, approximately 60 subjects per vendor and magnetic field strength. The dataset is approximately age and gender balanced, subject to the constraints of the available images. It provides consensus brain extraction masks for all volumes generated using supervised classification. Manual segmentation results for twelve randomly selected subjects performed by an expert are also provided. The CC-359 dataset allows investigation of 1) the influences of both vendor and magnetic field strength on quantitative analysis of brain MR; 2) parameter optimization for automatic segmentation methods; and potentially 3) machine learning classifiers with big data, specifically those based on deep learning methods, as these approaches require a large amount of data. To illustrate the utility of this dataset, we compared to the results of a supervised classifier, the results of eight publicly available skull stripping methods and one publicly available consensus algorithm. A linear mixed effects model analysis indicated that vendor (p-valuefield strength (p-value<0.001) have statistically significant impacts on skull stripping results. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. Scanning and georeferencing historical USGS quadrangles

    Science.gov (United States)

    Fishburn, Kristin A.; Davis, Larry R.; Allord, Gregory J.

    2017-06-23

    The U.S. Geological Survey (USGS) National Geospatial Program is scanning published USGS 1:250,000-scale and larger topographic maps printed between 1884, the inception of the topographic mapping program, and 2006. The goal of this project, which began publishing the Historical Topographic Map Collection in 2011, is to provide access to a digital repository of USGS topographic maps that is available to the public at no cost. For more than 125 years, USGS topographic maps have accurately portrayed the complex geography of the Nation. The USGS is the Nation’s largest producer of traditional topographic maps, and, prior to 2006, USGS topographic maps were created using traditional cartographic methods and printed using a lithographic process. The next generation of topographic maps, US Topo, is being released by the USGS in digital form, and newer technologies make it possible to also deliver historical maps in the same electronic format that is more publicly accessible.

  20. Georeferenced Population Datasets of Mexico (GEO-MEX): Urban Place GIS Coverage of Mexico

    Data.gov (United States)

    National Aeronautics and Space Administration — The Urban Place GIS Coverage of Mexico is a vector based point Geographic Information System (GIS) coverage of 696 urban places in Mexico. Each Urban Place is...

  1. Georeferenced Population Datasets of Mexico (GEO-MEX): Raster Based GIS Coverage of Mexican Population

    Data.gov (United States)

    National Aeronautics and Space Administration — The Raster Based GIS Coverage of Mexican Population is a gridded coverage (1 x 1 km) of Mexican population. The data were converted from vector into raster. The...

  2. Georeferencing natural disaster impact footprints : lessons learned from the EM-DAT experience

    Science.gov (United States)

    Wallemacq, Pascaline; Guha Sapir, Debarati

    2014-05-01

    wider public and policy makers. Some results from the application of georeferencing will be presented during the session such as a study of the population potentially exposed and affected by natural disasters in Europe, a flood vulnerability analysis in Vietnam and the potential merging of watersheds analysis and flood footprints data.

  3. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  4. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  5. Public Access Points, Location of public beach access along the Oregon Coast. Boat ramp locations were added to the dataset to allow users to view the location of boat ramps along the Columbia River and the Willamete River north of the Oregon City Dam., Published in 2005, 1:100000 (1in=8333ft) scale, Oregon Geospatial Enterprise Office (GEO).

    Data.gov (United States)

    NSGIC State | GIS Inventory — Public Access Points dataset current as of 2005. Location of public beach access along the Oregon Coast. Boat ramp locations were added to the dataset to allow users...

  6. Software Framework for Development of Web-GIS Systems for Analysis of Georeferenced Geophysical Data

    Science.gov (United States)

    Okladnikov, I.; Gordov, E. P.; Titov, A. G.

    2011-12-01

    Georeferenced datasets (meteorological databases, modeling and reanalysis results, remote sensing products, etc.) are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated software framework for rapid development of providing such support information-computational systems based on Web-GIS technologies has been created. The software framework consists of 3 basic parts: computational kernel developed using ITTVIS Interactive Data Language (IDL), a set of PHP-controllers run within specialized web portal, and JavaScript class library for development of typical components of web mapping application graphical user interface (GUI) based on AJAX technology. Computational kernel comprise of number of modules for datasets access, mathematical and statistical data analysis and visualization of results. Specialized web-portal consists of web-server Apache, complying OGC standards Geoserver software which is used as a base for presenting cartographical information over the Web, and a set of PHP-controllers implementing web-mapping application logic and governing computational kernel. JavaScript library aiming at graphical user interface development is based on GeoExt library combining ExtJS Framework and OpenLayers software. Based on the software framework an information-computational system for complex analysis of large georeferenced data archives was developed. Structured environmental datasets available for processing now include two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis

  7. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  8. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  9. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Science.gov (United States)

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting

  10. An Initial Seed Selection Algorithm for K-means Clustering of Georeferenced Data to Improve Replicability of Cluster Assignments for Mapping Application

    OpenAIRE

    Khan, Fouad

    2016-01-01

    K-means is one of the most widely used clustering algorithms in various disciplines, especially for large datasets. However the method is known to be highly sensitive to initial seed selection of cluster centers. K-means++ has been proposed to overcome this problem and has been shown to have better accuracy and computational efficiency than k-means. In many clustering problems though -such as when classifying georeferenced data for mapping applications- standardization of clustering methodolo...

  11. Development of a georeferenced data bank of radionuclides in typical food of Latin America - SIGLARA

    International Nuclear Information System (INIS)

    Nascimento, Lucia Maria Evangelista do

    2014-01-01

    The related management information related to the environmental assessment activity aims to provide the world community with better access to meaningful environmental information and help use this information in making decisions in case of contamination due to accident or deliberate actions. In recent years, the geotechnologies acquired are fundamental to research and environmental monitoring, once it possible, efficiently obtaining large amount of data natural resources. The result of this work was the development of a database system to store georeferenced data values of radionuclides in typical foods in Latin America (SIGLARA), defined in three languages (Spanish, Portuguese and English), using free software. The developed system meets the primary need of the RLA 09/72 ARCAL Project, funded by the International Atomic Energy Agency (IAEA), as having eleven participants countries in Latin America. The database of georeferenced created for SIGLARA system was tested in its applicability through the entry and manipulation of real data analyzed, which showed that the system is able to store, retrieve, view reports and maps of the samples of registered food. Interfaces that connect the user with the database show up efficient, making the system easy operability. Their application to environmental management is already showing results, it is hoped that these results will encourage its widespread adoption by other countries, institutions, the scientific community and the general public. (author)

  12. panMetaDocs, eSciDoc, and DOIDB - an infrastructure for the curation and publication of file-based datasets for 'GFZ Data Services'

    Science.gov (United States)

    Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens

    2016-04-01

    With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files

  13. Web-GIS approach for integrated analysis of heterogeneous georeferenced data

    Science.gov (United States)

    Okladnikov, Igor; Gordov, Evgeny; Titov, Alexander; Shulgina, Tamara

    2014-05-01

    Georeferenced datasets are currently actively used for modeling, interpretation and forecasting of climatic and ecosystem changes on different spatial and temporal scales [1]. Due to inherent heterogeneity of environmental datasets as well as their huge size (up to tens terabytes for a single dataset) a special software supporting studies in the climate and environmental change areas is required [2]. Dedicated information-computational system for integrated analysis of heterogeneous georeferenced climatological and meteorological data is presented. It is based on combination of Web and GIS technologies according to Open Geospatial Consortium (OGC) standards, and involves many modern solutions such as object-oriented programming model, modular composition, and JavaScript libraries based on GeoExt library (http://www.geoext.org), ExtJS Framework (http://www.sencha.com/products/extjs) and OpenLayers software (http://openlayers.org). The main advantage of the system lies in it's capability to perform integrated analysis of time series of georeferenced data obtained from different sources (in-situ observations, model results, remote sensing data) and to combine the results in a single map [3, 4] as WMS and WFS layers in a web-GIS application. Also analysis results are available for downloading as binary files from the graphical user interface or can be directly accessed through web mapping (WMS) and web feature (WFS) services for a further processing by the user. Data processing is performed on geographically distributed computational cluster comprising data storage systems and corresponding computational nodes. Several geophysical datasets represented by NCEP/NCAR Reanalysis II, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, DWD Global Precipitation Climatology Centre's data, GMAO Modern Era-Retrospective analysis for Research and Applications, reanalysis of Monitoring

  14. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  15. Job Patterns for Minorities and Women in Elementary-Secondary Public Schools, 2012 EEO-5 Dataset - State Summary Report

    Data.gov (United States)

    US Equal Employment Opportunity Commission — As part of its mandate under Title VII of the Civil Rights Act of 1964, as amended, the Equal Employment Opportunity Commission requires periodic reports from public...

  16. Job Patterns for Minorities and Women in Elementary-Secondary Public Schools, 2012 EEO-5 Dataset - US Summary Report

    Data.gov (United States)

    US Equal Employment Opportunity Commission — As part of its mandate under Title VII of the Civil Rights Act of 1964, as amended, the Equal Employment Opportunity Commission requires periodic reports from public...

  17. Object Georeferencing in UAV-Based SAR Terrain Images

    Directory of Open Access Journals (Sweden)

    Łabowski Michał

    2016-12-01

    Full Text Available Synthetic aperture radars (SAR allow to obtain high resolution terrain images comparable with the resolution of optical methods. Radar imaging is independent on the weather conditions and the daylight. The process of analysis of the SAR images consists primarily of identifying of interesting objects. The ability to determine their geographical coordinates can increase usability of the solution from a user point of view. The paper presents a georeferencing method of the radar terrain images. The presented images were obtained from the SAR system installed on board an Unmanned Aerial Vehicle (UAV. The system was developed within a project under acronym WATSAR realized by the Military University of Technology and WB Electronics S.A. The source of the navigation data was an INS/GNSS system integrated by the Kalman filter with a feed-backward correction loop. The paper presents the terrain images obtained during flight tests and results of selected objects georeferencing with an assessment of the accuracy of the method.

  18. A library of georeferenced photos from the field

    Science.gov (United States)

    Xiao, Xiangming; Dorovskoy, Pavel; Biradar, Chandrashekhar; Bridge, Eli

    2011-12-01

    A picture is worth a thousand of words, and every day hundreds of scientists, students, and environmentally aware citizens are taking field photos to document their observations of rocks, glaciers, soils, forests, wetlands, croplands, rangelands, livestock, and birds and mammals, as well as important events such as droughts, floods, wildfires, insect emergences, and infectious disease outbreaks. Where are those field photos stored? Can they be shared in a timely fashion to support education, research, and the leisure activities of citizens across the world? What are the financial and intellectual costs if those field photos are lost or not shared? Recently, researchers at the University of Oklahoma developed and released the Global Geo-Referenced Field Photo Library (hereinafter referred to as the Field Photo Library; http://www.eomf.ou.edu/photos/), a Web-based data portal designed for researchers and educators who wish to archive and share field photos from across the world, each tagged with exact positioning data (Figure 1). The data portal has a simple user interface that allows people to upload, query, and download georeferenced field photos in the library.

  19. Testing the Flat World Thesis: Using a Public Dataset to Engage Students in the Global Inequality Debate

    Science.gov (United States)

    Arabandi, Bhavani; Sweet, Stephen; Swords, Alicia

    2014-01-01

    We present a learning module to engage students in the global inequality debate using Google Public Data World Development Indicators. Goals of this article are to articulate the importance and urgency of teaching global issues to American students; situate the central debate in the globalization literature, paying particular attention to global…

  20. A public-industry partnership for enhancing corn nitrogen research and datasets: project description, methodology, and outcomes

    Science.gov (United States)

    Due to economic and environmental consequences of nitrogen (N) lost from fertilizer applications in corn (Zea mays L.), considerable public and industry attention has been devoted to development of N decision tools. Now a wide variety of tools are available to farmers for managing N inputs. However,...

  1. Development of web-GIS system for analysis of georeferenced geophysical data

    Science.gov (United States)

    Okladnikov, I.; Gordov, E. P.; Titov, A. G.; Bogomolov, V. Y.; Genina, E.; Martynova, Y.; Shulgina, T. M.

    2012-12-01

    Georeferenced datasets (meteorological databases, modeling and reanalysis results, remote sensing products, etc.) are currently actively used in numerous applications including modeling, interpretation and forecast of climatic and ecosystem changes for various spatial and temporal scales. Due to inherent heterogeneity of environmental datasets as well as their huge size which might constitute up to tens terabytes for a single dataset at present studies in the area of climate and environmental change require a special software support. A dedicated web-GIS information-computational system for analysis of georeferenced climatological and meteorological data has been created. The information-computational system consists of 4 basic parts: computational kernel developed using GNU Data Language (GDL), a set of PHP-controllers run within specialized web-portal, JavaScript class libraries for development of typical components of web mapping application graphical user interface (GUI) based on AJAX technology, and an archive of geophysical datasets. Computational kernel comprises of a number of dedicated modules for querying and extraction of data, mathematical and statistical data analysis, visualization, and preparing output files in geoTIFF and netCDF format containing processing results. Specialized web-portal consists of a web-server Apache, complying OGC standards Geoserver software which is used as a base for presenting cartographical information over the Web, and a set of PHP-controllers implementing web-mapping application logic and governing computational kernel. JavaScript libraries aiming at graphical user interface development are based on GeoExt library combining ExtJS Framework and OpenLayers software. The archive of geophysical data consists of a number of structured environmental datasets represented by data files in netCDF, HDF, GRIB, ESRI Shapefile formats. For processing by the system are available: two editions of NCEP/NCAR Reanalysis, JMA/CRIEPI JRA-25

  2. Automatic Georeferencing of Aerial Images by Means of Topographic Database Information

    DEFF Research Database (Denmark)

    Høhle, Joachim

    The book includes a preface and four articles which deal with the automatic georeferencing of aerial images. The articles are the written contribution of an seminar, held at Aalborg University in October 2002. The georeferencing or orientation of aerial images is the first step in mapping tasks l...... like generation of orthoimages, updating of topographic map data bases and generation of digial terrain models.......The book includes a preface and four articles which deal with the automatic georeferencing of aerial images. The articles are the written contribution of an seminar, held at Aalborg University in October 2002. The georeferencing or orientation of aerial images is the first step in mapping tasks...

  3. Ordering the mob: Insights into replicon and MOB typing schemes from analysis of a curated dataset of publicly available plasmids.

    Science.gov (United States)

    Orlek, Alex; Phan, Hang; Sheppard, Anna E; Doumith, Michel; Ellington, Matthew; Peto, Tim; Crook, Derrick; Walker, A Sarah; Woodford, Neil; Anjum, Muna F; Stoesser, Nicole

    2017-05-01

    Plasmid typing can provide insights into the epidemiology and transmission of plasmid-mediated antibiotic resistance. The principal plasmid typing schemes are replicon typing and MOB typing, which utilize variation in replication loci and relaxase proteins respectively. Previous studies investigating the proportion of plasmids assigned a type by these schemes ('typeability') have yielded conflicting results; moreover, thousands of plasmid sequences have been added to NCBI in recent years, without consistent annotation to indicate which sequences represent complete plasmids. Here, a curated dataset of complete Enterobacteriaceae plasmids from NCBI was compiled, and used to assess the typeability and concordance of in silico replicon and MOB typing schemes. Concordance was assessed at hierarchical replicon type resolutions, from replicon family-level to plasmid multilocus sequence type (pMLST)-level, where available. We found that 85% and 65% of the curated plasmids could be replicon and MOB typed, respectively. Overall, plasmid size and the number of resistance genes were significant independent predictors of replicon and MOB typing success. We found some degree of non-concordance between replicon families and MOB types, which was only partly resolved when partitioning plasmids into finer-resolution groups (replicon and pMLST types). In some cases, non-concordance was attributed to ambiguous boundaries between MOBP and MOBQ types; in other cases, backbone mosaicism was considered a more plausible explanation. β-lactamase resistance genes tended not to show fidelity to a particular plasmid type, though some previously reported associations were supported. Overall, replicon and MOB typing schemes are likely to continue playing an important role in plasmid analysis, but their performance is constrained by the diverse and dynamic nature of plasmid genomes. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  4. Direct Georeferencing of Uav Data Based on Simple Building Structures

    Science.gov (United States)

    Tampubolon, W.; Reinhardt, W.

    2016-06-01

    Unmanned Aerial Vehicle (UAV) data acquisition is more flexible compared with the more complex traditional airborne data acquisition. This advantage puts UAV platforms in a position as an alternative acquisition method in many applications including Large Scale Topographical Mapping (LSTM). LSTM, i.e. larger or equal than 1:10.000 map scale, is one of a number of prominent priority tasks to be solved in an accelerated way especially in third world developing countries such as Indonesia. As one component of fundamental geospatial data sets, large scale topographical maps are mandatory in order to enable detailed spatial planning. However, the accuracy of the products derived from the UAV data are normally not sufficient for LSTM as it needs robust georeferencing, which requires additional costly efforts such as the incorporation of sophisticated GPS Inertial Navigation System (INS) or Inertial Measurement Unit (IMU) on the platform and/or Ground Control Point (GCP) data on the ground. To reduce the costs and the weight on the UAV alternative solutions have to be found. This paper outlines a direct georeferencing method of UAV data by providing image orientation parameters derived from simple building structures and presents results of an investigation on the achievable results in a LSTM application. In this case, the image orientation determination has been performed through sequential images without any input from INS/IMU equipment. The simple building structures play a significant role in such a way that geometrical characteristics have been considered. Some instances are the orthogonality of the building's wall/rooftop and the local knowledge of the building orientation in the field. In addition, we want to include the Structure from Motion (SfM) approach in order to reduce the number of required GCPs especially for the absolute orientation purpose. The SfM technique applied to the UAV data and simple building structures additionally presents an effective tool

  5. DIRECT GEOREFERENCING OF UAV DATA BASED ON SIMPLE BUILDING STRUCTURES

    Directory of Open Access Journals (Sweden)

    W. Tampubolon

    2016-06-01

    Full Text Available Unmanned Aerial Vehicle (UAV data acquisition is more flexible compared with the more complex traditional airborne data acquisition. This advantage puts UAV platforms in a position as an alternative acquisition method in many applications including Large Scale Topographical Mapping (LSTM. LSTM, i.e. larger or equal than 1:10.000 map scale, is one of a number of prominent priority tasks to be solved in an accelerated way especially in third world developing countries such as Indonesia. As one component of fundamental geospatial data sets, large scale topographical maps are mandatory in order to enable detailed spatial planning. However, the accuracy of the products derived from the UAV data are normally not sufficient for LSTM as it needs robust georeferencing, which requires additional costly efforts such as the incorporation of sophisticated GPS Inertial Navigation System (INS or Inertial Measurement Unit (IMU on the platform and/or Ground Control Point (GCP data on the ground. To reduce the costs and the weight on the UAV alternative solutions have to be found. This paper outlines a direct georeferencing method of UAV data by providing image orientation parameters derived from simple building structures and presents results of an investigation on the achievable results in a LSTM application. In this case, the image orientation determination has been performed through sequential images without any input from INS/IMU equipment. The simple building structures play a significant role in such a way that geometrical characteristics have been considered. Some instances are the orthogonality of the building’s wall/rooftop and the local knowledge of the building orientation in the field. In addition, we want to include the Structure from Motion (SfM approach in order to reduce the number of required GCPs especially for the absolute orientation purpose. The SfM technique applied to the UAV data and simple building structures additionally presents an

  6. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  7. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  8. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  9. Georeferenced historical forest maps of Bukovina (Northern Romania) - important tool for paleoenvironmental analyses

    Science.gov (United States)

    Popa, Ionel; Crǎciunescu, Vasile; Candrea, Bogdan; Timár, Gábor

    2010-05-01

    The historical region of Bukovina is one of the most forested areas of Romania. The name itself, beech land, suggest the high wood resources located here. The systematic wood exploitation started in Bukovina during the Austrian rule (1775 - 1918). To fully asses the region's wood potential and to make the exploitation and replantation processes more efficient, the Austrian engineers developed a dedicated mapping system. The result was a series of maps, surveyed for each forest district. In the first editions, we can find maps crafted at different scales (e.g. 1:50 000, 1: 20 000, 1: 25 000). Later on (after 1900), the map sheets scale was standardized to 1: 25 000. Each sheet was accompanied by a register with information regarding the forest parcels. The system was kept after 1918, when Bukovina become a part of Romania. For another 20 years, the forest districts were periodically surveyed and the maps updated. The basemap content also changed during time. For most of the maps, the background was compiled from the Austrian Third Military Survey maps. After the Second World War, the Romanian military maps ("planurile directoare de tragere") were also used. The forest surveys were positioned using the Austrian triangulation network with the closest baseline at Rădăuţi. Considered lost after WWII, an important part of this maps were recently recovered by a fortunate and accidental finding. Such informations are highly valuable for the today forest planners. By careful studying this kind of documents, a modern forest manager can better understand the way forests were managed in the past and the implications of that management in today's forest reality. In order to do that, the maps should be first georeferenced into a known coordinate system of the Third Survey and integrated with recent geospatial datasets using a GIS environment. The paper presents the challenges of finding and applying the right informations regarding the datum and projection used by the

  10. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  11. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  12. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  13. Vision-Based Georeferencing of GPR in Urban Areas

    Directory of Open Access Journals (Sweden)

    Riccardo Barzaghi

    2016-01-01

    Full Text Available Ground Penetrating Radar (GPR surveying is widely used to gather accurate knowledge about the geometry and position of underground utilities. The sensor arrays need to be coupled to an accurate positioning system, like a geodetic-grade Global Navigation Satellite System (GNSS device. However, in urban areas this approach is not always feasible because GNSS accuracy can be substantially degraded due to the presence of buildings, trees, tunnels, etc. In this work, a photogrammetric (vision-based method for GPR georeferencing is presented. The method can be summarized in three main steps: tie point extraction from the images acquired during the survey, computation of approximate camera extrinsic parameters and finally a refinement of the parameter estimation using a rigorous implementation of the collinearity equations. A test under operational conditions is described, where accuracy of a few centimeters has been achieved. The results demonstrate that the solution was robust enough for recovering vehicle trajectories even in critical situations, such as poorly textured framed surfaces, short baselines, and low intersection angles.

  14. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: Facility Registry Services (FRS) : Toxic Release Inventory (TRI) , National Pollutant Discharge Elimination System (NPDES) , and Superfund Sites

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the estimated density of georeferenced sites within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the...

  15. RIGOROUS GEOREFERENCING OF ALSAT-2A PANCHROMATIC AND MULTISPECTRAL IMAGERY

    Directory of Open Access Journals (Sweden)

    I. Boukerch

    2013-04-01

    Full Text Available The exploitation of the full geometric capabilities of the High-Resolution Satellite Imagery (HRSI, require the development of an appropriate sensor orientation model. Several authors studied this problem; generally we have two categories of geometric models: physical and empirical models. Based on the analysis of the metadata provided with ALSAT-2A, a rigorous pushbroom camera model can be developed. This model has been successfully applied to many very high resolution imagery systems. The relation between the image and ground coordinates by the time dependant collinearity involving many coordinates systems has been tested. The interior orientation parameters must be integrated in the model, the interior parameters can be estimated from the viewing angles corresponding to the pointing directions of any detector, these values are derived from cubic polynomials provided in the metadata. The developed model integrates all the necessary elements with 33 unknown. All the approximate values of the 33 unknowns parameters may be derived from the informations contained in the metadata files provided with the imagery technical specifications or they are simply fixed to zero, so the condition equation is linearized and solved using SVD in a least square sense in order to correct the initial values using a suitable number of well-distributed GCPs. Using Alsat-2A images over the town of Toulouse in the south west of France, three experiments are done. The first is about 2D accuracy analysis using several sets of parameters. The second is about GCPs number and distribution. The third experiment is about georeferencing multispectral image by applying the model calculated from panchromatic image.

  16. Toward Automatic Georeferencing of Archival Aerial Photogrammetric Surveys

    Science.gov (United States)

    Giordano, S.; Le Bris, A.; Mallet, C.

    2018-05-01

    Images from archival aerial photogrammetric surveys are a unique and relatively unexplored means to chronicle 3D land-cover changes over the past 100 years. They provide a relatively dense temporal sampling of the territories with very high spatial resolution. Such time series image analysis is a mandatory baseline for a large variety of long-term environmental monitoring studies. The current bottleneck for accurate comparison between epochs is their fine georeferencing step. No fully automatic method has been proposed yet and existing studies are rather limited in terms of area and number of dates. State-of-the art shows that the major challenge is the identification of ground references: cartographic coordinates and their position in the archival images. This task is manually performed, and extremely time-consuming. This paper proposes to use a photogrammetric approach, and states that the 3D information that can be computed is the key to full automation. Its original idea lies in a 2-step approach: (i) the computation of a coarse absolute image orientation; (ii) the use of the coarse Digital Surface Model (DSM) information for automatic absolute image orientation. It only relies on a recent orthoimage+DSM, used as master reference for all epochs. The coarse orthoimage, compared with such a reference, allows the identification of dense ground references and the coarse DSM provides their position in the archival images. Results on two areas and 5 dates show that this method is compatible with long and dense archival aerial image series. Satisfactory planimetric and altimetric accuracies are reported, with variations depending on the ground sampling distance of the images and the location of the Ground Control Points.

  17. TOWARD AUTOMATIC GEOREFERENCING OF ARCHIVAL AERIAL PHOTOGRAMMETRIC SURVEYS

    Directory of Open Access Journals (Sweden)

    S. Giordano

    2018-05-01

    Full Text Available Images from archival aerial photogrammetric surveys are a unique and relatively unexplored means to chronicle 3D land-cover changes over the past 100 years. They provide a relatively dense temporal sampling of the territories with very high spatial resolution. Such time series image analysis is a mandatory baseline for a large variety of long-term environmental monitoring studies. The current bottleneck for accurate comparison between epochs is their fine georeferencing step. No fully automatic method has been proposed yet and existing studies are rather limited in terms of area and number of dates. State-of-the art shows that the major challenge is the identification of ground references: cartographic coordinates and their position in the archival images. This task is manually performed, and extremely time-consuming. This paper proposes to use a photogrammetric approach, and states that the 3D information that can be computed is the key to full automation. Its original idea lies in a 2-step approach: (i the computation of a coarse absolute image orientation; (ii the use of the coarse Digital Surface Model (DSM information for automatic absolute image orientation. It only relies on a recent orthoimage+DSM, used as master reference for all epochs. The coarse orthoimage, compared with such a reference, allows the identification of dense ground references and the coarse DSM provides their position in the archival images. Results on two areas and 5 dates show that this method is compatible with long and dense archival aerial image series. Satisfactory planimetric and altimetric accuracies are reported, with variations depending on the ground sampling distance of the images and the location of the Ground Control Points.

  18. Near Real-Time Dissemination of Geo-Referenced Imagery by an Enterprise Server

    National Research Council Canada - National Science Library

    Brown, Alison; Gilbert, Chris; Holland, Heather; Lu, Yan

    2006-01-01

    .... The payload is connected through a data link to a ground-based server that can process the georegistered data in near-real-time using our GeoReferenced Information Manager (GRIM) Enterprise Server...

  19. Integration of georeferencing, habitat, sampling, and genetic data for documentation of wild plant genetic resources

    Science.gov (United States)

    Plant genetic resource collections provide novel materials to the breeding and research communities. Availability of detailed documentation of passport, phenotypic, and genetic data increases the value of the genebank accessions. Inclusion of georeferenced sources, habitats, and sampling data in co...

  20. Libraries, The locations and contact information for academic, private and public libraries in Rhode Island. The intention of this dataset was to provide an overview of data. Additional information pertinent to the state is also available from the RI Department of, Published in 2007, 1:4800 (1in=400ft) scale, Rhode Island and Providence Plantations.

    Data.gov (United States)

    NSGIC State | GIS Inventory — Libraries dataset current as of 2007. The locations and contact information for academic, private and public libraries in Rhode Island. The intention of this dataset...

  1. Generation of High-Resolution Geo-referenced Photo-Mosaics From Navigation Data

    Science.gov (United States)

    Delaunoy, O.; Elibol, A.; Garcia, R.; Escartin, J.; Fornari, D.; Humphris, S.

    2006-12-01

    mosaic, with the intention of locally improving image alignment. Tests have been conducted using the data acquired during the cruise LUSTRE'96 (LUcky STRike Exploration, 37°17'N 32°17'W) by WHOI. During this cruise, the ARGO-II tethered vehicle acquired ~21,000 images in a ~1Km2 area of the seafloor to map at high resolution the geology of this hydrothermal field. The obtained geo-referenced photo-mosaic has a resolution of 1.5cm per pixel, with a coverage of ~25% of the Lucky Strike area. Data and software will be made publicly available.

  2. Geopan AT@S: a Brokering Based Gateway to Georeferenced Historical Maps for Risk Analysis

    Science.gov (United States)

    Previtali, M.

    2017-08-01

    Importance of ancient and historical maps is nowadays recognized in many applications (e.g., urban planning, landscape valorisation and preservation, land changes identification, etc.). In the last years a great effort has been done by different institutions, such as Geographical Institutes, Public Administrations, and collaborative communities, for digitizing and publishing online collections of historical maps. In spite of this variety and availability of data, information overload makes difficult their discovery and management: without knowing the specific repository where the data are stored, it is difficult to find the information required. In addition, problems of interconnection between different data sources and their restricted interoperability may arise. This paper describe a new brokering based gateway developed to assure interoperability between data, in particular georeferenced historical maps and geographic data, gathered from different data providers, with various features and referring to different historical periods. The developed approach is exemplified by a new application named GeoPAN Atl@s that is aimed at linking in Northern Italy area land changes with risk analysis (local seismicity amplification and flooding risk) by using multi-temporal data sources and historic maps.

  3. INTEGRATED GEOREFERENCING OF STEREO IMAGE SEQUENCES CAPTURED WITH A STEREOVISION MOBILE MAPPING SYSTEM – APPROACHES AND PRACTICAL RESULTS

    Directory of Open Access Journals (Sweden)

    H. Eugster

    2012-07-01

    Full Text Available Stereovision based mobile mapping systems enable the efficient capturing of directly georeferenced stereo pairs. With today's camera and onboard storage technologies imagery can be captured at high data rates resulting in dense stereo sequences. These georeferenced stereo sequences provide a highly detailed and accurate digital representation of the roadside environment which builds the foundation for a wide range of 3d mapping applications and image-based geo web-services. Georeferenced stereo images are ideally suited for the 3d mapping of street furniture and visible infrastructure objects, pavement inspection, asset management tasks or image based change detection. As in most mobile mapping systems, the georeferencing of the mapping sensors and observations – in our case of the imaging sensors – normally relies on direct georeferencing based on INS/GNSS navigation sensors. However, in urban canyons the achievable direct georeferencing accuracy of the dynamically captured stereo image sequences is often insufficient or at least degraded. Furthermore, many of the mentioned application scenarios require homogeneous georeferencing accuracy within a local reference frame over the entire mapping perimeter. To achieve these demands georeferencing approaches are presented and cost efficient workflows are discussed which allows validating and updating the INS/GNSS based trajectory with independently estimated positions in cases of prolonged GNSS signal outages in order to increase the georeferencing accuracy up to the project requirements.

  4. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments Riparian Buffer for the Conterminous United States: Facility Registry Services (FRS) : Toxic Release Inventory (TRI) , National Pollutant Discharge Elimination System (NPDES) , and Superfund Sites

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the estimated density of georeferenced sites within individual, local NHDPlusV2 catchments and upstream, contributing watersheds riparian...

  5. Georeferenced Point Clouds: A Survey of Features and Point Cloud Management

    Directory of Open Access Journals (Sweden)

    Johannes Otepka

    2013-10-01

    Full Text Available This paper presents a survey of georeferenced point clouds. Concentration is, on the one hand, put on features, which originate in the measurement process themselves, and features derived by processing the point cloud. On the other hand, approaches for the processing of georeferenced point clouds are reviewed. This includes the data structures, but also spatial processing concepts. We suggest a categorization of features into levels that reflect the amount of processing. Point clouds are found across many disciplines, which is reflected in the versatility of the literature suggesting specific features.

  6. Real-time geo-referenced video mosaicking with the MATISSE system

    DEFF Research Database (Denmark)

    Vincent, Anne-Gaelle; Pessel, Nathalie; Borgetto, Manon

    This paper presents the MATISSE system: Mosaicking Advanced Technologies Integrated in a Single Software Environment. This system aims at producing in-line and off-line geo-referenced video mosaics of seabed given a video input and navigation data. It is based upon several techniques of image...

  7. Integrated GNSS attitude determination and positioning for direct geo-referencing

    NARCIS (Netherlands)

    Nadarajah, N.; Paffenholz, J.A.; Teunissen, P.J.G.

    2014-01-01

    Direct geo-referencing is an efficient methodology for the fast acquisition of 3D spatial data. It requires the fusion of spatial data acquisition sensors with navigation sensors, such as Global Navigation Satellite System (GNSS) receivers. In this contribution, we consider an integrated GNSS

  8. Obesity and Fast Food in Urban Markets: A New Approach Using Geo-referenced Micro Data

    NARCIS (Netherlands)

    Chen, S.E.; Florax, R.J.G.M.; Snyder, S.D.

    2013-01-01

    This paper presents a new method of assessing the relationship between features of the built environment and obesity, particularly in urban areas. Our empirical application combines georeferenced data on the location of fast-food restaurants with data about personal health, behavioral, and

  9. Robust and Accurate Image-Based Georeferencing Exploiting Relative Orientation Constraints

    Science.gov (United States)

    Cavegn, S.; Blaser, S.; Nebiker, S.; Haala, N.

    2018-05-01

    Urban environments with extended areas of poor GNSS coverage as well as indoor spaces that often rely on real-time SLAM algorithms for camera pose estimation require sophisticated georeferencing in order to fulfill our high requirements of a few centimeters for absolute 3D point measurement accuracies. Since we focus on image-based mobile mapping, we extended the structure-from-motion pipeline COLMAP with georeferencing capabilities by integrating exterior orientation parameters from direct sensor orientation or SLAM as well as ground control points into bundle adjustment. Furthermore, we exploit constraints for relative orientation parameters among all cameras in bundle adjustment, which leads to a significant robustness and accuracy increase especially by incorporating highly redundant multi-view image sequences. We evaluated our integrated georeferencing approach on two data sets, one captured outdoors by a vehicle-based multi-stereo mobile mapping system and the other captured indoors by a portable panoramic mobile mapping system. We obtained mean RMSE values for check point residuals between image-based georeferencing and tachymetry of 2 cm in an indoor area, and 3 cm in an urban environment where the measurement distances are a multiple compared to indoors. Moreover, in comparison to a solely image-based procedure, our integrated georeferencing approach showed a consistent accuracy increase by a factor of 2-3 at our outdoor test site. Due to pre-calibrated relative orientation parameters, images of all camera heads were oriented correctly in our challenging indoor environment. By performing self-calibration of relative orientation parameters among respective cameras of our vehicle-based mobile mapping system, remaining inaccuracies from suboptimal test field calibration were successfully compensated.

  10. ROBUST AND ACCURATE IMAGE-BASED GEOREFERENCING EXPLOITING RELATIVE ORIENTATION CONSTRAINTS

    Directory of Open Access Journals (Sweden)

    S. Cavegn

    2018-05-01

    Full Text Available Urban environments with extended areas of poor GNSS coverage as well as indoor spaces that often rely on real-time SLAM algorithms for camera pose estimation require sophisticated georeferencing in order to fulfill our high requirements of a few centimeters for absolute 3D point measurement accuracies. Since we focus on image-based mobile mapping, we extended the structure-from-motion pipeline COLMAP with georeferencing capabilities by integrating exterior orientation parameters from direct sensor orientation or SLAM as well as ground control points into bundle adjustment. Furthermore, we exploit constraints for relative orientation parameters among all cameras in bundle adjustment, which leads to a significant robustness and accuracy increase especially by incorporating highly redundant multi-view image sequences. We evaluated our integrated georeferencing approach on two data sets, one captured outdoors by a vehicle-based multi-stereo mobile mapping system and the other captured indoors by a portable panoramic mobile mapping system. We obtained mean RMSE values for check point residuals between image-based georeferencing and tachymetry of 2 cm in an indoor area, and 3 cm in an urban environment where the measurement distances are a multiple compared to indoors. Moreover, in comparison to a solely image-based procedure, our integrated georeferencing approach showed a consistent accuracy increase by a factor of 2–3 at our outdoor test site. Due to pre-calibrated relative orientation parameters, images of all camera heads were oriented correctly in our challenging indoor environment. By performing self-calibration of relative orientation parameters among respective cameras of our vehicle-based mobile mapping system, remaining inaccuracies from suboptimal test field calibration were successfully compensated.

  11. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  12. Real-Time and Post-Processed Georeferencing for Hyperpspectral Drone Remote Sensing

    Science.gov (United States)

    Oliveira, R. A.; Khoramshahi, E.; Suomalainen, J.; Hakala, T.; Viljanen, N.; Honkavaara, E.

    2018-05-01

    The use of drones and photogrammetric technologies are increasing rapidly in different applications. Currently, drone processing workflow is in most cases based on sequential image acquisition and post-processing, but there are great interests towards real-time solutions. Fast and reliable real-time drone data processing can benefit, for instance, environmental monitoring tasks in precision agriculture and in forest. Recent developments in miniaturized and low-cost inertial measurement systems and GNSS sensors, and Real-time kinematic (RTK) position data are offering new perspectives for the comprehensive remote sensing applications. The combination of these sensors and light-weight and low-cost multi- or hyperspectral frame sensors in drones provides the opportunity of creating near real-time or real-time remote sensing data of target object. We have developed a system with direct georeferencing onboard drone to be used combined with hyperspectral frame cameras in real-time remote sensing applications. The objective of this study is to evaluate the real-time georeferencing comparing with post-processing solutions. Experimental data sets were captured in agricultural and forested test sites using the system. The accuracy of onboard georeferencing data were better than 0.5 m. The results showed that the real-time remote sensing is promising and feasible in both test sites.

  13. 3D Maize Plant Reconstruction Based on Georeferenced Overlapping LiDAR Point Clouds

    Directory of Open Access Journals (Sweden)

    Miguel Garrido

    2015-12-01

    Full Text Available 3D crop reconstruction with a high temporal resolution and by the use of non-destructive measuring technologies can support the automation of plant phenotyping processes. Thereby, the availability of such 3D data can give valuable information about the plant development and the interaction of the plant genotype with the environment. This article presents a new methodology for georeferenced 3D reconstruction of maize plant structure. For this purpose a total station, an IMU, and several 2D LiDARs with different orientations were mounted on an autonomous vehicle. By the multistep methodology presented, based on the application of the ICP algorithm for point cloud fusion, it was possible to perform the georeferenced point clouds overlapping. The overlapping point cloud algorithm showed that the aerial points (corresponding mainly to plant parts were reduced to 1.5%–9% of the total registered data. The remaining were redundant or ground points. Through the inclusion of different LiDAR point of views of the scene, a more realistic representation of the surrounding is obtained by the incorporation of new useful information but also of noise. The use of georeferenced 3D maize plant reconstruction at different growth stages, combined with the total station accuracy could be highly useful when performing precision agriculture at the crop plant level.

  14. How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers

    Science.gov (United States)

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-08-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  15. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Science.gov (United States)

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-01-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  16. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Directory of Open Access Journals (Sweden)

    Alberto Pepe

    Full Text Available We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it; unfamiliarity with options that make data-sharing easier (faster and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  17. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  18. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  19. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  20. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  1. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services

    Directory of Open Access Journals (Sweden)

    Damian Ulbricht

    2016-03-01

    Full Text Available The GFZ German Research Centre for Geosciences is the national laboratory for Geosciences in Germany. As part of the Helmholtz Association, providing and maintaining large-scale scientific infrastructures are an essential part of GFZ activities. This includes the generation of significant volumes and numbers of research data, which subsequently become source materials for data publications. The development and maintenance of data systems is a key component of GFZ Data Services to support state-of-the-art research. A challenge lies not only in the diversity of scientific subjects and communities, but also in different types and manifestations of how data are managed by research groups and individual scientists. The data repository of GFZ Data Services provides a flexible IT infrastructure for data storage and publication, including minting of digital object identifiers (DOI. It was built as a modular system of several independent software components linked together through Application Programming Interfaces (APIs provided by the eSciDoc framework. Principal application software are panMetaDocs for data management and DOIDB for logging and moderating data publications activities. Wherever possible, existing software solutions were integrated or adapted. A summary of our experiences made in operating this service is given. Data are described through comprehensive landing pages and supplementary documents, like journal articles or data reports, thus augmenting the scientific usability of the service.

  2. Automatic UAV Image Geo-Registration by Matching UAV Images to Georeferenced Image Data

    Directory of Open Access Journals (Sweden)

    Xiangyu Zhuo

    2017-04-01

    Full Text Available Recent years have witnessed the fast development of UAVs (unmanned aerial vehicles. As an alternative to traditional image acquisition methods, UAVs bridge the gap between terrestrial and airborne photogrammetry and enable flexible acquisition of high resolution images. However, the georeferencing accuracy of UAVs is still limited by the low-performance on-board GNSS and INS. This paper investigates automatic geo-registration of an individual UAV image or UAV image blocks by matching the UAV image(s with a previously taken georeferenced image, such as an individual aerial or satellite image with a height map attached or an aerial orthophoto with a DSM (digital surface model attached. As the biggest challenge for matching UAV and aerial images is in the large differences in scale and rotation, we propose a novel feature matching method for nadir or slightly tilted images. The method is comprised of a dense feature detection scheme, a one-to-many matching strategy and a global geometric verification scheme. The proposed method is able to find thousands of valid matches in cases where SIFT and ASIFT fail. Those matches can be used to geo-register the whole UAV image block towards the reference image data. When the reference images offer high georeferencing accuracy, the UAV images can also be geolocalized in a global coordinate system. A series of experiments involving different scenarios was conducted to validate the proposed method. The results demonstrate that our approach achieves not only decimeter-level registration accuracy, but also comparable global accuracy as the reference images.

  3. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  4. Mobile TDR for geo-referenced measurement of soil water content and electrical conductivity

    DEFF Research Database (Denmark)

    Thomsen, Anton; Schelde, Kirsten; Drøscher, Per

    2007-01-01

    The development of site-specific crop management is constrained by the availability of sensors for monitoring important soil and crop related conditions. A mobile time-domain reflectometry (TDR) unit for geo-referenced soil measurements has been developed and used for detailed mapping of soil wat...... analysis of the soil water measurements, recommendations are made with respect to sampling strategies. Depending on the variability of a given area, between 15 and 30 ha can be mapped with respect to soil moisture and electrical conductivity with sufficient detail within 8 h...

  5. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  6. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  7. Develop Direct Geo-referencing System Based on Open Source Software and Hardware Platform

    Directory of Open Access Journals (Sweden)

    H. S. Liu

    2015-08-01

    Full Text Available Direct geo-referencing system uses the technology of remote sensing to quickly grasp images, GPS tracks, and camera position. These data allows the construction of large volumes of images with geographic coordinates. So that users can be measured directly on the images. In order to properly calculate positioning, all the sensor signals must be synchronized. Traditional aerial photography use Position and Orientation System (POS to integrate image, coordinates and camera position. However, it is very expensive. And users could not use the result immediately because the position information does not embed into image. To considerations of economy and efficiency, this study aims to develop a direct geo-referencing system based on open source software and hardware platform. After using Arduino microcontroller board to integrate the signals, we then can calculate positioning with open source software OpenCV. In the end, we use open source panorama browser, panini, and integrate all these to open source GIS software, Quantum GIS. A wholesome collection of data – a data processing system could be constructed.

  8. Develop Direct Geo-referencing System Based on Open Source Software and Hardware Platform

    Science.gov (United States)

    Liu, H. S.; Liao, H. M.

    2015-08-01

    Direct geo-referencing system uses the technology of remote sensing to quickly grasp images, GPS tracks, and camera position. These data allows the construction of large volumes of images with geographic coordinates. So that users can be measured directly on the images. In order to properly calculate positioning, all the sensor signals must be synchronized. Traditional aerial photography use Position and Orientation System (POS) to integrate image, coordinates and camera position. However, it is very expensive. And users could not use the result immediately because the position information does not embed into image. To considerations of economy and efficiency, this study aims to develop a direct geo-referencing system based on open source software and hardware platform. After using Arduino microcontroller board to integrate the signals, we then can calculate positioning with open source software OpenCV. In the end, we use open source panorama browser, panini, and integrate all these to open source GIS software, Quantum GIS. A wholesome collection of data - a data processing system could be constructed.

  9. DIRECT GEOREFERENCING : A NEW STANDARD IN PHOTOGRAMMETRY FOR HIGH ACCURACY MAPPING

    Directory of Open Access Journals (Sweden)

    A. Rizaldy

    2012-07-01

    Full Text Available Direct georeferencing is a new method in photogrammetry, especially in the digital camera era. Theoretically, this method does not require ground control points (GCP and the Aerial Triangulation (AT, to process aerial photography into ground coordinates. Compared with the old method, this method has three main advantages: faster data processing, simple workflow and less expensive project, at the same accuracy. Direct georeferencing using two devices, GPS and IMU. GPS recording the camera coordinates (X, Y, Z, and IMU recording the camera orientation (omega, phi, kappa. Both parameters merged into Exterior Orientation (EO parameter. This parameters required for next steps in the photogrammetric projects, such as stereocompilation, DSM generation, orthorectification and mosaic. Accuracy of this method was tested on topographic map project in Medan, Indonesia. Large-format digital camera Ultracam X from Vexcel is used, while the GPS / IMU is IGI AeroControl. 19 Independent Check Point (ICP were used to determine the accuracy. Horizontal accuracy is 0.356 meters and vertical accuracy is 0.483 meters. Data with this accuracy can be used for 1:2.500 map scale project.

  10. Accuracy assessment of minimum control points for UAV photography and georeferencing

    Science.gov (United States)

    Skarlatos, D.; Procopiou, E.; Stavrou, G.; Gregoriou, M.

    2013-08-01

    In recent years, Autonomous Unmanned Aerial Vehicles (AUAV) became popular among researchers across disciplines because they combine many advantages. One major application is monitoring and mapping. Their ability to fly beyond eye sight autonomously, collecting data over large areas whenever, wherever, makes them excellent platform for monitoring hazardous areas or disasters. In both cases rapid mapping is needed while human access isn't always a given. Indeed, current automatic processing of aerial photos using photogrammetry and computer vision algorithms allows for rapid orthophomap production and Digital Surface Model (DSM) generation, as tools for monitoring and damage assessment. In such cases, control point measurement using GPS is either impossible, or time consuming or costly. This work investigates accuracies that can be attained using few or none control points over areas of one square kilometer, in two test sites; a typical block and a corridor survey. On board GPS data logged during AUAV's flight are being used for direct georeferencing, while ground check points are being used for evaluation. In addition various control point layouts are being tested using bundle adjustment for accuracy evaluation. Results indicate that it is possible to use on board single frequency GPS for direct georeferencing in cases of disaster management or areas without easy access, or even over featureless areas. Due to large numbers of tie points in the bundle adjustment, horizontal accuracy can be fulfilled with a rather small number of control points, but vertical accuracy may not.

  11. Standardization of a geo-referenced fishing data set for the Indian Ocean bigeye tuna, Thunnus obesus (1952-2014)

    Science.gov (United States)

    Wibawa, Teja A.; Lehodey, Patrick; Senina, Inna

    2017-02-01

    Geo-referenced catch and fishing effort data of the bigeye tuna fisheries in the Indian Ocean over 1952-2014 were analyzed and standardized to facilitate population dynamics modeling studies. During this 62-year historical period of exploitation, many changes occurred both in the fishing techniques and the monitoring of activity. This study includes a series of processing steps used for standardization of spatial resolution, conversion and standardization of catch and effort units, raising of geo-referenced catch into nominal catch level, screening and correction of outliers, and detection of major catchability changes over long time series of fishing data, i.e., the Japanese longline fleet operating in the tropical Indian Ocean. A total of 30 fisheries were finally determined from longline, purse seine and other-gears data sets, from which 10 longline and 4 purse seine fisheries represented 96 % of the whole historical geo-referenced catch. Nevertheless, one-third of total nominal catch is still not included due to a total lack of geo-referenced information and would need to be processed separately, accordingly to the requirements of the study. The geo-referenced records of catch, fishing effort and associated length frequency samples of all fisheries are available at PANGAEA.864154" target="_blank">doi:10.1594/PANGAEA.864154.

  12. Provenance Challenges for Earth Science Dataset Publication

    Science.gov (United States)

    Tilmes, Curt

    2011-01-01

    Modern science is increasingly dependent on computational analysis of very large data sets. Organizing, referencing, publishing those data has become a complex problem. Published research that depends on such data often fails to cite the data in sufficient detail to allow an independent scientist to reproduce the original experiments and analyses. This paper explores some of the challenges related to data identification, equivalence and reproducibility in the domain of data intensive scientific processing. It will use the example of Earth Science satellite data, but the challenges also apply to other domains.

  13. Design and development of a geo-referenced database to radionuclides in food

    Science.gov (United States)

    Nascimento, L. M. E.; Ferreira, A. C. M.; Gonzalez, S. A.

    2018-03-01

    The primary purpose of the range of activities concerning the info management of the environmental assessment is to provide to scientific community an improved access to environmental data, as well as to support the decision making loop, in case of contamination events due either to accidental or intentional causes. In recent years, geotechnologies became a key reference in environmental research and monitoring, since they deliver an efficient data retrieval and subsequent processing about natural resources. This study aimed at the development of a georeferenced database (SIGLARA – SIstema Georeferenciado Latino Americano de Radionuclídeos em Alimentos), designed to radioactivity in food data storage, available in three languages (Spanish, Portuguese and English), employing free software[l].

  14. Design and development of a geo-referenced database to radionuclides in food

    Energy Technology Data Exchange (ETDEWEB)

    Nascimento, Lucia Maria Evangelista do; Ferreira, Ana Cristina de Melo; Gonzalez, Sergio de Albuquerque, E-mail: anacris@ird.gov.br [Instituto de Radioproteção e Dosimetria (RD/CNEN-RJ), Rio de Janeiro, RJ (Brazil)

    2017-07-01

    The primary purpose of the range of activities concerning the info management of the environmental assessment is to provide to scientific community an improved access to environmental data, as well as to support the decision making loop, in case of contamination events due either to accidental or intentional causes. In recent years, geotechnologies became a key reference in environmental research and monitoring, since they deliver an efficient data retrieval and subsequent processing about natural resources. This study aimed at the development of a georeferenced database (SIGLARA - Sistema Georeferenciado Latino Americano de Radionuclídeos em Alimentos), designed to radioactivity in food data storage, available in three languages (Spanish, Portuguese and English), employing free software. (author)

  15. Design and development of a geo-referenced database to radionuclides in food

    International Nuclear Information System (INIS)

    Nascimento, Lucia Maria Evangelista do; Ferreira, Ana Cristina de Melo; Gonzalez, Sergio de Albuquerque

    2017-01-01

    The primary purpose of the range of activities concerning the info management of the environmental assessment is to provide to scientific community an improved access to environmental data, as well as to support the decision making loop, in case of contamination events due either to accidental or intentional causes. In recent years, geotechnologies became a key reference in environmental research and monitoring, since they deliver an efficient data retrieval and subsequent processing about natural resources. This study aimed at the development of a georeferenced database (SIGLARA - Sistema Georeferenciado Latino Americano de Radionuclídeos em Alimentos), designed to radioactivity in food data storage, available in three languages (Spanish, Portuguese and English), employing free software. (author)

  16. Feature extraction and descriptor calculation methods for automatic georeferencing of Philippines' first microsatellite imagery

    Science.gov (United States)

    Tupas, M. E. A.; Dasallas, J. A.; Jiao, B. J. D.; Magallon, B. J. P.; Sempio, J. N. H.; Ramos, M. K. F.; Aranas, R. K. D.; Tamondong, A. M.

    2017-10-01

    The FAST-SIFT corner detector and descriptor extractor combination was used to automatically georeference DIWATA-1 Spaceborne Multispectral Imager images. Features from the Fast Accelerated Segment Test (FAST) algorithm detects corners or keypoints in an image, and these robustly detected keypoints have well-defined positions. Descriptors were computed using Scale-Invariant Feature Transform (SIFT) extractor. FAST-SIFT method effectively SMI same-subscene images detected by the NIR sensor. The method was also tested in stitching NIR images with varying subscene swept by the camera. The slave images were matched to the master image. The keypoints served as the ground control points. Random sample consensus was used to eliminate fall-out matches and ensure accuracy of the feature points from which the transformation parameters were derived. Keypoints are matched based on their descriptor vector. Nearest-neighbor matching is employed based on a metric distance between the descriptors. The metrics include Euclidean and city block, among others. Rough matching outputs not only the correct matches but also the faulty matches. A previous work in automatic georeferencing incorporates a geometric restriction. In this work, we applied a simplified version of the learning method. RANSAC was used to eliminate fall-out matches and ensure accuracy of the feature points. This method identifies if a point fits the transformation function and returns inlier matches. The transformation matrix was solved by Affine, Projective, and Polynomial models. The accuracy of the automatic georeferencing method were determined by calculating the RMSE of interest points, selected randomly, between the master image and transformed slave image.

  17. Accuracy analysis of indirect georeferencing about TH-1 satellite in Weinan test area

    International Nuclear Information System (INIS)

    Yunlan, Yang; Haiyan, Hu

    2014-01-01

    Optical linear scanning sensors can be divided into single-lens sensors and multi-lens sensors according to the number of lenses. In order to build stereo imaging, for single-lens optical systems such as aerial mapping camera ADS40 and ADS80, there are more than two parallel linear arrays placed on the focal plane. And for a multi-lens optical system there is only one linear CCD arrays placed on the center of every focal plan for each lens which is often carried on spacecraft. The difference of design between these two kinds of optical systems leads to the systematic errors, calibration in orbit and approach of data adjustment are different completely. Recent years the domestic space optical sensor systems are focused on multi-lens linear CCD sensor in China, such as TH-1 and ZY-3 both belong to multi-lens optical systems. The parameters influencing the position accuracy of the satellite system which are unknown or unknown precisely even changed after sensor posted launch can be estimated by self-calibration in orbit. So after self-calibration in orbit the accuracy of mapping satellite will often be improved strongly. Comparing to direct georeferencing, the indirect georeferencing as a research approach is introduced to TH-1 satellite in this paper considering the systematic errors completely. Parameters about geometry position systematic error are introduced to the basic co-linearity equations for multi-lenses linear array CCD sensor, and based on the extended model the method of space multi-lens linear array CCD sensor self-calibration bundle adjustment is presented. The test field is in some area of Weinan, Shaanxi province, and the observation data of GCPs and orbit are collected. The extended rigors model is used in bundle adjustment and the accuracy analysis shown that TH-1 has a satisfied metric performance

  18. The influence of the in situ camera calibration for direct georeferencing of aerial imagery

    Science.gov (United States)

    Mitishita, E.; Barrios, R.; Centeno, J.

    2014-11-01

    The direct determination of exterior orientation parameters (EOPs) of aerial images via GNSS/INS technologies is an essential prerequisite in photogrammetric mapping nowadays. Although direct sensor orientation technologies provide a high degree of automation in the process due to the GNSS/INS technologies, the accuracies of the obtained results depend on the quality of a group of parameters that models accurately the conditions of the system at the moment the job is performed. One sub-group of parameters (lever arm offsets and boresight misalignments) models the position and orientation of the sensors with respect to the IMU body frame due to the impossibility of having all sensors on the same position and orientation in the airborne platform. Another sub-group of parameters models the internal characteristics of the sensor (IOP). A system calibration procedure has been recommended by worldwide studies to obtain accurate parameters (mounting and sensor characteristics) for applications of the direct sensor orientation. Commonly, mounting and sensor characteristics are not stable; they can vary in different flight conditions. The system calibration requires a geometric arrangement of the flight and/or control points to decouple correlated parameters, which are not available in the conventional photogrammetric flight. Considering this difficulty, this study investigates the feasibility of the in situ camera calibration to improve the accuracy of the direct georeferencing of aerial images. The camera calibration uses a minimum image block, extracted from the conventional photogrammetric flight, and control point arrangement. A digital Vexcel UltraCam XP camera connected to POS AV TM system was used to get two photogrammetric image blocks. The blocks have different flight directions and opposite flight line. In situ calibration procedures to compute different sets of IOPs are performed and their results are analyzed and used in photogrammetric experiments. The IOPs

  19. Methods for Georeferencing and Spectral Scaling of Remote Imagery using ArcView, ArcGIS, and ENVI

    Science.gov (United States)

    Remote sensing images can be used to support variable-rate (VR) application of material from aircraft. Geographic coordinates must be assigned to an image (georeferenced) so that the variable-rate system can determine where in the field to apply these inputs and adjust the system when a zone has bee...

  20. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Georeferenced measurement of soil EC as a tool to detect susceptible areas to water erosion.

    Science.gov (United States)

    Fabian Sallesses, Leonardo; Aparicio, Virginia Carolina; Costa, Jose Luis

    2017-04-01

    The Southeast region of Buenos Aires Province, Argentina, is one of the main region for the cultivation of potato (Solanum tuberosum L.) in that country. The implementation of complementary irrigation for potato cultivation meant an increase in yield of up to 60%. Therefore, all potato production in the region is under irrigation. In this way, the area under central pivot irrigation has increased to 150% in the last two decades. The water used for irrigation in that region is underground with a high concentration of sodium bicarbonate. The combination of irrigation and rain increases the sodium absorption ratio of soil (SARs), consequently raising the clay dispersion and reducing infiltration. A reduction in infiltration means greater partitioning of precipitation into runoff. The degree of slope of the terrain, added to its length, increases the erosive potential of runoff water. The content of dissolved salts, in combination with the water content, affect the apparent Electrical Conductivity of the soil (EC), which is directly related to the concentration of Na + 2 in the soil solution. In August 2016, severe rill erosion was detected in a productive plot of 300 ha. The predecessor crop was a potato under irrigation campaign. However the history of the lot consists of various winter and summer crops, always made in dry land and no till. Cumulative rainfall from harvest to erosion detection (four months) was 250 mm. A georeferenced EC measurement was performed using the Verys 3100® contact sensor. With the data obtained, a geostatistical analysis was performed using Kriging spatial interpolation. The maps obtained were processed, dividing them into 4 EC ranges. The values and amplitude of the CEa ranges for each lot were determined according to the distribution observed in the generated histograms. It was observed a distribution of elevated EC ranges and consequently of a higher concentration of Na+ 2 coincident with the irrigation areas of the pivots. These

  2. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  3. Old Persian corpus [Dataset

    NARCIS (Netherlands)

    Bavant, M.

    2011-01-01

    XML Old Persian corpus. The corpus is based on publicly available data on the Web. Those data can be traced back to the grammar of Old Persian by Kent (1950). The corpus contains those data and is arranged in a way suitable for corpus searches.

  4. Low aerial imagery - an assessment of georeferencing errors and the potential for use in environmental inventory

    Science.gov (United States)

    Smaczyński, Maciej; Medyńska-Gulij, Beata

    2017-06-01

    Unmanned aerial vehicles are increasingly being used in close range photogrammetry. Real-time observation of the Earth's surface and the photogrammetric images obtained are used as material for surveying and environmental inventory. The following study was conducted on a small area (approximately 1 ha). In such cases, the classical method of topographic mapping is not accurate enough. The geodetic method of topographic surveying, on the other hand, is an overly precise measurement technique for the purpose of inventorying the natural environment components. The author of the following study has proposed using the unmanned aerial vehicle technology and tying in the obtained images to the control point network established with the aid of GNSS technology. Georeferencing the acquired images and using them to create a photogrammetric model of the studied area enabled the researcher to perform calculations, which yielded a total root mean square error below 9 cm. The performed comparison of the real lengths of the vectors connecting the control points and their lengths calculated on the basis of the photogrammetric model made it possible to fully confirm the RMSE calculated and prove the usefulness of the UAV technology in observing terrain components for the purpose of environmental inventory. Such environmental components include, among others, elements of road infrastructure, green areas, but also changes in the location of moving pedestrians and vehicles, as well as other changes in the natural environment that are not registered on classical base maps or topographic maps.

  5. An Emergency Georeferencing Framework for GF-4 Imagery Based on GCP Prediction and Dynamic RPC Refinement

    Directory of Open Access Journals (Sweden)

    Pengfei Li

    2017-10-01

    Full Text Available GaoFen-4 (GF-4 imagery has very potential in terms of emergency response due to its gazing mode. However, only poor geometric accuracy can be obtained using the rational polynomial coefficient (RPC parameters provided, making ground control points (GCPs necessary for emergency response. However, selecting GCPs is traditionally time-consuming, labor-intensive, and not fully reliable. This is mainly due to the facts that (1 manual GCP selection is time-consuming and cumbersome because of too many human interventions, especially for the first few GCPs; (2 typically, GF-4 gives planar array imagery acquired at rather large tilt angles, and the distortion introduces problems in image matching; (3 reference data will not always be available, especially under emergency circumstances. This paper provides a novel emergency georeferencing framework for GF-4 Level 1 imagery. The key feature is GCP prediction based on dynamic RPC refinement, which is able to predict even the first GCP and the prediction will be dynamically refined as the selection goes on. This is done by two techniques: (1 GCP prediction using RPC parameters and (2 dynamic RPC refinement using as few as only one GCP. Besides, online map services are also adopted to automatically provide reference data. Experimental results show that (1 GCP predictions improve using dynamic RPC refinement; (2 GCP selection becomes more efficient with GCP prediction; (3 the integration of online map services constitutes a good example for emergency response.

  6. GEOREFERENCING IN GNSS-CHALLENGED ENVIRONMENT: INTEGRATING UWB AND IMU TECHNOLOGIES

    Directory of Open Access Journals (Sweden)

    C. K. Toth

    2017-05-01

    Full Text Available Acquiring geospatial data in GNSS compromised environments remains a problem in mapping and positioning in general. Urban canyons, heavily vegetated areas, indoor environments represent different levels of GNSS signal availability from weak to no signal reception. Even outdoors, with multiple GNSS systems, with an ever-increasing number of satellites, there are many situations with limited or no access to GNSS signals. Independent navigation sensors, such as IMU can provide high-data rate information but their initial accuracy degrades quickly, as the measurement data drift over time unless positioning fixes are provided from another source. At The Ohio State University’s Satellite Positioning and Inertial Navigation (SPIN Laboratory, as one feasible solution, Ultra- Wideband (UWB radio units are used to aid positioning and navigating in GNSS compromised environments, including indoor and outdoor scenarios. Here we report about experiences obtained with georeferencing a pushcart based sensor system under canopied areas. The positioning system is based on UWB and IMU sensor integration, and provides sensor platform orientation for an electromagnetic inference (EMI sensor. Performance evaluation results are provided for various test scenarios, confirming acceptable results for applications where high accuracy is not required.

  7. A Geo-referenced 3D model of the Juan de Fuca Slab and associated seismicity

    Science.gov (United States)

    Blair, J.L.; McCrory, P.A.; Oppenheimer, D.H.; Waldhauser, F.

    2011-01-01

    We present a Geographic Information System (GIS) of a new 3-dimensional (3D) model of the subducted Juan de Fuca Plate beneath western North America and associated seismicity of the Cascadia subduction system. The geo-referenced 3D model was constructed from weighted control points that integrate depth information from hypocenter locations and regional seismic velocity studies. We used the 3D model to differentiate earthquakes that occur above the Juan de Fuca Plate surface from earthquakes that occur below the plate surface. This GIS project of the Cascadia subduction system supersedes the one previously published by McCrory and others (2006). Our new slab model updates the model with new constraints. The most significant updates to the model include: (1) weighted control points to incorporate spatial uncertainty, (2) an additional gridded slab surface based on the Generic Mapping Tools (GMT) Surface program which constructs surfaces based on splines in tension (see expanded description below), (3) double-differenced hypocenter locations in northern California to better constrain slab location there, and (4) revised slab shape based on new hypocenter profiles that incorporate routine depth uncertainties as well as data from new seismic-reflection and seismic-refraction studies. We also provide a 3D fly-through animation of the model for use as a visualization tool.

  8. Obesity and fast food in urban markets: a new approach using geo-referenced micro data.

    Science.gov (United States)

    Chen, Susan Elizabeth; Florax, Raymond J; Snyder, Samantha D

    2013-07-01

    This paper presents a new method of assessing the relationship between features of the built environment and obesity, particularly in urban areas. Our empirical application combines georeferenced data on the location of fast-food restaurants with data about personal health, behavioral, and neighborhood characteristics. We define a 'local food environment' for every individual utilizing buffers around a person's home address. Individual food landscapes are potentially endogenous because of spatial sorting of the population and food outlets, and the body mass index (BMI) values for individuals living close to each other are likely to be spatially correlated because of observed and unobserved individual and neighborhood effects. The potential biases associated with endogeneity and spatial correlation are handled using spatial econometric estimation techniques. Our application provides quantitative estimates of the effect of proximity to fast-food restaurants on obesity in an urban food market. We also present estimates of a policy simulation that focuses on reducing the density of fast-food restaurants in urban areas. In the simulations, we account for spatial heterogeneity in both the policy instruments and individual neighborhoods and find a small effect for the hypothesized relationships between individual BMI values and the density of fast-food restaurants. Copyright © 2012 John Wiley & Sons, Ltd.

  9. Accurate Reconstruction of the Roman Circus in Milan by Georeferencing Heterogeneous Data Sources with GIS

    Directory of Open Access Journals (Sweden)

    Gabriele Guidi

    2017-09-01

    Full Text Available This paper presents the methodological approach and the actual workflow for creating the 3D digital reconstruction in time of the ancient Roman Circus of Milan, which is presently covered completely by the urban fabric of the modern city. The diachronic reconstruction is based on a proper mix of quantitative data originated by current 3D surveys and historical sources, such as ancient maps, drawings, archaeological reports, restrictions decrees, and old photographs. When possible, such heterogeneous sources have been georeferenced and stored in a GIS system. In this way the sources have been analyzed in depth, allowing the deduction of geometrical information not explicitly revealed by the material available. A reliable reconstruction of the area in different historical periods has been therefore hypothesized. This research has been carried on in the framework of the project Cultural Heritage Through Time—CHT2, funded by the Joint Programming Initiative on Cultural Heritage (JPI-CH, supported by the Italian Ministry for Cultural Heritage (MiBACT, the Italian Ministry for University and Research (MIUR, and the European Commission.

  10. Direct Georeferencing of a Pushbroom, Lightweight Hyperspectral System for Mini-UAV Applications

    Directory of Open Access Journals (Sweden)

    Marion Jaud

    2018-01-01

    Full Text Available Hyperspectral imagery has proven its potential in many research applications, especially in the field of environmental sciences. Currently, hyperspectral imaging is generally performed by satellite or aircraft platforms, but mini-UAV (Unmanned Aerial Vehicle platforms (<20 kg are now emerging. On such platforms, payload restrictions are critical, so sensors must be selected according to stringent specifications. This article presents the integration of a light pushbroom hyperspectral sensor onboard a multirotor UAV, which we have called Hyper-DRELIO (Hyperspectral DRone for Environmental and LIttoral Observations. This article depicts the system design: the UAV platform, the imaging module, the navigation module, and the interfacing between the different elements. Pushbroom sensors offer a better combination of spatial and spectral resolution than full-frame cameras. Nevertheless, data georectification has to be performed line by line, the quality of direct georeferencing being limited by mechanical stability, good timing accuracy, and the resolution and accuracy of the proprioceptive sensors. A georegistration procedure is proposed for geometrical pre-processing of hyperspectral data. The specifications of Hyper-DRELIO surveys are described through two examples of surveys above coastal or inland waters, with different flight altitudes. This system can collect hyperspectral data in VNIR (Visible and Near InfraRed domain above small study sites (up to about 4 ha with both high spatial resolution (<10 cm and high spectral resolution (1.85 nm and with georectification accuracy on the order of 1 to 2 m.

  11. An Imaging Sensor-Aided Vision Navigation Approach that Uses a Geo-Referenced Image Database.

    Science.gov (United States)

    Li, Yan; Hu, Qingwu; Wu, Meng; Gao, Yang

    2016-01-28

    In determining position and attitude, vision navigation via real-time image processing of data collected from imaging sensors is advanced without a high-performance global positioning system (GPS) and an inertial measurement unit (IMU). Vision navigation is widely used in indoor navigation, far space navigation, and multiple sensor-integrated mobile mapping. This paper proposes a novel vision navigation approach aided by imaging sensors and that uses a high-accuracy geo-referenced image database (GRID) for high-precision navigation of multiple sensor platforms in environments with poor GPS. First, the framework of GRID-aided vision navigation is developed with sequence images from land-based mobile mapping systems that integrate multiple sensors. Second, a highly efficient GRID storage management model is established based on the linear index of a road segment for fast image searches and retrieval. Third, a robust image matching algorithm is presented to search and match a real-time image with the GRID. Subsequently, the image matched with the real-time scene is considered to calculate the 3D navigation parameter of multiple sensor platforms. Experimental results show that the proposed approach retrieves images efficiently and has navigation accuracies of 1.2 m in a plane and 1.8 m in height under GPS loss in 5 min and within 1500 m.

  12. The Direct Georeferencing Application and Performance Analysis of Uav Helicopter in Gcp-Free Area

    Science.gov (United States)

    Lo, C. F.; Tsai, M. L.; Chiang, K. W.; Chu, C. H.; Tsai, G. J.; Cheng, C. K.; El-Sheimy, N.; Ayman, H.

    2015-08-01

    There are many disasters happened because the weather changes extremely in these years. To facilitate applications such as environment detection or monitoring becomes very important. Therefore, the development of rapid low cost systems for collecting near real-time spatial information is very critical. Rapid spatial information collection has become an emerging trend for remote sensing and mapping applications. This study develops a Direct Georeferencing (DG) based Unmanned Aerial Vehicle (UAV) helicopter photogrammetric platform where an Inertial Navigation System (INS)/Global Navigation Satellite System (GNSS) integrated Positioning and Orientation System (POS) system is implemented to provide the DG capability of the platform. The performance verification indicates that the proposed platform can capture aerial images successfully. A flight test is performed to verify the positioning accuracy in DG mode without using Ground Control Points (GCP). The preliminary results illustrate that horizontal DG positioning accuracies in the x and y axes are around 5 meter with 100 meter flight height. The positioning accuracy in the z axis is less than 10 meter. Such accuracy is good for near real-time disaster relief. The DG ready function of proposed platform guarantees mapping and positioning capability even in GCP free environments, which is very important for rapid urgent response for disaster relief. Generally speaking, the data processing time for the DG module, including POS solution generalization, interpolation, Exterior Orientation Parameters (EOP) generation, and feature point measurements, is less than 1 hour.

  13. The Development of an UAV Borne Direct Georeferenced Photogrammetric Platform for Ground Control Point Free Applications

    Directory of Open Access Journals (Sweden)

    Chien-Hsun Chu

    2012-07-01

    Full Text Available To facilitate applications such as environment detection or disaster monitoring, the development of rapid low cost systems for collecting near real time spatial information is very critical. Rapid spatial information collection has become an emerging trend for remote sensing and mapping applications. In this study, a fixed-wing Unmanned Aerial Vehicle (UAV-based spatial information acquisition platform that can operate in Ground Control Point (GCP free environments is developed and evaluated. The proposed UAV based photogrammetric platform has a Direct Georeferencing (DG module that includes a low cost Micro Electro Mechanical Systems (MEMS Inertial Navigation System (INS/ Global Positioning System (GPS integrated system. The DG module is able to provide GPS single frequency carrier phase measurements for differential processing to obtain sufficient positioning accuracy. All necessary calibration procedures are implemented. Ultimately, a flight test is performed to verify the positioning accuracy in DG mode without using GCPs. The preliminary results of positioning accuracy in DG mode illustrate that horizontal positioning accuracies in the x and y axes are around 5 m at 300 m flight height above the ground. The positioning accuracy of the z axis is below 10 m. Therefore, the proposed platform is relatively safe and inexpensive for collecting critical spatial information for urgent response such as disaster relief and assessment applications where GCPs are not available.

  14. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  15. Automated Generation of Geo-Referenced Mosaics From Video Data Collected by Deep-Submergence Vehicles: Preliminary Results

    Science.gov (United States)

    Rhzanov, Y.; Beaulieu, S.; Soule, S. A.; Shank, T.; Fornari, D.; Mayer, L. A.

    2005-12-01

    Many advances in understanding geologic, tectonic, biologic, and sedimentologic processes in the deep ocean are facilitated by direct observation of the seafloor. However, making such observations is both difficult and expensive. Optical systems (e.g., video, still camera, or direct observation) will always be constrained by the severe attenuation of light in the deep ocean, limiting the field of view to distances that are typically less than 10 meters. Acoustic systems can 'see' much larger areas, but at the cost of spatial resolution. Ultimately, scientists want to study and observe deep-sea processes in the same way we do land-based phenomena so that the spatial distribution and juxtaposition of processes and features can be resolved. We have begun development of algorithms that will, in near real-time, generate mosaics from video collected by deep-submergence vehicles. Mosaics consist of >>10 video frames and can cover 100's of square-meters. This work builds on a publicly available still and video mosaicking software package developed by Rzhanov and Mayer. Here we present the results of initial tests of data collection methodologies (e.g., transects across the seafloor and panoramas across features of interest), algorithm application, and GIS integration conducted during a recent cruise to the Eastern Galapagos Spreading Center (0 deg N, 86 deg W). We have developed a GIS database for the region that will act as a means to access and display mosaics within a geospatially-referenced framework. We have constructed numerous mosaics using both video and still imagery and assessed the quality of the mosaics (including registration errors) under different lighting conditions and with different navigation procedures. We have begun to develop algorithms for efficient and timely mosaicking of collected video as well as integration with navigation data for georeferencing the mosaics. Initial results indicate that operators must be properly versed in the control of the

  16. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  17. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  18. Assessing the Accuracy of Georeferenced Point Clouds Produced via Multi-View Stereopsis from Unmanned Aerial Vehicle (UAV Imagery

    Directory of Open Access Journals (Sweden)

    Arko Lucieer

    2012-05-01

    Full Text Available Sensor miniaturisation, improved battery technology and the availability of low-cost yet advanced Unmanned Aerial Vehicles (UAV have provided new opportunities for environmental remote sensing. The UAV provides a platform for close-range aerial photography. Detailed imagery captured from micro-UAV can produce dense point clouds using multi-view stereopsis (MVS techniques combining photogrammetry and computer vision. This study applies MVS techniques to imagery acquired from a multi-rotor micro-UAV of a natural coastal site in southeastern Tasmania, Australia. A very dense point cloud ( < 1–3 cm point spacing is produced in an arbitrary coordinate system using full resolution imagery, whereas other studies usually downsample the original imagery. The point cloud is sparse in areas of complex vegetation and where surfaces have a homogeneous texture. Ground control points collected with Differential Global Positioning System (DGPS are identified and used for georeferencing via a Helmert transformation. This study compared georeferenced point clouds to a Total Station survey in order to assess and quantify their geometric accuracy. The results indicate that a georeferenced point cloud accurate to 25–40 mm can be obtained from imagery acquired from 50 m. UAV-based image capture provides the spatial and temporal resolution required to map and monitor natural landscapes. This paper assesses the accuracy of the generated point clouds based on field survey points. Based on our key findings we conclude that sub-decimetre terrain change (in this case coastal erosion can be monitored.

  19. Public Schools

    Data.gov (United States)

    Department of Homeland Security — This Public Schools feature dataset is composed of all Public elementary and secondary education in the United States as defined by the Common Core of Data, National...

  20. Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

    NARCIS (Netherlands)

    Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

    2017-01-01

    Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

  1. Preparedness for the Rio 2016 Olympic Games: hospital treatment capacity in georeferenced areas

    Directory of Open Access Journals (Sweden)

    Carolina Figueiredo Freitas

    2016-01-01

    Full Text Available Abstract: Recently, Brazil has hosted mass events with recognized international relevance. The 2014 FIFA World Cup was held in 12 Brazilian state capitals and health sector preparedness drew on the history of other World Cups and Brazil's own experience with the 2013 FIFA Confederations Cup. The current article aims to analyze the treatment capacity of hospital facilities in georeferenced areas for sports events in the 2016 Olympic Games in the city of Rio de Janeiro, based on a model built drawing on references from the literature. Source of data were Brazilian health databases and the Rio 2016 website. Sports venues for the Olympic Games and surrounding hospitals in a 10km radius were located by geoprocessing and designated a "health area" referring to the probable inflow of persons to be treated in case of hospital referral. Six different factors were used to calculate needs for surge and one was used to calculate needs in case of disasters (20/1,000. Hospital treatment capacity is defined by the coincidence of beds and life support equipment, namely the number of cardiac monitors (electrocardiographs and ventilators in each hospital unit. Maracanã followed by the Olympic Stadium (Engenhão and the Sambódromo would have the highest single demand for hospitalizations (1,572, 1,200 and 600, respectively. Hospital treatment capacity proved capable of accommodating surges, but insufficient in cases of mass casualties. In mass events most treatments involve easy clinical management, it is expected that the current capacity will not have negative consequences for participants.

  2. Geo-referenced modelling of metal concentrations in river basins at the catchment scale

    Science.gov (United States)

    Hüffmeyer, N.; Berlekamp, J.; Klasmeier, J.

    2009-04-01

    1. Introduction The European Water Framework Directive demands the good ecological and chemical state of surface waters [1]. This implies the reduction of unwanted metal concentrations in surface waters. To define reasonable environmental target values and to develop promising mitigation strategies a detailed exposure assessment is required. This includes the identification of emission sources and the evaluation of their effect on local and regional surface water concentrations. Point source emissions via municipal or industrial wastewater that collect metal loads from a wide variety of applications and products are important anthropogenic pathways into receiving waters. Natural background and historical influences from ore-mining activities may be another important factor. Non-point emissions occur via surface runoff and erosion from drained land area. Besides deposition metals can be deposited by fertilizer application or the use of metal products such as wires or metal fences. Surface water concentrations vary according to the emission strength of sources located nearby and upstream of the considered location. A direct link between specific emission sources and pathways on the one hand and observed concentrations can hardly be established by monitoring alone. Geo-referenced models such as GREAT-ER (Geo-referenced Regional Exposure Assessment Tool for European Rivers) deliver spatially resolved concentrations in a whole river basin and allow for evaluating the causal relationship between specific emissions and resulting concentrations. This study summarizes the results of investigations for the metals zinc and copper in three German catchments. 2. The model GREAT-ER The geo-referenced model GREAT-ER has originally been developed to simulate and assess chemical burden of European river systems from multiple emission sources [2]. Emission loads from private households and rainwater runoff are individually estimated based on average consumption figures, runoff rates

  3. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  4. Ecohydrological Index, Native Fish, and Climate Trends and Relationships in the Kansas River Basin_dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset is an excel file that contain data for the figures in the manuscript. This dataset is associated with the following publication: Sinnathamby, S., K....

  5. Geoinformation web-system for processing and visualization of large archives of geo-referenced data

    Science.gov (United States)

    Gordov, E. P.; Okladnikov, I. G.; Titov, A. G.; Shulgina, T. M.

    2010-12-01

    Developed working model of information-computational system aimed at scientific research in area of climate change is presented. The system will allow processing and analysis of large archives of geophysical data obtained both from observations and modeling. Accumulated experience of developing information-computational web-systems providing computational processing and visualization of large archives of geo-referenced data was used during the implementation (Gordov et al, 2007; Okladnikov et al, 2008; Titov et al, 2009). Functional capabilities of the system comprise a set of procedures for mathematical and statistical analysis, processing and visualization of data. At present five archives of data are available for processing: 1st and 2nd editions of NCEP/NCAR Reanalysis, ECMWF ERA-40 Reanalysis, JMA/CRIEPI JRA-25 Reanalysis, and NOAA-CIRES XX Century Global Reanalysis Version I. To provide data processing functionality a computational modular kernel and class library providing data access for computational modules were developed. Currently a set of computational modules for climate change indices approved by WMO is available. Also a special module providing visualization of results and writing to Encapsulated Postscript, GeoTIFF and ESRI shape files was developed. As a technological basis for representation of cartographical information in Internet the GeoServer software conforming to OpenGIS standards is used. Integration of GIS-functionality with web-portal software to provide a basis for web-portal’s development as a part of geoinformation web-system is performed. Such geoinformation web-system is a next step in development of applied information-telecommunication systems offering to specialists from various scientific fields unique opportunities of performing reliable analysis of heterogeneous geophysical data using approved computational algorithms. It will allow a wide range of researchers to work with geophysical data without specific programming

  6. Development of a georeferenced data bank of radionuclides in typical food of Latin America - SIGLARA; Desenvolvimento de um banco de dados georeferenciado de radionuclideos em alimentos tipicos na America Latina - SIGLARA

    Energy Technology Data Exchange (ETDEWEB)

    Nascimento, Lucia Maria Evangelista do

    2014-07-01

    The related management information related to the environmental assessment activity aims to provide the world community with better access to meaningful environmental information and help use this information in making decisions in case of contamination due to accident or deliberate actions. In recent years, the geotechnologies acquired are fundamental to research and environmental monitoring, once it possible, efficiently obtaining large amount of data natural resources. The result of this work was the development of a database system to store georeferenced data values of radionuclides in typical foods in Latin America (SIGLARA), defined in three languages (Spanish, Portuguese and English), using free software. The developed system meets the primary need of the RLA 09/72 ARCAL Project, funded by the International Atomic Energy Agency (IAEA), as having eleven participants countries in Latin America. The database of georeferenced created for SIGLARA system was tested in its applicability through the entry and manipulation of real data analyzed, which showed that the system is able to store, retrieve, view reports and maps of the samples of registered food. Interfaces that connect the user with the database show up efficient, making the system easy operability. Their application to environmental management is already showing results, it is hoped that these results will encourage its widespread adoption by other countries, institutions, the scientific community and the general public. (author)

  7. Publicity.

    Science.gov (United States)

    Chisholm, Joan

    Publicity for preschool cooperatives is described. Publicity helps produce financial support for preschool cooperatives. It may take the form of posters, brochures, newsletters, open house, newspaper coverage, and radio and television. Word of mouth and general good will in the community are the best avenues of publicity that a cooperative nursery…

  8. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  9. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  10. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  11. Sensor Fusion of a Mobile Device to Control and Acquire Videos or Images of Coffee Branches and for Georeferencing Trees

    Directory of Open Access Journals (Sweden)

    Paula Jimena Ramos Giraldo

    2017-04-01

    Full Text Available Smartphones show potential for controlling and monitoring variables in agriculture. Their processing capacity, instrumentation, connectivity, low cost, and accessibility allow farmers (among other users in rural areas to operate them easily with applications adjusted to their specific needs. In this investigation, the integration of inertial sensors, a GPS, and a camera are presented for the monitoring of a coffee crop. An Android-based application was developed with two operating modes: (i Navigation: for georeferencing trees, which can be as close as 0.5 m from each other; and (ii Acquisition: control of video acquisition, based on the movement of the mobile device over a branch, and measurement of image quality, using clarity indexes to select the most appropriate frames for application in future processes. The integration of inertial sensors in navigation mode, shows a mean relative error of ±0.15 m, and total error ±5.15 m. In acquisition mode, the system correctly identifies the beginning and end of mobile phone movement in 99% of cases, and image quality is determined by means of a sharpness factor which measures blurriness. With the developed system, it will be possible to obtain georeferenced information about coffee trees, such as their production, nutritional state, and presence of plagues or diseases.

  12. Sensor Fusion of a Mobile Device to Control and Acquire Videos or Images of Coffee Branches and for Georeferencing Trees.

    Science.gov (United States)

    Giraldo, Paula Jimena Ramos; Aguirre, Álvaro Guerrero; Muñoz, Carlos Mario; Prieto, Flavio Augusto; Oliveros, Carlos Eugenio

    2017-04-06

    Smartphones show potential for controlling and monitoring variables in agriculture. Their processing capacity, instrumentation, connectivity, low cost, and accessibility allow farmers (among other users in rural areas) to operate them easily with applications adjusted to their specific needs. In this investigation, the integration of inertial sensors, a GPS, and a camera are presented for the monitoring of a coffee crop. An Android-based application was developed with two operating modes: ( i ) Navigation: for georeferencing trees, which can be as close as 0.5 m from each other; and ( ii ) Acquisition: control of video acquisition, based on the movement of the mobile device over a branch, and measurement of image quality, using clarity indexes to select the most appropriate frames for application in future processes. The integration of inertial sensors in navigation mode, shows a mean relative error of ±0.15 m, and total error ±5.15 m. In acquisition mode, the system correctly identifies the beginning and end of mobile phone movement in 99% of cases, and image quality is determined by means of a sharpness factor which measures blurriness. With the developed system, it will be possible to obtain georeferenced information about coffee trees, such as their production, nutritional state, and presence of plagues or diseases.

  13. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  14. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  15. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  16. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  17. DactyLoc : A minimally geo-referenced WiFi+GSM-fingerprint-based localization method for positioning in urban spaces

    DEFF Research Database (Denmark)

    Cujia, Kristian; Wirz, Martin; Kjærgaard, Mikkel Baun

    2012-01-01

    Fingerprinting-based localization methods relying on WiFi and GSM information provide sufficient localization accuracy for many mobile phone applications. Most of the existing approaches require a training set consisting of geo-referenced fingerprints to build a reference database. We propose...... a collaborative, semi-supervised WiFi+GSM fingerprinting method where only a small fraction of all fingerprints needs to be geo-referenced. Our approach enables indexing of areas in the absence of GPS reception as often found in urban spaces and indoors without manual labeling of fingerprints. The method takes...

  18. AN ACCURACY ASSESSMENT OF GEOREFERENCED POINT CLOUDS PRODUCED VIA MULTI-VIEW STEREO TECHNIQUES APPLIED TO IMAGERY ACQUIRED VIA UNMANNED AERIAL VEHICLE

    Directory of Open Access Journals (Sweden)

    S. Harwin

    2012-08-01

    Full Text Available Low-cost Unmanned Aerial Vehicles (UAVs are becoming viable environmental remote sensing tools. Sensor and battery technology is expanding the data capture opportunities. The UAV, as a close range remote sensing platform, can capture high resolution photography on-demand. This imagery can be used to produce dense point clouds using multi-view stereopsis techniques (MVS combining computer vision and photogrammetry. This study examines point clouds produced using MVS techniques applied to UAV and terrestrial photography. A multi-rotor micro UAV acquired aerial imagery from a altitude of approximately 30–40 m. The point clouds produced are extremely dense (<1–3 cm point spacing and provide a detailed record of the surface in the study area, a 70 m section of sheltered coastline in southeast Tasmania. Areas with little surface texture were not well captured, similarly, areas with complex geometry such as grass tussocks and woody scrub were not well mapped. The process fails to penetrate vegetation, but extracts very detailed terrain in unvegetated areas. Initially the point clouds are in an arbitrary coordinate system and need to be georeferenced. A Helmert transformation is applied based on matching ground control points (GCPs identified in the point clouds to GCPs surveying with differential GPS. These point clouds can be used, alongside laser scanning and more traditional techniques, to provide very detailed and precise representations of a range of landscapes at key moments. There are many potential applications for the UAV-MVS technique, including coastal erosion and accretion monitoring, mine surveying and other environmental monitoring applications. For the generated point clouds to be used in spatial applications they need to be converted to surface models that reduce dataset size without loosing too much detail. Triangulated meshes are one option, another is Poisson Surface Reconstruction. This latter option makes use of point normal

  19. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  20. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  1. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  2. Psychogeography in the Age of the Quantified Self — Mental Map modelling with Georeferenced Personal Activity Data

    Science.gov (United States)

    Meier, Sebastian; Glinka, Katrin

    2018-05-01

    Personal and subjective perceptions of urban space have been a focus of various research projects in the area of cartography, geography, and related fields such as urban planning. This paper illustrates how personal georeferenced activity data can be used in algorithmic modelling of certain aspects of mental maps and customised spatial visualisations. The technical implementation of the algorithm is accompanied by a preliminary study which evaluates the performance of the algorithm. As a linking element between personal perception, interpretation, and depiction of space and the field of cartography and geography, we include perspectives from artistic practice and cultural theory. By developing novel visualisation concepts based on personal data, the paper in part mitigates the challenges presented by user modelling that is, amongst others, used in LBS applications.

  3. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  4. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  5. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Science.gov (United States)

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  6. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  7. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  8. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  9. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  10. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  11. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  12. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  13. PLÉIADES PROJECT: ASSESSMENT OF GEOREFERENCING ACCURACY, IMAGE QUALITY, PANSHARPENING PERFORMENCE AND DSM/DTM QUALITY

    Directory of Open Access Journals (Sweden)

    H. Topan

    2016-06-01

    Full Text Available Pléiades 1A and 1B are twin optical satellites of Optical and Radar Federated Earth Observation (ORFEO program jointly running by France and Italy. They are the first satellites of Europe with sub-meter resolution. Airbus DS (formerly Astrium Geo runs a MyGIC (formerly Pléiades Users Group program to validate Pléiades images worldwide for various application purposes. The authors conduct three projects, one is within this program, the second is supported by BEU Scientific Research Project Program, and the third is supported by TÜBİTAK. Assessment of georeferencing accuracy, image quality, pansharpening performance and Digital Surface Model/Digital Terrain Model (DSM/DTM quality subjects are investigated in these projects. For these purposes, triplet panchromatic (50 cm Ground Sampling Distance (GSD and VNIR (2 m GSD Pléiades 1A images were investigated over Zonguldak test site (Turkey which is urbanised, mountainous and covered by dense forest. The georeferencing accuracy was estimated with a standard deviation in X and Y (SX, SY in the range of 0.45m by bias corrected Rational Polynomial Coefficient (RPC orientation, using ~170 Ground Control Points (GCPs. 3D standard deviation of ±0.44m in X, ±0.51m in Y, and ±1.82m in Z directions have been reached in spite of the very narrow angle of convergence by bias corrected RPC orientation. The image quality was also investigated with respect to effective resolution, Signal to Noise Ratio (SNR and blur coefficient. The effective resolution was estimated with factor slightly below 1.0, meaning that the image quality corresponds to the nominal resolution of 50cm. The blur coefficients were achieved between 0.39-0.46 for triplet panchromatic images, indicating a satisfying image quality. SNR is in the range of other comparable space borne images which may be caused by de-noising of Pléiades images. The pansharpened images were generated by various methods, and are validated by most common

  14. Automatic 3D relief acquisition and georeferencing of road sides by low-cost on-motion SfM

    Science.gov (United States)

    Voumard, Jérémie; Bornemann, Perrick; Malet, Jean-Philippe; Derron, Marc-Henri; Jaboyedoff, Michel

    2017-04-01

    3D terrain relief acquisition is important for a large part of geosciences. Several methods have been developed to digitize terrains, such as total station, LiDAR, GNSS or photogrammetry. To digitize road (or rail tracks) sides on long sections, mobile spatial imaging system or UAV are commonly used. In this project, we compare a still fairly new method -the SfM on-motion technics- with some traditional technics of terrain digitizing (terrestrial laser scanning, traditional SfM, UAS imaging solutions, GNSS surveying systems and total stations). The SfM on-motion technics generates 3D spatial data by photogrammetric processing of images taken from a moving vehicle. Our mobile system consists of six action cameras placed on a vehicle. Four fisheye cameras mounted on a mast on the vehicle roof are placed at 3.2 meters above the ground. Three of them have a GNNS chip providing geotagged images. Two pictures were acquired every second by each camera. 4K resolution fisheye videos were also used to extract 8.3M not geotagged pictures. All these pictures are then processed with the Agisoft PhotoScan Professional software. Results from the SfM on-motion technics are compared with results from classical SfM photogrammetry on a 500 meters long alpine track. They were also compared with mobile laser scanning data on the same road section. First results seem to indicate that slope structures are well observable up to decimetric accuracy. For the georeferencing, the planimetric (XY) accuracy of few meters is much better than the altimetric (Z) accuracy. There is indeed a Z coordinate shift of few tens of meters between GoPro cameras and Garmin camera. This makes necessary to give a greater freedom to altimetric coordinates in the processing software. Benefits of this low-cost SfM on-motion method are: 1) a simple setup to use in the field (easy to switch between vehicle types as car, train, bike, etc.), 2) a low cost and 3) an automatic georeferencing of 3D points clouds. Main

  15. Virtual georeferencing : how an 11-eyed video camera helps the pipeline industry

    Energy Technology Data Exchange (ETDEWEB)

    Ball, C.

    2006-08-15

    Designed by Immersive Media Corporation (IMC) the new Telemmersion System is a lightweight camera system capable of generating synchronized high-resolution video streams that represent a full motion spherical world. With 11 cameras configured in a sphere, the system is portable and can be easily mounted on ground and air-borne vehicles for use in surveillance; integration; commanded control of interactive intelligent databases; scenario modelling; and specialized training. The system was recently used to georeference immersive video of existing and proposed pipeline routes. Footage from the system was used to supplement traditional data collection methods for environmental impact assessment; route selection and engineering; regulatory compliance; and public consultation. Traditionally, the medium used to visualize pipeline routes is a map overlaid with aerial photography. Pipeline routes are typically flown throughout the planning stages in order to give environmentalists, engineers and other personnel the opportunity to visually inspect the terrain, identify issues and make decisions. The technology has significantly reduced the costs during the planning stages of pipeline routes, as the remote footage can be stored on DVD, allowing various stakeholders and contractors to view the terrain and zoom in on particular land features. During one recent 3-day trip, 500 km of proposed pipeline was recorded using the technology. Typically, for various regulatory and environmental requirements, a half a dozen trips would have been required. 2 figs.

  16. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  17. Low aerial imagery – an assessment of georeferencing errors and the potential for use in environmental inventory

    Directory of Open Access Journals (Sweden)

    Smaczyński Maciej

    2017-06-01

    Full Text Available Unmanned aerial vehicles are increasingly being used in close range photogrammetry. Real-time observation of the Earth’s surface and the photogrammetric images obtained are used as material for surveying and environmental inventory. The following study was conducted on a small area (approximately 1 ha. In such cases, the classical method of topographic mapping is not accurate enough. The geodetic method of topographic surveying, on the other hand, is an overly precise measurement technique for the purpose of inventorying the natural environment components. The author of the following study has proposed using the unmanned aerial vehicle technology and tying in the obtained images to the control point network established with the aid of GNSS technology. Georeferencing the acquired images and using them to create a photogrammetric model of the studied area enabled the researcher to perform calculations, which yielded a total root mean square error below 9 cm. The performed comparison of the real lengths of the vectors connecting the control points and their lengths calculated on the basis of the photogrammetric model made it possible to fully confirm the RMSE calculated and prove the usefulness of the UAV technology in observing terrain components for the purpose of environmental inventory. Such environmental components include, among others, elements of road infrastructure, green areas, but also changes in the location of moving pedestrians and vehicles, as well as other changes in the natural environment that are not registered on classical base maps or topographic maps.

  18. Georeferenced and secure mobile health system for large scale data collection in primary care.

    Science.gov (United States)

    Sa, Joao H G; Rebelo, Marina S; Brentani, Alexandra; Grisi, Sandra J F E; Iwaya, Leonardo H; Simplicio, Marcos A; Carvalho, Tereza C M B; Gutierrez, Marco A

    2016-10-01

    Mobile health consists in applying mobile devices and communication capabilities for expanding the coverage and improving the effectiveness of health care programs. The technology is particularly promising for developing countries, in which health authorities can take advantage of the flourishing mobile market to provide adequate health care to underprivileged communities, especially primary care. In Brazil, the Primary Care Information System (SIAB) receives primary health care data from all regions of the country, creating a rich database for health-related action planning. Family Health Teams (FHTs) collect this data in periodic visits to families enrolled in governmental programs, following an acquisition procedure that involves filling in paper forms. This procedure compromises the quality of the data provided to health care authorities and slows down the decision-making process. To develop a mobile system (GeoHealth) that should address and overcome the aforementioned problems and deploy the proposed solution in a wide underprivileged metropolitan area of a major city in Brazil. The proposed solution comprises three main components: (a) an Application Server, with a database containing family health conditions; and two clients, (b) a Web Browser running visualization tools for management tasks, and (c) a data-gathering device (smartphone) to register and to georeference the family health data. A data security framework was designed to ensure the security of data, which was stored locally and transmitted over public networks. The system was successfully deployed at six primary care units in the city of Sao Paulo, where a total of 28,324 families/96,061 inhabitants are regularly followed up by government health policies. The health conditions observed from the population covered were: diabetes in 3.40%, hypertension (age >40) in 23.87% and tuberculosis in 0.06%. This estimated prevalence has enabled FHTs to set clinical appointments proactively, with the aim of

  19. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  20. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  1. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  2. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  3. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  4. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  5. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  6. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  7. Towards an Automatic Framework for Urban Settlement Mapping from Satellite Images: Applications of Geo-referenced Social Media and One Class Classification

    Science.gov (United States)

    Miao, Zelang

    2017-04-01

    Currently, urban dwellers comprise more than half of the world's population and this percentage is still dramatically increasing. The explosive urban growth over the next two decades poses long-term profound impact on people as well as the environment. Accurate and up-to-date delineation of urban settlements plays a fundamental role in defining planning strategies and in supporting sustainable development of urban settlements. In order to provide adequate data about urban extents and land covers, classifying satellite data has become a common practice, usually with accurate enough results. Indeed, a number of supervised learning methods have proven effective in urban area classification, but they usually depend on a large amount of training samples, whose collection is a time and labor expensive task. This issue becomes particularly serious when classifying large areas at the regional/global level. As an alternative to manual ground truth collection, in this work we use geo-referenced social media data. Cities and densely populated areas are an extremely fertile land for the production of individual geo-referenced data (such as GPS and social network data). Training samples derived from geo-referenced social media have several advantages: they are easy to collect, usually they are freely exploitable; and, finally, data from social media are spatially available in many locations, and with no doubt in most urban areas around the world. Despite these advantages, the selection of training samples from social media meets two challenges: 1) there are many duplicated points; 2) method is required to automatically label them as "urban/non-urban". The objective of this research is to validate automatic sample selection from geo-referenced social media and its applicability in one class classification for urban extent mapping from satellite images. The findings in this study shed new light on social media applications in the field of remote sensing.

  8. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    Internationally coordinated monitoring of long-term glacier changes provide key indicator data about global climate change and began in the year 1894 as an internationally coordinated effort to establish standardized observations. Today, world-wide monitoring of glaciers and ice caps is embedded within the Global Climate Observing System (GCOS) in support of the United Nations Framework Convention on Climate Change (UNFCCC) as an important Essential Climate Variable (ECV). The Global Terrestrial Network for Glaciers (GTN-G) was established in 1999 with the task of coordinating measurements and to ensure the continuous development and adaptation of the international strategies to the long-term needs of users in science and policy. The basic monitoring principles must be relevant, feasible, comprehensive and understandable to a wider scientific community as well as to policy makers and the general public. Data access has to be free and unrestricted, the quality of the standardized and calibrated data must be high and a combination of detailed process studies at selected field sites with global coverage by satellite remote sensing is envisaged. Recently a GTN-G Steering Committee was established to guide and advise the operational bodies responsible for the international glacier monitoring, which are the World Glacier Monitoring Service (WGMS), the US National Snow and Ice Data Center (NSIDC), and the Global Land Ice Measurements from Space (GLIMS) initiative. Several online databases containing a wealth of diverse data types having different levels of detail and global coverage provide fast access to continuously updated information on glacier fluctuation and inventory data. For world-wide inventories, data are now available through (a) the World Glacier Inventory containing tabular information of about 130,000 glaciers covering an area of around 240,000 km2, (b) the GLIMS-database containing digital outlines of around 118,000 glaciers with different time stamps and

  9. Determination of Exterior Orientation Parameters Through Direct Geo-Referencing in a Real-Time Aerial Monitoring System

    Science.gov (United States)

    Kim, H.; Lee, J.; Choi, K.; Lee, I.

    2012-07-01

    Rapid responses for emergency situations such as natural disasters or accidents often require geo-spatial information describing the on-going status of the affected area. Such geo-spatial information can be promptly acquired by a manned or unmanned aerial vehicle based multi-sensor system that can monitor the emergent situations in near real-time from the air using several kinds of sensors. Thus, we are in progress of developing such a real-time aerial monitoring system (RAMS) consisting of both aerial and ground segments. The aerial segment acquires the sensory data about the target areas by a low-altitude helicopter system equipped with sensors such as a digital camera and a GPS/IMU system and transmits them to the ground segment through a RF link in real-time. The ground segment, which is a deployable ground station installed on a truck, receives the sensory data and rapidly processes them to generate ortho-images, DEMs, etc. In order to generate geo-spatial information, in this system, exterior orientation parameters (EOP) of the acquired images are obtained through direct geo-referencing because it is difficult to acquire coordinates of ground points in disaster area. The main process, since the data acquisition stage until the measurement of EOP, is discussed as follows. First, at the time of data acquisition, image acquisition time synchronized by GPS time is recorded as part of image file name. Second, the acquired data are then transmitted to the ground segment in real-time. Third, by processing software for ground segment, positions/attitudes of acquired images are calculated through a linear interpolation using the GPS time of the received position/attitude data and images. Finally, the EOPs of images are obtained from position/attitude data by deriving the relationships between a camera coordinate system and a GPS/IMU coordinate system. In this study, we evaluated the accuracy of the EOP decided by direct geo-referencing in our system. To perform this

  10. DETERMINATION OF EXTERIOR ORIENTATION PARAMETERS THROUGH DIRECT GEO-REFERENCING IN A REAL-TIME AERIAL MONITORING SYSTEM

    Directory of Open Access Journals (Sweden)

    H. Kim

    2012-07-01

    Full Text Available Rapid responses for emergency situations such as natural disasters or accidents often require geo-spatial information describing the on-going status of the affected area. Such geo-spatial information can be promptly acquired by a manned or unmanned aerial vehicle based multi-sensor system that can monitor the emergent situations in near real-time from the air using several kinds of sensors. Thus, we are in progress of developing such a real-time aerial monitoring system (RAMS consisting of both aerial and ground segments. The aerial segment acquires the sensory data about the target areas by a low-altitude helicopter system equipped with sensors such as a digital camera and a GPS/IMU system and transmits them to the ground segment through a RF link in real-time. The ground segment, which is a deployable ground station installed on a truck, receives the sensory data and rapidly processes them to generate ortho-images, DEMs, etc. In order to generate geo-spatial information, in this system, exterior orientation parameters (EOP of the acquired images are obtained through direct geo-referencing because it is difficult to acquire coordinates of ground points in disaster area. The main process, since the data acquisition stage until the measurement of EOP, is discussed as follows. First, at the time of data acquisition, image acquisition time synchronized by GPS time is recorded as part of image file name. Second, the acquired data are then transmitted to the ground segment in real-time. Third, by processing software for ground segment, positions/attitudes of acquired images are calculated through a linear interpolation using the GPS time of the received position/attitude data and images. Finally, the EOPs of images are obtained from position/attitude data by deriving the relationships between a camera coordinate system and a GPS/IMU coordinate system. In this study, we evaluated the accuracy of the EOP decided by direct geo-referencing in our system

  11. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  12. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  13. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  14. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  15. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  16. Datasets used in ORD-018902: Bisphenol A alternatives can effectively substitute for estradiol

    Data.gov (United States)

    U.S. Environmental Protection Agency — Gene Expression Omnibus numbers only. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Arno, S. Balu, C. Corton, and M....

  17. A Tenebrionid beetle's dataset (Coleoptera, Tenebrionidae) from Peninsula Valdés (Chubut, Argentina).

    Science.gov (United States)

    Cheli, Germán H; Flores, Gustavo E; Román, Nicolás Martínez; Podestá, Darío; Mazzanti, Renato; Miyashiro, Lidia

    2013-12-18

    The Natural Protected Area Peninsula Valdés, located in Northeastern Patagonia, is one of the largest conservation units of arid lands in Argentina. Although this area has been in the UNESCO World Heritage List since 1999, it has been continually exposed to sheep grazing and cattle farming for more than a century which have had a negative impact on the local environment. Our aim is to describe the first dataset of tenebrionid beetle species living in Peninsula Valdés and their relationship to sheep grazing. The dataset contains 118 records on 11 species and 198 adult individuals collected. Beetles were collected using pitfall traps in the two major environmental units of Peninsula Valdés, taking into account grazing intensities over a three year time frame from 2005-2007. The Data quality was enhanced following the best practices suggested in the literature during the digitalization and geo-referencing processes. Moreover, identification of specimens and current accurate spelling of scientific names were reviewed. Finally, post-validation processes using DarwinTest software were applied. Specimens have been deposited at Entomological Collection of the Centro Nacional Patagónico (CENPAT-CONICET). The dataset is part of the database of this collection and has been published on the internet through GBIF Integrated Publishing Toolkit (IPT) (http://data.gbif.org/datasets/resource/14669/). Furthermore, it is the first dataset for tenebrionid beetles of arid Patagonia available in GBIF database, and it is the first one based on a previously designed and standardized sampling to assess the interaction between these beetles and grazing in the area. The main purposes of this dataset are to ensure accessibility to data associated with Tenebrionidae specimens from Peninsula Valdés (Chubut, Argentina), also to contribute to GBIF with primary data about Patagonian tenebrionids and finally, to promote the Entomological Collection of Centro Nacional Patagónico (CENPAT

  18. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  19. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  20. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  1. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  2. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  3. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  4. Dataset for Calibration and performance of synchronous SIM/scan mode for simultaneous targeted and discovery (non-targeted) analysis of exhaled breath samples from firefighters

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset includes the tables and supplementary information from the journal article. This dataset is associated with the following publication: Wallace, A., J....

  5. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  6. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  7. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  8. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  9. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  10. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  11. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  12. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  13. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  14. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  15. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  16. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  17. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  19. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  20. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  2. Grand Canyon as a universally accessible virtual field trip for intro Geoscience classes using geo-referenced mobile game technology

    Science.gov (United States)

    Bursztyn, N.; Pederson, J. L.; Shelton, B.

    2012-12-01

    There is a well-documented and nationally reported trend of declining interest, poor preparedness, and lack of diversity within U.S. students pursuing geoscience and other STEM disciplines. We suggest that a primary contributing factor to this problem is that introductory geoscience courses simply fail to inspire (i.e. they are boring). Our experience leads us to believe that the hands-on, contextualized learning of field excursions are often the most impactful component of lower division geoscience classes. However, field trips are becoming increasingly more difficult to run due to logistics and liability, high-enrollments, decreasing financial and administrative support, and exclusivity of the physically disabled. Recent research suggests that virtual field trips can be used to simulate this contextualized physical learning through the use of mobile devices - technology that exists in most students' hands already. Our overarching goal is to enhance interest in introductory geoscience courses by providing the kinetic and physical learning experience of field trips through geo-referenced educational mobile games and test the hypothesis that these experiences can be effectively simulated through virtual field trips. We are doing this by developing "serious" games for mobile devices that deliver introductory geology material in a fun and interactive manner. Our new teaching strategy will enhance undergraduate student learning in the geosciences, be accessible to students of diverse backgrounds and physical abilities, and be easily incorporated into higher education programs and curricula at institutions globally. Our prototype involves students virtually navigating downstream along a scaled down Colorado River through Grand Canyon - physically moving around their campus quad, football field or other real location, using their smart phone or a tablet. As students reach the next designated location, a photo or video in Grand Canyon appears along with a geological

  3. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  4. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  5. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  6. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  7. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  8. Dataset on the energy performance of atrium type hotel buildings.

    Science.gov (United States)

    Vujosevic, Milica; Krstic-Furundzic, Aleksandra

    2018-04-01

    The data presented in this article are related to the research article entitled "The Influence of Atrium on Energy Performance of Hotel Building" (Vujosevic and Krstic-Furundzic, 2017) [1], which describes the annual energy performance of atrium type hotel building in Belgrade climate conditions, with the objective to present the impact of the atrium on the hotel building's energy demands for space heating and cooling. This dataset is made publicly available to show energy performance of selected hotel design alternatives, in order to enable extended analyzes of these data for other researchers.

  9. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  10. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  11. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  12. USAID Public-Private Partnerships Database

    Data.gov (United States)

    US Agency for International Development — This dataset brings together information collected since 2001 on PPPs that have been supported by USAID. For the purposes of this dataset a Public-Private...

  13. Public Water Supply Systems (PWS)

    Data.gov (United States)

    Kansas Data Access and Support Center — This dataset includes boundaries for most public water supply systems (PWS) in Kansas (525 municipalities, 289 rural water districts and 13 public wholesale water...

  14. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  15. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  16. Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

    Science.gov (United States)

    Mazzitello, Karina I.; Candia, Julián

    2012-12-01

    In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.

  17. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. GPS receivers for georeferencing of spatial variability of soil attributes Receptores GPS para georreferenciamento da variabilidade espacial de atributos do solo

    Directory of Open Access Journals (Sweden)

    David L Rosalen

    2011-12-01

    Full Text Available The characterization of the spatial variability of soil attributes is essential to support agricultural practices in a sustainable manner. The use of geostatistics to characterize spatial variability of these attributes, such as soil resistance to penetration (RP and gravimetric soil moisture (GM is now usual practice in precision agriculture. The result of geostatistical analysis is dependent on the sample density and other factors according to the georeferencing methodology used. Thus, this study aimed to compare two methods of georeferencing to characterize the spatial variability of RP and GM as well as the spatial correlation of these variables. Sampling grid of 60 points spaced 20 m was used. For RP measurements, an electronic penetrometer was used and to determine the GM, a Dutch auger (0.0-0.1 m depth was used. The samples were georeferenced using a GPS navigation receiver, Simple Point Positioning (SPP with navigation GPS receiver, and Semi-Kinematic Relative Positioning (SKRP with an L1 geodetic GPS receiver. The results indicated that the georeferencing conducted by PPS did not affect the characterization of spatial variability of RP or GM, neither the spatial structure relationship of these attributes.A caracterização da variabilidade espacial dos atributos do solo é indispensável para subsidiar práticas agrícolas de maneira sustentável. A utilização da geoestatística para caracterizar a variabilidade espacial desses atributos, como a resistência mecânica do solo à penetração (RP e a umidade gravimétrica do solo (UG, é, hoje, prática usual na agricultura de precisão. O resultado da análise geoestatística é dependente da densidade amostral e de outros fatores, como o método de georreferencimento utilizado. Desta forma, o presente trabalho teve como objetivo comparar dois métodos de georreferenciamento para a caracterização da variabilidade espacial da RP e da UG, bem como a correlação espacial dessas vari

  19. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  20. DIRECT GEOREFERENCING ON SMALL UNMANNED AERIAL PLATFORMS FOR IMPROVED RELIABILITY AND ACCURACY OF MAPPING WITHOUT THE NEED FOR GROUND CONTROL POINTS

    Directory of Open Access Journals (Sweden)

    O. Mian

    2015-08-01

    Full Text Available This paper presents results from a Direct Mapping Solution (DMS comprised of an Applanix APX-15 UAV GNSS-Inertial system integrated with a Sony a7R camera to produce highly accurate ortho-rectified imagery without Ground Control Points on a Microdrones md4-1000 platform. A 55 millimeter Nikkor f/1.8 lens was mounted on the Sony a7R and the camera was then focused and calibrated terrestrially using the Applanix camera calibration facility, and then integrated with the APX-15 UAV GNSS-Inertial system using a custom mount specifically designed for UAV applications. In July 2015, Applanix and Avyon carried out a test flight of this system. The goal of the test flight was to assess the performance of DMS APX-15 UAV direct georeferencing system on the md4-1000. The area mapped during the test was a 250 x 300 meter block in a rural setting in Ontario, Canada. Several ground control points are distributed within the test area. The test included 8 North-South lines and 1 cross strip flown at 80 meters AGL, resulting in a ~1 centimeter Ground Sample Distance (GSD. Map products were generated from the test flight using Direct Georeferencing, and then compared for accuracy against the known positions of ground control points in the test area. The GNSS-Inertial data collected by the APX-15 UAV was post-processed in Single Base mode, using a base station located in the project area via POSPac UAV. The base-station’s position was precisely determined by processing a 12-hour session using the CSRS-PPP Post Processing service. The ground control points were surveyed in using differential GNSS post-processing techniques with respect to the base-station.

  1. Birds of Antioquia: Georeferenced database of specimens from the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA).

    Science.gov (United States)

    Rozo, Andrea Morales; Valencia, Fernando; Acosta, Alexis; Parra, Juan Luis

    2014-01-01

    The department of Antioquia, Colombia, lies in the northwestern corner of South America and provides a biogeographical link among divergent faunas, including Caribbean, Andean, Pacific and Amazonian. Information about the distribution of biodiversity in this area is of relevance for academic, practical and social purposes. This data paper describes the dataset containing all bird specimens deposited in the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA). We curated all the information associated with the bird specimens, including the georeferences and taxonomy, and published the database through the Global Biodiversity Information Facility network. During this process we checked the species identification and existing georeferences and completed the information when possible. The collection holds 663 bird specimens collected between 1940 and 2011. Even though most specimens are from Antioquia (70%), the collection includes material from several other departments and one specimen from the United States. The collection holds specimens from three endemic and endangered species (Coeligena orina, Diglossa gloriossisima, and Hypopirrhus pyrohipogaster), and includes localities poorly represented in other collections. The information contained in the collection has been used for biodiversity modeling, conservation planning and management, and we expect to further facilitate these activities by making it publicly available.

  2. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  3. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  4. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  5. Quantitative Analysis of the Educational Infrastructure in Colombia Through the Use of a Georeferencing Software and Analytic Hierarchy Process

    Science.gov (United States)

    Cala Estupiñan, Jose Luis; María González Bernal, Lina; Ponz Tienda, Jose Luis; Gutierrez Bucheli, Laura Andrea; Alejandro Arboleda, Carlos

    2017-10-01

    The distribution policies of the national budget have been showing an increasing trend of the investment in education infrastructure. This is the reason that makes it necessary to identify the territories with the greatest number of facilities (such as schools, colleges, universities and libraries) and those lacking this type of infrastructure, in order to know where a possible government intervention is required. This work is not intended to give a judgment on the qualitative state of the national infrastructure. It focuses, in terms of infrastructure, on Colombia’s quantitative status of the educational sector, by identifying the territories with more facilities, such as schools, colleges, universities and public libraries. To do this a quantitative index will be created to identify if the coverage of educational infrastructure at departmental level is enough, by taking into account not only the number of facilities, but also the population and the area of influence each one has. The above study is framed within a project of the University of the Andes called “visible Infrastructure”. The index is obtained through a hierarchical analytical process (AHP) and subsequently a linear equation that reflects the variables investigated. The validation of this index is performed through correlations and regressions of social, economic and cultural indicators determined by official entities. All the information on which the analysis is based is official and public. With the end of the armed conflict, it is necessary to focus the planning of public policies to heal the social gaps that the most vulnerable population needs.

  6. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  7. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  8. Publicly available datasets on thallium (Tl) in the environment-a comment on "Presence of thallium in the environment: sources of contaminations, distribution and monitoring methods" by Bozena Karbowska, Environ Monit Assess (2016) 188:640 (DOI 10.1007/s10661-016-5647-y).

    Science.gov (United States)

    de Caritat, Patrice; Reimann, Clemens

    2017-05-01

    This comment highlights a whole series of datasets on thallium concentrations in the environment that were overlooked in the recent review by Karbowska, Environmental Monitoring and Assessment, 188, 640, 2016 in this journal. Geochemical surveys carried out over the last few decades all over the world at various scales and using different sampling media have reported the concentration of thallium (and dozens more elements) in tens of thousands of samples. These datasets provide a 'real-world' foundation upon which source apportionment investigations can be based, monitoring programs devised and modelling studies designed. Furthermore, this comment also draws attention to two global geochemical mapping initiatives that should be of interest to environmental scientists.

  9. A Tenebrionid beetle’s dataset (Coleoptera, Tenebrionidae) from Peninsula Valdés (Chubut, Argentina)

    Science.gov (United States)

    Cheli, Germán H.; Flores, Gustavo E.; Román, Nicolás Martínez; Podestá, Darío; Mazzanti, Renato; Miyashiro, Lidia

    2013-01-01

    Abstract The Natural Protected Area Peninsula Valdés, located in Northeastern Patagonia, is one of the largest conservation units of arid lands in Argentina. Although this area has been in the UNESCO World Heritage List since 1999, it has been continually exposed to sheep grazing and cattle farming for more than a century which have had a negative impact on the local environment. Our aim is to describe the first dataset of tenebrionid beetle species living in Peninsula Valdés and their relationship to sheep grazing. The dataset contains 118 records on 11 species and 198 adult individuals collected. Beetles were collected using pitfall traps in the two major environmental units of Peninsula Valdés, taking into account grazing intensities over a three year time frame from 2005–2007. The Data quality was enhanced following the best practices suggested in the literature during the digitalization and geo-referencing processes. Moreover, identification of specimens and current accurate spelling of scientific names were reviewed. Finally, post-validation processes using DarwinTest software were applied. Specimens have been deposited at Entomological Collection of the Centro Nacional Patagónico (CENPAT-CONICET). The dataset is part of the database of this collection and has been published on the internet through GBIF Integrated Publishing Toolkit (IPT) (http://data.gbif.org/datasets/resource/14669/). Furthermore, it is the first dataset for tenebrionid beetles of arid Patagonia available in GBIF database, and it is the first one based on a previously designed and standardized sampling to assess the interaction between these beetles and grazing in the area. The main purposes of this dataset are to ensure accessibility to data associated with Tenebrionidae specimens from Peninsula Valdés (Chubut, Argentina), also to contribute to GBIF with primary data about Patagonian tenebrionids and finally, to promote the Entomological Collection of Centro Nacional Patag

  10. A Tenebrionid beetle’s dataset (Coleoptera, Tenebrionidae from Peninsula Valdés (Chubut, Argentina

    Directory of Open Access Journals (Sweden)

    German Cheli

    2013-12-01

    Full Text Available The Natural Protected Area Peninsula Valdés, located in Northeastern Patagonia, is one of the largest conservation units of arid lands in Argentina. Although this area has been in the UNESCO World Heritage List since 1999, it has been continually exposed to sheep grazing and cattle farming for more than a century which have had a negative impact on the local environment. Our aim is to describe the first dataset of tenebrionid beetle species living in Peninsula Valdés and their relationship to sheep grazing. The dataset contains 118 records on 11 species and 198 adult individuals collected. Beetles were collected using pitfall traps in the two major environmental units of Peninsula Valdés, taking into account grazing intensities over a three year time frame from 2005–2007. The Data quality was enhanced following the best practices suggested in the literature during the digitalization and geo-referencing processes. Moreover, identification of specimens and current accurate spelling of scientific names were reviewed. Finally, post-validation processes using DarwinTest software were applied. Specimens have been deposited at Entomological Collection of the Centro Nacional Patagónico (CENPAT-CONICET. The dataset is part of the database of this collection and has been published on the internet through GBIF Integrated Publishing Toolkit (IPT (http://data.gbif.org/datasets/resource/14669/. Furthermore, it is the first dataset for tenebrionid beetles of arid Patagonia available in GBIF database, and it is the first one based on a previously designed and standardized sampling to assess the interaction between these beetles and grazing in the area. The main purposes of this dataset are to ensure accessibility to data associated with Tenebrionidae specimens from Peninsula Valdés (Chubut, Argentina, also to contribute to GBIF with primary data about Patagonian tenebrionids and finally, to promote the Entomological Collection of Centro Nacional Patag

  11. Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

    Science.gov (United States)

    Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

    2017-01-01

    During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.

  12. Sally Ride EarthKAM - Automated Image Geo-Referencing Using Google Earth Web Plug-In

    Science.gov (United States)

    Andres, Paul M.; Lazar, Dennis K.; Thames, Robert Q.

    2013-01-01

    Sally Ride EarthKAM is an educational program funded by NASA that aims to provide the public the ability to picture Earth from the perspective of the International Space Station (ISS). A computer-controlled camera is mounted on the ISS in a nadir-pointing window; however, timing limitations in the system cause inaccurate positional metadata. Manually correcting images within an orbit allows the positional metadata to be improved using mathematical regressions. The manual correction process is time-consuming and thus, unfeasible for a large number of images. The standard Google Earth program allows for the importing of KML (keyhole markup language) files that previously were created. These KML file-based overlays could then be manually manipulated as image overlays, saved, and then uploaded to the project server where they are parsed and the metadata in the database is updated. The new interface eliminates the need to save, download, open, re-save, and upload the KML files. Everything is processed on the Web, and all manipulations go directly into the database. Administrators also have the control to discard any single correction that was made and validate a correction. This program streamlines a process that previously required several critical steps and was probably too complex for the average user to complete successfully. The new process is theoretically simple enough for members of the public to make use of and contribute to the success of the Sally Ride EarthKAM project. Using the Google Earth Web plug-in, EarthKAM images, and associated metadata, this software allows users to interactively manipulate an EarthKAM image overlay, and update and improve the associated metadata. The Web interface uses the Google Earth JavaScript API along with PHP-PostgreSQL to present the user the same interface capabilities without leaving the Web. The simpler graphical user interface will allow the public to participate directly and meaningfully with EarthKAM. The use of

  13. Terrestrial Lidar Datasets of New Orleans, Louisiana, Levee Failures from Hurricane Katrina, August 29, 2005

    Science.gov (United States)

    Collins, Brian D.; Kayen, Robert; Minasian, Diane L.; Reiss, Thomas

    2009-01-01

    Hurricane Katrina made landfall with the northern Gulf Coast on August 29, 2005, as one of the strongest hurricanes on record. The storm damage incurred in Louisiana included a number of levee failures that led to the inundation of approximately 85 percent of the metropolitan New Orleans area. Whereas extreme levels of storm damage were expected from such an event, the catastrophic failure of the New Orleans levees prompted a quick mobilization of engineering experts to assess why and how particular levees failed. As part of this mobilization, civil engineering members of the United States Geological Survey (USGS) performed terrestrial lidar topographic surveys at major levee failures in the New Orleans area. The focus of the terrestrial lidar effort was to obtain precise measurements of the ground surface to map soil displacements at each levee site, the nonuniformity of levee height freeboard, depth of erosion where scour occurred, and distress in structures at incipient failure. In total, we investigated eight sites in the New Orleans region, including both earth and concrete floodwall levee breaks. The datasets extend from the 17th Street Canal in the Orleans East Bank area to the intersection of the Gulf Intracoastal Waterway (GIWW) with the Mississippi River Gulf Outlet (MRGO) in the New Orleans East area. The lidar scan data consists of electronic files containing millions of surveyed points. These points characterize the topography of each levee's postfailure or incipient condition and are available for download through online hyperlinks. The data serve as a permanent archive of the catastrophic damage of Hurricane Katrina on the levee systems of New Orleans. Complete details of the data collection, processing, and georeferencing methodologies are provided in this report to assist in the visualization and analysis of the data by future users.

  14. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  15. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  16. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  17. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  18. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  19. Bibliometric Analysis to Scan and Scrape New Datasets: It’s all about that BASS

    Directory of Open Access Journals (Sweden)

    Karen Tingay

    2017-04-01

    Results focus on: • Using bibliometric methods in the context of DPUK cohort publications • Identifying emerging trends in the field of dementia research.  • Identifying and prioritising datasets which might be useful for the SAIL Databank to acquire

  20. A multi-environment dataset for activity of daily living recognition in video streams.

    Science.gov (United States)

    Borreo, Alessandro; Onofri, Leonardo; Soda, Paolo

    2015-08-01

    Public datasets played a key role in the increasing level of interest that vision-based human action recognition has attracted in last years. While the production of such datasets has been influenced by the variability introduced by various actors performing the actions, the different modalities of interactions with the environment introduced by the variation of the scenes around the actors has been scarcely took into account. As a consequence, public datasets do not provide a proper test-bed for recognition algorithms that aim at achieving high accuracy, irrespective of the environment where actions are performed. This is all the more so, when systems are designed to recognize activities of daily living (ADL), which are characterized by a high level of human-environment interaction. For that reason, we present in this manuscript the MEA dataset, a new multi-environment ADL dataset, which permitted us to show how the change of scenario can affect the performances of state-of-the-art approaches for action recognition.

  1. Improving the Accuracy of Direct Geo-referencing of Smartphone-Based Mobile Mapping Systems Using Relative Orientation and Scene Geometric Constraints

    Directory of Open Access Journals (Sweden)

    Naif M. Alsubaie

    2017-09-01

    Full Text Available This paper introduces a new method which facilitate the use of smartphones as a handheld low-cost mobile mapping system (MMS. Smartphones are becoming more sophisticated and smarter and are quickly closing the gap between computers and portable tablet devices. The current generation of smartphones are equipped with low-cost GPS receivers, high-resolution digital cameras, and micro-electro mechanical systems (MEMS-based navigation sensors (e.g., accelerometers, gyroscopes, magnetic compasses, and barometers. These sensors are in fact the essential components for a MMS. However, smartphone navigation sensors suffer from the poor accuracy of global navigation satellite System (GNSS, accumulated drift, and high signal noise. These issues affect the accuracy of the initial Exterior Orientation Parameters (EOPs that are inputted into the bundle adjustment algorithm, which then produces inaccurate 3D mapping solutions. This paper proposes new methodologies for increasing the accuracy of direct geo-referencing of smartphones using relative orientation and smartphone motion sensor measurements as well as integrating geometric scene constraints into free network bundle adjustment. The new methodologies incorporate fusing the relative orientations of the captured images and their corresponding motion sensor measurements to improve the initial EOPs. Then, the geometric features (e.g., horizontal and vertical linear lines visible in each image are extracted and used as constraints in the bundle adjustment procedure which correct the relative position and orientation of the 3D mapping solution.

  2. Incorporation of Spatial Interactions in Location Networks to Identify Critical Geo-Referenced Routes for Assessing Disease Control Measures on a Large-Scale Campus

    Directory of Open Access Journals (Sweden)

    Tzai-Hung Wen

    2015-04-01

    Full Text Available Respiratory diseases mainly spread through interpersonal contact. Class suspension is the most direct strategy to prevent the spread of disease through elementary or secondary schools by blocking the contact network. However, as university students usually attend courses in different buildings, the daily contact patterns on a university campus are complicated, and once disease clusters have occurred, suspending classes is far from an efficient strategy to control disease spread. The purpose of this study is to propose a methodological framework for generating campus location networks from a routine administration database, analyzing the community structure of the network, and identifying the critical links and nodes for blocking respiratory disease transmission. The data comes from the student enrollment records of a major comprehensive university in Taiwan. We combined the social network analysis and spatial interaction model to establish a geo-referenced community structure among the classroom buildings. We also identified the critical links among the communities that were acting as contact bridges and explored the changes in the location network after the sequential removal of the high-risk buildings. Instead of conducting a questionnaire survey, the study established a standard procedure for constructing a location network on a large-scale campus from a routine curriculum database. We also present how a location network structure at a campus could function to target the high-risk buildings as the bridges connecting communities for blocking disease transmission.

  3. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  4. Creating a Regional MODIS Satellite-Driven Net Primary Production Dataset for European Forests

    Directory of Open Access Journals (Sweden)

    Mathias Neumann

    2016-06-01

    Full Text Available Net primary production (NPP is an important ecological metric for studying forest ecosystems and their carbon sequestration, for assessing the potential supply of food or timber and quantifying the impacts of climate change on ecosystems. The global MODIS NPP dataset using the MOD17 algorithm provides valuable information for monitoring NPP at 1-km resolution. Since coarse-resolution global climate data are used, the global dataset may contain uncertainties for Europe. We used a 1-km daily gridded European climate data set with the MOD17 algorithm to create the regional NPP dataset MODIS EURO. For evaluation of this new dataset, we compare MODIS EURO with terrestrial driven NPP from analyzing and harmonizing forest inventory data (NFI from 196,434 plots in 12 European countries as well as the global MODIS NPP dataset for the years 2000 to 2012. Comparing these three NPP datasets, we found that the global MODIS NPP dataset differs from NFI NPP by 26%, while MODIS EURO only differs by 7%. MODIS EURO also agrees with NFI NPP across scales (from continental, regional to country and gradients (elevation, location, tree age, dominant species, etc.. The agreement is particularly good for elevation, dominant species or tree height. This suggests that using improved climate data allows the MOD17 algorithm to provide realistic NPP estimates for Europe. Local discrepancies between MODIS EURO and NFI NPP can be related to differences in stand density due to forest management and the national carbon estimation methods. With this study, we provide a consistent, temporally continuous and spatially explicit productivity dataset for the years 2000 to 2012 on a 1-km resolution, which can be used to assess climate change impacts on ecosystems or the potential biomass supply of the European forests for an increasing bio-based economy. MODIS EURO data are made freely available at ftp://palantir.boku.ac.at/Public/MODIS_EURO.

  5. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  6. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  7. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  8. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  9. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  10. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  11. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  12. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  13. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  14. Medicare Physician and Other Supplier Interactive Dataset

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Centers for Medicare and Medicaid Services (CMS) has prepared a public data set, the Medicare Provider Utilization and Payment Data - Physician and Other...

  15. Multilayered complex network datasets for three supply chain network archetypes on an urban road grid.

    Science.gov (United States)

    Viljoen, Nadia M; Joubert, Johan W

    2018-02-01

    This article presents the multilayered complex network formulation for three different supply chain network archetypes on an urban road grid and describes how 500 instances were randomly generated for each archetype. Both the supply chain network layer and the urban road network layer are directed unweighted networks. The shortest path set is calculated for each of the 1 500 experimental instances. The datasets are used to empirically explore the impact that the supply chain's dependence on the transport network has on its vulnerability in Viljoen and Joubert (2017) [1]. The datasets are publicly available on Mendeley (Joubert and Viljoen, 2017) [2].

  16. Dataset of anomalies and malicious acts in a cyber-physical subsystem.

    Science.gov (United States)

    Laso, Pedro Merino; Brosset, David; Puentes, John

    2017-10-01

    This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.

  17. Publishing datasets with eSciDoc and panMetaDocs

    Science.gov (United States)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of

  18. Standardization of GIS datasets for emergency preparedness of NPPs

    International Nuclear Information System (INIS)

    Saindane, Shashank S.; Suri, M.M.K.; Otari, Anil; Pradeepkumar, K.S.

    2012-01-01

    Probability of a major nuclear accident which can lead to large scale release of radioactivity into environment is extremely small by the incorporation of safety systems and defence-in-depth philosophy. Nevertheless emergency preparedness for implementation of counter measures to reduce the consequences are required for all major nuclear facilities. Iodine prophylaxis, Sheltering, evacuation etc. are protective measures to be implemented for members of public in the unlikely event of any significant releases from nuclear facilities. Bhabha Atomic Research Centre has developed a GIS supported Nuclear Emergency Preparedness Program. Preparedness for Response to Nuclear emergencies needs geographical details of the affected locations specially Nuclear Power Plant Sites and nearby public domain. Geographical information system data sets which the planners are looking for will have appropriate details in order to take decision and mobilize the resources in time and follow the Standard Operating Procedures. Maps are 2-dimensional representations of our real world and GIS makes it possible to manipulate large amounts of geo-spatially referenced data and convert it into information. This has become an integral part of the nuclear emergency preparedness and response planning. This GIS datasets consisting of layers such as village settlements, roads, hospitals, police stations, shelters etc. is standardized and effectively used during the emergency. The paper focuses on the need of standardization of GIS datasets which in turn can be used as a tool to display and evaluate the impact of standoff distances and selected zones in community planning. It will also highlight the database specifications which will help in fast processing of data and analysis to derive useful and helpful information. GIS has the capability to store, manipulate, analyze and display the large amount of required spatial and tabular data. This study intends to carry out a proper response and preparedness

  19. Provenance of Earth Science Datasets - How Deep Should One Go?

    Science.gov (United States)

    Ramapriyan, H.; Manipon, G. J. M.; Aulenbach, S.; Duggan, B.; Goldstein, J.; Hua, H.; Tan, D.; Tilmes, C.; Wilson, B. D.; Wolfe, R.; Zednik, S.

    2015-12-01

    For credibility of scientific research, transparency and reproducibility are essential. This fundamental tenet has been emphasized for centuries, and has been receiving increased attention in recent years. The Office of Management and Budget (2002) addressed reproducibility and other aspects of quality and utility of information from federal agencies. Specific guidelines from NASA (2002) are derived from the above. According to these guidelines, "NASA requires a higher standard of quality for information that is considered influential. Influential scientific, financial, or statistical information is defined as NASA information that, when disseminated, will have or does have clear and substantial impact on important public policies or important private sector decisions." For information to be compliant, "the information must be transparent and reproducible to the greatest possible extent." We present how the principles of transparency and reproducibility have been applied to NASA data supporting the Third National Climate Assessment (NCA3). The depth of trace needed of provenance of data used to derive conclusions in NCA3 depends on how the data were used (e.g., qualitatively or quantitatively). Given that the information is diligently maintained in the agency archives, it is possible to trace from a figure in the publication through the datasets, specific files, algorithm versions, instruments used for data collection, and satellites, as well as the individuals and organizations involved in each step. Such trace back permits transparency and reproducibility.

  20. Check your biosignals here: a new dataset for off-the-person ECG biometrics.

    Science.gov (United States)

    da Silva, Hugo Plácido; Lourenço, André; Fred, Ana; Raposo, Nuno; Aires-de-Sousa, Marta

    2014-02-01

    The Check Your Biosignals Here initiative (CYBHi) was developed as a way of creating a dataset and consistently repeatable acquisition framework, to further extend research in electrocardiographic (ECG) biometrics. In particular, our work targets the novel trend towards off-the-person data acquisition, which opens a broad new set of challenges and opportunities both for research and industry. While datasets with ECG signals collected using medical grade equipment at the chest can be easily found, for off-the-person ECG data the solution is generally for each team to collect their own corpus at considerable expense of resources. In this paper we describe the context, experimental considerations, methods, and preliminary findings of two public datasets created by our team, one for short-term and another for long-term assessment, with ECG data collected at the hand palms and fingers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  1. Supervised Variational Relevance Learning, An Analytic Geometric Feature Selection with Applications to Omic Datasets.

    Science.gov (United States)

    Boareto, Marcelo; Cesar, Jonatas; Leite, Vitor B P; Caticha, Nestor

    2015-01-01

    We introduce Supervised Variational Relevance Learning (Suvrel), a variational method to determine metric tensors to define distance based similarity in pattern classification, inspired in relevance learning. The variational method is applied to a cost function that penalizes large intraclass distances and favors small interclass distances. We find analytically the metric tensor that minimizes the cost function. Preprocessing the patterns by doing linear transformations using the metric tensor yields a dataset which can be more efficiently classified. We test our methods using publicly available datasets, for some standard classifiers. Among these datasets, two were tested by the MAQC-II project and, even without the use of further preprocessing, our results improve on their performance.

  2. Dataset on information strategies for energy conservation: A field experiment in India.

    Science.gov (United States)

    Chen, Victor L; Delmas, Magali A; Locke, Stephen L; Singh, Amarjeet

    2018-02-01

    The data presented in this article are related to the research article entitled: "Information strategies for energy conservation: a field experiment in India" (Chen et al., 2017) [1]. The availability of high-resolution electricity data offers benefits to both utilities and consumers to understand the dynamics of energy consumption for example, between billing periods or times of peak demand. However, few public datasets with high-temporal resolution have been available to researchers on electricity use, especially at the appliance-level. This article describes data collected in a residential field experiment for 19 apartments at an Indian faculty housing complex during the period from August 1, 2013 to May 12, 2014. The dataset includes detailed information about electricity consumption. It also includes information on apartment characteristics and hourly weather variation to enable further studies of energy performance. These data can be used by researchers as training datasets to evaluate electricity usage consumption.

  3. Genome-wide gene expression dataset used to identify potential therapeutic targets in androgenetic alopecia

    Directory of Open Access Journals (Sweden)

    R. Dey-Rao

    2017-08-01

    Full Text Available The microarray dataset attached to this report is related to the research article with the title: “A genomic approach to susceptibility and pathogenesis leads to identifying potential novel therapeutic targets in androgenetic alopecia” (Dey-Rao and Sinha, 2017 [1]. Male-pattern hair loss that is induced by androgens (testosterone in genetically predisposed individuals is known as androgenetic alopecia (AGA. The raw dataset is being made publicly available to enable critical and/or extended analyses. Our related research paper utilizes the attached raw dataset, for genome-wide gene-expression associated investigations. Combined with several in silico bioinformatics-based analyses we were able to delineate five strategic molecular elements as potential novel targets towards future AGA-therapy.

  4. Statistical segmentation of multidimensional brain datasets

    Science.gov (United States)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  5. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  6. Online Education in Public Affairs

    Science.gov (United States)

    Ginn, Martha H.; Hammond, Augustine

    2012-01-01

    This exploratory study provides an overview of the current landscape of online education in the fields of Master of Public Administration and Master of Public Policy (MPA/MPP) utilizing a dataset compiled from content analysis of MPA/MPP programs' websites and survey of 96 National Association of Schools of Public Affairs and Administration…

  7. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

    Directory of Open Access Journals (Sweden)

    Douglas Teodoro

    Full Text Available The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.

  8. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers

    Science.gov (United States)

    Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

    2018-01-01

    The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms. PMID:29293556

  9. ORBDA: An openEHR benchmark dataset for performance assessment of electronic health record servers.

    Science.gov (United States)

    Teodoro, Douglas; Sundvall, Erik; João Junior, Mario; Ruch, Patrick; Miranda Freire, Sergio

    2018-01-01

    The openEHR specifications are designed to support implementation of flexible and interoperable Electronic Health Record (EHR) systems. Despite the increasing number of solutions based on the openEHR specifications, it is difficult to find publicly available healthcare datasets in the openEHR format that can be used to test, compare and validate different data persistence mechanisms for openEHR. To foster research on openEHR servers, we present the openEHR Benchmark Dataset, ORBDA, a very large healthcare benchmark dataset encoded using the openEHR formalism. To construct ORBDA, we extracted and cleaned a de-identified dataset from the Brazilian National Healthcare System (SUS) containing hospitalisation and high complexity procedures information and formalised it using a set of openEHR archetypes and templates. Then, we implemented a tool to enrich the raw relational data and convert it into the openEHR model using the openEHR Java reference model library. The ORBDA dataset is available in composition, versioned composition and EHR openEHR representations in XML and JSON formats. In total, the dataset contains more than 150 million composition records. We describe the dataset and provide means to access it. Additionally, we demonstrate the usage of ORBDA for evaluating inserting throughput and query latency performances of some NoSQL database management systems. We believe that ORBDA is a valuable asset for assessing storage models for openEHR-based information systems during the software engineering process. It may also be a suitable component in future standardised benchmarking of available openEHR storage platforms.

  10. ionFR: Ionospheric Faraday rotation [Dataset

    NARCIS (Netherlands)

    Sotomayor-Beltran, C.; et al., [Unknown; Hessels, J.W.T.; van Leeuwen, J.; Markoff, S.; Wijers, R.A.M.J.

    2013-01-01

    IonFR calculates the amount of ionospheric Faraday rotation for a specific epoch, geographic location, and line-of-sight. The code uses a number of publicly available, GPS-derived total electron content maps and the most recent release of the International Geomagnetic Reference Field. ionFR can be

  11. Review of ATLAS Open Data 8 TeV datasets, tools and activities

    CERN Document Server

    The ATLAS collaboration

    2018-01-01

    The ATLAS Collaboration has released two 8 TeV datasets and relevant simulated samples to the public for educational use. A number of groups within ATLAS have used these ATLAS Open Data 8 TeV datasets, developing tools and educational material to promote particle physics. The general aim of these activities is to provide simple and user-friendly interactive interfaces to simulate the procedures used by high-energy physics researchers. International Masterclasses introduce particle physics to high school students and have been studying 8 TeV ATLAS Open Data since 2015. Inspired by this success, a new ATLAS Open Data initiative was launched in 2016 for university students. A comprehensive educational platform was thus developed featuring a second 8 TeV dataset and a new set of educational tools. The 8 TeV datasets and associated tools are presented and discussed here, as well as a selection of activities studying the ATLAS Open Data 8 TeV datasets.

  12. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

    Directory of Open Access Journals (Sweden)

    Carroll Adam J

    2010-07-01

    Full Text Available Abstract Background Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Description Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.. Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP their own data to the server for online processing via a novel raw data processing pipeline. Conclusions MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to

  13. A framework for automatic creation of gold-standard rigid 3D-2D registration datasets.

    Science.gov (United States)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga

    2017-02-01

    Advanced image-guided medical procedures incorporate 2D intra-interventional information into pre-interventional 3D image and plan of the procedure through 3D/2D image registration (32R). To enter clinical use, and even for publication purposes, novel and existing 32R methods have to be rigorously validated. The performance of a 32R method can be estimated by comparing it to an accurate reference or gold standard method (usually based on fiducial markers) on the same set of images (gold standard dataset). Objective validation and comparison of methods are possible only if evaluation methodology is standardized, and the gold standard  dataset is made publicly available. Currently, very few such datasets exist and only one contains images of multiple patients acquired during a procedure. To encourage the creation of gold standard 32R datasets, we propose an automatic framework. The framework is based on rigid registration of fiducial markers. The main novelty is spatial grouping of fiducial markers on the carrier device, which enables automatic marker localization and identification across the 3D and 2D images. The proposed framework was demonstrated on clinical angiograms of 20 patients. Rigid 32R computed by the framework was more accurate than that obtained manually, with the respective target registration error below 0.027 mm compared to 0.040 mm. The framework is applicable for gold standard setup on any rigid anatomy, provided that the acquired images contain spatially grouped fiducial markers. The gold standard datasets and software will be made publicly available.

  14. Palmprint and Palmvein Recognition Based on DCNN and A New Large-Scale Contactless Palmvein Dataset

    Directory of Open Access Journals (Sweden)

    Lin Zhang

    2018-03-01

    Full Text Available Among the members of biometric identifiers, the palmprint and the palmvein have received significant attention due to their stability, uniqueness, and non-intrusiveness. In this paper, we investigate the problem of palmprint/palmvein recognition and propose a Deep Convolutional Neural Network (DCNN based scheme, namely P a l m R CNN (short for palmprint/palmvein recognition using CNNs. The effectiveness and efficiency of P a l m R CNN have been verified through extensive experiments conducted on benchmark datasets. In addition, though substantial effort has been devoted to palmvein recognition, it is still quite difficult for the researchers to know the potential discriminating capability of the contactless palmvein. One of the root reasons is that a large-scale and publicly available dataset comprising high-quality, contactless palmvein images is still lacking. To this end, a user-friendly acquisition device for collecting high quality contactless palmvein images is at first designed and developed in this work. Then, a large-scale palmvein image dataset is established, comprising 12,000 images acquired from 600 different palms in two separate collection sessions. The collected dataset now is publicly available.

  15. A large-scale dataset of solar event reports from automated feature recognition modules

    Science.gov (United States)

    Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

    2016-05-01

    The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.

  16. RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis

    Directory of Open Access Journals (Sweden)

    Svetlana Avramova

    2018-04-01

    Full Text Available Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms. While there are many public datasets with reactions in synthetic direction (usually non-generic reactions, there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools.

  17. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  18. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  19. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  20. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  1. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  2. Two datasets of defect reports labeled by a crowd of annotators of unknown reliability

    Directory of Open Access Journals (Sweden)

    Jerónimo Hernández-González

    2018-06-01

    Full Text Available Classifying software defects according to any defined taxonomy is not straightforward. In order to be used for automatizing the classification of software defects, two sets of defect reports were collected from public issue tracking systems from two different real domains. Due to the lack of a domain expert, the collected defects were categorized by a set of annotators of unknown reliability according to their impact from IBM's orthogonal defect classification taxonomy. Both datasets are prepared to solve the defect classification problem by means of techniques of the learning from crowds paradigm (Hernández-González et al. [1].Two versions of both datasets are publicly shared. In the first version, the raw data is given: the text description of defects together with the category assigned by each annotator. In the second version, the text of each defect has been transformed to a descriptive vector using text-mining techniques.

  3. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    facilitate rapid and efficient updating of dataset metadata records by external data producers. Here we present this new service and demonstrate the variety of ways in which a multitude of Earth Science datasets may be submitted in a manner that significantly reduces the time in ensuring that new, vital data reaches the public domain.

  4. Dataset on daytime outdoor thermal comfort for Belo Horizonte, Brazil.

    Science.gov (United States)

    Hirashima, Simone Queiroz da Silveira; Assis, Eleonora Sad de; Nikolopoulou, Marialena

    2016-12-01

    This dataset describe microclimatic parameters of two urban open public spaces in the city of Belo Horizonte, Brazil; physiological equivalent temperature (PET) index values and the related subjective responses of interviewees regarding thermal sensation perception and preference and thermal comfort evaluation. Individuals and behavioral characteristics of respondents were also presented. Data were collected at daytime, in summer and winter, 2013. Statistical treatment of this data was firstly presented in a PhD Thesis ("Percepção sonora e térmica e avaliação de conforto em espaços urbanos abertos do município de Belo Horizonte - MG, Brasil" (Hirashima, 2014) [1]), providing relevant information on thermal conditions in these locations and on thermal comfort assessment. Up to now, this data was also explored in the article "Daytime Thermal Comfort in Urban Spaces: A Field Study in Brazil" (Hirashima et al., in press) [2]. These references are recommended for further interpretation and discussion.

  5. Mining Open Datasets for Transparency in Taxi Transport in Metropolitan Environments

    OpenAIRE

    Noulas, Anastasios; Salnikov, Vsevolod; Lambiotte, Renaud; Mascolo, Cecilia

    2015-01-01

    Uber has recently been introducing novel practices in urban taxi transport. Journey prices can change dynamically in almost real time and also vary geographically from one area to another in a city, a strategy known as surge pricing. In this paper, we explore the power of the new generation of open datasets towards understanding the impact of the new disruption technologies that emerge in the area of public transport. With our primary goal being a more transparent economic landscape for urban...

  6. Dataset - Evaluation of Standardized Sample Collection, Packaging, and Decontamination Procedures to Assess Cross-Contamination Potential during Bacillus anthracis Incident Response Operations

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spore recovery data during sample packaging decontamination tests. This dataset is associated with the following publication: Calfee, W., J. Tufts, K. Meyer, K....

  7. Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents

    Data.gov (United States)

    U.S. Environmental Protection Agency — GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C....

  8. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  9. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  10. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  11. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  12. Importing, Working With, and Sharing Microstructural Data in the StraboSpot Digital Data System, Including an Example Dataset from the Pilbara Craton, Western Australia.

    Science.gov (United States)

    Roberts, N.; Cunningham, H.; Snell, A.; Newman, J.; Tikoff, B.; Chatzaras, V.; Walker, J. D.; Williams, R. T.

    2017-12-01

    There is currently no repository where a geologist can survey microstructural datasets that have been collected from a specific field area or deformation experiment. New development of the StraboSpot digital data system provides a such a repository as well as visualization and analysis tools. StraboSpot is a graph database that allows field geologists to share primary data and develop new types of scientific questions. The database can be accessed through: 1) a field-based mobile application that runs on iOS and Android mobile devices; and 2) a desktop system. We are expanding StraboSpot to include the handling of a variety of microstructural data types. Presented here is the detailed vocabulary and logic used for the input of microstructural data, and how this system operates with the anticipated workflow of users. Microstructural data include observations and interpretations from photomicrographs, scanning electron microscope images, electron backscatter diffraction, and transmission electron microscopy data. The workflow for importing microstructural data into StraboSpot is organized into the following tabs: Images, Mineralogy & Composition; Sedimentary; Igneous; Metamorphic; Fault Rocks; Grain size & configuration; Crystallographic Preferred Orientation; Reactions; Geochronology; Relationships; and Interpretations. Both the sample and the thin sections are also spots. For the sample spot, the user can specify whether a sample is experimental or natural; natural samples are inherently linked to their field context. For the thin section (sub-sample) spot, the user can select between different options for sample preparation, geometry, and methods. A universal framework for thin section orientation is given, which allows users to overlay different microscope images of the same area and keeps georeferenced orientation. We provide an example dataset of field and microstructural data from the Mt Edgar dome, a granitic complex in the Paleoarchean East Pilbara craton

  13. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  14. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  15. A collection of annotated and harmonized human breast cancer transcriptome datasets, including immunologic classification [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jessica Roelands

    2018-02-01

    Full Text Available The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.

  16. Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

    Science.gov (United States)

    Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

    2015-09-01

    Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the

  17. Finding the traces of behavioral and cognitive processes in big data and naturally occurring datasets.

    Science.gov (United States)

    Paxton, Alexandra; Griffiths, Thomas L

    2017-10-01

    Today, people generate and store more data than ever before as they interact with both real and virtual environments. These digital traces of behavior and cognition offer cognitive scientists and psychologists an unprecedented opportunity to test theories outside the laboratory. Despite general excitement about big data and naturally occurring datasets among researchers, three "gaps" stand in the way of their wider adoption in theory-driven research: the imagination gap, the skills gap, and the culture gap. We outline an approach to bridging these three gaps while respecting our responsibilities to the public as participants in and consumers of the resulting research. To that end, we introduce Data on the Mind ( http://www.dataonthemind.org ), a community-focused initiative aimed at meeting the unprecedented challenges and opportunities of theory-driven research with big data and naturally occurring datasets. We argue that big data and naturally occurring datasets are most powerfully used to supplement-not supplant-traditional experimental paradigms in order to understand human behavior and cognition, and we highlight emerging ethical issues related to the collection, sharing, and use of these powerful datasets.

  18. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  19. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  20. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  1. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  2. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  3. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  4. The Stream-Catchment (StreamCat) Dataset: A database of watershed metrics for the conterminous USA

    Science.gov (United States)

    We developed an extensive database of landscape metrics for ~2.65 million streams, and their associated catchments, within the conterminous USA: The Stream-Catchment (StreamCat) Dataset. These data are publically available and greatly reduce the specialized geospatial expertise n...

  5. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  6. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  7. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  8. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  9. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  10. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  11. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  12. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  13. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  14. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  15. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  16. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  17. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  18. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  19. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  20. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  1. SAR image dataset of military ground targets with multiple poses for ATR

    Science.gov (United States)

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  2. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Directory of Open Access Journals (Sweden)

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  3. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.

  4. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  5. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  6. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  7. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  8. Advancements in Wind Integration Study Data Modeling: The Wind Integration National Dataset (WIND) Toolkit; Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Draxl, C.; Hodge, B. M.; Orwig, K.; Jones, W.; Searight, K.; Getman, D.; Harrold, S.; McCaa, J.; Cline, J.; Clark, C.

    2013-10-01

    Regional wind integration studies in the United States require detailed wind power output data at many locations to perform simulations of how the power system will operate under high-penetration scenarios. The wind data sets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as be time synchronized with available load profiles. The Wind Integration National Dataset (WIND) Toolkit described in this paper fulfills these requirements. A wind resource dataset, wind power production time series, and simulated forecasts from a numerical weather prediction model run on a nationwide 2-km grid at 5-min resolution will be made publicly available for more than 110,000 onshore and offshore wind power production sites.

  9. Multilayered complex network datasets for three supply chain network archetypes on an urban road grid

    Directory of Open Access Journals (Sweden)

    Nadia M. Viljoen

    2018-02-01

    Full Text Available This article presents the multilayered complex network formulation for three different supply chain network archetypes on an urban road grid and describes how 500 instances were randomly generated for each archetype. Both the supply chain network layer and the urban road network layer are directed unweighted networks. The shortest path set is calculated for each of the 1 500 experimental instances. The datasets are used to empirically explore the impact that the supply chain's dependence on the transport network has on its vulnerability in Viljoen and Joubert (2017 [1]. The datasets are publicly available on Mendeley (Joubert and Viljoen, 2017 [2]. Keywords: Multilayered complex networks, Supply chain vulnerability, Urban road networks

  10. Open source platform for collaborative construction of wearable sensor datasets for human motion analysis and an application for gait analysis.

    Science.gov (United States)

    Llamas, César; González, Manuel A; Hernández, Carmen; Vegas, Jesús

    2016-10-01

    Nearly every practical improvement in modeling human motion is well founded in a properly designed collection of data or datasets. These datasets must be made publicly available for the community could validate and accept them. It is reasonable to concede that a collective, guided enterprise could serve to devise solid and substantial datasets, as a result of a collaborative effort, in the same sense as the open software community does. In this way datasets could be complemented, extended and expanded in size with, for example, more individuals, samples and human actions. For this to be possible some commitments must be made by the collaborators, being one of them sharing the same data acquisition platform. In this paper, we offer an affordable open source hardware and software platform based on inertial wearable sensors in a way that several groups could cooperate in the construction of datasets through common software suitable for collaboration. Some experimental results about the throughput of the overall system are reported showing the feasibility of acquiring data from up to 6 sensors with a sampling frequency no less than 118Hz. Also, a proof-of-concept dataset is provided comprising sampled data from 12 subjects suitable for gait analysis. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  12. Use of country of birth as an indicator of refugee background in health datasets

    Science.gov (United States)

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  13. Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

    Science.gov (United States)

    Robinson, Nathaniel Paul

    across the CONUS domain. The main results of this work are three publicly available datasets: 1) 30 m Landsat NDVI; 2) 250 m MODIS based GPP and NPP; and 3) 30 m Landsat based GPP and NPP. My goal is that these products prove useful for the wider scientific, conservation, and land management communities as we continue to strive for better conservation and management practices.

  14. Publicity and public relations

    Science.gov (United States)

    Fosha, Charles E.

    1990-01-01

    This paper addresses approaches to using publicity and public relations to meet the goals of the NASA Space Grant College. Methods universities and colleges can use to publicize space activities are presented.

  15. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  16. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    Science.gov (United States)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  17. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  18. Formerly Used Defense Sites (FUDS) Public Properties

    Data.gov (United States)

    Department of Homeland Security — The FUDS Public GIS dataset contains point location information for the 2,709 Formerly Used Defense Sites (FUDS) properties where the U.S. Army Corps of Engineers is...

  19. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Darawan Rinchai

    2016-04-01

    Full Text Available Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

  20. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  1. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  2. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  3. Hydrological simulation of the Brahmaputra basin using global datasets

    Science.gov (United States)

    Bhattacharya, Biswa; Conway, Crystal; Craven, Joanne; Masih, Ilyas; Mazzolini, Maurizio; Shrestha, Shreedeepy; Ugay, Reyne; van Andel, Schalk Jan

    2017-04-01

    Brahmaputra River flows through China, India and Bangladesh to the Bay of Bengal and is one of the largest rivers of the world with a catchment size of 580K km2. The catchment is largely hilly and/or forested with sparse population and with limited urbanisation and economic activities. The catchment experiences heavy monsoon rainfall leading to very high flood discharges. Large inter-annual variation of discharge leading to flooding, erosion and morphological changes are among the major challenges. The catchment is largely ungauged; moreover, limited availability of hydro-meteorological data limits the possibility of carrying out evidence based research, which could provide trustworthy information for managing and when needed, controlling, the basin processes by the riparian countries for overall basin development. The paper presents initial results of a current research project on Brahmaputra basin. A set of hydrological and hydraulic models (SWAT, HMS, RAS) are developed by employing publicly available datasets of DEM, land use and soil and simulated using satellite based rainfall products, evapotranspiration and temperature estimates. Remotely sensed data are compared with sporadically available ground data. The set of models are able to produce catchment wide hydrological information that potentially can be used in the future in managing the basin's water resources. The model predications should be used with caution due to high level of uncertainty because the semi-calibrated models are developed with uncertain physical representation (e.g. cross-section) and simulated with global meteorological forcing (e.g. TRMM) with limited validation. Major scientific challenges are seen in producing robust information that can be reliably used in managing the basin. The information generated by the models are uncertain and as a result, instead of using them per se, they are used in improving the understanding of the catchment, and by running several scenarios with varying

  4. Dataset on species incidence, species richness and forest characteristics in a Danish protected area

    Directory of Open Access Journals (Sweden)

    Adriano Mazziotta

    2016-12-01

    Full Text Available The data presented in this article are related to the research article entitled “Restoring hydrology and old-growth structures in a former production forest: Modelling the long-term effects on biodiversity” (A. Mazziotta, J. Heilmann-Clausen, H. H.Bruun, Ö. Fritz, E. Aude, A.P. Tøttrup [1]. This article describes how the changes induced by restoration actions in forest hydrology and structure alter the biodiversity value of a Danish forest reserve. The field dataset is made publicly available to enable critical or extended analyses.

  5. Dataset on species incidence, species richness and forest characteristics in a Danish protected area

    DEFF Research Database (Denmark)

    Mazziotta, Adriano; Heilmann-Clausen, Jacob; Bruun, Hans Henrik

    2016-01-01

    The data presented in this article are related to the research article entitled "Restoring hydrology and old-growth structures in a former production forest: Modelling the long-term effects on biodiversity" (A. Mazziotta, J. Heilmann-Clausen, H. H.Bruun, Ö. Fritz, E. Aude, A.P. Tøttrup) [1......]. This article describes how the changes induced by restoration actions in forest hydrology and structure alter the biodiversity value of a Danish forest reserve. The field dataset is made publicly available to enable critical or extended analyses....

  6. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  7. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  8. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  9. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  10. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  11. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  12. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  13. BLOND, a building-level office environment dataset of typical electrical appliances.

    Science.gov (United States)

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-27

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  14. Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin

    Science.gov (United States)

    Yoon, Chun Hong; Demirci, Hasan; Sierra, Raymond G.; Dao, E. Han; Ahmadi, Radman; Aksit, Fulya; Aquila, Andrew L.; Batyuk, Alexander; Ciftci, Halilibrahim; Guillet, Serge; Hayes, Matt J.; Hayes, Brandon; Lane, Thomas J.; Liang, Meng; Lundström, Ulf; Koglin, Jason E.; Mgbam, Paul; Rao, Yashas; Rendahl, Theodore; Rodriguez, Evan; Zhang, Lindsey; Wakatsuki, Soichi; Boutet, Sébastien; Holton, James M.; Hunter, Mark S.

    2017-04-01

    We provide a detailed description of selenobiotinyl-streptavidin (Se-B SA) co-crystal datasets recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS) for selenium single-wavelength anomalous diffraction (Se-SAD) structure determination. Se-B SA was chosen as the model system for its high affinity between biotin and streptavidin where the sulfur atom in the biotin molecule (C10H16N2O3S) is substituted with selenium. The dataset was collected at three different transmissions (100, 50, and 10%) using a serial sample chamber setup which allows for two sample chambers, a front chamber and a back chamber, to operate simultaneously. Diffraction patterns from Se-B SA were recorded to a resolution of 1.9 Å. The dataset is publicly available through the Coherent X-ray Imaging Data Bank (CXIDB) and also on LCLS compute nodes as a resource for research and algorithm development.

  15. MiSTIC, an integrated platform for the analysis of heterogeneity in large tumour transcriptome datasets.

    Science.gov (United States)

    Lemieux, Sebastien; Sargeant, Tobias; Laperrière, David; Ismail, Houssam; Boucher, Geneviève; Rozendaal, Marieke; Lavallée, Vincent-Philippe; Ashton-Beaucage, Dariel; Wilhelm, Brian; Hébert, Josée; Hilton, Douglas J; Mader, Sylvie; Sauvageau, Guy

    2017-07-27

    Genome-wide transcriptome profiling has enabled non-supervised classification of tumours, revealing different sub-groups characterized by specific gene expression features. However, the biological significance of these subtypes remains for the most part unclear. We describe herein an interactive platform, Minimum Spanning Trees Inferred Clustering (MiSTIC), that integrates the direct visualization and comparison of the gene correlation structure between datasets, the analysis of the molecular causes underlying co-variations in gene expression in cancer samples, and the clinical annotation of tumour sets defined by the combined expression of selected biomarkers. We have used MiSTIC to highlight the roles of specific transcription factors in breast cancer subtype specification, to compare the aspects of tumour heterogeneity targeted by different prognostic signatures, and to highlight biomarker interactions in AML. A version of MiSTIC preloaded with datasets described herein can be accessed through a public web server (http://mistic.iric.ca); in addition, the MiSTIC software package can be obtained (github.com/iric-soft/MiSTIC) for local use with personalized datasets. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. BLOND, a building-level office environment dataset of typical electrical appliances

    Science.gov (United States)

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-01

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  17. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L. Johnson, Timothy; Malof, Jordan M.; Devarajan, Arjun; Zhang, Wuming; M. Collins, Leslie; G. Newell, Richard

    2016-12-01

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  18. HydroSHEDS: A global comprehensive hydrographic dataset

    Science.gov (United States)

    Wickel, B. A.; Lehner, B.; Sindorf, N.

    2007-12-01

    The Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales (HydroSHEDS) is an innovative product that, for the first time, provides hydrographic information in a consistent and comprehensive format for regional and global-scale applications. HydroSHEDS offers a suite of geo-referenced data sets, including stream networks, watershed boundaries, drainage directions, and ancillary data layers such as flow accumulations, distances, and river topology information. The goal of developing HydroSHEDS was to generate key data layers to support regional and global watershed analyses, hydrological modeling, and freshwater conservation planning at a quality, resolution and extent that had previously been unachievable. Available resolutions range from 3 arc-second (approx. 90 meters at the equator) to 5 minute (approx. 10 km at the equator) with seamless near-global extent. HydroSHEDS is derived from elevation data of the Shuttle Radar Topography Mission (SRTM) at 3 arc-second resolution. The original SRTM data have been hydrologically conditioned using a sequence of automated procedures. Existing methods of data improvement and newly developed algorithms have been applied, including void filling, filtering, stream burning, and upscaling techniques. Manual corrections were made where necessary. Preliminary quality assessments indicate that the accuracy of HydroSHEDS significantly exceeds that of existing global watershed and river maps. HydroSHEDS was developed by the Conservation Science Program of the World Wildlife Fund (WWF) in partnership with the U.S. Geological Survey (USGS), the International Centre for Tropical Agriculture (CIAT), The Nature Conservancy (TNC), and the Center for Environmental Systems Research (CESR) of the University of Kassel, Germany.

  19. Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

    Science.gov (United States)

    Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

    2013-04-01

    Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures

  20. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  1. Traffic sign classification with dataset augmentation and convolutional neural network

    Science.gov (United States)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  2. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  3. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  4. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  5. Georeferenced energy information system integrated of energetic matrix of Sao Paulo state from 2005 to 2035; Sistema de informacoes energeticas georreferenciadas integrado a matriz energetica do estado de Sao Paulo: 2005-2035

    Energy Technology Data Exchange (ETDEWEB)

    Alvares, Joao Malta [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil)

    2010-07-01

    A georeferenced information system energy or simply SIEG, is designed to integrate into the energy matrix of Sao Paulo from 2005 to 2035. Being an innovative request made by the Department of Sanitation and Energy of the state, this system would have the purpose to collect and aggregate information and data from several themes, relating this content in a geographic location spatialized. The main focus of the system is the analysis of the energy sector as a whole, from generation to final consumption, through all phases such as transmission and distribution. The energy data would also be crossed with various themes of support, contributing to the development of numerous reviews and generating sound conclusions. Issues such as environment, socio-economics, infrastructure, interconnected sectors, geographical conditions and other information could be entered, viewed and linked to the system. The SIEG is also a facilitator for planning and managing the energy sector with forecast models in possible future situations. (author)

  6. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction

    Directory of Open Access Journals (Sweden)

    Zheng Huiru

    2009-01-01

    Full Text Available Abstract Background Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases and non-interacting proteins (negative cases are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. Results GRIP (Gold Reference dataset constructor from Information on Protein complexes is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. Conclusion GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  7. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias.

    Directory of Open Access Journals (Sweden)

    Thomas Pfeiffer

    2011-03-01

    Full Text Available Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that address similar questions. A major obstacle for meta-analyses arises from biases in reporting. In particular, it is speculated that findings which do not achieve formal statistical significance are less likely reported than statistically significant findings. Moreover, the patterns of bias can be complex and may also depend on the timing of the research results and their relationship with previously published work. In this paper, we present an approach that is specifically designed to analyze large-scale datasets on published results. Such datasets are currently emerging in diverse research fields, particularly in molecular medicine. We use our approach to investigate a dataset on Alzheimer's disease (AD that covers 1167 results from case-control studies on 102 genetic markers. We observe that initial studies on a genetic marker tend to be substantially more biased than subsequent replications. The chances for initial, statistically non-significant results to be published are estimated to be about 44% (95% CI, 32% to 63% relative to statistically significant results, while statistically non-significant replications have almost the same chance to be published as statistically significant replications (84%; 95% CI, 66% to 107%. Early replications tend to be biased against initial findings, an observation previously termed Proteus phenomenon: The chances for non-significant studies going in the same direction as the initial result are estimated to be lower than the chances for non-significant studies opposing the initial result (73%; 95% CI, 55% to 96%. Such dynamic patterns in bias are difficult to capture by conventional methods, where typically simple publication bias is assumed to operate. Our approach captures and corrects for complex dynamic patterns of bias, and thereby helps generating conclusions from published results that are more robust

  8. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  9. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  10. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  11. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  12. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  13. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  14. The LAMBADA dataset: Word prediction requiring a broad discourse context

    NARCIS (Netherlands)

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  15. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    Energy Technology Data Exchange (ETDEWEB)

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  16. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  17. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  18. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  19. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Science.gov (United States)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  20. Dataset: Multi Sensor-Orientation Movement Data of Goats

    NARCIS (Netherlands)

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  1. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  2. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  3. participatory development of a minimum dataset for the khayelitsha ...

    African Journals Online (AJOL)

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  4. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  5. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  6. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  7. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  8. Using kittens to unlock photo-sharing website datasets for environmental applications

    Science.gov (United States)

    Gascoin, Simon

    2016-04-01

    Mining photo-sharing websites is a promising approach to complement in situ and satellite observations of the environment, however a challenge is to deal with the large degree of noise inherent to online social datasets. Here I explored the value of the Flickr image hosting website database to monitor the snow cover in the Pyrenees. Using the Flickr application programming interface (API) I queried all the public images metadata tagged at least with one of the following words: "snow", "neige", "nieve", "neu" (snow in French, Spanish and Catalan languages). The search was limited to the geo-tagged pictures taken in the Pyrenees area. However, the number of public pictures available in the Flickr database for a given time interval depends on several factors, including the Flickr website popularity and the development of digital photography. Thus, I also searched for all Flickr images tagged with "chat", "gat" or "gato" (cat in French, Spanish and Catalan languages). The tag "cat" was not considered in order to exclude the results from North America where Flickr got popular earlier than in Europe. The number of "cat" images per month was used to fit a model of the number of images uploaded in Flickr with time. This model was used to remove this trend in the numbers of snow-tagged photographs. The resulting time series was compared to a time series of the snow cover area derived from the MODIS satellite over the same region. Both datasets are well correlated; in particular they exhibit the same seasonal evolution, although the inter-annual variabilities are less similar. I will also discuss which other factors may explain the main discrepancies in order to further decrease the noise in the Flickr dataset.

  9. Free internet datasets for streamflow modelling using SWAT in the Johor river basin, Malaysia

    Science.gov (United States)

    Tan, M. L.

    2014-02-01

    Streamflow modelling is a mathematical computational approach that represents terrestrial hydrology cycle digitally and is used for water resources assessment. However, such modelling endeavours require a large amount of data. Generally, governmental departments produce and maintain these data sets which make it difficult to obtain this data due to bureaucratic constraints. In some countries, the availability and quality of geospatial and climate datasets remain a critical issue due to many factors such as lacking of ground station, expertise, technology, financial support and war time. To overcome this problem, this research used public domain datasets from the Internet as "input" to a streamflow model. The intention is simulate daily and monthly streamflow of the Johor River Basin in Malaysia. The model used is the Soil and Water Assessment Tool (SWAT). As input free data including a digital elevation model (DEM), land use information, soil and climate data were used. The model was validated by in-situ streamflow information obtained from Rantau Panjang station for the year 2006. The coefficient of determination and Nash-Sutcliffe efficiency were 0.35/0.02 for daily simulated streamflow and 0.92/0.21 for monthly simulated streamflow, respectively. The results show that free data can provide a better simulation at a monthly scale compared to a daily basis in a tropical region. A sensitivity analysis and calibration procedure should be conducted in order to maximize the "goodness-of-fit" between simulated and observed streamflow. The application of Internet datasets promises an acceptable performance of streamflow modelling. This research demonstrates that public domain data is suitable for streamflow modelling in a tropical river basin within acceptable accuracy.

  10. Free internet datasets for streamflow modelling using SWAT in the Johor river basin, Malaysia

    International Nuclear Information System (INIS)

    Tan, M L

    2014-01-01

    Streamflow modelling is a mathematical computational approach that represents terrestrial hydrology cycle digitally and is used for water resources assessment. However, such modelling endeavours require a large amount of data. Generally, governmental departments produce and maintain these data sets which make it difficult to obtain this data due to bureaucratic constraints. In some countries, the availability and quality of geospatial and climate datasets remain a critical issue due to many factors such as lacking of ground station, expertise, technology, financial support and war time. To overcome this problem, this research used public domain datasets from the Internet as ''input'' to a streamflow model. The intention is simulate daily and monthly streamflow of the Johor River Basin in Malaysia. The model used is the Soil and Water Assessment Tool (SWAT). As input free data including a digital elevation model (DEM), land use information, soil and climate data were used. The model was validated by in-situ streamflow information obtained from Rantau Panjang station for the year 2006. The coefficient of determination and Nash-Sutcliffe efficiency were 0.35/0.02 for daily simulated streamflow and 0.92/0.21 for monthly simulated streamflow, respectively. The results show that free data can provide a better simulation at a monthly scale compared to a daily basis in a tropical region. A sensitivity analysis and calibration procedure should be conducted in order to maximize the ''goodness-of-fit'' between simulated and observed streamflow. The application of Internet datasets promises an acceptable performance of streamflow modelling. This research demonstrates that public domain data is suitable for streamflow modelling in a tropical river basin within acceptable accuracy

  11. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  12. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  13. Public Use Airport Runways, Geographic WGS84, BTS (2006) [public_use_airport_runway_BTS_2006

    Data.gov (United States)

    Louisiana Geographic Information Center — The Public Use Airport Runways database is a geographic dataset of runways in the United States and US territories containing information on the physical...

  14. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    DEFF Research Database (Denmark)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter

    2014-01-01

    in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational...... to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data......Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research...

  15. Creating a data exchange strategy for radiotherapy research : Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Buettner, Daniel; Debus, Juergen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A.; Lewensohn, Rolf; Luehr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Bjoern; Zips, Daniel; Baumann, Michael

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate 'translational research

  16. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, T.; Belka, C.; Bosch, W.; Brink, C. Van den; Brunner, T.; Budach, V.; Buttner, D.; Debus, J.; Dekker, A.; Grau, C.; Gulliford, S.; Hurkmans, C.; Just, U.; Krause, M.; Lambin, P.; Langendijk, J.A.; Lewensohn, R.; Luhr, A.; Maingon, P.; Masucci, M.; Niyazi, M.; Poortmans, P.M.P.; Simon, M.; Schmidberger, H.; Spezi, E.; Stuschke, M.; Valentini, V.; Verheij, M.; Whitfield, G.; Zackrisson, B.; Zips, D.; Baumann, M.

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research

  17. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    International Nuclear Information System (INIS)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within “Big Data”. Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology

  18. Creating a data exchange strategy for radiotherapy research: towards federated databases and anonymised public datasets.

    Science.gov (United States)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A; Lewensohn, Rolf; Lühr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Björn; Zips, Daniel; Baumann, Michael

    2014-12-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within "Big Data". Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  19. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  20. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  1. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  2. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  3. Datasets in Gene Expression Omnibus used in the study ORD-019001: Compensatory changes in CYP expression in three different toxicology mouse models: CAR-null, Cyp3a-null, and Cyp2b9/10/13-null mice.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Accession numbers of microarray data sets used in the analysis. This dataset is associated with the following publication: Kumar, R., L. Mota, E. Litoff, J. Rooney,...

  4. Application of Density Estimation Methods to Datasets from a Glider

    Science.gov (United States)

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  5. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  6. Dataset of mitochondrial genome variants in oncocytic tumors

    Directory of Open Access Journals (Sweden)

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  7. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  8. Dataset on records of Hericium erinaceus in Slovakia

    OpenAIRE

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  9. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  10. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  11. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  12. Discovering New Global Climate Patterns: Curating a 21-Year High Temporal (Hourly) and Spatial (40km) Resolution Reanalysis Dataset

    Science.gov (United States)

    Hou, C. Y.; Dattore, R.; Peng, G. S.

    2014-12-01

    The National Center for Atmospheric Research's Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40km Reanalysis dataset is a dynamically downscaled dataset with high temporal and spatial resolution. The dataset contains three-dimensional hourly analyses in netCDF format for the global atmospheric state from 1985 to 2005 on a 40km horizontal grid (0.4°grid increment) with 28 vertical levels, providing good representation of local forcing and diurnal variation of processes in the planetary boundary layer. This project aimed to make the dataset publicly available, accessible, and usable in order to provide a unique resource to allow and promote studies of new climate characteristics. When the curation project started, it had been five years since the data files were generated. Also, although the Principal Investigator (PI) had generated a user document at the end of the project in 2009, the document had not been maintained. Furthermore, the PI had moved to a new institution, and the remaining team members were reassigned to other projects. These factors made data curation in the areas of verifying data quality, harvest metadata descriptions, documenting provenance information especially challenging. As a result, the project's curation process found that: Data curator's skill and knowledge helped make decisions, such as file format and structure and workflow documentation, that had significant, positive impact on the ease of the dataset's management and long term preservation. Use of data curation tools, such as the Data Curation Profiles Toolkit's guidelines, revealed important information for promoting the data's usability and enhancing preservation planning. Involving data curators during each stage of the data curation life cycle instead of at the end could improve the curation process' efficiency. Overall, the project showed that proper resources invested in the curation process would give datasets the best chance to fulfill their potential to

  13. Parton Distributions based on a Maximally Consistent Dataset

    Science.gov (United States)

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  14. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  15. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  16. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  17. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  18. Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

    Directory of Open Access Journals (Sweden)

    Nicolas Robitaille

    2012-01-01

    Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.

  19. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  20. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  1. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  2. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  3. A cross-country Exchange Market Pressure (EMP dataset

    Directory of Open Access Journals (Sweden)

    Mohit Desai

    2017-06-01

    Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  4. UniMiB SHAR: A Dataset for Human Activity Recognition Using Acceleration Data from Smartphones

    Directory of Open Access Journals (Sweden)

    Daniela Micucci

    2017-10-01

    Full Text Available Smartphones, smartwatches, fitness trackers, and ad-hoc wearable devices are being increasingly used to monitor human activities. Data acquired by the hosted sensors are usually processed by machine-learning-based algorithms to classify human activities. The success of those algorithms mostly depends on the availability of training (labeled data that, if made publicly available, would allow researchers to make objective comparisons between techniques. Nowadays, there are only a few publicly available data sets, which often contain samples from subjects with too similar characteristics, and very often lack specific information so that is not possible to select subsets of samples according to specific criteria. In this article, we present a new dataset of acceleration samples acquired with an Android smartphone designed for human activity recognition and fall detection. The dataset includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL performed, the age, the gender, and so on. Finally, the dataset has been benchmarked with four different classifiers and with two different feature vectors. We evaluated four different classification tasks: fall vs. no fall, 9 activities, 8 falls, 17 activities and falls. For each classification task, we performed a 5-fold cross-validation (i.e., including samples from all the subjects in both the training and the test dataset and a leave-one-subject-out cross-validation (i.e., the test data include the samples of a subject only, and the training data, the samples of all the other subjects. Regarding the

  5. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    Science.gov (United States)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  6. Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone

    Science.gov (United States)

    Kashparov, Valery; Levchuk, Sviatoslav; Zhurba, Marina; Protsak, Valentyn; Khomutinin, Yuri; Beresford, Nicholas A.; Chaplow, Jacqueline S.

    2018-02-01

    The dataset Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone was developed to enable data collected between May 1986 (immediately after Chernobyl) and 2014 by the Ukrainian Institute of Agricultural Radiology (UIAR) after the Chernobyl accident to be made publicly available. The dataset includes results from comprehensive soil sampling across the Chernobyl Exclusion Zone (CEZ). Analyses include radiocaesium (134Cs and 134Cs) 90Sr, 154Eu and soil property data; plutonium isotope activity concentrations in soil (including distribution in the soil profile); analyses of hot (or fuel) particles from the CEZ (data from Poland and across Europe are also included); and results of monitoring in the Ivankov district, a region adjacent to the exclusion zone. The purpose of this paper is to describe the available data and methodology used to obtain them. The data will be valuable to those conducting studies within the CEZ in a number of ways, for instance (i) for helping to perform robust exposure estimates to wildlife, (ii) for predicting comparative activity concentrations of different key radionuclides, (iii) for providing a baseline against which future surveys in the CEZ can be compared, (iv) as a source of information on the behaviour of fuel particles (FPs), (v) for performing retrospective dose assessments and (vi) for assessing natural background dose rates in the CEZ. The CEZ has been proposed as a radioecological observatory (i.e. a radioactively contaminated site that will provide a focus for long-term, radioecological collaborative international research). Key to the future success of this concept is open access to data for the CEZ. The data presented here are a first step in this process. The data and supporting documentation are freely available from the Environmental Information Data Centre (EIDC) under the terms and conditions of the Open Government Licence: https://doi.org/10.5285/782ec845-2135-4698-8881-b38823e533bf.

  7. Comparison and evaluation of datasets for off-angle iris recognition

    Science.gov (United States)

    Kurtuncu, Osman M.; Cerme, Gamze N.; Karakaya, Mahmut

    2016-05-01

    In this paper, we investigated the publicly available iris recognition datasets and their data capture procedures in order to determine if they are suitable for the stand-off iris recognition research. Majority of the iris recognition datasets include only frontal iris images. Even if a few datasets include off-angle iris images, the frontal and off-angle iris images are not captured at the same time. The comparison of the frontal and off-angle iris images shows not only differences in the gaze angle but also change in pupil dilation and accommodation as well. In order to isolate the effect of the gaze angle from other challenging issues including dilation and accommodation, the frontal and off-angle iris images are supposed to be captured at the same time by using two different cameras. Therefore, we developed an iris image acquisition platform by using two cameras in this work where one camera captures frontal iris image and the other one captures iris images from off-angle. Based on the comparison of Hamming distance between frontal and off-angle iris images captured with the two-camera- setup and one-camera-setup, we observed that Hamming distance in two-camera-setup is less than one-camera-setup ranging from 0.05 to 0.001. These results show that in order to have accurate results in the off-angle iris recognition research, two-camera-setup is necessary in order to distinguish the challenging issues from each other.

  8. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.

    Science.gov (United States)

    Ahmidi, Narges; Tao, Lingling; Sefati, Shahin; Gao, Yixin; Lea, Colin; Haro, Benjamin Bejar; Zappella, Luca; Khudanpur, Sanjeev; Vidal, Rene; Hager, Gregory D

    2017-09-01

    State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

  9. A curated transcriptome dataset collection to investigate the functional programming of human hematopoietic cells in early life.

    Science.gov (United States)

    Rahman, Mahbuba; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Cugno, Chiara; Chaussabel, Damien; Marr, Nico

    2016-01-01

    Compendia of large-scale datasets made available in public repositories provide an opportunity to identify and fill gaps in biomedical knowledge. But first, these data need to be made readily accessible to research investigators for interpretation. Here we make available a collection of transcriptome datasets to investigate the functional programming of human hematopoietic cells in early life. Thirty two datasets were retrieved from the NCBI Gene Expression Omnibus (GEO) and loaded in a custom web application called the Gene Expression Browser (GXB), which was designed for interactive query and visualization of integrated large-scale data. Quality control checks were performed. Multiple sample groupings and gene rank lists were created allowing users to reveal age-related differences in transcriptome profiles, changes in the gene expression of neonatal hematopoietic cells to a variety of immune stimulators and modulators, as well as during cell differentiation. Available demographic, clinical, and cell phenotypic information can be overlaid with the gene expression data and used to sort samples. Web links to customized graphical views can be generated and subsequently inserted in manuscripts to report novel findings. GXB also enables browsing of a single gene across projects, thereby providing new perspectives on age- and developmental stage-specific expression of a given gene across the human hematopoietic system. This dataset collection is available at: http://developmentalimmunology.gxbsidra.org/dm3/geneBrowser/list.

  10. DEVELOPING VERIFICATION SYSTEMS FOR BUILDING INFORMATION MODELS OF HERITAGE BUILDINGS WITH HETEROGENEOUS DATASETS

    Directory of Open Access Journals (Sweden)

    L. Chow

    2017-08-01

    Full Text Available The digitization and abstraction of existing buildings into building information models requires the translation of heterogeneous datasets that may include CAD, technical reports, historic texts, archival drawings, terrestrial laser scanning, and photogrammetry into model elements. In this paper, we discuss a project undertaken by the Carleton Immersive Media Studio (CIMS that explored the synthesis of heterogeneous datasets for the development of a building information model (BIM for one of Canada’s most significant heritage assets – the Centre Block of the Parliament Hill National Historic Site. The scope of the project included the development of an as-found model of the century-old, six-story building in anticipation of specific model uses for an extensive rehabilitation program. The as-found Centre Block model was developed in Revit using primarily point cloud data from terrestrial laser scanning. The data was captured by CIMS in partnership with Heritage Conservation Services (HCS, Public Services and Procurement Canada (PSPC, using a Leica C10 and P40 (exterior and large interior spaces and a Faro Focus (small to mid-sized interior spaces. Secondary sources such as archival drawings, photographs, and technical reports were referenced in cases where point cloud data was not available. As a result of working with heterogeneous data sets, a verification system was introduced in order to communicate to model users/viewers the source of information for each building element within the model.

  11. Assessment of Global Cloud Datasets from Satellites: Project and Database Initiated by the GEWEX Radiation Panel

    Science.gov (United States)

    Stubenrauch, C. J.; Rossow, W. B.; Kinne, S.; Ackerman, S.; Cesana, G.; Chepfer, H.; Getzewich, B.; Di Girolamo, L.; Guignard, A.; Heidinger, A.; hide

    2012-01-01

    Clouds cover about 70% of the Earth's surface and play a dominant role in the energy and water cycle of our planet. Only satellite observations provide a continuous survey of the state of the atmosphere over the whole globe and across the wide range of spatial and temporal scales that comprise weather and climate variability. Satellite cloud data records now exceed more than 25 years in length. However, climatologies compiled from different satellite datasets can exhibit systematic biases. Questions therefore arise as to the accuracy and limitations of the various sensors. The Global Energy and Water cycle Experiment (GEWEX) Cloud Assessment, initiated in 2005 by the GEWEX Radiation Panel, provided the first coordinated intercomparison of publically available, standard global cloud products (gridded, monthly statistics) retrieved from measurements of multi-spectral imagers (some with multiangle view and polarization capabilities), IR sounders and lidar. Cloud properties under study include cloud amount, cloud height (in terms of pressure, temperature or altitude), cloud radiative properties (optical depth or emissivity), cloud thermodynamic phase and bulk microphysical properties (effective particle size and water path). Differences in average cloud properties, especially in the amount of high-level clouds, are mostly explained by the inherent instrument measurement capability for detecting and/or identifying optically thin cirrus, especially when overlying low-level clouds. The study of long-term variations with these datasets requires consideration of many factors. A monthly, gridded database, in common format, facilitates further assessments, climate studies and the evaluation of climate models.

  12. Reference dataset of volcanic ash physicochemical and optical properties for atmospheric measurement retrievals and transport modelling

    Science.gov (United States)

    Vogel, Andreas; Durant, Adam; Sytchkova, Anna; Diplas, Spyros; Bonadonna, Costanza; Scarnato, Barbara; Krüger, Kirstin; Kylling, Arve; Kristiansen, Nina; Stohl, Andreas

    2016-04-01

    Explosive volcanic eruptions emit up to 50 wt.% (total erupted mass) of fine ash particles (estimates of the volcanic source term and the nature of the constituent volcanic ash properties. Consequently, it is important to include a quantitative assessment of measurement uncertainties of ash properties to provide realistic ash forecast uncertainty. Currently, information on volcanic ash physicochemical and optical properties is derived from a small number of somewhat dated publications. In this study, we provide a reference dataset for physical (size distribution and shape), chemical (bulk vs. surface chemistry) and optical properties (complex refractive index in the UV-vis-NIR range) of a representative selection of volcanic ash samples from 10 different volcanic eruptions covering the full variability in silica content (40-75 wt.% SiO2). Through the combination of empirical analytical methods (e.g., image analysis, Energy Dispersive Spectroscopy, X-ray Photoelectron Spectroscopy, Transmission Electron Microscopy and UV/Vis/NIR/FTIR Spectroscopy) and theoretical models (e.g., Bruggeman effective medium approach), it was possible to fully capture the natural variability of ash physicochemical and optical characteristics. The dataset will be applied in atmospheric measurement retrievals and atmospheric transport modelling to determine the sensitivity to uncertainty in ash particle characteristics.

  13. Developing Verification Systems for Building Information Models of Heritage Buildings with Heterogeneous Datasets

    Science.gov (United States)

    Chow, L.; Fai, S.

    2017-08-01

    The digitization and abstraction of existing buildings into building information models requires the translation of heterogeneous datasets that may include CAD, technical reports, historic texts, archival drawings, terrestrial laser scanning, and photogrammetry into model elements. In this paper, we discuss a project undertaken by the Carleton Immersive Media Studio (CIMS) that explored the synthesis of heterogeneous datasets for the development of a building information model (BIM) for one of Canada's most significant heritage assets - the Centre Block of the Parliament Hill National Historic Site. The scope of the project included the development of an as-found model of the century-old, six-story building in anticipation of specific model uses for an extensive rehabilitation program. The as-found Centre Block model was developed in Revit using primarily point cloud data from terrestrial laser scanning. The data was captured by CIMS in partnership with Heritage Conservation Services (HCS), Public Services and Procurement Canada (PSPC), using a Leica C10 and P40 (exterior and large interior spaces) and a Faro Focus (small to mid-sized interior spaces). Secondary sources such as archival drawings, photographs, and technical reports were referenced in cases where point cloud data was not available. As a result of working with heterogeneous data sets, a verification system was introduced in order to communicate to model users/viewers the source of information for each building element within the model.

  14. The global coastline dataset: the observed relation between erosion and sea-level rise

    Science.gov (United States)

    Donchyts, G.; Baart, F.; Luijendijk, A.; Hagenaars, G.

    2017-12-01

    Erosion of sandy coasts is considered one of the key risks of sea-level rise. Because sandy coastlines of the world are often highly populated, erosive coastline trends result in risk to populations and infrastructure. Most of our understanding of the relation between sea-level rise and coastal erosion is based on local or regional observations and generalizations of numerical and physical experiments. Until recently there was no reliable global scale assessment of the location of sandy coasts and their rate of erosion and accretion. Here we present the global coastline dataset that covers erosion indicators on a local scale with global coverage. The dataset uses our global coastline transects grid defined with an alongshore spacing of 250 m and a cross shore length extending 1 km seaward and 1 km landward. This grid matches up with pre-existing local grids where available. We present the latest results on validation of coastal-erosion trends (based on optical satellites) and classification of sandy versus non-sandy coasts. We show the relation between sea-level rise (based both on tide-gauges and multi-mission satellite altimetry) and observed erosion trends over the last decades, taking into account broken-coastline trends (for example due to nourishments).An interactive web application presents the publicly-accessible results using a backend based on Google Earth Engine. It allows both researchers and stakeholders to use objective estimates of coastline trends, particularly when authoritative sources are not available.

  15. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  16. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  17. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  18. A synthetic dataset for evaluating soft and hard fusion algorithms

    Science.gov (United States)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  19. Identifying frauds and anomalies in Medicare-B dataset.

    Science.gov (United States)

    Jiwon Seo; Mendelevitch, Ofer

    2017-07-01

    Healthcare industry is growing at a rapid rate to reach a market value of $7 trillion dollars world wide. At the same time, fraud in healthcare is becoming a serious problem, amounting to 5% of the total healthcare spending, or $100 billion dollars each year in US. Manually detecting healthcare fraud requires much effort. Recently, machine learning and data mining techniques are applied to automatically detect healthcare frauds. This paper proposes a novel PageRank-based algorithm to detect healthcare frauds and anomalies. We apply the algorithm to Medicare-B dataset, a real-life data with 10 million healthcare insurance claims. The algorithm successfully identifies tens of previously unreported anomalies.

  20. Power analysis dataset for QCA based multiplexer circuits

    Directory of Open Access Journals (Sweden)

    Md. Abdullah-Al-Shafi

    2017-04-01

    Full Text Available Power consumption in irreversible QCA logic circuits is a vital and a major issue; however in the practical cases, this focus is mostly omitted.The complete power depletion dataset of different QCA multiplexers have been worked out in this paper. At −271.15 °C temperature, the depletion is evaluated under three separate tunneling energy levels. All the circuits are designed with QCADesigner, a broadly used simulation engine and QCAPro tool has been applied for estimating the power dissipation.

  1. Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

    Directory of Open Access Journals (Sweden)

    AnaM. Palacios

    2012-04-01

    Full Text Available Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassifications are able to produce knowledge bases that are also adequate under common metrics for imbalanced classification.

  2. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  3. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  4. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  5. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  6. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  7. Dataset on records of Hericium erinaceus in Slovakia.

    Science.gov (United States)

    Kunca, Vladimír; Čiliak, Marek

    2017-06-01

    The data presented in this article are related to the research article entitled "Habitat preferences of Hericium erinaceus in Slovakia" (Kunca and Čiliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  8. Dataset on records of Hericium erinaceus in Slovakia

    Directory of Open Access Journals (Sweden)

    Vladimír Kunca

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Habitat preferences of Hericium erinaceus in Slovakia” (Kunca and Čiliak, 2016 [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  9. Public participation in GIS via mobile applications

    Science.gov (United States)

    Brovelli, Maria Antonia; Minghini, Marco; Zamboni, Giorgio

    2016-04-01

    Driven by the recent trends in the GIS domain including Volunteered Geographic Information, geo-crowdsourcing and citizen science, and fostered by the constant technological advances, collection and dissemination of geospatial information by ordinary people has become commonplace. However, applications involving user-generated geospatial content show dramatically diversified patterns in terms of incentive, type and level of participation, purpose of the activity, data/metadata provided and data quality. This study contributes to this heterogeneous context by investigating public participation in GIS within the field of mobile-based applications. Results not only show examples of how to technically build GIS applications enabling user collection and interaction with geospatial data, but they also draw conclusions about the methods and needs of public participation. We describe three projects with different scales and purposes in the context of urban monitoring and planning, and tourism valorisation. In each case, an open source architecture is used, allowing users to exploit their mobile devices to collect georeferenced information. This data is then made publicly available on specific Web viewers. Analysis of user involvement in these projects provides insights related to participation patterns which suggests some generalized conclusions.

  10. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  11. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  12. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  13. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  14. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  15. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    Science.gov (United States)

    Li, Lianwei; Ma, Zhanshan Sam

    2016-08-16

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime.

  16. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    Science.gov (United States)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  17. Predicting weather regime transitions in Northern Hemisphere datasets

    Energy Technology Data Exchange (ETDEWEB)

    Kondrashov, D. [University of California, Department of Atmospheric and Oceanic Sciences and Institute of Geophysics and Planetary Physics, Los Angeles, CA (United States); Shen, J. [UCLA, Department of Statistics, Los Angeles, CA (United States); Berk, R. [UCLA, Department of Statistics, Los Angeles, CA (United States); University of Pennsylvania, Department of Criminology, Philadelphia, PA (United States); D' Andrea, F.; Ghil, M. [Ecole Normale Superieure, Departement Terre-Atmosphere-Ocean and Laboratoire de Meteorologie Dynamique (CNRS and IPSL), Paris Cedex 05 (France)

    2007-10-15

    A statistical learning method called random forests is applied to the prediction of transitions between weather regimes of wintertime Northern Hemisphere (NH) atmospheric low-frequency variability. A dataset composed of 55 winters of NH 700-mb geopotential height anomalies is used in the present study. A mixture model finds that the three Gaussian components that were statistically significant in earlier work are robust; they are the Pacific-North American (PNA) regime, its approximate reverse (the reverse PNA, or RNA), and the blocked phase of the North Atlantic Oscillation (BNAO). The most significant and robust transitions in the Markov chain generated by these regimes are PNA {yields} BNAO, PNA {yields} RNA and BNAO {yields} PNA. The break of a regime and subsequent onset of another one is forecast for these three transitions. Taking the relative costs of false positives and false negatives into account, the random-forests method shows useful forecasting skill. The calculations are carried out in the phase space spanned by a few leading empirical orthogonal functions of dataset variability. Plots of estimated response functions to a given predictor confirm the crucial influence of the exit angle on a preferred transition path. This result points to the dynamic origin of the transitions. (orig.)

  18. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  19. ISC-EHB: Reconstruction of a robust earthquake dataset

    Science.gov (United States)

    Weston, J.; Engdahl, E. R.; Harris, J.; Di Giacomo, D.; Storchak, D. A.

    2018-04-01

    The EHB Bulletin of hypocentres and associated travel-time residuals was originally developed with procedures described by Engdahl, Van der Hilst and Buland (1998) and currently ends in 2008. It is a widely used seismological dataset, which is now expanded and reconstructed, partly by exploiting updated procedures at the International Seismological Centre (ISC), to produce the ISC-EHB. The reconstruction begins in the modern period (2000-2013) to which new and more rigorous procedures for event selection, data preparation, processing, and relocation are applied. The selection criteria minimise the location bias produced by unmodelled 3D Earth structure, resulting in events that are relatively well located in any given region. Depths of the selected events are significantly improved by a more comprehensive review of near station and secondary phase travel-time residuals based on ISC data, especially for the depth phases pP, pwP and sP, as well as by a rigorous review of the event depths in subduction zone cross sections. The resulting cross sections and associated maps are shown to provide details of seismicity in subduction zones in much greater detail than previously achievable. The new ISC-EHB dataset will be especially useful for global seismicity studies and high-frequency regional and global tomographic inversions.

  20. Condensing Massive Satellite Datasets For Rapid Interactive Analysis

    Science.gov (United States)

    Grant, G.; Gallaher, D. W.; Lv, Q.; Campbell, G. G.; Fowler, C.; LIU, Q.; Chen, C.; Klucik, R.; McAllister, R. A.

    2015-12-01

    Our goal is to enable users to interactively analyze massive satellite datasets, identifying anomalous data or values that fall outside of thresholds. To achieve this, the project seeks to create a derived database containing only the most relevant information, accelerating the analysis process. The database is designed to be an ancillary tool for the researcher, not an archival database to replace the original data. This approach is aimed at improving performance by reducing the overall size by way of condensing the data. The primary challenges of the project include: - The nature of the research question(s) may not be known ahead of time. - The thresholds for determining anomalies may be uncertain. - Problems associated with processing cloudy, missing, or noisy satellite imagery. - The contents and method of creation of the condensed dataset must be easily explainable to users. The architecture of the database will reorganize spatially-oriented satellite imagery into temporally-oriented columns of data (a.k.a., "data rods") to facilitate time-series analysis. The database itself is an open-source parallel database, designed to make full use of clustered server technologies. A demonstration of the system capabilities will be shown. Applications for this technology include quick-look views of the data, as well as the potential for on-board satellite processing of essential information, with the goal of reducing data latency.

  1. Utilizing the Antarctic Master Directory to find orphan datasets

    Science.gov (United States)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  2. Public Values

    DEFF Research Database (Denmark)

    Beck Jørgensen, Torben; Rutgers, Mark R.

    2015-01-01

    administration is approached in terms of processes guided or restricted by public values and as public value creating: public management and public policy-making are both concerned with establishing, following and realizing public values. To study public values a broad perspective is needed. The article suggest......This article provides the introduction to a symposium on contemporary public values research. It is argued that the contribution to this symposium represent a Public Values Perspective, distinct from other specific lines of research that also use public value as a core concept. Public...... a research agenda for this encompasing kind of public values research. Finally the contributions to the symposium are introduced....

  3. Georeferenced database generation with the purpose of hydrologic molding in reservoirs of the hydrographic basin of Jaguaribe river in the state of Ceará, Brazil Geração de base de dados georreferenciada com finalidade de modelagem hidrológica em reservatórios da bacia hidrográfica do rio Jaguaribe, Ceará, Brasil

    Directory of Open Access Journals (Sweden)

    Raimundo A. de O. Leão

    2013-04-01

    Full Text Available The edafoclimatic conditions of the Brazilian semiarid region favor the water loss by surface runoff. The state of Ceará, almost completely covered by semiarid, has developed public policies for the construction of dams in order to attend the varied water demand. Several hydrological models were developed to support decisive processes in the complex management of reservoirs. This study aimed to establish a methodology for obtaining a georeferenced database suitable for use as input data in hydrological modeling in the semiarid of Ceará. It was used images of Landsat satellite and SRTM Mission, and soil maps of the state of Ceará. The Landsat images allowed the determination of the land cover and the SRTM Mission images, the automatic delineation of hydrographic basins. The soil type was obtained through the soil map. The database was obtained for Jaguaribe River hydrographic basin, in the state of Ceará, and is applicable to hydrological modeling based on the Curve Number method for estimating the surface runoff.As condições edafoclimáticas do semiárido brasileiro favorecem a perda de água por escoamento superficial. O Estado do Ceará,quase totalmente abrangido pelo semiárido, desenvolveu políticas públicas voltadas para a construção de açudes,a fim de atender à demanda hídrica diversificada. Vários modelos hidrológicos foram desenvolvidos para subsidiar os processos decisórios no complexo manejo dos reservatórios. Este trabalho teve como objetivo estabelecer uma metodologia para a obtenção de uma base de dados georreferenciada e adequada para uso como dados de entrada na modelagem hidrológica,no semiárido cearense. Foram utilizadas imagens de satélite Landsat e da missão SRTM, e mapa de solos do Estado do Ceará. As imagens Landsat possibilitaram determinar a cobertura do solo,e as imagens da missão SRTM, a delimitação automática das bacias hidrográficas. O tipo de solo foi obtido por meio do mapa de solos. A

  4. SPREAD: a high-resolution daily gridded precipitation dataset for Spain – an extreme events frequency and intensity overview

    Directory of Open Access Journals (Sweden)

    R. Serrano-Notivoli

    2017-09-01

    Full Text Available A high-resolution daily gridded precipitation dataset was built from raw data of 12 858 observatories covering a period from 1950 to 2012 in peninsular Spain and 1971 to 2012 in Balearic and Canary islands. The original data were quality-controlled and gaps were filled on each day and location independently. Using the serially complete dataset, a grid with a 5 × 5 km spatial resolution was constructed by estimating daily precipitation amounts and their corresponding uncertainty at each grid node. Daily precipitation estimations were compared to original observations to assess the quality of the gridded dataset. Four daily precipitation indices were computed to characterise the spatial distribution of daily precipitation and nine extreme precipitation indices were used to describe the frequency and intensity of extreme precipitation events. The Mediterranean coast and the Central Range showed the highest frequency and intensity of extreme events, while the number of wet days and dry and wet spells followed a north-west to south-east gradient in peninsular Spain, from high to low values in the number of wet days and wet spells and reverse in dry spells. The use of the total available data in Spain, the independent estimation of precipitation for each day and the high spatial resolution of the grid allowed for a precise spatial and temporal assessment of daily precipitation that is difficult to achieve when using other methods, pre-selected long-term stations or global gridded datasets. SPREAD dataset is publicly available at https://doi.org/10.20350/digitalCSIC/7393.

  5. The TERRA-PNW Dataset: A New Source for Standardized Plant Trait, Forest Carbon Cycling, and Soil Properties Measurements from the Pacific Northwest US, 2000-2014.

    Science.gov (United States)

    Berner, L. T.; Law, B. E.

    2015-12-01

    Plant traits include physiological, morphological, and biogeochemical characteristics that in combination determine a species sensitivity to environmental conditions. Standardized, co-located, and geo-referenced species- and plot-level measurements are needed to address variation in species sensitivity to climate change impacts and for ecosystem process model development, parameterization and testing. We present a new database of plant trait, forest carbon cycling, and soil property measurements derived from multiple TERRA-PNW projects in the Pacific Northwest US, spanning 2000-2014. The database includes measurements from over 200 forest plots across Oregon and northern California, where the data were explicitly collected for scaling and modeling regional terrestrial carbon processes with models such as Biome-BGC and the Community Land Model. Some of the data are co-located at AmeriFlux sites in the region. The database currently contains leaf trait measurements (specific leaf area, leaf longevity, leaf carbon and nitrogen) from over 1,200 branch samples and 30 species, as well as plot-level biomass and productivity components, and soil carbon and nitrogen. Standardized protocols were used across projects, as summarized in an FAO protocols document. The database continues to expand and will include agricultural crops. The database will be hosted by the Oak Ridge National Laboratory (ORLN) Distributed Active Archive Center (DAAC). We hope that other regional databases will become publicly available to help enable Earth System Modeling to simulate species-level sensitivity to climate at regional to global scales.

  6. Multivendor Spectral-Domain Optical Coherence Tomography Dataset, Observer Annotation Performance Evaluation, and Standardized Evaluation Framework for Intraretinal Cystoid Fluid Segmentation

    Directory of Open Access Journals (Sweden)

    Jing Wu

    2016-01-01

    Full Text Available Development of image analysis and machine learning methods for segmentation of clinically significant pathology in retinal spectral-domain optical coherence tomography (SD-OCT, used in disease detection and prediction, is limited due to the availability of expertly annotated reference data. Retinal segmentation methods use datasets that either are not publicly available, come from only one device, or use different evaluation methodologies making them difficult to compare. Thus we present and evaluate a multiple expert annotated reference dataset for the problem of intraretinal cystoid fluid (IRF segmentation, a key indicator in exudative macular disease. In addition, a standardized framework for segmentation accuracy evaluation, applicable to other pathological structures, is presented. Integral to this work is the dataset used which must be fit for purpose for IRF segmentation algorithm training and testing. We describe here a multivendor dataset comprised of 30 scans. Each OCT scan for system training has been annotated by multiple graders using a proprietary system. Evaluation of the intergrader annotations shows a good correlation, thus making the reproducibly annotated scans suitable for the training and validation of image processing and machine learning based segmentation methods. The dataset will be made publicly available in the form of a segmentation Grand Challenge.

  7. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  8. Accuracy assessment of seven global land cover datasets over China

    Science.gov (United States)

    Yang, Yongke; Xiao, Pengfeng; Feng, Xuezhi; Li, Haixing

    2017-03-01

    Land cover (LC) is the vital foundation to Earth science. Up to now, several global LC datasets have arisen with efforts of many scientific communities. To provide guidelines for data usage over China, nine LC maps from seven global LC datasets (IGBP DISCover, UMD, GLC, MCD12Q1, GLCNMO, CCI-LC, and GlobeLand30) were evaluated in this study. First, we compared their similarities and discrepancies in both area and spatial patterns, and analysed their inherent relations to data sources and classification schemes and methods. Next, five sets of validation sample units (VSUs) were collected to calculate their accuracy quantitatively. Further, we built a spatial analysis model and depicted their spatial variation in accuracy based on the five sets of VSUs. The results show that, there are evident discrepancies among these LC maps in both area and spatial patterns. For LC maps produced by different institutes, GLC 2000 and CCI-LC 2000 have the highest overall spatial agreement (53.8%). For LC maps produced by same institutes, overall spatial agreement of CCI-LC 2000 and 2010, and MCD12Q1 2001 and 2010 reach up to 99.8% and 73.2%, respectively; while more efforts are still needed if we hope to use these LC maps as time series data for model inputting, since both CCI-LC and MCD12Q1 fail to represent the rapid changing trend of several key LC classes in the early 21st century, in particular urban and built-up, snow and ice, water bodies, and permanent wetlands. With the highest spatial resolution, the overall accuracy of GlobeLand30 2010 is 82.39%. For the other six LC datasets with coarse resolution, CCI-LC 2010/2000 has the highest overall accuracy, and following are MCD12Q1 2010/2001, GLC 2000, GLCNMO 2008, IGBP DISCover, and UMD in turn. Beside that all maps exhibit high accuracy in homogeneous regions; local accuracies in other regions are quite different, particularly in Farming-Pastoral Zone of North China, mountains in Northeast China, and Southeast Hills. Special

  9. Job Satisfaction and Public Service Motivation

    OpenAIRE

    Kaiser, Lutz C.

    2014-01-01

    Based on a unique case study-dataset, the paper analyses job satisfaction and public service motivation in Germany. A special issue of the investigation is related to the evaluation of performance pay scales that were introduced some years ago to German public employees within the frame of fostering New Public Management. The findings display a general dominance of intrinsic motivators. Additionally, this kind of motivators plays an important role with regard to building up and keeping job sa...

  10. Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...

  11. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  12. Dataset for Probabilistic estimation of residential air exchange rates for population-based exposure modeling

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset provides the city-specific air exchange rate measurements, modeled, literature-based as well as housing characteristics. This dataset is associated with...

  13. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  14. Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Human Built-up And Settlement Extent (HBASE) Dataset from Landsat is a global map of HBASE derived from the Global Land Survey (GLS) Landsat dataset for...

  15. An Automatic Matcher and Linker for Transportation Datasets

    Directory of Open Access Journals (Sweden)

    Ali Masri

    2017-01-01

    Full Text Available Multimodality requires the integration of heterogeneous transportation data to construct a broad view of the transportation network. Many new transportation services are emerging while being isolated from previously-existing networks. This leads them to publish their data sources to the web, according to linked data principles, in order to gain visibility. Our interest is to use these data to construct an extended transportation network that links these new services to existing ones. The main problems we tackle in this article fall in the categories of automatic schema matching and data interlinking. We propose an approach that uses web services as mediators to help in automatically detecting geospatial properties and mapping them between two different schemas. On the other hand, we propose a new interlinking approach that enables the user to define rich semantic links between datasets in a flexible and customizable way.

  16. [Parallel virtual reality visualization of extreme large medical datasets].

    Science.gov (United States)

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  17. The wildland-urban interface raster dataset of Catalonia.

    Science.gov (United States)

    Alcasena, Fermín J; Evers, Cody R; Vega-Garcia, Cristina

    2018-04-01

    We provide the wildland urban interface (WUI) map of the autonomous community of Catalonia (Northeastern Spain). The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region.

  18. xarray: N-D labeled Arrays and Datasets in Python

    Directory of Open Access Journals (Sweden)

    Stephan Hoyer

    2017-04-01

    Full Text Available xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Our approach combines an application programing interface (API inspired by pandas with the Common Data Model for self-described scientific data. Key features of the xarray package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, out-of-core computation on datasets that don’t fit into memory, a wide range of serialization and input/output (I/O options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. xarray, as a data model and analytics toolkit, has been widely adopted in the geoscience community but is also used more broadly for multi-dimensional data analysis in physics, machine learning and finance.

  19. The wildland-urban interface raster dataset of Catalonia

    Directory of Open Access Journals (Sweden)

    Fermín J. Alcasena

    2018-04-01

    Full Text Available We provide the wildland urban interface (WUI map of the autonomous community of Catalonia (Northeastern Spain. The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region. Keywords: Wildland-urban interface, Wildfire risk, Urban planning, Human communities, Catalonia

  20. Reconstructing flaw image using dataset of full matrix capture technique

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Tae Hun; Kim, Yong Sik; Lee, Jeong Seok [KHNP Central Research Institute, Daejeon (Korea, Republic of)

    2017-02-15

    A conventional phased array ultrasonic system offers the ability to steer an ultrasonic beam by applying independent time delays of individual elements in the array and produce an ultrasonic image. In contrast, full matrix capture (FMC) is a data acquisition process that collects a complete matrix of A-scans from every possible independent transmit-receive combination in a phased array transducer and makes it possible to reconstruct various images that cannot be produced by conventional phased array with the post processing as well as images equivalent to a conventional phased array image. In this paper, a basic algorithm based on the LLL mode total focusing method (TFM) that can image crack type flaws is described. And this technique was applied to reconstruct flaw images from the FMC dataset obtained from the experiments and ultrasonic simulation.