WorldWideScience

Sample records for publication georeferencing datasets

  1. Georeferencing UAS Derivatives Through Point Cloud Registration with Archived Lidar Datasets

    Science.gov (United States)

    Magtalas, M. S. L. Y.; Aves, J. C. L.; Blanco, A. C.

    2016-10-01

    Georeferencing gathered images is a common step before performing spatial analysis and other processes on acquired datasets using unmanned aerial systems (UAS). Methods of applying spatial information to aerial images or their derivatives is through onboard GPS (Global Positioning Systems) geotagging, or through tying of models through GCPs (Ground Control Points) acquired in the field. Currently, UAS (Unmanned Aerial System) derivatives are limited to meter-levels of accuracy when their generation is unaided with points of known position on the ground. The use of ground control points established using survey-grade GPS or GNSS receivers can greatly reduce model errors to centimeter levels. However, this comes with additional costs not only with instrument acquisition and survey operations, but also in actual time spent in the field. This study uses a workflow for cloud-based post-processing of UAS data in combination with already existing LiDAR data. The georeferencing of the UAV point cloud is executed using the Iterative Closest Point algorithm (ICP). It is applied through the open-source CloudCompare software (Girardeau-Montaut, 2006) on a `skeleton point cloud'. This skeleton point cloud consists of manually extracted features consistent on both LiDAR and UAV data. For this cloud, roads and buildings with minimal deviations given their differing dates of acquisition are considered consistent. Transformation parameters are computed for the skeleton cloud which could then be applied to the whole UAS dataset. In addition, a separate cloud consisting of non-vegetation features automatically derived using CANUPO classification algorithm (Brodu and Lague, 2012) was used to generate a separate set of parameters. Ground survey is done to validate the transformed cloud. An RMSE value of around 16 centimeters was found when comparing validation data to the models georeferenced using the CANUPO cloud and the manual skeleton cloud. Cloud-to-cloud distance computations of

  2. Automatic Georeferencing of Astronaut Auroral Photography: Providing a New Dataset for Space Physics

    Science.gov (United States)

    Riechert, Maik; Walsh, Andrew P.; Taylor, Matt

    2014-05-01

    Astronauts aboard the International Space Station (ISS) have taken tens of thousands of photographs showing the aurora in high temporal and spatial resolution. The use of these images in research though is limited as they often miss accurate pointing and scale information. In this work we develop techniques and software libraries to automatically georeference such images, and provide a time and location-searchable database and website of those images. Aurora photographs very often include a visible starfield due to the necessarily long camera exposure times. We extend on the proof-of-concept of Walsh et al. (2012) who used starfield recognition software, Astrometry.net, to reconstruct the pointing and scale information. Previously a manual pre-processing step, the starfield can now in most cases be separated from earth and spacecraft structures successfully using image recognition. Once the pointing and scale of an image are known, latitudes and longitudes can be calculated for each pixel corner for an assumed auroral emission height. As part of this work, an open-source Python library is developed which automates the georeferencing process and aids in visualization tasks. The library facilitates the resampling of the resulting data from an irregular to a regular coordinate grid in a given pixel per degree density, it supports the export of data in CDF and NetCDF formats, and it generates polygons for drawing graphs and stereographic maps. In addition, the THEMIS all-sky imager web archive has been included as a first transparently accessible imaging source which in this case is useful when drawing maps of ISS passes over North America. The database and website are in development and will use the Python library as their base. Through this work, georeferenced auroral ISS photography is made available as a continously extended and easily accessible dataset. This provides potential not only for new studies on the aurora australis, as there are few all-sky imagers in

  3. Tools for address georeferencing - limitations and opportunities every public health professional should be aware of.

    Directory of Open Access Journals (Sweden)

    Ana Isabel Ribeiro

    Full Text Available Various address georeferencing (AG tools are currently available. But little is known about the quality of each tool. Using data from the EPIPorto cohort we compared the most commonly used AG tools in terms of positional error (PE and subjects' misclassification according to census tract socioeconomic status (SES, a widely used variable in epidemiologic studies. Participants of the EPIPorto cohort (n = 2427 were georeferenced using Geographical Information Systems (GIS and Google Earth (GE. One hundred were randomly selected and georeferenced using three additional tools: 1 cadastral maps (gold-standard; 2 Global Positioning Systems (GPS and 3 Google Earth, single and in a batch. Mean PE and the proportion of misclassified individuals were compared. Google Earth showed lower PE than GIS, but 10% of the addresses were imprecisely positioned. Thirty-eight, 27, 16 and 14% of the participants were located in the wrong census tract by GIS, GPS, GE (batch and GE (single, respectively (p<0.001. Misclassification according to SES was less frequent but still non-negligible -14.4, 8.1, 4.2 and 2% (p<0.001. The quality of georeferencing differed substantially between AG tools. GE seems to be the best tool, but only if prudently used. Epidemiologic studies using spatial data should start including information on the quality and accuracy of their georeferencing tools and spatial datasets.

  4. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  5. Publicly Releasing a Large Simulation Dataset with NDS Labs

    Science.gov (United States)

    Goldbaum, Nathan

    2016-03-01

    Optimally, all publicly funded research should be accompanied by the tools, code, and data necessary to fully reproduce the analysis performed in journal articles describing the research. This ideal can be difficult to attain, particularly when dealing with large (>10 TB) simulation datasets. In this lightning talk, we describe the process of publicly releasing a large simulation dataset to accompany the submission of a journal article. The simulation was performed using Enzo, an open source, community-developed N-body/hydrodynamics code and was analyzed using a wide range of community- developed tools in the scientific Python ecosystem. Although the simulation was performed and analyzed using an ecosystem of sustainably developed tools, we enable sustainable science using our data by making it publicly available. Combining the data release with the NDS Labs infrastructure allows a substantial amount of added value, including web-based access to analysis and visualization using the yt analysis package through an IPython notebook interface. In addition, we are able to accompany the paper submission to the arXiv preprint server with links to the raw simulation data as well as interactive real-time data visualizations that readers can explore on their own or share with colleagues during journal club discussions. It is our hope that the value added by these services will substantially increase the impact and readership of the paper.

  6. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  7. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  8. DIRECT GEOREFERENCING OF UAVS

    Directory of Open Access Journals (Sweden)

    M. Bláha

    2012-09-01

    Full Text Available UAV systems have become an attractive data acquisition platform in emerging applications. As measuring instrument they extend the lineup of possible surveying methods in the field of geomatics. However, most of UAVs are equipped with low-cost navigation sensors such as GPS or INS, allowing a positioning accuracy of 3 to 5 m. As a result the acquired position- and orientation data fea- tures a low accuracy which implicates that it cannot be used in applications that require high precision data on cm-level (e.g. direct georeferencing. In this paper we will analyze the potential of differential post-processing of GPS data from UAV in order to im- prove the positioning accuracy for applications basing on direct georeferencing. Subsequently, the obtained results are compared and verified with a track of the octocopter carried out with a total station simultaneously to the GPS data acquisition. The results show that the differential post-processing essentially improved the accuracy of the Falcon position data. Thereby the average offset be- tween the data sets (GPS data, track and the corresponding standard deviation is 0.82 m and 0.45 m, respectively. However, under ideal conditions it is even possible to improve this positioning accuracy to the cm-range. Furthermore, there are still several sources of error such as the offset between the GPS antenna of the Falcon 8 and the prism which is used for the track. Considering this fact there is further room for improvement regarding the here discussed positioning method.

  9. Effects of georeferencing effort on mapping monkeypox case distributions and transmission risk

    Directory of Open Access Journals (Sweden)

    Lash R

    2012-06-01

    Full Text Available Abstract Background Maps of disease occurrences and GIS-based models of disease transmission risk are increasingly common, and both rely on georeferenced diseases data. Automated methods for georeferencing disease data have been widely studied for developed countries with rich sources of geographic referenced data. However, the transferability of these methods to countries without comparable geographic reference data, particularly when working with historical disease data, has not been as widely studied. Historically, precise geographic information about where individual cases occur has been collected and stored verbally, identifying specific locations using place names. Georeferencing historic data is challenging however, because it is difficult to find appropriate geographic reference data to match the place names to. Here, we assess the degree of care and research invested in converting textual descriptions of disease occurrence locations to numerical grid coordinates (latitude and longitude. Specifically, we develop three datasets from the same, original monkeypox disease occurrence data, with varying levels of care and effort: the first based on an automated web-service, the second improving on the first by reference to additional maps and digital gazetteers, and the third improving still more based on extensive consultation of legacy surveillance records that provided considerable additional information about each case. To illustrate the implications of these seemingly subtle improvements in data quality, we develop ecological niche models and predictive maps of monkeypox transmission risk based on each of the three occurrence data sets. Results We found macrogeographic variations in ecological niche models depending on the type of georeferencing method used. Less-careful georeferencing identified much smaller areas as having potential for monkeypox transmission in the Sahel region, as well as around the rim of the Congo Basin. These

  10. A longitudinal dataset of five years of public activity in the Scratch online community

    Science.gov (United States)

    Hill, Benjamin Mako; Monroy-Hernández, Andrés

    2017-01-01

    Scratch is a programming environment and an online community where young people can create, share, learn, and communicate. In collaboration with the Scratch Team at MIT, we created a longitudinal dataset of public activity in the Scratch online community during its first five years (2007–2012). The dataset comprises 32 tables with information on more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and more. To help researchers understand this dataset, and to establish the validity of the data, we also include the source code of every version of the software that operated the website, as well as the software used to generate this dataset. We believe this is the largest and most comprehensive downloadable dataset of youth programming artifacts and communication. PMID:28140385

  11. SEED: Public Energy and Environment Dataset for Optimizing HVAC Operation in Subway Stations

    OpenAIRE

    Wang, Yongcai; Feng, Haoran; Qi, Xiao

    2013-01-01

    For sustainability and energy saving, the problem to optimize the control of heating, ventilating, and air-conditioning (HVAC) systems has attracted great attentions, but analyzing the signatures of thermal environments and HVAC systems and the evaluation of the optimization policies has encountered inefficiency and inconvenient problems due to the lack of public dataset. In this paper, we present the Subway station Energy and Environment Dataset (SEED), which was collected from a line of Bei...

  12. Dataset of Building and Environment Publication in 2016, A reference method for measuring emissions of SVOCs in small chambers

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains DEHP air concentrations in the emission test chamber. This dataset...

  13. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    Energy Technology Data Exchange (ETDEWEB)

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.

  14. Automatic Diabetic Macular Edema Detection in Fundus Images Using Publicly Available Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing. Our algorithm is robust to segmentation uncertainties, does not need ground truth at lesion level, and is very fast, generating a diagnosis on an average of 4.4 seconds per image on an 2.6 GHz platform with an unoptimised Matlab implementation.

  15. A dataset for examining trends in publication of new Australian insects

    Directory of Open Access Journals (Sweden)

    Robert Mesibov

    2014-07-01

    Full Text Available Australian Faunal Directory data were used to create a new, publicly available dataset, nai50, which lists 18318 species and subspecies names for Australian insects described in the period 1961–2010, together with associated publishing data. The number of taxonomic publications introducing the new names varied little around a long-term average of 70 per year, with ca 420 new names published per year during the 30-year period 1981–2010. Within this stable pattern there were steady increases in multi-authored and 'Smith in Jones and Smith' names, and a decline in publication of names in entomology journals and books. For taxonomic works published in Australia, a publications peak around 1990 reflected increases in museum, scientific society and government agency publishing, but a subsequent decline is largely explained by a steep drop in the number of papers on insect taxonomy published by Australia's national science agency, CSIRO.

  16. Toward a complete dataset of drug-drug interaction information from publicly available sources.

    Science.gov (United States)

    Ayvaz, Serkan; Horn, John; Hassanzadeh, Oktie; Zhu, Qian; Stan, Johann; Tatonetti, Nicholas P; Vilar, Santiago; Brochhausen, Mathias; Samwald, Matthias; Rastegar-Mojarad, Majid; Dumontier, Michel; Boyce, Richard D

    2015-06-01

    Although potential drug-drug interactions (PDDIs) are a significant source of preventable drug-related harm, there is currently no single complete source of PDDI information. In the current study, all publically available sources of PDDI information that could be identified using a comprehensive and broad search were combined into a single dataset. The combined dataset merged fourteen different sources including 5 clinically-oriented information sources, 4 Natural Language Processing (NLP) Corpora, and 5 Bioinformatics/Pharmacovigilance information sources. As a comprehensive PDDI source, the merged dataset might benefit the pharmacovigilance text mining community by making it possible to compare the representativeness of NLP corpora for PDDI text extraction tasks, and specifying elements that can be useful for future PDDI extraction purposes. An analysis of the overlap between and across the data sources showed that there was little overlap. Even comprehensive PDDI lists such as DrugBank, KEGG, and the NDF-RT had less than 50% overlap with each other. Moreover, all of the comprehensive lists had incomplete coverage of two data sources that focus on PDDIs of interest in most clinical settings. Based on this information, we think that systems that provide access to the comprehensive lists, such as APIs into RxNorm, should be careful to inform users that the lists may be incomplete with respect to PDDIs that drug experts suggest clinicians be aware of. In spite of the low degree of overlap, several dozen cases were identified where PDDI information provided in drug product labeling might be augmented by the merged dataset. Moreover, the combined dataset was also shown to improve the performance of an existing PDDI NLP pipeline and a recently published PDDI pharmacovigilance protocol. Future work will focus on improvement of the methods for mapping between PDDI information sources, identifying methods to improve the use of the merged dataset in PDDI NLP algorithms

  17. A database of georeferenced nutrient chemistry data for mountain lakes of the Western United States

    Science.gov (United States)

    Williams, Jason; Labou, Stephanie G.

    2017-05-01

    Human activities have increased atmospheric nitrogen and phosphorus deposition rates relative to pre-industrial background. In the Western U.S., anthropogenic nutrient deposition has increased nutrient concentrations and stimulated algal growth in at least some remote mountain lakes. The Georeferenced Lake Nutrient Chemistry (GLNC) Database was constructed to create a spatially-extensive lake chemistry database needed to assess atmospheric nutrient deposition effects on Western U.S. mountain lakes. The database includes nitrogen and phosphorus water chemistry data spanning 1964-2015, with 148,336 chemistry results from 51,048 samples collected across 3,602 lakes in the Western U.S. Data were obtained from public databases, government agencies, scientific literature, and researchers, and were formatted into a consistent table structure. All data are georeferenced to a modified version of the National Hydrography Dataset Plus version 2. The database is transparent and reproducible; R code and input files used to format data are provided in an appendix. The database will likely be useful to those assessing spatial patterns of lake nutrient chemistry associated with atmospheric deposition or other environmental stressors.

  18. Destination Prediction by Identifying and Clustering Prominent Features from Public Trajectory Datasets

    Directory of Open Access Journals (Sweden)

    Li Yang

    2015-07-01

    Full Text Available Destination prediction is an essential task in many location-based services (LBS such as providing targeted advertisements and route recommendations. Most existing solutions were generative methods that model the problem as a series of probabilistic events that are then used to compute the destination probability using Bayes’ rule. In contrast, we propose a discriminative method that chooses the most prominent features found in a public trajectory dataset, clusters the trajectories into groups based on these features, and performs destination prediction queries accordingly. Our method is more concise and simple than existing methods while achieving better runtime efficiency and prediction accuracy as verified by experimental studies.

  19. Aerial Photography and Imagery, Ortho-Corrected, Digital Aerial Photography 1970. Limited georeferencing. See metadata for additional information., Published in 1970, 1:4800 (1in=400ft) scale, Washington County Government.

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — Aerial Photography and Imagery, Ortho-Corrected dataset current as of 1970. Digital Aerial Photography 1970. Limited georeferencing. See metadata for additional...

  20. Dataset of Atmospheric Environment Publication in 2016, Characterization of organophosphorus flame retardants’ sorption on building materials and consumer products

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains OPFR sorption concentrations on building materials and consumer...

  1. Datasets will not be made accessible to the public due to the fact that they include household level data with PII.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Datasets will not be made accessible to the public due to the fact that they include household level data with PII. This dataset is not publicly accessible because:...

  2. Public Health Applications of Remotely-sensed Environmental Datasets for the Conterminous United States

    Science.gov (United States)

    Al-Hamdan, Mohammad; Crosson, William; Economou, Sigrid; Estes, Marice Jr; Estes, Sue; Hemmings, Sarah; Kent, Shia; Puckett, Mark; Quattrochi, Dale; Wade, Gina

    2013-01-01

    NASA Marshall Space Flight Center is collaborating with the University of Alabama at Birmingham (UAB) School of Public Health and the Centers for Disease Control and Prevention (CDC) National Center for Public Health Informatics to address issues of environmental health and enhance public health decision-making using NASA remotely-sensed data and products. The objectives of this study are to develop high-quality spatial data sets of environmental variables, link these with public health data from a national cohort study, and deliver the linked data sets and associated analyses to local, state and federal end-user groups. Three daily environmental data sets were developed for the conterminous U.S. on different spatial resolutions for the period 2003-2008: (1) spatial surfaces of estimated fine particulate matter (PM2.5) exposures on a 10-km grid using the US Environmental Protection Agency (EPA) ground observations and NASA's MODerate-resolution Imaging Spectroradiometer (MODIS) data; (2) a 1-km grid of Land Surface Temperature (LST) using MODIS data; and (3) a 12-km grid of daily Incoming Solar Radiation (Insolation) and heat-related products using the North American Land Data Assimilation System (NLDAS) forcing data. These environmental data sets were linked with public health data from the UAB REasons for Geographic And Racial Differences in Stroke (REGARDS) national cohort study to determine whether exposures to these environmental risk factors are related to cognitive decline, stroke and other health outcomes. These environmental datasets and the results of the public health linkage analyses will be disseminated to end-users for decision-making through the CDC Wide-ranging Online Data for Epidemiologic Research (WONDER) system and through peer-reviewed publications respectively. The linkage of these data with the CDC WONDER system substantially expands public access to NASA data, making their use by a wide range of decision makers feasible. By successful

  3. Methodology and software for georeferencing vineyards

    Directory of Open Access Journals (Sweden)

    Fialho Flávio Bello

    2016-01-01

    Full Text Available An agricultural registry is a collection of information about production area and yield of agricultural properties in a region or designated area. It allows to measure agricultural production and its spatial distribution, characterize rural structure, facilitate inspection and development of agricultural policies, optimize distribution of agricultural credit, estimate crop yield and generate research data. A key component for a quality registry is accurate measurement of areas and their geographical position, through georeferencing, to allow integration with other spatial information. The Vineyard Registry of Rio Grande do Sul is one of the most complete agricultural registries in Brazil. It has been carried out in all grape producing properties in the state since 1995, and its georeferencing began in 2005, with the objective of accurately map vineyards. Embrapa has developed a methodology to accelerate georeferencing, by simplifying the field mapping process. One of the central points of this methodology was the development of a software called MapaGPS to organize and classify measured points in the field. Recently, this software has been improved, with the incorporation of features, such as transformation between coordinate systems, conversion between files of different formats, and more control over generated charts. The georeferencing experience of the Vineyard Registry of Rio Grande do Sul may be used throughout Brazil and other countries. The software is available under a free license, and there are no restrictions to adopting the methodology. This document aims to disclose details of this methodology and how it may be used to facilitate zoning projects worldwide.

  4. Automatic Detection of Online Recruitment Frauds: Characteristics, Methods, and a Public Dataset

    Directory of Open Access Journals (Sweden)

    Sokratis Vidros

    2017-03-01

    Full Text Available The critical process of hiring has relatively recently been ported to the cloud. Specifically, the automated systems responsible for completing the recruitment of new employees in an online fashion, aim to make the hiring process more immediate, accurate and cost-efficient. However, the online exposure of such traditional business procedures has introduced new points of failure that may lead to privacy loss for applicants and harm the reputation of organizations. So far, the most common case of Online Recruitment Frauds (ORF, is employment scam. Unlike relevant online fraud problems, the tackling of ORF has not yet received the proper attention, remaining largely unexplored until now. Responding to this need, the work at hand defines and describes the characteristics of this severe and timely novel cyber security research topic. At the same time, it contributes and evaluates the first to our knowledge publicly available dataset of 17,880 annotated job ads, retrieved from the use of a real-life system.

  5. Georeferencing in QGIS 2.0

    Directory of Open Access Journals (Sweden)

    Jim Clifford

    2013-12-01

    Full Text Available In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer. Georeferencing is required for anyone who wants to accurately digitize data found on a paper map, and since historians work mostly in the realm of paper, georeferencing is one of our most commonly used tools. The technique uses a series of control points to give a two-dimensional object like a paper map the real world coordinates it needs to align with the three-dimensional features of the earth in GIS software (in Intro to Google Maps and Google Earth we saw an ‘overlay’ which is a Google Earth shortcut version of georeferencing. Georeferencing a historical map requires a knowledge of both the geography and the history of the place you are studying to ensure accuracy. The built and natural landscapes change over time, and it is important to confirm that the location of your control points — whether they be houses, intersections, or even towns — have remained constant. Entering control points in a GIS is easy, but behind the scenes, georeferencing uses complex transformation and compression processes. These are used to correct the distortions and inaccuracies found in many historical maps and stretch the maps so that they fit geographic coordinates. In cartography this is known as rubber-sheeting because it treats the map as if it were made of rubber and the control points as if they were tacks ‘pinning’ the historical document to a three dimensional surface like the globe. To offer some examples of georeferenced historical maps, we prepared some National Topographic Series maps hosted on the University of Toronto Map Library website courtesy of Marcel Fortin, and we overlaid them on a Google web map. Viewers can adjust the transparency with the slider bar on the top right, view the historical map as an overlay on terrain or satellite images, or click ‘Earth’ to switch into Google Earth mode and see 3D

  6. OLS Client and OLS Dialog: Open Source Tools to Annotate Public Omics Datasets.

    Science.gov (United States)

    Perez-Riverol, Yasset; Ternent, Tobias; Koch, Maximilian; Barsnes, Harald; Vrousgou, Olga; Jupp, Simon; Vizcaíno, Juan Antonio

    2017-10-01

    The availability of user-friendly software to annotate biological datasets and experimental details is becoming essential in data management practices, both in local storage systems and in public databases. The Ontology Lookup Service (OLS, http://www.ebi.ac.uk/ols) is a popular centralized service to query, browse and navigate biomedical ontologies and controlled vocabularies. Recently, the OLS framework has been completely redeveloped (version 3.0), including enhancements in the data model, like the added support for Web Ontology Language based ontologies, among many other improvements. However, the new OLS is not backwards compatible and new software tools are needed to enable access to this widely used framework now that the previous version is no longer available. We here present the OLS Client as a free, open-source Java library to retrieve information from the new version of the OLS. It enables rapid tool creation by providing a robust, pluggable programming interface and common data model to programmatically access the OLS. The library has already been integrated and is routinely used by several bioinformatics resources and related data annotation tools. Secondly, we also introduce an updated version of the OLS Dialog (version 2.0), a Java graphical user interface that can be easily plugged into Java desktop applications to access the OLS. The software and related documentation are freely available at https://github.com/PRIDE-Utilities/ols-client and https://github.com/PRIDE-Toolsuite/ols-dialog. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. Public participation in Full dome digital visualisations of large datasets in a planetarium sky theater : An experiment in progress

    Science.gov (United States)

    Rathnasree, Nandivada

    2015-08-01

    A full dome digital planetarium system with a userfriendly content creation possibility can be used very effectively for communicating points of interest in large Astronomical datsets, to public and student visitors to a planetarium. Periodic public lectures by Astronomers, "Under the Stars", which use full dome visualisations of data sets, foster a regular interest group which becomes associated with the planetarium, ensuring a regular inflow of students (and a smaller number of non student visitors) willing to contribute to the entries in the full dome datasets.Regardless of whether or not completion is achieved for any of the data sets, the very process of this project is extremely rewarding in terms of generating a quickening of interest, for the casual visitor to a planetarium, in aspects related to intricacies of datasets. The casual visitor who gets interested, may just make one entry in the dataset, following instructions provided in the planetarium public interaction. For students who show sustained interest in this data entry project, it becomes a really fruitful learning process.Combining this purely data entry process with some interactions and discussions with Astronomers on the excitements in the areas related to specific data sets, allows a more organised enrichment possibility for student participants, nudging them towards exploring related possibilities of some "Hands on Astronomy" analysis oriented projects.Datasets like Gamma Ray bursts, variable stars, TGSS, and so on, are being entered within the planetarium production software at the New Delhi planetarium, by public and student visitors to the planetarium, as weekend activities.The Digital Universe data sets pre-existing in the planetarium system, allow preliminary discussions for weekend crowds related to Astronomical data sets, introduction of ever increasing multiwavelength data sets and onwwards to facilitating public participation in data entry within the planetarium software, for some

  8. User Defined Geo-referenced Information

    DEFF Research Database (Denmark)

    Konstantas, Dimitri; Villalba, Alfredo; di Marzo Serugendo, Giovanna

    2009-01-01

    The evolution of technology allow us today to extent the “location based services” to fine grained services and allow any mobile user to create location based information and make it available to anyone interested. This evolution open the way for new services and applications for the mobile users....... In this paper we present two novel mobile and wireless collaborative services and concepts, the Hovering Information, a mobile, geo-referenced content information management system, and the QoS Information service, providing user observed end-to-end infrastructure geo-related QoS information....

  9. User Defined Geo-referenced Information

    DEFF Research Database (Denmark)

    Konstantas, Dimitri; Villalba, Alfredo; di Marzo Serugendo, Giovanna

    2009-01-01

    The evolution of technology allow us today to extent the “location based services” to fine grained services and allow any mobile user to create location based information and make it available to anyone interested. This evolution open the way for new services and applications for the mobile users....... In this paper we present two novel mobile and wireless collaborative services and concepts, the Hovering Information, a mobile, geo-referenced content information management system, and the QoS Information service, providing user observed end-to-end infrastructure geo-related QoS information....

  10. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  11. ε-inclusion: privacy preserving re-publication of dynamic datasets

    Institute of Scientific and Technical Information of China (English)

    Qiong WEI; Yan-sheng LU; Lei ZOU

    2008-01-01

    This paper presents a novel privacy principle,ε-inclusion,for re-publishing sensitive dynamic datasets,ε-inclusion releases all the quasi-identifier values directly and uses permutation-based method and substitution to anonymize the microdata.Combined with generalization-based methods,ε-inclusion protects privacy and captures a large amount of correlation in the microdata.We develop an effective algorithm for computing anonymized tables that obey the ε-inclusion privacy requirement.Extensive experiments confirm that our solution allows significantly more effective data analysis than generalization-based methods.

  12. USGS Small-scale Dataset - Public Land Survey System of the United States 201011 Shapefile

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This data set portrays the Public Land Surveys of the United States, including areas of private survey, Donation Land Claims, and Land Grants and Civil Colonies....

  13. Georeferencing on Synthetic Aperture Radar Imagery

    Science.gov (United States)

    Esmaeilzade, M.; Amini, J.; Zakeri, S.

    2015-12-01

    Due to the SAR1 geometry imaging, SAR images include geometric distortions that would be erroneous image information and the images should be geometrically calibrated. As the radar systems are side looking, geometric distortion such as shadow, foreshortening and layover are occurred. To compensate these geometric distortions, information about sensor position, imaging geometry and target altitude from ellipsoid should be available. In this paper, a method for geometric calibration of SAR images is proposed. The method uses Range-Doppler equations. In this method, for the image georeferencing, the DEM2 of SRTM with 30m pixel size is used and also exact ephemeris data of the sensor is required. In the algorithm proposed in this paper, first digital elevation model transmit to range and azimuth direction. By applying this process, errors caused by topography such as foreshortening and layover are removed in the transferred DEM. Then, the position of the corners on original image is found base on the transferred DEM. Next, original image registered to transfer DEM by 8 parameters projective transformation. The output is the georeferenced image that its geometric distortions are removed. The advantage of the method described in this article is that it does not require any control point as well as the need to attitude and rotational parameters of the sensor. Since the ground range resolution of used images are about 30m, the geocoded images using the method described in this paper have an accuracy about 20m (subpixel) in planimetry and about 30m in altimetry. 1 Synthetic Aperture Radar 2 Digital Elevation Model

  14. Dataset of Atmospheric Environment Publication in 2016, Source emission and model evaluation of formaldehyde from composite and solid wood furniture in a full-scale chamber

    Data.gov (United States)

    U.S. Environmental Protection Agency — The data presented in this data file is a product of a journal publication. The dataset contains formaldehyde air concentrations in the emission test chamber and...

  15. Clinic-Genomic Association Mining for Colorectal Cancer Using Publicly Available Datasets

    Directory of Open Access Journals (Sweden)

    Fang Liu

    2014-01-01

    Full Text Available In recent years, a growing number of researchers began to focus on how to establish associations between clinical and genomic data. However, up to now, there is lack of research mining clinic-genomic associations by comprehensively analysing available gene expression data for a single disease. Colorectal cancer is one of the malignant tumours. A number of genetic syndromes have been proven to be associated with colorectal cancer. This paper presents our research on mining clinic-genomic associations for colorectal cancer under biomedical big data environment. The proposed method is engineered with multiple technologies, including extracting clinical concepts using the unified medical language system (UMLS, extracting genes through the literature mining, and mining clinic-genomic associations through statistical analysis. We applied this method to datasets extracted from both gene expression omnibus (GEO and genetic association database (GAD. A total of 23517 clinic-genomic associations between 139 clinical concepts and 7914 genes were obtained, of which 3474 associations between 31 clinical concepts and 1689 genes were identified as highly reliable ones. Evaluation and interpretation were performed using UMLS, KEGG, and Gephi, and potential new discoveries were explored. The proposed method is effective in mining valuable knowledge from available biomedical big data and achieves a good performance in bridging clinical data with genomic data for colorectal cancer.

  16. Designing a Secure Storage Repository for Sharing Scientific Datasets using Public Clouds

    Energy Technology Data Exchange (ETDEWEB)

    Kumbhare, Alok; Simmhan, Yogesh; Prasanna, Viktor

    2011-11-14

    As Cloud platforms gain increasing traction among scientific and business communities for outsourcing storage, computing and content delivery, there is also growing concern about the associated loss of control over private data hosted in the Cloud. In this paper, we present an architecture for a secure data repository service designed on top of a public Cloud infrastructure to support multi-disciplinary scientific communities dealing with personal and human subject data, motivated by the smart power grid domain. Our repository model allows users to securely store and share their data in the Cloud without revealing the plain text to unauthorized users, the Cloud storage provider or the repository itself. The system masks file names, user permissions and access patterns while providing auditing capabilities with provable data updates.

  17. Improved Automated Detection of Diabetic Retinopathy on a Publicly Available Dataset Through Integration of Deep Learning.

    Science.gov (United States)

    Abràmoff, Michael David; Lou, Yiyue; Erginay, Ali; Clarida, Warren; Amelon, Ryan; Folk, James C; Niemeijer, Meindert

    2016-10-01

    To compare performance of a deep-learning enhanced algorithm for automated detection of diabetic retinopathy (DR), to the previously published performance of that algorithm, the Iowa Detection Program (IDP)-without deep learning components-on the same publicly available set of fundus images and previously reported consensus reference standard set, by three US Board certified retinal specialists. We used the previously reported consensus reference standard of referable DR (rDR), defined as International Clinical Classification of Diabetic Retinopathy moderate, severe nonproliferative (NPDR), proliferative DR, and/or macular edema (ME). Neither Messidor-2 images, nor the three retinal specialists setting the Messidor-2 reference standard were used for training IDx-DR version X2.1. Sensitivity, specificity, negative predictive value, area under the curve (AUC), and their confidence intervals (CIs) were calculated. Sensitivity was 96.8% (95% CI: 93.3%-98.8%), specificity was 87.0% (95% CI: 84.2%-89.4%), with 6/874 false negatives, resulting in a negative predictive value of 99.0% (95% CI: 97.8%-99.6%). No cases of severe NPDR, PDR, or ME were missed. The AUC was 0.980 (95% CI: 0.968-0.992). Sensitivity was not statistically different from published IDP sensitivity, which had a CI of 94.4% to 99.3%, but specificity was significantly better than the published IDP specificity CI of 55.7% to 63.0%. A deep-learning enhanced algorithm for the automated detection of DR, achieves significantly better performance than a previously reported, otherwise essentially identical, algorithm that does not employ deep learning. Deep learning enhanced algorithms have the potential to improve the efficiency of DR screening, and thereby to prevent visual loss and blindness from this devastating disease.

  18. Application of Georeferencing in the management of environmental ...

    African Journals Online (AJOL)

    Application of Georeferencing in the management of environmental pollution in the ... range of important ecosystem goods and services provided by the Niger Delta ... opportunities in the accurate monitoring and assessment of environmental ...

  19. Tracking vegetation phenology across diverse North American biomes using PhenoCam imagery: A new, publicly-available dataset

    Science.gov (United States)

    Richardson, A. D.

    2015-12-01

    Vegetation phenology controls the seasonality of many ecosystem processes, as well as numerous biosphere-atmosphere feedbacks. Phenology is highly sensitive to climate change and variability, and is thus a key aspect of global change ecology. The goal of the PhenoCam network is to serve as a long-term, continental-scale, phenological observatory. The network uses repeat digital photography—images captured using conventional, visible-wavelength, automated digital cameras—to characterize vegetation phenology in diverse ecosystems across North America and around the world. At present, imagery from over 200 research sites, spanning a wide range of ecoregions, climate zones, and plant functional types, is currently being archived and processed in near-real-time through the PhenoCam project web page (http://phenocam.sr.unh.edu/). Data derived from PhenoCam imagery have been previously used to evaluate satellite phenology products, to constrain and test new phenology models, to understand relationships between canopy phenology and ecosystem processes, and to study the seasonal changes in leaf-level physiology that are associated with changes in leaf color. I will describe a new, publicly-available phenological dataset, derived from over 600 site-years of PhenoCam imagery. For each archived image (ca. 5 million), we extracted RGB (red, green, blue) color channel information, with means and other statistics calculated across a region-of-interest (ROI) delineating a specific vegetation type. From the high-frequency (typically, 30 minute) imagery, we derived time series characterizing vegetation color, including "canopy greenness", processed to 1- and 3-day intervals. For ecosystems with a single annual cycle of vegetation activity, we derived estimates, with uncertainties, for the start, middle, and end of spring and autumn phenological transitions. Given the lack of multi-year, standardized, and geographically distributed phenological data for North America, we

  20. Uncertainty in georeferencing current and historic plant locations

    Science.gov (United States)

    McEachern, K.; Niessen, K.

    2009-01-01

    With shrinking habitats, weed invasions, and climate change, repeated surveys are becoming increasingly important for rare plant conservation and ecological restoration. We often need to relocate historical sites or provide locations for newly restored sites. Georeferencing is the technique of giving geographic coordinates to the location of a site. Georeferencing has been done historically using verbal descriptions or field maps that accompany voucher collections. New digital technology gives us more exact techniques for mapping and storing location information. Error still exists, however, and even georeferenced locations can be uncertain, especially if error information is not included with the observation. We review the concept of uncertainty in georeferencing and compare several institutional database systems for cataloging error and uncertainty with georeferenced locations. These concepts are widely discussed among geographers, but ecologists and restorationists need to become more aware of issues related to uncertainty to improve our use of spatial information in field studies. ?? 2009 by the Board of Regents of the University of Wisconsin System.

  1. Scanning and georeferencing historical USGS quadrangles

    Science.gov (United States)

    Fishburn, Kristin A.; Davis, Larry R.; Allord, Gregory J.

    2017-06-23

    The U.S. Geological Survey (USGS) National Geospatial Program is scanning published USGS 1:250,000-scale and larger topographic maps printed between 1884, the inception of the topographic mapping program, and 2006. The goal of this project, which began publishing the Historical Topographic Map Collection in 2011, is to provide access to a digital repository of USGS topographic maps that is available to the public at no cost. For more than 125 years, USGS topographic maps have accurately portrayed the complex geography of the Nation. The USGS is the Nation’s largest producer of traditional topographic maps, and, prior to 2006, USGS topographic maps were created using traditional cartographic methods and printed using a lithographic process. The next generation of topographic maps, US Topo, is being released by the USGS in digital form, and newer technologies make it possible to also deliver historical maps in the same electronic format that is more publicly accessible.

  2. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  3. Georeferencing natural disaster impact footprints : lessons learned from the EM-DAT experience

    Science.gov (United States)

    Wallemacq, Pascaline; Guha Sapir, Debarati

    2014-05-01

    wider public and policy makers. Some results from the application of georeferencing will be presented during the session such as a study of the population potentially exposed and affected by natural disasters in Europe, a flood vulnerability analysis in Vietnam and the potential merging of watersheds analysis and flood footprints data.

  4. Automatic Georeferencing of Astronaut Auroral Photography

    Science.gov (United States)

    Walsh, A. P.; Riechert, M.; Taylor, M. G.

    2014-12-01

    Astronauts on board the International Space Station have taken thousands of high quality photographs of the aurorae borealis and australis with a high temporal and spatial resolution. A barrier to these photographs being used in research is that the cameras do not have a fixed orientation and the images therefore do not have any pointing information associated with them. Using astrometry.net and other open source libraries we have developed a software toolkit to automatically reconstruct the pointing of the images from the visible starfield and hence project the auroral images in geographic and geomagnetic coordinates. Here we explain the technique and the resulting data products, which will soon be publically available through the project website.

  5. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  6. HS3D, A Dataset of Homo Sapiens Splice Regions, and its Extraction Procedure from a Major Public Database

    Science.gov (United States)

    Pollastro, Pasquale; Rampone, Salvatore

    The aim of this work is to describe a cleaning procedure of GenBank data, producing material to train and to assess the prediction accuracy of computational approaches for gene characterization. A procedure (GenBank2HS3D) has been defined, producing a dataset (HS3D - Homo Sapiens Splice Sites Dataset) of Homo Sapiens Splice regions extracted from GenBank (Rel.123 at this time). It selects, from the complete GenBank Primate Division, entries of Human Nuclear DNA according with several assessed criteria; then it extracts exons and introns from these entries (actually 4523 + 3802). Donor and acceptor sites are then extracted as windows of 140 nucleotides around each splice site (3799 + 3799). After discarding windows not including canonical GT-AG junctions (65 + 74), including insufficient data (not enough material for a 140 nucleotide window) (686 + 589), including not AGCT bases (29 + 30), and redundant (218 + 226), the remaining windows (2796 + 2880) are reported in the dataset. Finally, windows of false splice sites are selected by searching canonical GT-AG pairs in not splicing positions (271 937 + 332 296). The false sites in a range +/- 60 from a true splice site are marked as proximal. HS3D, release 1.2 at this time, is available at the Web server of the University of Sannio: http://www.sci.unisannio.it/docenti/rampone/.

  7. panMetaDocs, eSciDoc, and DOIDB - an infrastructure for the curation and publication of file-based datasets for 'GFZ Data Services'

    Science.gov (United States)

    Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens

    2016-04-01

    With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files

  8. Job Patterns for Minorities and Women in Elementary-Secondary Public Schools, 2012 EEO-5 Dataset - US Summary Report

    Data.gov (United States)

    US Equal Employment Opportunity Commission — As part of its mandate under Title VII of the Civil Rights Act of 1964, as amended, the Equal Employment Opportunity Commission requires periodic reports from public...

  9. Job Patterns for Minorities and Women in Elementary-Secondary Public Schools, 2012 EEO-5 Dataset - State Summary Report

    Data.gov (United States)

    US Equal Employment Opportunity Commission — As part of its mandate under Title VII of the Civil Rights Act of 1964, as amended, the Equal Employment Opportunity Commission requires periodic reports from public...

  10. CAG - computer-aid-georeferencing, or rapid sharing, restructuring and presentation of environmental data using remote-server georeferencing for the GE clients. Educational and scientific implications.

    Science.gov (United States)

    Hronusov, V. V.

    2006-12-01

    We suggest a method of using external public servers for rearranging, restructuring and rapid sharing of environmental data for the purpose of quick presentations in numerous GE clients. The method allows to add new philosophy for the presentation (publication) of the data (mostly static) stored in the public domain (e.g., Blue Marble, Visible Earth, etc). - The new approach is generated by publishing freely accessible spreadsheets which contain enough information and links to the data. Due to the fact that most of the large depositories of the data on the environmental monitoring have rather simple net address system as well as simple hierarchy mostly based on the date and type of the data, it is possible to develop the http-based link to the file which contains the data. Publication of new data on the server is recorded by a simple entering a new address into a cell in the spreadsheet. At the moment we use the EditGrid (www.editgrid.com) system as a spreadsheet platform. The generation of kml-codes is achieved on the basis of XML data and XSLT procedures. Since the EditGride environment supports "fetch" and similar commands, it is possible to create"smart-adaptive" KML generation on the fly based on the data streams from RSS and XML sources. The previous GIS-based methods could combine hi-definition data combined from various sources, but large- scale comparisons of dynamic processes have been usually out of reach of the technology. The suggested method allows unlimited number of GE clients to view, review and compare dynamic and static process of previously un-combinable sources, and on unprecedent scales. The ease of automated or computer-assisted georeferencing has already led to translation about 3000 raster public domain imagery, point and linear data sources into GE-language. In addition the suggested method allows a user to create rapid animations to demonstrate dynamic processes; roducts of high demand in education, meteorology, volcanology and

  11. Web-GIS approach for integrated analysis of heterogeneous georeferenced data

    Science.gov (United States)

    Okladnikov, Igor; Gordov, Evgeny; Titov, Alexander; Shulgina, Tamara

    2014-05-01

    Georeferenced datasets are currently actively used for modeling, interpretation and forecasting of climatic and ecosystem changes on different spatial and temporal scales [1]. Due to inherent heterogeneity of environmental datasets as well as their huge size (up to tens terabytes for a single dataset) a special software supporting studies in the climate and environmental change areas is required [2]. Dedicated information-computational system for integrated analysis of heterogeneous georeferenced climatological and meteorological data is presented. It is based on combination of Web and GIS technologies according to Open Geospatial Consortium (OGC) standards, and involves many modern solutions such as object-oriented programming model, modular composition, and JavaScript libraries based on GeoExt library (http://www.geoext.org), ExtJS Framework (http://www.sencha.com/products/extjs) and OpenLayers software (http://openlayers.org). The main advantage of the system lies in it's capability to perform integrated analysis of time series of georeferenced data obtained from different sources (in-situ observations, model results, remote sensing data) and to combine the results in a single map [3, 4] as WMS and WFS layers in a web-GIS application. Also analysis results are available for downloading as binary files from the graphical user interface or can be directly accessed through web mapping (WMS) and web feature (WFS) services for a further processing by the user. Data processing is performed on geographically distributed computational cluster comprising data storage systems and corresponding computational nodes. Several geophysical datasets represented by NCEP/NCAR Reanalysis II, JMA/CRIEPI JRA-25 Reanalysis, ECMWF ERA-40 Reanalysis, ECMWF ERA Interim Reanalysis, MRI/JMA APHRODITE's Water Resources Project Reanalysis, DWD Global Precipitation Climatology Centre's data, GMAO Modern Era-Retrospective analysis for Research and Applications, reanalysis of Monitoring

  12. Georeferenced LiDAR 3D Vine Plantation Map Generation

    OpenAIRE

    Meritxell Queraltó; Jordi Llop; Emilio Gil; Jordi Llorens

    2011-01-01

    The use of electronic devices for canopy characterization has recently been widely discussed. Among such devices, LiDAR sensors appear to be the most accurate and precise. Information obtained with LiDAR sensors during reading while driving a tractor along a crop row can be managed and transformed into canopy density maps by evaluating the frequency of LiDAR returns. This paper describes a proposed methodology to obtain a georeferenced canopy map by combining the information obtained with LiD...

  13. Ordering the mob: Insights into replicon and MOB typing schemes from analysis of a curated dataset of publicly available plasmids.

    Science.gov (United States)

    Orlek, Alex; Phan, Hang; Sheppard, Anna E; Doumith, Michel; Ellington, Matthew; Peto, Tim; Crook, Derrick; Walker, A Sarah; Woodford, Neil; Anjum, Muna F; Stoesser, Nicole

    2017-03-09

    Plasmid typing can provide insights into the epidemiology and transmission of plasmid-mediated antibiotic resistance. The principal plasmid typing schemes are replicon typing and MOB typing, which utilize variation in replication loci and relaxase proteins respectively. Previous studies investigating the proportion of plasmids assigned a type by these schemes ('typeability') have yielded conflicting results; moreover, thousands of plasmid sequences have been added to NCBI in recent years, without consistent annotation to indicate which sequences represent complete plasmids. Here, a curated dataset of complete Enterobacteriaceae plasmids from NCBI was compiled, and used to assess the typeability and concordance of in silico replicon and MOB typing schemes. Concordance was assessed at hierarchical replicon type resolutions, from replicon family-level to plasmid multilocus sequence type (pMLST)-level, where available. We found that 85% and 65% of the curated plasmids could be replicon and MOB typed, respectively. Overall, plasmid size and the number of resistance genes were significant independent predictors of replicon and MOB typing success. We found some degree of non-concordance between replicon families and MOB types, which was only partly resolved when partitioning plasmids into finer-resolution groups (replicon and pMLST types). In some cases, non-concordance was attributed to ambiguous boundaries between MOBP and MOBQ types; in other cases, backbone mosaicism was considered a more plausible explanation. β-lactamase resistance genes tended not to show fidelity to a particular plasmid type, though some previously reported associations were supported. Overall, replicon and MOB typing schemes are likely to continue playing an important role in plasmid analysis, but their performance is constrained by the diverse and dynamic nature of plasmid genomes.

  14. Georeferenced LiDAR 3D vine plantation map generation.

    Science.gov (United States)

    Llorens, Jordi; Gil, Emilio; Llop, Jordi; Queraltó, Meritxell

    2011-01-01

    The use of electronic devices for canopy characterization has recently been widely discussed. Among such devices, LiDAR sensors appear to be the most accurate and precise. Information obtained with LiDAR sensors during reading while driving a tractor along a crop row can be managed and transformed into canopy density maps by evaluating the frequency of LiDAR returns. This paper describes a proposed methodology to obtain a georeferenced canopy map by combining the information obtained with LiDAR with that generated using a GPS receiver installed on top of a tractor. Data regarding the velocity of LiDAR measurements and UTM coordinates of each measured point on the canopy were obtained by applying the proposed transformation process. The process allows overlap of the canopy density map generated with the image of the intended measured area using Google Earth(®), providing accurate information about the canopy distribution and/or location of damage along the rows. This methodology was applied and tested on different vine varieties and crop stages in two important vine production areas in Spain. The results indicate that the georeferenced information obtained with LiDAR sensors appears to be an interesting tool with the potential to improve crop management processes.

  15. Georeferenced LiDAR 3D Vine Plantation Map Generation

    Directory of Open Access Journals (Sweden)

    Meritxell Queraltó

    2011-06-01

    Full Text Available The use of electronic devices for canopy characterization has recently been widely discussed. Among such devices, LiDAR sensors appear to be the most accurate and precise. Information obtained with LiDAR sensors during reading while driving a tractor along a crop row can be managed and transformed into canopy density maps by evaluating the frequency of LiDAR returns. This paper describes a proposed methodology to obtain a georeferenced canopy map by combining the information obtained with LiDAR with that generated using a GPS receiver installed on top of a tractor. Data regarding the velocity of LiDAR measurements and UTM coordinates of each measured point on the canopy were obtained by applying the proposed transformation process. The process allows overlap of the canopy density map generated with the image of the intended measured area using Google Earth®, providing accurate information about the canopy distribution and/or location of damage along the rows. This methodology was applied and tested on different vine varieties and crop stages in two important vine production areas in Spain. The results indicate that the georeferenced information obtained with LiDAR sensors appears to be an interesting tool with the potential to improve crop management processes.

  16. Refugees welcome? A dataset on anti-refugee violence in Germany

    Directory of Open Access Journals (Sweden)

    David Benček

    2016-11-01

    Full Text Available The recent rise of xenophobic attacks against refugees in Germany has sparked both political and scholarly debates on the drivers, dynamics, and consequences of right-wing violence. Thus far, a lack of systematic data collection and data processing has inhibited quantitative analysis to help explain this current social phenomenon. This paper presents a georeferenced event dataset on anti-refugee violence and social unrest in Germany in 2014 and 2015 that is based on information collected by two civil society organizations, the Amadeu Antonio Foundation and PRO ASYL, who publicize their data in an online chronicle. We webscraped this information to create a scientifically usable dataset that includes information on 1 645 events of four different types of right-wing violence and social unrest: xenophobic demonstrations, assault, arson attacks, and miscellaneous attacks against refugee housing (such as swastika graffiti. After discussing how the dataset was constructed, we offer a descriptive analysis of patterns of right-wing violence and unrest in Germany in 2014 and 2015. This article concludes by outlining preliminary ideas on how the dataset can be used in future research of various disciplines in the social sciences.

  17. DIRECT GEOREFERENCING OF UAV DATA BASED ON SIMPLE BUILDING STRUCTURES

    Directory of Open Access Journals (Sweden)

    W. Tampubolon

    2016-06-01

    Full Text Available Unmanned Aerial Vehicle (UAV data acquisition is more flexible compared with the more complex traditional airborne data acquisition. This advantage puts UAV platforms in a position as an alternative acquisition method in many applications including Large Scale Topographical Mapping (LSTM. LSTM, i.e. larger or equal than 1:10.000 map scale, is one of a number of prominent priority tasks to be solved in an accelerated way especially in third world developing countries such as Indonesia. As one component of fundamental geospatial data sets, large scale topographical maps are mandatory in order to enable detailed spatial planning. However, the accuracy of the products derived from the UAV data are normally not sufficient for LSTM as it needs robust georeferencing, which requires additional costly efforts such as the incorporation of sophisticated GPS Inertial Navigation System (INS or Inertial Measurement Unit (IMU on the platform and/or Ground Control Point (GCP data on the ground. To reduce the costs and the weight on the UAV alternative solutions have to be found. This paper outlines a direct georeferencing method of UAV data by providing image orientation parameters derived from simple building structures and presents results of an investigation on the achievable results in a LSTM application. In this case, the image orientation determination has been performed through sequential images without any input from INS/IMU equipment. The simple building structures play a significant role in such a way that geometrical characteristics have been considered. Some instances are the orthogonality of the building’s wall/rooftop and the local knowledge of the building orientation in the field. In addition, we want to include the Structure from Motion (SfM approach in order to reduce the number of required GCPs especially for the absolute orientation purpose. The SfM technique applied to the UAV data and simple building structures additionally presents an

  18. Direct Georeferencing of Uav Data Based on Simple Building Structures

    Science.gov (United States)

    Tampubolon, W.; Reinhardt, W.

    2016-06-01

    Unmanned Aerial Vehicle (UAV) data acquisition is more flexible compared with the more complex traditional airborne data acquisition. This advantage puts UAV platforms in a position as an alternative acquisition method in many applications including Large Scale Topographical Mapping (LSTM). LSTM, i.e. larger or equal than 1:10.000 map scale, is one of a number of prominent priority tasks to be solved in an accelerated way especially in third world developing countries such as Indonesia. As one component of fundamental geospatial data sets, large scale topographical maps are mandatory in order to enable detailed spatial planning. However, the accuracy of the products derived from the UAV data are normally not sufficient for LSTM as it needs robust georeferencing, which requires additional costly efforts such as the incorporation of sophisticated GPS Inertial Navigation System (INS) or Inertial Measurement Unit (IMU) on the platform and/or Ground Control Point (GCP) data on the ground. To reduce the costs and the weight on the UAV alternative solutions have to be found. This paper outlines a direct georeferencing method of UAV data by providing image orientation parameters derived from simple building structures and presents results of an investigation on the achievable results in a LSTM application. In this case, the image orientation determination has been performed through sequential images without any input from INS/IMU equipment. The simple building structures play a significant role in such a way that geometrical characteristics have been considered. Some instances are the orthogonality of the building's wall/rooftop and the local knowledge of the building orientation in the field. In addition, we want to include the Structure from Motion (SfM) approach in order to reduce the number of required GCPs especially for the absolute orientation purpose. The SfM technique applied to the UAV data and simple building structures additionally presents an effective tool

  19. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents...

  20. Integrating High-Resolution Datasets to Target Mitigation Efforts for Improving Air Quality and Public Health in Urban Neighborhoods.

    Science.gov (United States)

    Shandas, Vivek; Voelkel, Jackson; Rao, Meenakshi; George, Linda

    2016-08-05

    Reducing exposure to degraded air quality is essential for building healthy cities. Although air quality and population vary at fine spatial scales, current regulatory and public health frameworks assess human exposures using county- or city-scales. We build on a spatial analysis technique, dasymetric mapping, for allocating urban populations that, together with emerging fine-scale measurements of air pollution, addresses three objectives: (1) evaluate the role of spatial scale in estimating exposure; (2) identify urban communities that are disproportionately burdened by poor air quality; and (3) estimate reduction in mobile sources of pollutants due to local tree-planting efforts using nitrogen dioxide. Our results show a maximum value of 197% difference between cadastrally-informed dasymetric system (CIDS) and standard estimations of population exposure to degraded air quality for small spatial extent analyses, and a lack of substantial difference for large spatial extent analyses. These results provide the foundation for improving policies for managing air quality, and targeting mitigation efforts to address challenges of environmental justice.

  1. Rapid Characterization of Shorelines using a Georeferenced Video Mapping System

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Michael G.; Judd, Chaeli; Marcoe, K.

    2012-09-01

    Increased understanding of shoreline conditions is needed, yet current approaches are limited in ability to characterize remote areas or document features at a finer resolution. Documentation using video mapping may provide a rapid and repeatable method for assessing the current state of the environment and determining changes to the shoreline over time. In this study, we compare two studies using boat-based, georeferenced video mapping in coastal Washington and the Columbia River Estuary to map and characterize coastal stressors and functional data. In both areas, mapping multiple features along the shoreline required approximation of the coastline. However, characterization of vertically oriented features such as shoreline armoring and small features such as pilings and large woody debris was possible. In addition, end users noted that geovideo provides a permanent record to allow a user to examine recorded video anywhere along a transect or at discrete points.

  2. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  3. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  4. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  5. Microarray Analysis Dataset

    Science.gov (United States)

    This file contains a link for Gene Expression Omnibus and the GSE designations for the publicly available gene expression data used in the study and reflected in Figures 6 and 7 for the Das et al., 2016 paper.This dataset is associated with the following publication:Das, K., C. Wood, M. Lin, A.A. Starkov, C. Lau, K.B. Wallace, C. Corton, and B. Abbott. Perfluoroalky acids-induced liver steatosis: Effects on genes controlling lipid homeostasis. TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 378: 32-52, (2017).

  6. Vision-Based Georeferencing of GPR in Urban Areas

    Directory of Open Access Journals (Sweden)

    Riccardo Barzaghi

    2016-01-01

    Full Text Available Ground Penetrating Radar (GPR surveying is widely used to gather accurate knowledge about the geometry and position of underground utilities. The sensor arrays need to be coupled to an accurate positioning system, like a geodetic-grade Global Navigation Satellite System (GNSS device. However, in urban areas this approach is not always feasible because GNSS accuracy can be substantially degraded due to the presence of buildings, trees, tunnels, etc. In this work, a photogrammetric (vision-based method for GPR georeferencing is presented. The method can be summarized in three main steps: tie point extraction from the images acquired during the survey, computation of approximate camera extrinsic parameters and finally a refinement of the parameter estimation using a rigorous implementation of the collinearity equations. A test under operational conditions is described, where accuracy of a few centimeters has been achieved. The results demonstrate that the solution was robust enough for recovering vehicle trajectories even in critical situations, such as poorly textured framed surfaces, short baselines, and low intersection angles.

  7. Small UAV-Acquired, High-resolution, Georeferenced Still Imagery

    Energy Technology Data Exchange (ETDEWEB)

    Ryan Hruska

    2005-09-01

    Currently, small Unmanned Aerial Vehicles (UAVs) are primarily used for capturing and down-linking real-time video. To date, their role as a low-cost airborne platform for capturing high-resolution, georeferenced still imagery has not been fully utilized. On-going work within the Unmanned Vehicle Systems Program at the Idaho National Laboratory (INL) is attempting to exploit this small UAV-acquired, still imagery potential. Initially, a UAV-based still imagery work flow model was developed that includes initial UAV mission planning, sensor selection, UAV/sensor integration, and imagery collection, processing, and analysis. Components to support each stage of the work flow are also being developed. Critical to use of acquired still imagery is the ability to detect changes between images of the same area over time. To enhance the analysts’ change detection ability, a UAV-specific, GIS-based change detection system called SADI or System for Analyzing Differences in Imagery is under development. This paper will discuss the associated challenges and approaches to collecting still imagery with small UAVs. Additionally, specific components of the developed work flow system will be described and graphically illustrated using varied examples of small UAV-acquired still imagery.

  8. Vision-Based Georeferencing of GPR in Urban Areas

    Science.gov (United States)

    Barzaghi, Riccardo; Cazzaniga, Noemi Emanuela; Pagliari, Diana; Pinto, Livio

    2016-01-01

    Ground Penetrating Radar (GPR) surveying is widely used to gather accurate knowledge about the geometry and position of underground utilities. The sensor arrays need to be coupled to an accurate positioning system, like a geodetic-grade Global Navigation Satellite System (GNSS) device. However, in urban areas this approach is not always feasible because GNSS accuracy can be substantially degraded due to the presence of buildings, trees, tunnels, etc. In this work, a photogrammetric (vision-based) method for GPR georeferencing is presented. The method can be summarized in three main steps: tie point extraction from the images acquired during the survey, computation of approximate camera extrinsic parameters and finally a refinement of the parameter estimation using a rigorous implementation of the collinearity equations. A test under operational conditions is described, where accuracy of a few centimeters has been achieved. The results demonstrate that the solution was robust enough for recovering vehicle trajectories even in critical situations, such as poorly textured framed surfaces, short baselines, and low intersection angles. PMID:26805842

  9. Libraries, The locations and contact information for academic, private and public libraries in Rhode Island. The intention of this dataset was to provide an overview of data. Additional information pertinent to the state is also available from the RI Department of, Published in 2007, 1:4800 (1in=400ft) scale, Rhode Island and Providence Plantations.

    Data.gov (United States)

    NSGIC State | GIS Inventory — Libraries dataset current as of 2007. The locations and contact information for academic, private and public libraries in Rhode Island. The intention of this dataset...

  10. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: Facility Registry Services (FRS) : Toxic Release Inventory (TRI) , National Pollutant Discharge Elimination System (NPDES) , and Superfund Sites

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the estimated density of georeferenced sites within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the...

  11. OBJECT-BASED CHANGE DETECTION USING GEOREFERENCED UAV IMAGES

    Directory of Open Access Journals (Sweden)

    J. Shi

    2012-09-01

    Full Text Available Unmanned aerial vehicles (UAV have been widely used to capture and down-link real-time videos/images. However, their role as a low-cost airborne platform for capturing high-resolution, geo-referenced still imagery has not been fully utilized. The images obtained from UAV are advantageous over remote sensing images as they can be obtained at a low cost and potentially no risk to human life. However, these images are distorted due to the noise generated by the rotary wings which limits the usefulness of such images. One potential application of such images is to detect changes between the images of the same area which are collected over time. Change detection is of widespread interest due to a large number of applications, including surveillance and civil infrastructure. Although UAVs can provide images with high resolution in a portable and easy way, such images only cover small parts of the entire field of interest and are often with high deformation. Until now, there is not much application of change detection for UAV images. Also the traditional pixel-based change detection method does not give satisfactory results for such images. In this paper, we have proposed a novel object-based method for change detection using UAV images which can overcome the effect of deformation and can fully utilize the high resolution capability of UAV images. The developed method can be divided into five main blocks: pre-processing, image matching, image segmentation and feature extraction, change detection and accuracy evaluation. The pre-processing step is further divided into two sub-steps: the first sub-step is to geometrically correct the bi-temporal image based on the geo-reference information (GPS/INS installed on the UAV system, and the second sub-step is the radiometric normalization using a histogram method. The image matching block uses the well-known scale-invariant feature transform (SIFT algorithm to match the same areas in the images and then

  12. Web based tools for data manipulation, visualisation and validation with interactive georeferenced graphs

    Science.gov (United States)

    Ivankovic, D.; Dadic, V.

    2009-04-01

    Some of oceanographic parameters have to be manually inserted into database; some (for example data from CTD probe) are inserted from various files. All this parameters requires visualization, validation and manipulation from research vessel or scientific institution, and also public presentation. For these purposes is developed web based system, containing dynamic sql procedures and java applets. Technology background is Oracle 10g relational database, and Oracle application server. Web interfaces are developed using PL/SQL stored database procedures (mod PL/SQL). Additional parts for data visualization include use of Java applets and JavaScript. Mapping tool is Google maps API (javascript) and as alternative java applet. Graph is realized as dynamically generated web page containing java applet. Mapping tool and graph are georeferenced. That means that click on some part of graph, automatically initiate zoom or marker onto location where parameter was measured. This feature is very useful for data validation. Code for data manipulation and visualization are partially realized with dynamic SQL and that allow as to separate data definition and code for data manipulation. Adding new parameter in system requires only data definition and description without programming interface for this kind of data.

  13. Geopan AT@S: a Brokering Based Gateway to Georeferenced Historical Maps for Risk Analysis

    Science.gov (United States)

    Previtali, M.

    2017-08-01

    Importance of ancient and historical maps is nowadays recognized in many applications (e.g., urban planning, landscape valorisation and preservation, land changes identification, etc.). In the last years a great effort has been done by different institutions, such as Geographical Institutes, Public Administrations, and collaborative communities, for digitizing and publishing online collections of historical maps. In spite of this variety and availability of data, information overload makes difficult their discovery and management: without knowing the specific repository where the data are stored, it is difficult to find the information required. In addition, problems of interconnection between different data sources and their restricted interoperability may arise. This paper describe a new brokering based gateway developed to assure interoperability between data, in particular georeferenced historical maps and geographic data, gathered from different data providers, with various features and referring to different historical periods. The developed approach is exemplified by a new application named GeoPAN Atl@s that is aimed at linking in Northern Italy area land changes with risk analysis (local seismicity amplification and flooding risk) by using multi-temporal data sources and historic maps.

  14. The Flora Mycologica Iberica Project fungi occurrence dataset

    Directory of Open Access Journals (Sweden)

    Francisco Pando

    2016-09-01

    Full Text Available The dataset contains detailed distribution information on several fungal groups. The information has been revised, and in many times compiled, by expert mycologist(s working on the monographs for the Flora Mycologica Iberica Project (FMI. Records comprise both collection and observational data, obtained from a variety of sources including field work, herbaria, and the literature. The dataset contains 59,235 records, of which 21,393 are georeferenced. These correspond to 2,445 species, grouped in 18 classes. The geographical scope of the dataset is Iberian Peninsula (Continental Portugal and Spain, and Andorra and Balearic Islands. The complete dataset is available in Darwin Core Archive format via the Global Biodiversity Information Facility (GBIF.

  15. Computer vision-based orthorectification and georeferencing of aerial image sets

    Science.gov (United States)

    Faraji, Mohammad Reza; Qi, Xiaojun; Jensen, Austin

    2016-07-01

    Generating a georeferenced mosaic map from unmanned aerial vehicle (UAV) imagery is a challenging task. Direct and indirect georeferencing methods may fail to generate an accurate mosaic map due to the erroneous exterior orientation parameters stored in the inertial measurement unit (IMU), erroneous global positioning system (GPS) data, and difficulty in locating ground control points (GCPs) or having a sufficient number of GCPs. This paper presents a practical framework to orthorectify and georeference aerial images using the robust features-based matching method. The proposed georeferencing process is fully automatic and does not require any GCPs. It is also a near real-time process which can be used to determine whether aerial images taken by UAV cover the entire target area. We also extend this framework to use the inverse georeferencing process to update the IMU/GPS data which can be further used to calibrate the camera of the UAV, reduce IMU/GPS errors, and thus produce more accurate mosaic maps by employing any georeferencing method. Our experiments demonstrate the effectiveness of the proposed framework in producing comparable mosaic maps as commercial software Agisoft and the effectiveness of the extended framework in significantly reducing the errors in the IMU/GPS data.

  16. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments Riparian Buffer for the Conterminous United States: Facility Registry Services (FRS) : Toxic Release Inventory (TRI) , National Pollutant Discharge Elimination System (NPDES) , and Superfund Sites

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the estimated density of georeferenced sites within individual, local NHDPlusV2 catchments and upstream, contributing watersheds riparian...

  17. Georeferenced Point Clouds: A Survey of Features and Point Cloud Management

    Directory of Open Access Journals (Sweden)

    Johannes Otepka

    2013-10-01

    Full Text Available This paper presents a survey of georeferenced point clouds. Concentration is, on the one hand, put on features, which originate in the measurement process themselves, and features derived by processing the point cloud. On the other hand, approaches for the processing of georeferenced point clouds are reviewed. This includes the data structures, but also spatial processing concepts. We suggest a categorization of features into levels that reflect the amount of processing. Point clouds are found across many disciplines, which is reflected in the versatility of the literature suggesting specific features.

  18. Overview of the HUPO Plasma Proteome Project: Results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database

    Energy Technology Data Exchange (ETDEWEB)

    Omenn, Gilbert; States, David J.; Adamski, Marcin; Blackwell, Thomas W.; Menon, Rajasree; Hermjakob, Henning; Apweiler, Rolf; Haab, Brian B.; Simpson, Richard; Eddes, James; Kapp, Eugene; Moritz, Rod; Chan, Daniel W.; Rai, Alex J.; Admon, Arie; Aebersold, Ruedi; Eng, Jimmy K.; Hancock, William S.; Hefta, Stanley A.; Meyer, Helmut; Paik, Young-Ki; Yoo, Jong-Shin; Ping, Peipei; Pounds, Joel G.; Adkins, Joshua N.; Qian, Xiaohong; Wang, Rong; Wasinger, Valerie; Wu, Chi Yue; Zhao, Xiaohang; Zeng, Rong; Archakov, Alexander; Tsugita, Akira; Beer, Ilan; Pandey, Akhilesh; Pisano, Michael; Andrews, Philip; Tammen, Harald; Speicher, David W.; Hanash, Samir M.

    2005-08-13

    HUPO initiated the Plasma Proteome Project (PPP) in 2002. Its pilot phase has (1) evaluated advantages and limitations of many depletion, fractionation, and MS technology platforms; (2) compared PPP reference specimens of human serum and EDTA, heparin, and citrate-anticoagulated plasma; and (3) created a publicly-available knowledge base (www.bioinformatics. med.umich.edu/hupo/ppp; www.ebi.ac.uk/pride). Thirty-five participating laboratories in 13 countries submitted datasets. Working groups addressed (a) specimen stability and protein concentrations; (b) protein identifications from 18 MS/MS datasets; (c) independent analyses from raw MS-MS spectra; (d) search engine performance, subproteome analyses, and biological insights; (e) antibody arrays; and (f) direct MS/SELDI analyses. MS-MS datasets had 15 710 different International Protein Index (IPI) protein IDs; our integration algorithm applied to multiple matches of peptide sequences yielded 9504 IPI proteins identified with one or more peptides and 3020 proteins identified with two or more peptides (the Core Dataset). These proteins have been characterized with Gene Ontology, InterPro, Novartis Atlas, OMIM, and immunoassay based concentration determinations. The database permits examination of many other subsets, such as 1274 proteins identified with three or more peptides. Reverse protein to DNA matching identified proteins for 118 previously unidentified ORFs. We recommend use of plasma instead of serum, with EDTA (or citrate) for anticoagulation. To improve resolution, sensitivity and reproducibility of peptide identifications and protein matches, we recommend combinations of depletion, fractionation, and MS/MS technologies, with explicit criteria for evaluation of spectra, use of search algorithms, and integration of homologous protein matches. This Special Issue of PROTEOMICS presents papers integral to the collaborative analysis plus many reports of supplementary work on various aspects of the PPP workplan

  19. Photographic dataset: random peppercorns

    CERN Document Server

    Helenius, Teemu

    2016-01-01

    This is a photographic dataset collected for testing image processing algorithms. The idea is to have sets of different but statistically similar images. In this work the images show randomly distributed peppercorns. The dataset is made available at www.fips.fi/photographic_dataset.php .

  20. Creating a Geo-Referenced Bibliography with Google Earth and Geocommons: The Coos Bay Bibliography

    Science.gov (United States)

    Schmitt, Jenni; Butler, Barb

    2012-01-01

    We compiled a geo-referenced bibliography of research including theses, peer-reviewed articles, agency literature, and books having sample collection sites in and around Coos Bay, Oregon. Using Google Earth and GeoCommons we created a map that allows users such as visiting researchers, faculty, students, and local agencies to identify previous…

  1. Integrated GNSS attitude determination and positioning for direct geo-referencing

    NARCIS (Netherlands)

    Nadarajah, N.; Paffenholz, J.A.; Teunissen, P.J.G.

    2014-01-01

    Direct geo-referencing is an efficient methodology for the fast acquisition of 3D spatial data. It requires the fusion of spatial data acquisition sensors with navigation sensors, such as Global Navigation Satellite System (GNSS) receivers. In this contribution, we consider an integrated GNSS naviga

  2. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets.

    Science.gov (United States)

    Li, Miao-Xin; Yeung, Juilian M Y; Cherny, Stacey S; Sham, Pak C

    2012-05-01

    Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M(e)) for the adjustment of multiple testing, but current methods of calculation for M(e) are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate M(e). Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the M(e), and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ~10(-7) as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ~5 × 10(-8) for current or merged commercial genotyping arrays, ~10(-8) for all common SNPs in the 1000 Genomes Project dataset and ~5 × 10(-8) for the common SNPs only within genes.

  3. Dataset Lifecycle Policy

    Science.gov (United States)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  4. Dataset Lifecycle Policy

    Science.gov (United States)

    Armstrong, Edward; Tauer, Eric

    2013-01-01

    The presentation focused on describing a new dataset lifecycle policy that the NASA Physical Oceanography DAAC (PO.DAAC) has implemented for its new and current datasets to foster improved stewardship and consistency across its archive. The overarching goal is to implement this dataset lifecycle policy for all new GHRSST GDS2 datasets and bridge the mission statements from the GHRSST Project Office and PO.DAAC to provide the best quality SST data in a cost-effective, efficient manner, preserving its integrity so that it will be available and usable to a wide audience.

  5. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services

    Directory of Open Access Journals (Sweden)

    Damian Ulbricht

    2016-03-01

    Full Text Available The GFZ German Research Centre for Geosciences is the national laboratory for Geosciences in Germany. As part of the Helmholtz Association, providing and maintaining large-scale scientific infrastructures are an essential part of GFZ activities. This includes the generation of significant volumes and numbers of research data, which subsequently become source materials for data publications. The development and maintenance of data systems is a key component of GFZ Data Services to support state-of-the-art research. A challenge lies not only in the diversity of scientific subjects and communities, but also in different types and manifestations of how data are managed by research groups and individual scientists. The data repository of GFZ Data Services provides a flexible IT infrastructure for data storage and publication, including minting of digital object identifiers (DOI. It was built as a modular system of several independent software components linked together through Application Programming Interfaces (APIs provided by the eSciDoc framework. Principal application software are panMetaDocs for data management and DOIDB for logging and moderating data publications activities. Wherever possible, existing software solutions were integrated or adapted. A summary of our experiences made in operating this service is given. Data are described through comprehensive landing pages and supplementary documents, like journal articles or data reports, thus augmenting the scientific usability of the service.

  6. Real-time geo-referenced video mosaicking with the MATISSE system

    DEFF Research Database (Denmark)

    Vincent, Anne-Gaelle; Pessel, Nathalie; Borgetto, Manon

    This paper presents the MATISSE system: Mosaicking Advanced Technologies Integrated in a Single Software Environment. This system aims at producing in-line and off-line geo-referenced video mosaics of seabed given a video input and navigation data. It is based upon several techniques of image...... and signal processing which have been developed at Ifremer these last years in the fields of image mosaicking, camera self-calibration or correction and estimation of navigation data....

  7. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Science.gov (United States)

    Pepe, Alberto; Goodman, Alyssa; Muench, August; Crosas, Merce; Erdmann, Christopher

    2014-01-01

    We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  8. How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers.

    Directory of Open Access Journals (Sweden)

    Alberto Pepe

    Full Text Available We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it; unfamiliarity with options that make data-sharing easier (faster and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

  9. Fixing Dataset Search

    Science.gov (United States)

    Lynnes, Chris

    2014-01-01

    Three current search engines are queried for ozone data at the GES DISC. The results range from sub-optimal to counter-intuitive. We propose a method to fix dataset search by implementing a robust relevancy ranking scheme. The relevancy ranking scheme is based on several heuristics culled from more than 20 years of helping users select datasets.

  10. 3D Maize Plant Reconstruction Based on Georeferenced Overlapping LiDAR Point Clouds

    Directory of Open Access Journals (Sweden)

    Miguel Garrido

    2015-12-01

    Full Text Available 3D crop reconstruction with a high temporal resolution and by the use of non-destructive measuring technologies can support the automation of plant phenotyping processes. Thereby, the availability of such 3D data can give valuable information about the plant development and the interaction of the plant genotype with the environment. This article presents a new methodology for georeferenced 3D reconstruction of maize plant structure. For this purpose a total station, an IMU, and several 2D LiDARs with different orientations were mounted on an autonomous vehicle. By the multistep methodology presented, based on the application of the ICP algorithm for point cloud fusion, it was possible to perform the georeferenced point clouds overlapping. The overlapping point cloud algorithm showed that the aerial points (corresponding mainly to plant parts were reduced to 1.5%–9% of the total registered data. The remaining were redundant or ground points. Through the inclusion of different LiDAR point of views of the scene, a more realistic representation of the surrounding is obtained by the incorporation of new useful information but also of noise. The use of georeferenced 3D maize plant reconstruction at different growth stages, combined with the total station accuracy could be highly useful when performing precision agriculture at the crop plant level.

  11. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  12. 2016 TRI Preliminary Dataset

    Science.gov (United States)

    The TRI preliminary dataset includes the most current TRI data available and reflects toxic chemical releases and pollution prevention activities that occurred at TRI facilities during the 2016 calendar year.

  13. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  14. USEWOD 2016 Research Dataset

    OpenAIRE

    Luczak-Roesch, Markus; Aljaloud, Saud; Berendt, Bettina; Hollink, Laura

    2016-01-01

    The USEWOD 2016 research dataset is a collection of usage data from Web of Data sources, which have been collected in 2015. It covers sources such as DBpedia, the Linked Data Fragments interface to DBpedia as well as Wikidata page views.\\ud \\ud This dataset can be requested via http://library.soton.ac.uk/datarequest - please also email a scanned copy of the signed Usage Agreement (to ).

  15. Mobile TDR for geo-referenced measurement of soil water content and electrical conductivity

    DEFF Research Database (Denmark)

    Thomsen, Anton; Schelde, Kirsten; Drøscher, Per;

    2007-01-01

    The development of site-specific crop management is constrained by the availability of sensors for monitoring important soil and crop related conditions. A mobile time-domain reflectometry (TDR) unit for geo-referenced soil measurements has been developed and used for detailed mapping of soil water...... are closely related to the clay and silt fractions of a variable field. The application to early season field mapping of water content, electrical conductivity and clay content is presented. The water and clay content maps are to be used for automated delineation of field management units. Based on a spatial...

  16. Pantir - a Dual Camera Setup for Precise Georeferencing and Mosaicing of Thermal Aerial Images

    Science.gov (United States)

    Weber, I.; Jenal, A.; Kneer, C.; Bongartz, J.

    2015-03-01

    Research and monitoring in fields like hydrology and agriculture are applications of airborne thermal infrared (TIR) cameras, which suffer from low spatial resolution and low quality lenses. Common ground control points (GCPs), lacking thermal activity and being relatively small in size, cannot be used in TIR images. Precise georeferencing and mosaicing however is necessary for data analysis. Adding a high resolution visible light camera (VIS) with a high quality lens very close to the TIR camera, in the same stabilized rig, allows us to do accurate geoprocessing with standard GCPs after fusing both images (VIS+TIR) using standard image registration methods.

  17. Particle filtering methods for georeferencing panoramic image sequence in complex urban scenes

    Science.gov (United States)

    Ji, Shunping; Shi, Yun; Shan, Jie; Shao, Xiaowei; Shi, Zhongchao; Yuan, Xiuxiao; Yang, Peng; Wu, Wenbin; Tang, Huajun; Shibasaki, Ryosuke

    2015-07-01

    Georeferencing image sequences is critical for mobile mapping systems. Traditional methods such as bundle adjustment need adequate and well-distributed ground control points (GCP) when accurate GPS data are not available in complex urban scenes. For applications of large areas, automatic extraction of GCPs by matching vehicle-born image sequences with geo-referenced ortho-images will be a better choice than intensive GCP collection with field surveying. However, such image matching generated GCP's are highly noisy, especially in complex urban street environments due to shadows, occlusions and moving objects in the ortho images. This study presents a probabilistic solution that integrates matching and localization under one framework. First, a probabilistic and global localization model is formulated based on the Bayes' rules and Markov chain. Unlike many conventional methods, our model can accommodate non-Gaussian observation. In the next step, a particle filtering method is applied to determine this model under highly noisy GCP's. Owing to the multiple hypotheses tracking represented by diverse particles, the method can balance the strength of geometric and radiometric constraints, i.e., drifted motion models and noisy GCP's, and guarantee an approximately optimal trajectory. Carried out tests are with thousands of mobile panoramic images and aerial ortho-images. Comparing with the conventional extended Kalman filtering and a global registration method, the proposed approach can succeed even under more than 80% gross errors in GCP's and reach a good accuracy equivalent to the traditional bundle adjustment with dense and precise control.

  18. Accuracy Assessment of Point Clouds Geo-Referencing in Surveying and Documentation of Historical Complexes

    Science.gov (United States)

    Fryskowska, A.

    2017-05-01

    Terrestrial Laser Scanning (TLS) technique is widely used for documentation and preservation of historical sites by for example creating three-dimensional (3-D) digital models or vectorial sketches. In consequence, a complex, complete, detail and accurate documentation of historical structure is created. It is very crucial when it comes about modern digital culture. If we acquire TLS data of once particular structure usually we do it in local coordinate system of scanner. Nevertheless when measurements are conducted for complex of several historical buildings or monuments (i.e. castle ruins, building of narrow streets of the Old Towns), the registration of point clouds into a common, global coordinate system is one of the critical steps in TLS data processing. Then we have integrate data with different accuracy level. Inner accuracy of local coordinate system (scanner system) is usually thrice higher than for global coordinate systems measurement. The paper describes the geometric quality of the direct georeferencing in post-processing, considering surveying points. Then, an analysis of factors affecting registration accuracy is proposed. Finally, an improvement of direct georeferencing technique is presented and examined. Furthermore, registered data and chosen orientation methods have been compared to each other.

  19. Develop Direct Geo-referencing System Based on Open Source Software and Hardware Platform

    Science.gov (United States)

    Liu, H. S.; Liao, H. M.

    2015-08-01

    Direct geo-referencing system uses the technology of remote sensing to quickly grasp images, GPS tracks, and camera position. These data allows the construction of large volumes of images with geographic coordinates. So that users can be measured directly on the images. In order to properly calculate positioning, all the sensor signals must be synchronized. Traditional aerial photography use Position and Orientation System (POS) to integrate image, coordinates and camera position. However, it is very expensive. And users could not use the result immediately because the position information does not embed into image. To considerations of economy and efficiency, this study aims to develop a direct geo-referencing system based on open source software and hardware platform. After using Arduino microcontroller board to integrate the signals, we then can calculate positioning with open source software OpenCV. In the end, we use open source panorama browser, panini, and integrate all these to open source GIS software, Quantum GIS. A wholesome collection of data - a data processing system could be constructed.

  20. Dual-Antenna Terrestrial Laser Scanner Georeferencing Using Auxiliary Photogrammetric Observations

    Directory of Open Access Journals (Sweden)

    Benjamin Wilkinson

    2015-09-01

    Full Text Available Terrestrial laser scanning typically requires the use of artificial targets for registration and georeferencing the data. This equipment can be burdensome to transport and set up, representing expense in both time and labor. Environmental factors such as terrain can sometimes make target placement dangerous or impossible, or lead to weak network geometry and therefore degraded product accuracy. The use of additional sensors can help reduce the required number of artificial targets and, in some cases, eliminate the need for them altogether. The research presented here extends methods for direct georeferencing of terrestrial laser scanner data using a dual GNSS antenna apparatus with additional photogrammetric observations from a scanner-mounted camera. Novel combinations of observations and processing methods were tested on data collected at two disparate sites in order to find the best method in terms of processing efficiency and product quality. In addition, a general model for the scanner and auxiliary data is given which can be used for least-squares adjustment and uncertainty estimation in similar systems with varied and diverse configurations. We found that the dual-antenna system resulted in cm-level accuracy practical for many applications and superior to conventional one-antenna systems, and that auxiliary photogrammetric observation significantly increased accuracy of the dual-antenna solution.

  1. Direct Georeferencing : a New Standard in Photogrammetry for High Accuracy Mapping

    Science.gov (United States)

    Rizaldy, A.; Firdaus, W.

    2012-07-01

    Direct georeferencing is a new method in photogrammetry, especially in the digital camera era. Theoretically, this method does not require ground control points (GCP) and the Aerial Triangulation (AT), to process aerial photography into ground coordinates. Compared with the old method, this method has three main advantages: faster data processing, simple workflow and less expensive project, at the same accuracy. Direct georeferencing using two devices, GPS and IMU. GPS recording the camera coordinates (X, Y, Z), and IMU recording the camera orientation (omega, phi, kappa). Both parameters merged into Exterior Orientation (EO) parameter. This parameters required for next steps in the photogrammetric projects, such as stereocompilation, DSM generation, orthorectification and mosaic. Accuracy of this method was tested on topographic map project in Medan, Indonesia. Large-format digital camera Ultracam X from Vexcel is used, while the GPS / IMU is IGI AeroControl. 19 Independent Check Point (ICP) were used to determine the accuracy. Horizontal accuracy is 0.356 meters and vertical accuracy is 0.483 meters. Data with this accuracy can be used for 1:2.500 map scale project.

  2. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an accompanyin

  3. SAMHSA Federated Datasets

    Data.gov (United States)

    Substance Abuse and Mental Health Services Administration, Department of Health and Human Services — This link provides a temporary method of accessing SAMHSA datasets that are found on the interactive portion of the Data.gov catalog. This is a temporary solution...

  4. Standardization of a geo-referenced fishing data set for the Indian Ocean bigeye tuna, Thunnus obesus (1952-2014)

    Science.gov (United States)

    Wibawa, Teja A.; Lehodey, Patrick; Senina, Inna

    2017-02-01

    Geo-referenced catch and fishing effort data of the bigeye tuna fisheries in the Indian Ocean over 1952-2014 were analyzed and standardized to facilitate population dynamics modeling studies. During this 62-year historical period of exploitation, many changes occurred both in the fishing techniques and the monitoring of activity. This study includes a series of processing steps used for standardization of spatial resolution, conversion and standardization of catch and effort units, raising of geo-referenced catch into nominal catch level, screening and correction of outliers, and detection of major catchability changes over long time series of fishing data, i.e., the Japanese longline fleet operating in the tropical Indian Ocean. A total of 30 fisheries were finally determined from longline, purse seine and other-gears data sets, from which 10 longline and 4 purse seine fisheries represented 96 % of the whole historical geo-referenced catch. Nevertheless, one-third of total nominal catch is still not included due to a total lack of geo-referenced information and would need to be processed separately, accordingly to the requirements of the study. The geo-referenced records of catch, fishing effort and associated length frequency samples of all fisheries are available at http://dx.doi.org/10.1594/PANGAEA.864154" target="_blank">doi:10.1594/PANGAEA.864154.

  5. Evaluation of georeferencing methods with respect to their suitability to address unsimilarity between the image to be referenced and the reference image

    Science.gov (United States)

    Brüstle, Stefan; Erdnüß, Bastian

    2016-10-01

    In recent years, operational costs of unmanned aircraft systems (UAS) have been massively decreasing. New sensors satisfying weight and size restrictions of even small UAS cover many different spectral ranges and spatial resolutions. This results in airborne imagery having become more and more available. Such imagery is used to address many different tasks in various fields of application. For many of those tasks, not only the content of the imagery itself is of interest, but also its spatial location. This requires the imagery to be properly georeferenced. Many UAS have an integrated GPS receiver together with some kind of INS device acquiring the sensor orientation to provide the georeference. However, both GPS and INS data can easily become unavailable for a period of time during a flight, e.g. due to sensor malfunction, transmission problems or jamming. Imagery gathered during such times lacks georeference. Moreover, even in datasets not affected by such problems, GPS and INS inaccuracies together with a potentially poor knowledge of ground elevation can render location information accuracy less than sufficient for a given task. To provide or improve the georeference of an image affected by this, an image to reference registration can be performed if a suitable reference is available, e.g. a georeferenced orthophoto covering the area of the image to be georeferenced. Registration and thus georeferencing is achieved by determining a transformation between the image to be referenced and the reference which maximizes the coincidence of relevant structures present both in the former and the latter. Many methods have been developed to accomplish this task. Regardless of their differences they usually tend to perform the better the more similar an image and a reference are in appearance. This contribution evaluates a selection of such methods all differing in the type of structure they use for the assessment of coincidence with respect to their ability to tolerate

  6. Wiki-talk Datasets

    OpenAIRE

    Sun, Jun; Kunegis, Jérôme

    2016-01-01

    User interaction networks of Wikipedia of 28 different languages. Nodes (orininal wikipedia user IDs) represent users of the Wikipedia, and an edge from user A to user B denotes that user A wrote a message on the talk page of user B at a certain timestamp. More info: http://yfiua.github.io/academic/2016/02/14/wiki-talk-datasets.html

  7. Vision-Based Unmanned Aerial Vehicle Navigation Using Geo-Referenced Information

    Science.gov (United States)

    Conte, Gianpaolo; Doherty, Patrick

    2009-12-01

    This paper investigates the possibility of augmenting an Unmanned Aerial Vehicle (UAV) navigation system with a passive video camera in order to cope with long-term GPS outages. The paper proposes a vision-based navigation architecture which combines inertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image. The vision-aided navigation system developed is capable of providing high-rate and drift-free state estimation for UAV autonomous navigation without the GPS system. Due to the use of image-to-map registration for absolute position calculation, drift-free position performance depends on the structural characteristics of the terrain. Experimental evaluation of the approach based on offline flight data is provided. In addition the architecture proposed has been implemented on-board an experimental UAV helicopter platform and tested during vision-based autonomous flights.

  8. Vision-Based Unmanned Aerial Vehicle Navigation Using Geo-Referenced Information

    Directory of Open Access Journals (Sweden)

    Gianpaolo Conte

    2009-01-01

    Full Text Available This paper investigates the possibility of augmenting an Unmanned Aerial Vehicle (UAV navigation system with a passive video camera in order to cope with long-term GPS outages. The paper proposes a vision-based navigation architecture which combines inertial sensors, visual odometry, and registration of the on-board video to a geo-referenced aerial image. The vision-aided navigation system developed is capable of providing high-rate and drift-free state estimation for UAV autonomous navigation without the GPS system. Due to the use of image-to-map registration for absolute position calculation, drift-free position performance depends on the structural characteristics of the terrain. Experimental evaluation of the approach based on offline flight data is provided. In addition the architecture proposed has been implemented on-board an experimental UAV helicopter platform and tested during vision-based autonomous flights.

  9. The use of open data from social media for the creation of 3D georeferenced modeling

    Science.gov (United States)

    Themistocleous, Kyriacos

    2016-08-01

    There is a great deal of open source video on the internet that is posted by users on social media sites. With the release of low-cost unmanned aerial vehicles, many hobbyists are uploading videos from different locations, especially in remote areas. Using open source data that is available on the internet, this study utilized structure to motion (SfM) as a range imaging technique to estimate 3 dimensional landscape features from 2 dimensional image sequences subtracted from video, applied image distortion correction and geo-referencing. This type of documentation may be necessary for cultural heritage sites that are inaccessible or documentation is difficult, where we can access video from Unmanned Aerial Vehicles (UAV). These 3D models can be viewed using Google Earth, create orthoimage, drawings and create digital terrain modeling for cultural heritage and archaeological purposes in remote or inaccessible areas.

  10. The influence of the in situ camera calibration for direct georeferencing of aerial imagery

    Science.gov (United States)

    Mitishita, E.; Barrios, R.; Centeno, J.

    2014-11-01

    The direct determination of exterior orientation parameters (EOPs) of aerial images via GNSS/INS technologies is an essential prerequisite in photogrammetric mapping nowadays. Although direct sensor orientation technologies provide a high degree of automation in the process due to the GNSS/INS technologies, the accuracies of the obtained results depend on the quality of a group of parameters that models accurately the conditions of the system at the moment the job is performed. One sub-group of parameters (lever arm offsets and boresight misalignments) models the position and orientation of the sensors with respect to the IMU body frame due to the impossibility of having all sensors on the same position and orientation in the airborne platform. Another sub-group of parameters models the internal characteristics of the sensor (IOP). A system calibration procedure has been recommended by worldwide studies to obtain accurate parameters (mounting and sensor characteristics) for applications of the direct sensor orientation. Commonly, mounting and sensor characteristics are not stable; they can vary in different flight conditions. The system calibration requires a geometric arrangement of the flight and/or control points to decouple correlated parameters, which are not available in the conventional photogrammetric flight. Considering this difficulty, this study investigates the feasibility of the in situ camera calibration to improve the accuracy of the direct georeferencing of aerial images. The camera calibration uses a minimum image block, extracted from the conventional photogrammetric flight, and control point arrangement. A digital Vexcel UltraCam XP camera connected to POS AV TM system was used to get two photogrammetric image blocks. The blocks have different flight directions and opposite flight line. In situ calibration procedures to compute different sets of IOPs are performed and their results are analyzed and used in photogrammetric experiments. The IOPs

  11. Public Schools

    Data.gov (United States)

    Department of Homeland Security — This Public Schools feature dataset is composed of all Public elementary and secondary education in the United States as defined by the Common Core of Data, National...

  12. Methods for Georeferencing and Spectral Scaling of Remote Imagery using ArcView, ArcGIS, and ENVI

    Science.gov (United States)

    Remote sensing images can be used to support variable-rate (VR) application of material from aircraft. Geographic coordinates must be assigned to an image (georeferenced) so that the variable-rate system can determine where in the field to apply these inputs and adjust the system when a zone has bee...

  13. Accuracy Assessment of Direct Georeferencing for Photogrammetric Applications on Small Unmanned Aerial Platforms

    Science.gov (United States)

    Mian, O.; Lutes, J.; Lipa, G.; Hutton, J. J.; Gavelle, E.; Borghini, S.

    2016-03-01

    Efficient mapping from unmanned aerial platforms cannot rely on aerial triangulation using known ground control points. The cost and time of setting ground control, added to the need for increased overlap between flight lines, severely limits the ability of small VTOL platforms, in particular, to handle mapping-grade missions of all but the very smallest survey areas. Applanix has brought its experience in manned photogrammetry applications to this challenge, setting out the requirements for increasing the efficiency of mapping operations from small UAVs, using survey-grade GNSS-Inertial technology to accomplish direct georeferencing of the platform and/or the imaging payload. The Direct Mapping Solution for Unmanned Aerial Vehicles (DMS-UAV) is a complete and ready-to-integrate OEM solution for Direct Georeferencing (DG) on unmanned aerial platforms. Designed as a solution for systems integrators to create mapping payloads for UAVs of all types and sizes, the DMS produces directly georeferenced products for any imaging payload (visual, LiDAR, infrared, multispectral imaging, even video). Additionally, DMS addresses the airframe's requirements for high-accuracy position and orientation for such tasks as precision RTK landing and Precision Orientation for Air Data Systems (ADS), Guidance and Control. This paper presents results using a DMS comprised of an Applanix APX-15 UAV with a Sony a7R camera to produce highly accurate orthorectified imagery without Ground Control Points on a Microdrones md4-1000 platform conducted by Applanix and Avyon. APX-15 UAV is a single-board, small-form-factor GNSS-Inertial system designed for use on small, lightweight platforms. The Sony a7R is a prosumer digital RGB camera sensor, with a 36MP, 4.9-micron CCD producing images at 7360 columns by 4912 rows. It was configured with a 50mm AF-S Nikkor f/1.8 lens and subsequently with a 35mm Zeiss Sonnar T* FE F2.8 lens. Both the camera/lens combinations and the APX-15 were mounted to a

  14. Pilgrims Face Recognition Dataset -- HUFRD

    OpenAIRE

    Aly, Salah A.

    2012-01-01

    In this work, we define a new pilgrims face recognition dataset, called HUFRD dataset. The new developed dataset presents various pilgrims' images taken from outside the Holy Masjid El-Harram in Makkah during the 2011-2012 Hajj and Umrah seasons. Such dataset will be used to test our developed facial recognition and detection algorithms, as well as assess in the missing and found recognition system \\cite{crowdsensing}.

  15. Accuracy analysis of direct georeferenced UAV images utilising low-cost navigation sensors

    Science.gov (United States)

    Briese, Christian; Wieser, Martin; Verhoeven, Geert; Glira, Philipp; Doneus, Michael; Pfeifer, Norbert

    2014-05-01

    Unmanned aerial vehicles (UAVs), also known as unmanned airborne systems (UAS) or remotely piloted airborne systems (RPAS), are an established platform for close range airborne photogrammetry. Compared to manned platforms, the acquisition of local remote sensing data by UAVs is a convenient and very flexible option. For the application in photogrammetry UAVs are typically equipped with an autopilot and a lightweight digital camera. The autopilot includes several navigation sensors, which might allow an automated waypoint flight and offer a systematic data acquisition of the object resp. scene of interest. Assuming a sufficient overlap between the captured images, the position (3 coordinates: x, y, z) and the orientation (3 angles: roll, pitch, yaw) of the images can be estimated within a bundle block adjustment. Subsequently, coordinates of observed points that appear in at least two images, can be determined by measuring their image coordinates or a dense surface model can be generated from all acquired images by automated image matching. For the bundle block adjustment approximate values of the position and the orientation of the images are needed. To gather this information, several methods exist. We introduce in this contribution one of them: the direct georeferencing of images by using the navigation sensors (mainly GNSS and INS) of a low-cost on-board autopilot. Beside automated flights, the autopilot offers the possibility to record the position and the orientation of the platform during the flight. These values don't correspond directly to those of the images. To compute the position and the orientation of the images two requirements must be fulfilled. First the misalignment angles and the positional differences between the camera and the autopilot must be determined (mounting calibration). Second the synchronization between the camera and the autopilot has to be established. Due to the limited accuracy of the navigation sensors, a small number of ground

  16. Quality Visualization of Microarray Datasets Using Circos

    Directory of Open Access Journals (Sweden)

    Martin Koch

    2012-08-01

    Full Text Available Quality control and normalization is considered the most important step in the analysis of microarray data. At present there are various methods available for quality assessments of microarray datasets. However there seems to be no standard visualization routine, which also depicts individual microarray quality. Here we present a convenient method for visualizing the results of standard quality control tests using Circos plots. In these plots various quality measurements are drawn in a circular fashion, thus allowing for visualization of the quality and all outliers of each distinct array within a microarray dataset. The proposed method is intended for use with the Affymetrix Human Genome platform (i.e., GPL 96, GPL570 and GPL571. Circos quality measurement plots are a convenient way for the initial quality estimate of Affymetrix datasets that are stored in publicly available databases.

  17. Georeferenced measurement of soil EC as a tool to detect susceptible areas to water erosion.

    Science.gov (United States)

    Fabian Sallesses, Leonardo; Aparicio, Virginia Carolina; Costa, Jose Luis

    2017-04-01

    The Southeast region of Buenos Aires Province, Argentina, is one of the main region for the cultivation of potato (Solanum tuberosum L.) in that country. The implementation of complementary irrigation for potato cultivation meant an increase in yield of up to 60%. Therefore, all potato production in the region is under irrigation. In this way, the area under central pivot irrigation has increased to 150% in the last two decades. The water used for irrigation in that region is underground with a high concentration of sodium bicarbonate. The combination of irrigation and rain increases the sodium absorption ratio of soil (SARs), consequently raising the clay dispersion and reducing infiltration. A reduction in infiltration means greater partitioning of precipitation into runoff. The degree of slope of the terrain, added to its length, increases the erosive potential of runoff water. The content of dissolved salts, in combination with the water content, affect the apparent Electrical Conductivity of the soil (EC), which is directly related to the concentration of Na + 2 in the soil solution. In August 2016, severe rill erosion was detected in a productive plot of 300 ha. The predecessor crop was a potato under irrigation campaign. However the history of the lot consists of various winter and summer crops, always made in dry land and no till. Cumulative rainfall from harvest to erosion detection (four months) was 250 mm. A georeferenced EC measurement was performed using the Verys 3100® contact sensor. With the data obtained, a geostatistical analysis was performed using Kriging spatial interpolation. The maps obtained were processed, dividing them into 4 EC ranges. The values and amplitude of the CEa ranges for each lot were determined according to the distribution observed in the generated histograms. It was observed a distribution of elevated EC ranges and consequently of a higher concentration of Na+ 2 coincident with the irrigation areas of the pivots. These

  18. Old Persian corpus [Dataset

    NARCIS (Netherlands)

    Bavant, M.

    2011-01-01

    XML Old Persian corpus. The corpus is based on publicly available data on the Web. Those data can be traced back to the grammar of Old Persian by Kent (1950). The corpus contains those data and is arranged in a way suitable for corpus searches.

  19. The development of an UAV borne direct georeferenced photogrammetric platform for Ground Control Point free applications.

    Science.gov (United States)

    Chiang, Kai-Wei; Tsai, Meng-Lun; Chu, Chien-Hsun

    2012-01-01

    To facilitate applications such as environment detection or disaster monitoring, the development of rapid low cost systems for collecting near real time spatial information is very critical. Rapid spatial information collection has become an emerging trend for remote sensing and mapping applications. In this study, a fixed-wing Unmanned Aerial Vehicle (UAV)-based spatial information acquisition platform that can operate in Ground Control Point (GCP) free environments is developed and evaluated. The proposed UAV based photogrammetric platform has a Direct Georeferencing (DG) module that includes a low cost Micro Electro Mechanical Systems (MEMS) Inertial Navigation System (INS)/Global Positioning System (GPS) integrated system. The DG module is able to provide GPS single frequency carrier phase measurements for differential processing to obtain sufficient positioning accuracy. All necessary calibration procedures are implemented. Ultimately, a flight test is performed to verify the positioning accuracy in DG mode without using GCPs. The preliminary results of positioning accuracy in DG mode illustrate that horizontal positioning accuracies in the x and y axes are around 5 m at 300 m flight height above the ground. The positioning accuracy of the z axis is below 10 m. Therefore, the proposed platform is relatively safe and inexpensive for collecting critical spatial information for urgent response such as disaster relief and assessment applications where GCPs are not available.

  20. Georeferencing in Gnss-Challenged Environment: Integrating Uwb and Imu Technologies

    Science.gov (United States)

    Toth, C. K.; Koppanyi, Z.; Navratil, V.; Grejner-Brzezinska, D.

    2017-05-01

    Acquiring geospatial data in GNSS compromised environments remains a problem in mapping and positioning in general. Urban canyons, heavily vegetated areas, indoor environments represent different levels of GNSS signal availability from weak to no signal reception. Even outdoors, with multiple GNSS systems, with an ever-increasing number of satellites, there are many situations with limited or no access to GNSS signals. Independent navigation sensors, such as IMU can provide high-data rate information but their initial accuracy degrades quickly, as the measurement data drift over time unless positioning fixes are provided from another source. At The Ohio State University's Satellite Positioning and Inertial Navigation (SPIN) Laboratory, as one feasible solution, Ultra- Wideband (UWB) radio units are used to aid positioning and navigating in GNSS compromised environments, including indoor and outdoor scenarios. Here we report about experiences obtained with georeferencing a pushcart based sensor system under canopied areas. The positioning system is based on UWB and IMU sensor integration, and provides sensor platform orientation for an electromagnetic inference (EMI) sensor. Performance evaluation results are provided for various test scenarios, confirming acceptable results for applications where high accuracy is not required.

  1. Obesity and fast food in urban markets: a new approach using geo-referenced micro data.

    Science.gov (United States)

    Chen, Susan Elizabeth; Florax, Raymond J; Snyder, Samantha D

    2013-07-01

    This paper presents a new method of assessing the relationship between features of the built environment and obesity, particularly in urban areas. Our empirical application combines georeferenced data on the location of fast-food restaurants with data about personal health, behavioral, and neighborhood characteristics. We define a 'local food environment' for every individual utilizing buffers around a person's home address. Individual food landscapes are potentially endogenous because of spatial sorting of the population and food outlets, and the body mass index (BMI) values for individuals living close to each other are likely to be spatially correlated because of observed and unobserved individual and neighborhood effects. The potential biases associated with endogeneity and spatial correlation are handled using spatial econometric estimation techniques. Our application provides quantitative estimates of the effect of proximity to fast-food restaurants on obesity in an urban food market. We also present estimates of a policy simulation that focuses on reducing the density of fast-food restaurants in urban areas. In the simulations, we account for spatial heterogeneity in both the policy instruments and individual neighborhoods and find a small effect for the hypothesized relationships between individual BMI values and the density of fast-food restaurants.

  2. GEO-REFERENCED MAPPING USING AN AIRBORNE 3D TIME-OF-FLIGHT CAMERA

    Directory of Open Access Journals (Sweden)

    T. K. Kohoutek

    2012-09-01

    Full Text Available This paper presents the first experience of a close range bird's eye view photogrammetry with range imaging (RIM sensors for the real time generation of high resolution geo-referenced 3D surface models. The aim of this study was to develop a mobile, versatile and less costly outdoor survey methodology to measure natural surfaces compared to the terrestrial laser scanning (TLS. Two commercial RIM cameras (SR4000 by MESA Imaging AG and a CamCube 2.0 by PMDTechnologies GmbH were mounted on a lightweight crane and on an unmanned aerial vehicle (UAV. The field experiments revealed various challenges in real time deployment of the two state-of-the-art RIM systems, e.g. processing of the large data volume. Acquisition strategy and data processing and first measurements are presented. The precision of the measured distances is less than 1 cm for good conditions. However, the measurement precision degraded under the test conditions due to direct sunlight, strong illumination contrasts and helicopter vibrations.

  3. THE DIRECT GEOREFERENCING APPLICATION AND PERFORMANCE ANALYSIS OF UAV HELICOPTER IN GCP-FREE AREA

    Directory of Open Access Journals (Sweden)

    C. F. Lo

    2015-08-01

    Full Text Available There are many disasters happened because the weather changes extremely in these years. To facilitate applications such as environment detection or monitoring becomes very important. Therefore, the development of rapid low cost systems for collecting near real-time spatial information is very critical. Rapid spatial information collection has become an emerging trend for remote sensing and mapping applications. This study develops a Direct Georeferencing (DG based Unmanned Aerial Vehicle (UAV helicopter photogrammetric platform where an Inertial Navigation System (INS/Global Navigation Satellite System (GNSS integrated Positioning and Orientation System (POS system is implemented to provide the DG capability of the platform. The performance verification indicates that the proposed platform can capture aerial images successfully. A flight test is performed to verify the positioning accuracy in DG mode without using Ground Control Points (GCP. The preliminary results illustrate that horizontal DG positioning accuracies in the x and y axes are around 5 meter with 100 meter flight height. The positioning accuracy in the z axis is less than 10 meter. Such accuracy is good for near real-time disaster relief. The DG ready function of proposed platform guarantees mapping and positioning capability even in GCP free environments, which is very important for rapid urgent response for disaster relief. Generally speaking, the data processing time for the DG module, including POS solution generalization, interpolation, Exterior Orientation Parameters (EOP generation, and feature point measurements, is less than 1 hour.

  4. Georeferenced Scanning System to Estimate the Leaf Wall Area in Tree Crops

    Directory of Open Access Journals (Sweden)

    Ignacio del-Moral-Martínez

    2015-04-01

    Full Text Available This paper presents the use of a terrestrial light detection and ranging (LiDAR system to scan the vegetation of tree crops to estimate the so-called pixelated leaf wall area (PLWA. Scanning rows laterally and considering only the half-canopy vegetation to the line of the trunks, PLWA refers to the vertical projected area without gaps detected by LiDAR. As defined, PLWA may be different depending on the side from which the LiDAR is applied. The system is completed by a real-time kinematic global positioning system (RTK-GPS sensor and an inertial measurement unit (IMU sensor for positioning. At the end, a total leaf wall area (LWA is computed and assigned to the X, Y position of each vertical scan. The final value of the area depends on the distance between two consecutive scans (or horizontal resolution, as well as the number of intercepted points within each scan, since PLWA is only computed when the laser beam detects vegetation. To verify system performance, tests were conducted related to the georeferencing task and synchronization problems between GPS time and central processing unit (CPU time. Despite this, the overall accuracy of the system is generally acceptable. The Leaf Area Index (LAI can then be estimated using PLWA as an explanatory variable in appropriate linear regression models.

  5. Georeferenced scanning system to estimate the leaf wall area in tree crops.

    Science.gov (United States)

    del-Moral-Martínez, Ignacio; Arnó, Jaume; Escolà, Alexandre; Sanz, Ricardo; Masip-Vilalta, Joan; Company-Messa, Joaquim; Rosell-Polo, Joan R

    2015-04-10

    This paper presents the use of a terrestrial light detection and ranging (LiDAR) system to scan the vegetation of tree crops to estimate the so-called pixelated leaf wall area (PLWA). Scanning rows laterally and considering only the half-canopy vegetation to the line of the trunks, PLWA refers to the vertical projected area without gaps detected by LiDAR. As defined, PLWA may be different depending on the side from which the LiDAR is applied. The system is completed by a real-time kinematic global positioning system (RTK-GPS) sensor and an inertial measurement unit (IMU) sensor for positioning. At the end, a total leaf wall area (LWA) is computed and assigned to the X, Y position of each vertical scan. The final value of the area depends on the distance between two consecutive scans (or horizontal resolution), as well as the number of intercepted points within each scan, since PLWA is only computed when the laser beam detects vegetation. To verify system performance, tests were conducted related to the georeferencing task and synchronization problems between GPS time and central processing unit (CPU) time. Despite this, the overall accuracy of the system is generally acceptable. The Leaf Area Index (LAI) can then be estimated using PLWA as an explanatory variable in appropriate linear regression models.

  6. The Development of an UAV Borne Direct Georeferenced Photogrammetric Platform for Ground Control Point Free Applications

    Directory of Open Access Journals (Sweden)

    Chien-Hsun Chu

    2012-07-01

    Full Text Available To facilitate applications such as environment detection or disaster monitoring, the development of rapid low cost systems for collecting near real time spatial information is very critical. Rapid spatial information collection has become an emerging trend for remote sensing and mapping applications. In this study, a fixed-wing Unmanned Aerial Vehicle (UAV-based spatial information acquisition platform that can operate in Ground Control Point (GCP free environments is developed and evaluated. The proposed UAV based photogrammetric platform has a Direct Georeferencing (DG module that includes a low cost Micro Electro Mechanical Systems (MEMS Inertial Navigation System (INS/ Global Positioning System (GPS integrated system. The DG module is able to provide GPS single frequency carrier phase measurements for differential processing to obtain sufficient positioning accuracy. All necessary calibration procedures are implemented. Ultimately, a flight test is performed to verify the positioning accuracy in DG mode without using GCPs. The preliminary results of positioning accuracy in DG mode illustrate that horizontal positioning accuracies in the x and y axes are around 5 m at 300 m flight height above the ground. The positioning accuracy of the z axis is below 10 m. Therefore, the proposed platform is relatively safe and inexpensive for collecting critical spatial information for urgent response such as disaster relief and assessment applications where GCPs are not available.

  7. GEOREFERENCING IN GNSS-CHALLENGED ENVIRONMENT: INTEGRATING UWB AND IMU TECHNOLOGIES

    Directory of Open Access Journals (Sweden)

    C. K. Toth

    2017-05-01

    Full Text Available Acquiring geospatial data in GNSS compromised environments remains a problem in mapping and positioning in general. Urban canyons, heavily vegetated areas, indoor environments represent different levels of GNSS signal availability from weak to no signal reception. Even outdoors, with multiple GNSS systems, with an ever-increasing number of satellites, there are many situations with limited or no access to GNSS signals. Independent navigation sensors, such as IMU can provide high-data rate information but their initial accuracy degrades quickly, as the measurement data drift over time unless positioning fixes are provided from another source. At The Ohio State University’s Satellite Positioning and Inertial Navigation (SPIN Laboratory, as one feasible solution, Ultra- Wideband (UWB radio units are used to aid positioning and navigating in GNSS compromised environments, including indoor and outdoor scenarios. Here we report about experiences obtained with georeferencing a pushcart based sensor system under canopied areas. The positioning system is based on UWB and IMU sensor integration, and provides sensor platform orientation for an electromagnetic inference (EMI sensor. Performance evaluation results are provided for various test scenarios, confirming acceptable results for applications where high accuracy is not required.

  8. Low aerial imagery - an assessment of georeferencing errors and the potential for use in environmental inventory

    Science.gov (United States)

    Smaczyński, Maciej; Medyńska-Gulij, Beata

    2017-06-01

    Unmanned aerial vehicles are increasingly being used in close range photogrammetry. Real-time observation of the Earth's surface and the photogrammetric images obtained are used as material for surveying and environmental inventory. The following study was conducted on a small area (approximately 1 ha). In such cases, the classical method of topographic mapping is not accurate enough. The geodetic method of topographic surveying, on the other hand, is an overly precise measurement technique for the purpose of inventorying the natural environment components. The author of the following study has proposed using the unmanned aerial vehicle technology and tying in the obtained images to the control point network established with the aid of GNSS technology. Georeferencing the acquired images and using them to create a photogrammetric model of the studied area enabled the researcher to perform calculations, which yielded a total root mean square error below 9 cm. The performed comparison of the real lengths of the vectors connecting the control points and their lengths calculated on the basis of the photogrammetric model made it possible to fully confirm the RMSE calculated and prove the usefulness of the UAV technology in observing terrain components for the purpose of environmental inventory. Such environmental components include, among others, elements of road infrastructure, green areas, but also changes in the location of moving pedestrians and vehicles, as well as other changes in the natural environment that are not registered on classical base maps or topographic maps.

  9. An Imaging Sensor-Aided Vision Navigation Approach that Uses a Geo-Referenced Image Database

    Directory of Open Access Journals (Sweden)

    Yan Li

    2016-01-01

    Full Text Available In determining position and attitude, vision navigation via real-time image processing of data collected from imaging sensors is advanced without a high-performance global positioning system (GPS and an inertial measurement unit (IMU. Vision navigation is widely used in indoor navigation, far space navigation, and multiple sensor-integrated mobile mapping. This paper proposes a novel vision navigation approach aided by imaging sensors and that uses a high-accuracy geo-referenced image database (GRID for high-precision navigation of multiple sensor platforms in environments with poor GPS. First, the framework of GRID-aided vision navigation is developed with sequence images from land-based mobile mapping systems that integrate multiple sensors. Second, a highly efficient GRID storage management model is established based on the linear index of a road segment for fast image searches and retrieval. Third, a robust image matching algorithm is presented to search and match a real-time image with the GRID. Subsequently, the image matched with the real-time scene is considered to calculate the 3D navigation parameter of multiple sensor platforms. Experimental results show that the proposed approach retrieves images efficiently and has navigation accuracies of 1.2 m in a plane and 1.8 m in height under GPS loss in 5 min and within 1500 m.

  10. Seamless Positioning and Navigation by Using Geo-Referenced Images and Multi-Sensor Data

    Directory of Open Access Journals (Sweden)

    Tao Li

    2013-07-01

    Full Text Available Ubiquitous positioning is considered to be a highly demanding application for today’s Location-Based Services (LBS. While satellite-based navigation has achieved great advances in the past few decades, positioning and navigation in indoor scenarios and deep urban areas has remained a challenging topic of substantial research interest. Various strategies have been adopted to fill this gap, within which vision-based methods have attracted growing attention due to the widespread use of cameras on mobile devices. However, current vision-based methods using image processing have yet to revealed their full potential for navigation applications and are insufficient in many aspects. Therefore in this paper, we present a hybrid image-based positioning system that is intended to provide seamless position solution in six degrees of freedom (6DoF for location-based services in both outdoor and indoor environments. It mainly uses visual sensor input to match with geo-referenced images for image-based positioning resolution, and also takes advantage of multiple onboard sensors, including the built-in GPS receiver and digital compass to assist visual methods. Experiments demonstrate that such a system can greatly improve the position accuracy for areas where the GPS signal is negatively affected (such as in urban canyons, and it also provides excellent position accuracy for indoor environments.

  11. Seamless positioning and navigation by using geo-referenced images and multi-sensor data.

    Science.gov (United States)

    Li, Xun; Wang, Jinling; Li, Tao

    2013-07-12

    Ubiquitous positioning is considered to be a highly demanding application for today's Location-Based Services (LBS). While satellite-based navigation has achieved great advances in the past few decades, positioning and navigation in indoor scenarios and deep urban areas has remained a challenging topic of substantial research interest. Various strategies have been adopted to fill this gap, within which vision-based methods have attracted growing attention due to the widespread use of cameras on mobile devices. However, current vision-based methods using image processing have yet to revealed their full potential for navigation applications and are insufficient in many aspects. Therefore in this paper, we present a hybrid image-based positioning system that is intended to provide seamless position solution in six degrees of freedom (6DoF) for location-based services in both outdoor and indoor environments. It mainly uses visual sensor input to match with geo-referenced images for image-based positioning resolution, and also takes advantage of multiple onboard sensors, including the built-in GPS receiver and digital compass to assist visual methods. Experiments demonstrate that such a system can greatly improve the position accuracy for areas where the GPS signal is negatively affected (such as in urban canyons), and it also provides excellent position accuracy for indoor environments.

  12. An Imaging Sensor-Aided Vision Navigation Approach that Uses a Geo-Referenced Image Database.

    Science.gov (United States)

    Li, Yan; Hu, Qingwu; Wu, Meng; Gao, Yang

    2016-01-28

    In determining position and attitude, vision navigation via real-time image processing of data collected from imaging sensors is advanced without a high-performance global positioning system (GPS) and an inertial measurement unit (IMU). Vision navigation is widely used in indoor navigation, far space navigation, and multiple sensor-integrated mobile mapping. This paper proposes a novel vision navigation approach aided by imaging sensors and that uses a high-accuracy geo-referenced image database (GRID) for high-precision navigation of multiple sensor platforms in environments with poor GPS. First, the framework of GRID-aided vision navigation is developed with sequence images from land-based mobile mapping systems that integrate multiple sensors. Second, a highly efficient GRID storage management model is established based on the linear index of a road segment for fast image searches and retrieval. Third, a robust image matching algorithm is presented to search and match a real-time image with the GRID. Subsequently, the image matched with the real-time scene is considered to calculate the 3D navigation parameter of multiple sensor platforms. Experimental results show that the proposed approach retrieves images efficiently and has navigation accuracies of 1.2 m in a plane and 1.8 m in height under GPS loss in 5 min and within 1500 m.

  13. The applications of geo-referenced data visualization technologies for GIS

    Science.gov (United States)

    Liu, Jie; Wang, Jiechen; Zhou, Yuji

    2007-06-01

    Geo-referenced data visualization is one of the most important components of geographic information systems. Over the past several years, geospatial data are growing much more in size and complexity than ever before, and researchers are engaged in doing a lot of works to visualize these diverse geospatial data by taking advantage of computer graphics which helps to convey information and amplify cognition and makes possible for more powerful participating exploration and discovery experience. This paper will discuss the related works on visualization for GIS. The first chapter of this paper is an introduction which will present an overview. In the second chapter, we will talk about the geo-virtual environment which closely related to the virtual reality concept. We will focus on representation of urban models, terrain rendering algorithms, and the problems we currently face. For the third part, we will talk about two young but promising fields, which are scientific visualization and information visualization. The brief history and the research issues of these two disciplines will be the main topic. Finally, we will make a outlook on the future works about human-computer interaction, and hardware acceleration.

  14. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  15. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  16. Assessing the Accuracy of Georeferenced Point Clouds Produced via Multi-View Stereopsis from Unmanned Aerial Vehicle (UAV Imagery

    Directory of Open Access Journals (Sweden)

    Arko Lucieer

    2012-05-01

    Full Text Available Sensor miniaturisation, improved battery technology and the availability of low-cost yet advanced Unmanned Aerial Vehicles (UAV have provided new opportunities for environmental remote sensing. The UAV provides a platform for close-range aerial photography. Detailed imagery captured from micro-UAV can produce dense point clouds using multi-view stereopsis (MVS techniques combining photogrammetry and computer vision. This study applies MVS techniques to imagery acquired from a multi-rotor micro-UAV of a natural coastal site in southeastern Tasmania, Australia. A very dense point cloud ( < 1–3 cm point spacing is produced in an arbitrary coordinate system using full resolution imagery, whereas other studies usually downsample the original imagery. The point cloud is sparse in areas of complex vegetation and where surfaces have a homogeneous texture. Ground control points collected with Differential Global Positioning System (DGPS are identified and used for georeferencing via a Helmert transformation. This study compared georeferenced point clouds to a Total Station survey in order to assess and quantify their geometric accuracy. The results indicate that a georeferenced point cloud accurate to 25–40 mm can be obtained from imagery acquired from 50 m. UAV-based image capture provides the spatial and temporal resolution required to map and monitor natural landscapes. This paper assesses the accuracy of the generated point clouds based on field survey points. Based on our key findings we conclude that sub-decimetre terrain change (in this case coastal erosion can be monitored.

  17. Integrated GNSS attitude determination and positioning for direct geo-referencing.

    Science.gov (United States)

    Nadarajah, Nandakumaran; Paffenholz, Jens-André; Teunissen, Peter J G

    2014-07-17

    Direct geo-referencing is an efficient methodology for the fast acquisition of 3D spatial data. It requires the fusion of spatial data acquisition sensors with navigation sensors, such as Global Navigation Satellite System (GNSS) receivers. In this contribution, we consider an integrated GNSS navigation system to provide estimates of the position and attitude (orientation) of a 3D laser scanner. The proposed multi-sensor system (MSS) consists of multiple GNSS antennas rigidly mounted on the frame of a rotating laser scanner and a reference GNSS station with known coordinates. Precise GNSS navigation requires the resolution of the carrier phase ambiguities. The proposed method uses the multivariate constrained integer least-squares (MC-LAMBDA) method for the estimation of rotating frame ambiguities and attitude angles. MC-LAMBDA makes use of the known antenna geometry to strengthen the underlying attitude model and, hence, to enhance the reliability of rotating frame ambiguity resolution and attitude determination. The reliable estimation of rotating frame ambiguities is consequently utilized to enhance the relative positioning of the rotating frame with respect to the reference station. This integrated (array-aided) method improves ambiguity resolution, as well as positioning accuracy between the rotating frame and the reference station. Numerical analyses of GNSS data from a real-data campaign confirm the improved performance of the proposed method over the existing method. In particular, the integrated method yields reliable ambiguity resolution and reduces position standard deviation by a factor of about 0:8, matching the theoretical gain of √ 3/4 for two antennas on the rotating frame and a single antenna at the reference station.

  18. Preparedness for the Rio 2016 Olympic Games: hospital treatment capacity in georeferenced areas

    Directory of Open Access Journals (Sweden)

    Carolina Figueiredo Freitas

    2016-01-01

    Full Text Available Abstract: Recently, Brazil has hosted mass events with recognized international relevance. The 2014 FIFA World Cup was held in 12 Brazilian state capitals and health sector preparedness drew on the history of other World Cups and Brazil's own experience with the 2013 FIFA Confederations Cup. The current article aims to analyze the treatment capacity of hospital facilities in georeferenced areas for sports events in the 2016 Olympic Games in the city of Rio de Janeiro, based on a model built drawing on references from the literature. Source of data were Brazilian health databases and the Rio 2016 website. Sports venues for the Olympic Games and surrounding hospitals in a 10km radius were located by geoprocessing and designated a "health area" referring to the probable inflow of persons to be treated in case of hospital referral. Six different factors were used to calculate needs for surge and one was used to calculate needs in case of disasters (20/1,000. Hospital treatment capacity is defined by the coincidence of beds and life support equipment, namely the number of cardiac monitors (electrocardiographs and ventilators in each hospital unit. Maracanã followed by the Olympic Stadium (Engenhão and the Sambódromo would have the highest single demand for hospitalizations (1,572, 1,200 and 600, respectively. Hospital treatment capacity proved capable of accommodating surges, but insufficient in cases of mass casualties. In mass events most treatments involve easy clinical management, it is expected that the current capacity will not have negative consequences for participants.

  19. Integrated GNSS Attitude Determination and Positioning for Direct Geo-Referencing

    Directory of Open Access Journals (Sweden)

    Nandakumaran Nadarajah

    2014-07-01

    Full Text Available Direct geo-referencing is an efficient methodology for the fast acquisition of 3D spatial data. It requires the fusion of spatial data acquisition sensors with navigation sensors, such as Global Navigation Satellite System (GNSS receivers. In this contribution, we consider an integrated GNSS navigation system to provide estimates of the position and attitude (orientation of a 3D laser scanner. The proposed multi-sensor system (MSS consists of multiple GNSS antennas rigidly mounted on the frame of a rotating laser scanner and a reference GNSS station with known coordinates. Precise GNSS navigation requires the resolution of the carrier phase ambiguities. The proposed method uses the multivariate constrained integer least-squares (MC-LAMBDA method for the estimation of rotating frame ambiguities and attitude angles. MC-LAMBDA makes use of the known antenna geometry to strengthen the underlying attitude model and, hence, to enhance the reliability of rotating frame ambiguity resolution and attitude determination. The reliable estimation of rotating frame ambiguities is consequently utilized to enhance the relative positioning of the rotating frame with respect to the reference station. This integrated (array-aided method improves ambiguity resolution, as well as positioning accuracy between the rotating frame and the reference station. Numerical analyses of GNSS data from a real-data campaign confirm the improved performance of the proposed method over the existing method. In particular, the integrated method yields reliable ambiguity resolution and reduces position standard deviation by a factor of about 0.8, matching the theoretical gain of 3/4 for two antennas on the rotating frame and a single antenna at the reference station.

  20. Geo-referenced modelling of metal concentrations in river basins at the catchment scale

    Science.gov (United States)

    Hüffmeyer, N.; Berlekamp, J.; Klasmeier, J.

    2009-04-01

    1. Introduction The European Water Framework Directive demands the good ecological and chemical state of surface waters [1]. This implies the reduction of unwanted metal concentrations in surface waters. To define reasonable environmental target values and to develop promising mitigation strategies a detailed exposure assessment is required. This includes the identification of emission sources and the evaluation of their effect on local and regional surface water concentrations. Point source emissions via municipal or industrial wastewater that collect metal loads from a wide variety of applications and products are important anthropogenic pathways into receiving waters. Natural background and historical influences from ore-mining activities may be another important factor. Non-point emissions occur via surface runoff and erosion from drained land area. Besides deposition metals can be deposited by fertilizer application or the use of metal products such as wires or metal fences. Surface water concentrations vary according to the emission strength of sources located nearby and upstream of the considered location. A direct link between specific emission sources and pathways on the one hand and observed concentrations can hardly be established by monitoring alone. Geo-referenced models such as GREAT-ER (Geo-referenced Regional Exposure Assessment Tool for European Rivers) deliver spatially resolved concentrations in a whole river basin and allow for evaluating the causal relationship between specific emissions and resulting concentrations. This study summarizes the results of investigations for the metals zinc and copper in three German catchments. 2. The model GREAT-ER The geo-referenced model GREAT-ER has originally been developed to simulate and assess chemical burden of European river systems from multiple emission sources [2]. Emission loads from private households and rainwater runoff are individually estimated based on average consumption figures, runoff rates

  1. FLUXNET2015 Dataset: Batteries included

    Science.gov (United States)

    Pastorello, G.; Papale, D.; Agarwal, D.; Trotta, C.; Chu, H.; Canfora, E.; Torn, M. S.; Baldocchi, D. D.

    2016-12-01

    The synthesis datasets have become one of the signature products of the FLUXNET global network. They are composed from contributions of individual site teams to regional networks, being then compiled into uniform data products - now used in a wide variety of research efforts: from plant-scale microbiology to global-scale climate change. The FLUXNET Marconi Dataset in 2000 was the first in the series, followed by the FLUXNET LaThuile Dataset in 2007, with significant additions of data products and coverage, solidifying the adoption of the datasets as a research tool. The FLUXNET2015 Dataset counts with another round of substantial improvements, including extended quality control processes and checks, use of downscaled reanalysis data for filling long gaps in micrometeorological variables, multiple methods for USTAR threshold estimation and flux partitioning, and uncertainty estimates - all of which accompanied by auxiliary flags. This "batteries included" approach provides a lot of information for someone who wants to explore the data (and the processing methods) in detail. This inevitably leads to a large number of data variables. Although dealing with all these variables might seem overwhelming at first, especially to someone looking at eddy covariance data for the first time, there is method to our madness. In this work we describe the data products and variables that are part of the FLUXNET2015 Dataset, and the rationale behind the organization of the dataset, covering the simplified version (labeled SUBSET), the complete version (labeled FULLSET), and the auxiliary products in the dataset.

  2. Georeferencing of museum collections: A review of problems and automated tools, and the methodology developed by the Mountain and Plains Spatio-Temporal Database-Informatics Initiative (Mapstedi)

    OpenAIRE

    Murphy, Paul C.; Guralnick, Robert P.; Glaubitz, Robert; Neufeld, David; Ryan, J. Allen

    2004-01-01

    The vast majority of locality descriptions associated with biological specimens housed in natural history museums lack the geographic coordinates required for computer-based geographic analyses. Assigning such coordinates to existing specimen records is a process called retrospective georeferencing. The georeferencing of biological collections makes those collections more valuable by allowing them to be used in spatially explicit biodiversity analyses. Here we review some of the most common p...

  3. An Accuracy Assessment of Georeferenced Point Clouds Produced via Multi-View Stereo Techniques Applied to Imagery Acquired via Unmanned Aerial Vehicle

    Science.gov (United States)

    Harwin, S.; Lucieer, A.

    2012-08-01

    Low-cost Unmanned Aerial Vehicles (UAVs) are becoming viable environmental remote sensing tools. Sensor and battery technology is expanding the data capture opportunities. The UAV, as a close range remote sensing platform, can capture high resolution photography on-demand. This imagery can be used to produce dense point clouds using multi-view stereopsis techniques (MVS) combining computer vision and photogrammetry. This study examines point clouds produced using MVS techniques applied to UAV and terrestrial photography. A multi-rotor micro UAV acquired aerial imagery from a altitude of approximately 30-40 m. The point clouds produced are extremely dense (study area, a 70 m section of sheltered coastline in southeast Tasmania. Areas with little surface texture were not well captured, similarly, areas with complex geometry such as grass tussocks and woody scrub were not well mapped. The process fails to penetrate vegetation, but extracts very detailed terrain in unvegetated areas. Initially the point clouds are in an arbitrary coordinate system and need to be georeferenced. A Helmert transformation is applied based on matching ground control points (GCPs) identified in the point clouds to GCPs surveying with differential GPS. These point clouds can be used, alongside laser scanning and more traditional techniques, to provide very detailed and precise representations of a range of landscapes at key moments. There are many potential applications for the UAV-MVS technique, including coastal erosion and accretion monitoring, mine surveying and other environmental monitoring applications. For the generated point clouds to be used in spatial applications they need to be converted to surface models that reduce dataset size without loosing too much detail. Triangulated meshes are one option, another is Poisson Surface Reconstruction. This latter option makes use of point normal data and produces a surface representation at greater detail than previously obtainable. This

  4. Development of a georeferenced data bank of radionuclides in typical food of Latin America - SIGLARA; Desenvolvimento de um banco de dados georeferenciado de radionuclideos em alimentos tipicos na America Latina - SIGLARA

    Energy Technology Data Exchange (ETDEWEB)

    Nascimento, Lucia Maria Evangelista do

    2014-07-01

    The related management information related to the environmental assessment activity aims to provide the world community with better access to meaningful environmental information and help use this information in making decisions in case of contamination due to accident or deliberate actions. In recent years, the geotechnologies acquired are fundamental to research and environmental monitoring, once it possible, efficiently obtaining large amount of data natural resources. The result of this work was the development of a database system to store georeferenced data values of radionuclides in typical foods in Latin America (SIGLARA), defined in three languages (Spanish, Portuguese and English), using free software. The developed system meets the primary need of the RLA 09/72 ARCAL Project, funded by the International Atomic Energy Agency (IAEA), as having eleven participants countries in Latin America. The database of georeferenced created for SIGLARA system was tested in its applicability through the entry and manipulation of real data analyzed, which showed that the system is able to store, retrieve, view reports and maps of the samples of registered food. Interfaces that connect the user with the database show up efficient, making the system easy operability. Their application to environmental management is already showing results, it is hoped that these results will encourage its widespread adoption by other countries, institutions, the scientific community and the general public. (author)

  5. Genomic Datasets for Cancer Research

    Science.gov (United States)

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  6. Atlantic Offshore Seabird Dataset Catalog

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — Several bureaus within the Department of Interior compiled available information from seabird observation datasets from the Atlantic Outer Continental Shelf into a...

  7. QUALITY ASSESSMENT OF COMBINED IMU/GNSS DATA FOR DIRECT GEOREFERENCING IN THE CONTEXT OF UAV-BASED MAPPING

    Directory of Open Access Journals (Sweden)

    C. Stöcker

    2017-08-01

    Full Text Available Within the past years, the development of high-quality Inertial Measurement Units (IMU and GNSS technology and dedicated RTK (Real Time Kinematic and PPK (Post-Processing Kinematic solutions for UAVs promise accurate measurements of the exterior orientation (EO parameters which allow to georeference the images. Whereas the positive impact of known precise GNSS coordinates of camera positions is already well studied, the influence of the angular observations have not been studied in depth so far. Challenges include accuracies of GNSS/IMU observations, excessive angular motion and time synchronization problems during the flight. Thus, this study assesses the final geometric accuracy using direct georeferencing with high-quality post-processed IMU/GNSS and PPK corrections. A comparison of different data processing scenarios including indirect georeferencing, integrated solutions as well as direct georeferencing provides guidance on the workability of UAV mapping approaches that require a high level of positional accuracy. In the current research the results show, that the use of the post-processed APX-15 GNSS and IMU data was particularly beneficial to enhance the image orientation quality. Horizontal accuracies within the pixel level (2.8 cm could be achieved. However, it was also shown, that the angular EO parameters are still too inaccurate to be assigned with a high weight during the image orientation process. Furthermore, detailed investigations of the EO parameters unveil that systematic sensor misalignments and offsets of the image block can be reduced by the introduction of four GCPs. In this regard, the use of PPK corrections reduces the time consuming field work to measure high quantities of GCPs and makes large-scale UAV mapping a more feasible solution for practitioners that require high geometric accuracies.

  8. Wild Type and PPAR KO Dataset

    Science.gov (United States)

    Data set 1 consists of the experimental data for the Wild Type and PPAR KO animal study and includes data used to prepare Figures 1-4 and Table 1 of the Das et al, 2016 paper.This dataset is associated with the following publication:Das, K., C. Wood, M. Lin, A.A. Starkov, C. Lau, K.B. Wallace, C. Corton, and B. Abbott. Perfluoroalky acids-induced liver steatosis: Effects on genes controlling lipid homeostasis. TOXICOLOGY. Elsevier Science Ltd, New York, NY, USA, 378: 32-52, (2017).

  9. Geoseq: a tool for dissecting deep-sequencing datasets

    OpenAIRE

    Homann Robert; George Ajish; Levovitz Chaya; Shah Hardik; Cancio Anthony; Gurtowski James; Sachidanandam Ravi

    2010-01-01

    Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments...

  10. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  11. Sensor Fusion of a Mobile Device to Control and Acquire Videos or Images of Coffee Branches and for Georeferencing Trees

    Directory of Open Access Journals (Sweden)

    Paula Jimena Ramos Giraldo

    2017-04-01

    Full Text Available Smartphones show potential for controlling and monitoring variables in agriculture. Their processing capacity, instrumentation, connectivity, low cost, and accessibility allow farmers (among other users in rural areas to operate them easily with applications adjusted to their specific needs. In this investigation, the integration of inertial sensors, a GPS, and a camera are presented for the monitoring of a coffee crop. An Android-based application was developed with two operating modes: (i Navigation: for georeferencing trees, which can be as close as 0.5 m from each other; and (ii Acquisition: control of video acquisition, based on the movement of the mobile device over a branch, and measurement of image quality, using clarity indexes to select the most appropriate frames for application in future processes. The integration of inertial sensors in navigation mode, shows a mean relative error of ±0.15 m, and total error ±5.15 m. In acquisition mode, the system correctly identifies the beginning and end of mobile phone movement in 99% of cases, and image quality is determined by means of a sharpness factor which measures blurriness. With the developed system, it will be possible to obtain georeferenced information about coffee trees, such as their production, nutritional state, and presence of plagues or diseases.

  12. Detecting bimodality in astronomical datasets

    Science.gov (United States)

    Ashman, Keith A.; Bird, Christina M.; Zepf, Stephen E.

    1994-01-01

    We discuss statistical techniques for detecting and quantifying bimodality in astronomical datasets. We concentrate on the KMM algorithm, which estimates the statistical significance of bimodality in such datasets and objectively partitions data into subpopulations. By simulating bimodal distributions with a range of properties we investigate the sensitivity of KMM to datasets with varying characteristics. Our results facilitate the planning of optimal observing strategies for systems where bimodality is suspected. Mixture-modeling algorithms similar to the KMM algorithm have been used in previous studies to partition the stellar population of the Milky Way into subsystems. We illustrate the broad applicability of KMM by analyzing published data on globular cluster metallicity distributions, velocity distributions of galaxies in clusters, and burst durations of gamma-ray sources. FORTRAN code for the KMM algorithm and directions for its use are available from the authors upon request.

  13. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  14. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  15. Statewide Datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a workspace (folder) containing four gridded datasets and a personal geodatabase. The gridded datasets are a grid of mean annual...

  16. Statewide datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a workspace (folder) containing 41 gridded datasets and a personal geodatabase. The gridded datasets consist of 28 precipitation-frequency...

  17. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  18. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7...

  19. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  20. Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

    Science.gov (United States)

    McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

    2017-01-01

    Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers.

  1. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  2. Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.

    Science.gov (United States)

    Rossi, Jean-Pierre; Nardin, Maxime; Godefroid, Martin; Ruiz-Diaz, Manuela; Sergent, Anne-Sophie; Martinez-Meier, Alejandro; Pâques, Luc; Rozenberg, Philippe

    2014-01-01

    Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA), a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller) during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i) the temporal evolution of spatial structures and ii) the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.

  3. Dissecting the space-time structure of tree-ring datasets using the partial triadic analysis.

    Directory of Open Access Journals (Sweden)

    Jean-Pierre Rossi

    Full Text Available Tree-ring datasets are used in a variety of circumstances, including archeology, climatology, forest ecology, and wood technology. These data are based on microdensity profiles and consist of a set of tree-ring descriptors, such as ring width or early/latewood density, measured for a set of individual trees. Because successive rings correspond to successive years, the resulting dataset is a ring variables × trees × time datacube. Multivariate statistical analyses, such as principal component analysis, have been widely used for extracting worthwhile information from ring datasets, but they typically address two-way matrices, such as ring variables × trees or ring variables × time. Here, we explore the potential of the partial triadic analysis (PTA, a multivariate method dedicated to the analysis of three-way datasets, to apprehend the space-time structure of tree-ring datasets. We analyzed a set of 11 tree-ring descriptors measured in 149 georeferenced individuals of European larch (Larix decidua Miller during the period of 1967-2007. The processing of densitometry profiles led to a set of ring descriptors for each tree and for each year from 1967-2007. The resulting three-way data table was subjected to two distinct analyses in order to explore i the temporal evolution of spatial structures and ii the spatial structure of temporal dynamics. We report the presence of a spatial structure common to the different years, highlighting the inter-individual variability of the ring descriptors at the stand scale. We found a temporal trajectory common to the trees that could be separated into a high and low frequency signal, corresponding to inter-annual variations possibly related to defoliation events and a long-term trend possibly related to climate change. We conclude that PTA is a powerful tool to unravel and hierarchize the different sources of variation within tree-ring datasets.

  4. DactyLoc : A minimally geo-referenced WiFi+GSM-fingerprint-based localization method for positioning in urban spaces

    DEFF Research Database (Denmark)

    Cujia, Kristian; Wirz, Martin; Kjærgaard, Mikkel Baun

    2012-01-01

    a collaborative, semi-supervised WiFi+GSM fingerprinting method where only a small fraction of all fingerprints needs to be geo-referenced. Our approach enables indexing of areas in the absence of GPS reception as often found in urban spaces and indoors without manual labeling of fingerprints. The method takes......Fingerprinting-based localization methods relying on WiFi and GSM information provide sufficient localization accuracy for many mobile phone applications. Most of the existing approaches require a training set consisting of geo-referenced fingerprints to build a reference database. We propose...

  5. "Grid" versus "Ground": Optimized Design of Low-Distortion Projections Using Existing Projection Types Rigorously Georeferenced to the National Spatial Reference System

    Science.gov (United States)

    Dennis, M. L.; Armstrong, M. L.

    2016-12-01

    common existing conformal projection types, so, unlike MSPCS, they are compatible with most engineering, surveying, and GIS datasets and software. Because they are rigorously georeferenced, LDPs facilitate appropriate use of the NSRS and directly represent conditions at ground without resorting to best-fit approximate transformations.

  6. Wi-Fi Crowdsourced Fingerprinting Dataset for Indoor Positioning

    Directory of Open Access Journals (Sweden)

    Elena Simona Lohan

    2017-10-01

    Full Text Available Benchmark open-source Wi-Fi fingerprinting datasets for indoor positioning studies are still hard to find in the current literature and existing public repositories. This is unlike other research fields, such as the image processing field, where benchmark test images such as the Lenna image or Face Recognition Technology (FERET databases exist, or the machine learning field, where huge datasets are available for example at the University of California Irvine (UCI Machine Learning Repository. It is the purpose of this paper to present a new openly available Wi-Fi fingerprint dataset, comprised of 4648 fingerprints collected with 21 devices in a university building in Tampere, Finland, and to present some benchmark indoor positioning results using these data. The datasets and the benchmarking software are distributed under the open-source MIT license and can be found on the EU Zenodo repository.

  7. A georeferenced Agent-Based Model to analyze the climate change impacts on the Andorra winter tourism

    CERN Document Server

    Pons-Pons, M; Rosas-Casals, M; Sureda, B; Jover, E

    2011-01-01

    This study presents a georeferenced agent-based model to analyze the climate change impacts on the ski industry in Andorra and the effect of snowmaking as future adaptation strategy. The present study is the first attempt to analyze the ski industry in the Pyrenees region and will contribute to a better understanding of the vulnerability of Andorran ski resorts and the suitability of snowmaking as potential adaptation strategy to climate change. The resulting model can be used as a planning support tool to help local stakeholders understand the vulnerability and potential impacts of climate change. This model can be used in the decision-making process of designing and developing appropriate sustainable adaptation strategies to future climate variability.

  8. Geo-referencing livestock farms as tool for studying cystic echinococcosis epidemiology in cattle and water buffaloes from southern Italy

    Directory of Open Access Journals (Sweden)

    Giuseppe Cringoli

    2007-11-01

    Full Text Available Cystic echinococcosis (CE, caused by the larval stages of the tapeworm Echinococcus granulosus, is known to be one of the most important parasitic infection in livestock worldwide and one of the most widespread zoonoses known. In the present study, we used a geographical information system (GIS to study the spatial structure of livestock (cattle, water buffaloes and sheep populations to gain a better understanding of the role of sheep as reservoir for the transmission of CE to cattle and water buffaloes. To this end, a survey on CE in cattle and water buffaloes from the Campania region of southern Italy was conducted and the geo-referenced results linked to the regional farm geo-referenced data within a GIS. The results showed a noteworthy prevalence of CE in cattle and water buffalo farms (overall prevalence = 18.6%. The elaboration of the data with a GIS approach showed a close proximity of the bovine and/or water buffalo CE positive farms with the ovine farms present in the study area, thus giving important information on the significance of sheep and free-ranging canids in the transmission cycles of CE in relation to cattle and water buffaloes. The significantly higher prevalence found in cattle as compared to water buffalo farms (20.0% versus 12.4% supports the key role of sheep in the CE transmission; indeed, within the 5 km radius buffer zones constructed around the cattle farms positive for CE, a higher number of (potentially infected sheep farms were found compared to those found within the buffer zones around the water buffalo farms. Furthermore, the average distances between the sheep and cattle farms falling in the same buffer zones were significantly lower than those between the sheep and water buffalo farms. We emphasize that the use of GIS is a novel approach to further our understanding of the epidemiology and control of CE and we encourage other groups to make use of it.

  9. Matchmaking, datasets and physics analysis

    CERN Document Server

    Donno, Flavia; Eulisse, Giulio; Mazzucato, Mirco; Steenberg, Conrad; CERN. Geneva. IT Department; 10.1109/ICPPW.2005.48

    2005-01-01

    Grid enabled physics analysis requires a workload management system (WMS) that takes care of finding suitable computing resources to execute data intensive jobs. A typical example is the WMS available in the LCG2 (also referred to as EGEE-0) software system, used by several scientific experiments. Like many other current grid systems, LCG2 provides a file level granularity for accessing and analysing data. However, application scientists such as high energy physicists often require a higher abstraction level for accessing data, i.e. they prefer to use datasets rather than files in their physics analysis. We have improved the current WMS (in particular the Matchmaker) to allow physicists to express their analysis job requirements in terms of datasets. This required modifications to the WMS and its interface to potential data catalogues. As a result, we propose a simple data location interface that is based on a Web service approach and allows for interoperability of the WMS with new dataset and file catalogues...

  10. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  11. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  12. Securely measuring the overlap between private datasets with cryptosets.

    Science.gov (United States)

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.

  13. PLÉIADES PROJECT: ASSESSMENT OF GEOREFERENCING ACCURACY, IMAGE QUALITY, PANSHARPENING PERFORMENCE AND DSM/DTM QUALITY

    Directory of Open Access Journals (Sweden)

    H. Topan

    2016-06-01

    Full Text Available Pléiades 1A and 1B are twin optical satellites of Optical and Radar Federated Earth Observation (ORFEO program jointly running by France and Italy. They are the first satellites of Europe with sub-meter resolution. Airbus DS (formerly Astrium Geo runs a MyGIC (formerly Pléiades Users Group program to validate Pléiades images worldwide for various application purposes. The authors conduct three projects, one is within this program, the second is supported by BEU Scientific Research Project Program, and the third is supported by TÜBİTAK. Assessment of georeferencing accuracy, image quality, pansharpening performance and Digital Surface Model/Digital Terrain Model (DSM/DTM quality subjects are investigated in these projects. For these purposes, triplet panchromatic (50 cm Ground Sampling Distance (GSD and VNIR (2 m GSD Pléiades 1A images were investigated over Zonguldak test site (Turkey which is urbanised, mountainous and covered by dense forest. The georeferencing accuracy was estimated with a standard deviation in X and Y (SX, SY in the range of 0.45m by bias corrected Rational Polynomial Coefficient (RPC orientation, using ~170 Ground Control Points (GCPs. 3D standard deviation of ±0.44m in X, ±0.51m in Y, and ±1.82m in Z directions have been reached in spite of the very narrow angle of convergence by bias corrected RPC orientation. The image quality was also investigated with respect to effective resolution, Signal to Noise Ratio (SNR and blur coefficient. The effective resolution was estimated with factor slightly below 1.0, meaning that the image quality corresponds to the nominal resolution of 50cm. The blur coefficients were achieved between 0.39-0.46 for triplet panchromatic images, indicating a satisfying image quality. SNR is in the range of other comparable space borne images which may be caused by de-noising of Pléiades images. The pansharpened images were generated by various methods, and are validated by most common

  14. PLÉIADES Project: Assessment of Georeferencing Accuracy, Image Quality, Pansharpening Performence and Dsm/dtm Quality

    Science.gov (United States)

    Topan, Hüseyin; Cam, Ali; Özendi, Mustafa; Oruç, Murat; Jacobsen, Karsten; Taşkanat, Talha

    2016-06-01

    Pléiades 1A and 1B are twin optical satellites of Optical and Radar Federated Earth Observation (ORFEO) program jointly running by France and Italy. They are the first satellites of Europe with sub-meter resolution. Airbus DS (formerly Astrium Geo) runs a MyGIC (formerly Pléiades Users Group) program to validate Pléiades images worldwide for various application purposes. The authors conduct three projects, one is within this program, the second is supported by BEU Scientific Research Project Program, and the third is supported by TÜBİTAK. Assessment of georeferencing accuracy, image quality, pansharpening performance and Digital Surface Model/Digital Terrain Model (DSM/DTM) quality subjects are investigated in these projects. For these purposes, triplet panchromatic (50 cm Ground Sampling Distance (GSD)) and VNIR (2 m GSD) Pléiades 1A images were investigated over Zonguldak test site (Turkey) which is urbanised, mountainous and covered by dense forest. The georeferencing accuracy was estimated with a standard deviation in X and Y (SX, SY) in the range of 0.45m by bias corrected Rational Polynomial Coefficient (RPC) orientation, using ~170 Ground Control Points (GCPs). 3D standard deviation of ±0.44m in X, ±0.51m in Y, and ±1.82m in Z directions have been reached in spite of the very narrow angle of convergence by bias corrected RPC orientation. The image quality was also investigated with respect to effective resolution, Signal to Noise Ratio (SNR) and blur coefficient. The effective resolution was estimated with factor slightly below 1.0, meaning that the image quality corresponds to the nominal resolution of 50cm. The blur coefficients were achieved between 0.39-0.46 for triplet panchromatic images, indicating a satisfying image quality. SNR is in the range of other comparable space borne images which may be caused by de-noising of Pléiades images. The pansharpened images were generated by various methods, and are validated by most common statistical

  15. Automatic 3D relief acquisition and georeferencing of road sides by low-cost on-motion SfM

    Science.gov (United States)

    Voumard, Jérémie; Bornemann, Perrick; Malet, Jean-Philippe; Derron, Marc-Henri; Jaboyedoff, Michel

    2017-04-01

    3D terrain relief acquisition is important for a large part of geosciences. Several methods have been developed to digitize terrains, such as total station, LiDAR, GNSS or photogrammetry. To digitize road (or rail tracks) sides on long sections, mobile spatial imaging system or UAV are commonly used. In this project, we compare a still fairly new method -the SfM on-motion technics- with some traditional technics of terrain digitizing (terrestrial laser scanning, traditional SfM, UAS imaging solutions, GNSS surveying systems and total stations). The SfM on-motion technics generates 3D spatial data by photogrammetric processing of images taken from a moving vehicle. Our mobile system consists of six action cameras placed on a vehicle. Four fisheye cameras mounted on a mast on the vehicle roof are placed at 3.2 meters above the ground. Three of them have a GNNS chip providing geotagged images. Two pictures were acquired every second by each camera. 4K resolution fisheye videos were also used to extract 8.3M not geotagged pictures. All these pictures are then processed with the Agisoft PhotoScan Professional software. Results from the SfM on-motion technics are compared with results from classical SfM photogrammetry on a 500 meters long alpine track. They were also compared with mobile laser scanning data on the same road section. First results seem to indicate that slope structures are well observable up to decimetric accuracy. For the georeferencing, the planimetric (XY) accuracy of few meters is much better than the altimetric (Z) accuracy. There is indeed a Z coordinate shift of few tens of meters between GoPro cameras and Garmin camera. This makes necessary to give a greater freedom to altimetric coordinates in the processing software. Benefits of this low-cost SfM on-motion method are: 1) a simple setup to use in the field (easy to switch between vehicle types as car, train, bike, etc.), 2) a low cost and 3) an automatic georeferencing of 3D points clouds. Main

  16. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Department of Resources — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  17. VT Hydrography Dataset - High Resolution NHD

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) The Vermont Hydrography Dataset (VHD) is compliant with the local resolution (also known as High Resolution) National Hydrography Dataset (NHD)...

  18. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  19. Scanned Hardcopy Maps, Walworth County Historical Airphotos scanned and georeferenced, Published in 2011, Not Applicable scale, Walworth County.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Scanned Hardcopy Maps dataset, published at Not Applicable scale, was produced all or in part from Other information as of 2011. It is described as 'Walworth...

  20. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  1. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters.

    Science.gov (United States)

    Biswas, Mithun; Islam, Rafiqul; Shom, Gautam Kumar; Shopon, Md; Mohammed, Nabeel; Momen, Sifat; Abedin, Anowarul

    2017-06-01

    BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  2. Georeferencing the Large-Scale Aerial Photographs of a Great Lakes Coastal Wetland: A Modified Photogrammetric Method

    Science.gov (United States)

    Murphy, Marilyn K.; Kowalski, Kurt P.; Grapentine, Joel L.

    2010-01-01

    The geocontrol template method was developed to georeference multiple, overlapping analog aerial photographs without reliance upon conventionally obtained horizontal ground control. The method was tested as part of a long-term wetland habitat restoration project at a Lake Erie coastal wetland complex in the U.S. Fish and Wildlife Service Ottawa National Wildlife Refuge. As in most coastal wetlands, annually identifiable ground-control features required to georeference photo-interpreted data are difficult to find. The geocontrol template method relies on the following four components: (a) an uncontrolled aerial photo mosaic of the study area, (b) global positioning system (GPS) derived horizontal coordinates of each photo’s principal point, (c) a geocontrol template created by the transfer of fiducial markings and calculated principal points to clear acetate from individual photographs arranged in a mosaic, and (d) the root-mean-square-error testing of the system to ensure an acceptable level of planimetric accuracy. Once created for a study area, the geocontrol template can be registered in geographic information system (GIS) software to facilitate interpretation of multiple images without individual image registration. The geocontrol template enables precise georeferencing of single images within larger blocks of photographs using a repeatable and consistent method.

  3. Drone with thermal infrared camera provides high resolution georeferenced imagery of the Waikite geothermal area, New Zealand

    Science.gov (United States)

    Harvey, M. C.; Rowland, J. V.; Luketina, K. M.

    2016-10-01

    Drones are now routinely used for collecting aerial imagery and creating digital elevation models (DEM). Lightweight thermal sensors provide another payload option for generation of very high-resolution aerial thermal orthophotos. This technology allows for the rapid and safe survey of thermal areas, often present in inaccessible or dangerous terrain. Here we present a 2.2 km2 georeferenced, temperature-calibrated thermal orthophoto of the Waikite geothermal area, New Zealand. The image represents a mosaic of nearly 6000 thermal images captured by drone over a period of about 2 weeks. This is thought by the authors to be the first such image published of a significant geothermal area produced by a drone equipped with a thermal camera. Temperature calibration of the image allowed calculation of heat loss (43 ± 12 MW) from thermal lakes and streams in the survey area (loss from evaporation, conduction and radiation). An RGB (visible spectrum) orthomosaic photo and digital elevation model was also produced for this area, with ground resolution and horizontal position error comparable to commercially produced LiDAR and aerial imagery obtained from crewed aircraft. Our results show that thermal imagery collected by drones has the potential to become a key tool in geothermal science, including geological, geochemical and geophysical surveys, environmental baseline and monitoring studies, geotechnical studies and civil works.

  4. Low aerial imagery – an assessment of georeferencing errors and the potential for use in environmental inventory

    Directory of Open Access Journals (Sweden)

    Smaczyński Maciej

    2017-06-01

    Full Text Available Unmanned aerial vehicles are increasingly being used in close range photogrammetry. Real-time observation of the Earth’s surface and the photogrammetric images obtained are used as material for surveying and environmental inventory. The following study was conducted on a small area (approximately 1 ha. In such cases, the classical method of topographic mapping is not accurate enough. The geodetic method of topographic surveying, on the other hand, is an overly precise measurement technique for the purpose of inventorying the natural environment components. The author of the following study has proposed using the unmanned aerial vehicle technology and tying in the obtained images to the control point network established with the aid of GNSS technology. Georeferencing the acquired images and using them to create a photogrammetric model of the studied area enabled the researcher to perform calculations, which yielded a total root mean square error below 9 cm. The performed comparison of the real lengths of the vectors connecting the control points and their lengths calculated on the basis of the photogrammetric model made it possible to fully confirm the RMSE calculated and prove the usefulness of the UAV technology in observing terrain components for the purpose of environmental inventory. Such environmental components include, among others, elements of road infrastructure, green areas, but also changes in the location of moving pedestrians and vehicles, as well as other changes in the natural environment that are not registered on classical base maps or topographic maps.

  5. Review Studies for the ATLAS Open Data Dataset

    CERN Document Server

    The ATLAS collaboration

    2016-01-01

    This document presents approval plots from selected analyses using the ATLAS Open Data dataset. This dataset containing "1\\ \\text{fb}^{-1}" of "8 \\text{TeV}" data collected by ATLAS along with a selection of Monte Carlo simulated events, is intended to be released to the public for educational use only alongside tools to enable students to get started quickly and easily. The corrections applied to the Monte Carlo have been simplified for the purposes of the intended use and reduce processing time, and the approval plots should indicate clearly reasons for disagreement between Monte Carlo and data. As the dataset is for educational purposes only, although some low statistic analyses can be done and educational objectives achieved it will be clear that the user can not use it beyond the use case due to the low statistics.

  6. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    Science.gov (United States)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  7. Structural diversity of biologically interesting datasets: a scaffold analysis approach

    Directory of Open Access Journals (Sweden)

    Khanna Varun

    2011-08-01

    Full Text Available Abstract Background The recent public availability of the human metabolome and natural product datasets has revitalized "metabolite-likeness" and "natural product-likeness" as a drug design concept to design lead libraries targeting specific pathways. Many reports have analyzed the physicochemical property space of biologically important datasets, with only a few comprehensively characterizing the scaffold diversity in public datasets of biological interest. With large collections of high quality public data currently available, we carried out a comparative analysis of current day leads with other biologically relevant datasets. Results In this study, we note a two-fold enrichment of metabolite scaffolds in drug dataset (42% as compared to currently used lead libraries (23%. We also note that only a small percentage (5% of natural product scaffolds space is shared by the lead dataset. We have identified specific scaffolds that are present in metabolites and natural products, with close counterparts in the drugs, but are missing in the lead dataset. To determine the distribution of compounds in physicochemical property space we analyzed the molecular polar surface area, the molecular solubility, the number of rings and the number of rotatable bonds in addition to four well-known Lipinski properties. Here, we note that, with only few exceptions, most of the drugs follow Lipinski's rule. The average values of the molecular polar surface area and the molecular solubility in metabolites is the highest while the number of rings is the lowest. In addition, we note that natural products contain the maximum number of rings and the rotatable bonds than any other dataset under consideration. Conclusions Currently used lead libraries make little use of the metabolites and natural products scaffold space. We believe that metabolites and natural products are recognized by at least one protein in the biosphere therefore, sampling the fragment and scaffold

  8. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  9. The KUSC Classical Music Dataset for Audio Key Finding

    Directory of Open Access Journals (Sweden)

    Ching-Hua Chuan

    2014-08-01

    Full Text Available In this paper, we present a benchmark dataset based on the KUSC classical music collection and provide baseline key-finding comparison results. Audio key finding is a basic music information retrieval task; it forms an essential component of systems for music segmentation, similarity assessment, and mood detection. Due to copyright restrictions and a labor-intensive annotation process, audio key finding algorithms have only been evaluated using small proprietary datasets to date. To create a common base for systematic comparisons, we have constructed a dataset comprising of more than 3,000 excerpts of classical music. The excerpts are made publicly accessible via commonly used acoustic features such as pitch-based spectrograms and chromagrams. We introduce a hybrid annotation scheme that combines the use of title keys with expert validation and correction of only the challenging cases. The expert musicians also provide ratings of key recognition difficulty. Other meta-data include instrumentation. As demonstration of use of the dataset, and to provide initial benchmark comparisons for evaluating new algorithms, we conduct a series of experiments reporting key determination accuracy of four state-of-the-art algorithms. We further show the importance of considering factors such as estimated tuning frequency, key strength or confidence value, and key recognition difficulty in key finding. In the future, we plan to expand the dataset to include meta-data for other music information retrieval tasks.

  10. Spatial Accuracy Assessment and Integration of Global Land Cover Datasets

    Directory of Open Access Journals (Sweden)

    Nandin-Erdene Tsendbazar

    2015-11-01

    Full Text Available Along with the creation of new maps, current efforts for improving global land cover (GLC maps focus on integrating maps by accounting for their relative merits, e.g., agreement amongst maps or map accuracy. Such integration efforts may benefit from the use of multiple GLC reference datasets. Using available reference datasets, this study assesses spatial accuracy of recent GLC maps and compares methods for creating an improved land cover (LC map. Spatial correspondence with reference dataset was modeled for Globcover-2009, Land Cover-CCI-2010, MODIS-2010 and Globeland30 maps for Africa. Using different scenarios concerning the used input data, five integration methods for an improved LC map were tested and cross-validated. Comparison of the spatial correspondences showed that the preferences for GLC maps varied spatially. Integration methods using both the GLC maps and reference data at their locations resulted in 4.5%–13% higher correspondence with the reference LC than any of the input GLC maps. An integrated LC map and LC class probability maps were computed using regression kriging, which produced the highest correspondence (76%. Our results demonstrate the added value of using reference datasets and geostatistics for improving GLC maps. This approach is useful as more GLC reference datasets are becoming publicly available and their reuse is being encouraged.

  11. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  12. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  13. Computing procedure of spatial geo-referencing of satellite image; Un procedimiento simple de geo-referenciacion de imagenes de satelite

    Energy Technology Data Exchange (ETDEWEB)

    Santos, J.; Vazquez, M.; Fernandes, F.; Prado, T.; Castro, R.

    2004-07-01

    In this paper a computing procedure of spatial geo-referencing is described that, by means of the terrestrial spatial geometry, permits to obtain the Latitude and Longitude that corresponds to a given pixel of a satellite image, the pixel being defined by a pair Line- Pixel. The procedure also permits to compute the other way round. This procedure is more clear and simple than those proposed by Eumetsat and Goes and can be applied to any satellite image. (Author)

  14. Towards an Automatic Framework for Urban Settlement Mapping from Satellite Images: Applications of Geo-referenced Social Media and One Class Classification

    Science.gov (United States)

    Miao, Zelang

    2017-04-01

    Currently, urban dwellers comprise more than half of the world's population and this percentage is still dramatically increasing. The explosive urban growth over the next two decades poses long-term profound impact on people as well as the environment. Accurate and up-to-date delineation of urban settlements plays a fundamental role in defining planning strategies and in supporting sustainable development of urban settlements. In order to provide adequate data about urban extents and land covers, classifying satellite data has become a common practice, usually with accurate enough results. Indeed, a number of supervised learning methods have proven effective in urban area classification, but they usually depend on a large amount of training samples, whose collection is a time and labor expensive task. This issue becomes particularly serious when classifying large areas at the regional/global level. As an alternative to manual ground truth collection, in this work we use geo-referenced social media data. Cities and densely populated areas are an extremely fertile land for the production of individual geo-referenced data (such as GPS and social network data). Training samples derived from geo-referenced social media have several advantages: they are easy to collect, usually they are freely exploitable; and, finally, data from social media are spatially available in many locations, and with no doubt in most urban areas around the world. Despite these advantages, the selection of training samples from social media meets two challenges: 1) there are many duplicated points; 2) method is required to automatically label them as "urban/non-urban". The objective of this research is to validate automatic sample selection from geo-referenced social media and its applicability in one class classification for urban extent mapping from satellite images. The findings in this study shed new light on social media applications in the field of remote sensing.

  15. 像底点用于POS系统直接对地目标定位%Direct georeferencing by Position and Orientation System using photo nadir point

    Institute of Scientific and Technical Information of China (English)

    付建红

    2012-01-01

    Based on the geometry relation of photo nadir point and the elements of exterior angular orientation of aerial photograph , a method of solving IMU boresight misalignment by using photo nadir point was proposed to improve the direct georeferencing accuracy of POS in the paper* Tlie method was tested by a set of actual flight photograph, the experiments showed that the accuracy of direct georeferencing using POS could be improved effectively- The proposed method does not need a specific calibration field and ground control points* Therefore it would have practical application value in direct georeferencing when taking large-scale aerial photography over urban areas with POS.%根据像底点与航摄像片外方位角元素的几何关系,本文提出利用像底点求解IMU视准轴误差,以提高POS系统直接对地目标定位精度的方法,并用一组实际飞行的数据进行了试验.结果表明,该方法可有效提高POS系统直接对地目标定位的精度,而无需布设特定的检校场和地面控制点.对带POS系统的城区大比例尺航空影像对地目标定位有一定实用价值.

  16. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    Internationally coordinated monitoring of long-term glacier changes provide key indicator data about global climate change and began in the year 1894 as an internationally coordinated effort to establish standardized observations. Today, world-wide monitoring of glaciers and ice caps is embedded within the Global Climate Observing System (GCOS) in support of the United Nations Framework Convention on Climate Change (UNFCCC) as an important Essential Climate Variable (ECV). The Global Terrestrial Network for Glaciers (GTN-G) was established in 1999 with the task of coordinating measurements and to ensure the continuous development and adaptation of the international strategies to the long-term needs of users in science and policy. The basic monitoring principles must be relevant, feasible, comprehensive and understandable to a wider scientific community as well as to policy makers and the general public. Data access has to be free and unrestricted, the quality of the standardized and calibrated data must be high and a combination of detailed process studies at selected field sites with global coverage by satellite remote sensing is envisaged. Recently a GTN-G Steering Committee was established to guide and advise the operational bodies responsible for the international glacier monitoring, which are the World Glacier Monitoring Service (WGMS), the US National Snow and Ice Data Center (NSIDC), and the Global Land Ice Measurements from Space (GLIMS) initiative. Several online databases containing a wealth of diverse data types having different levels of detail and global coverage provide fast access to continuously updated information on glacier fluctuation and inventory data. For world-wide inventories, data are now available through (a) the World Glacier Inventory containing tabular information of about 130,000 glaciers covering an area of around 240,000 km2, (b) the GLIMS-database containing digital outlines of around 118,000 glaciers with different time stamps and

  17. Datasets used in ORD-018902: Bisphenol A alternatives can effectively substitute for estradiol

    Data.gov (United States)

    U.S. Environmental Protection Agency — Gene Expression Omnibus numbers only. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Arno, S. Balu, C. Corton, and M....

  18. River network routing on the NHDPlus dataset

    OpenAIRE

    David, Cédric; Maidment, David,; Niu, Guo-Yue; Yang, Zong-Liang; Habets, Florence; Eijkhout, Victor

    2011-01-01

    International audience; The mapped rivers and streams of the contiguous United States are available in a geographic information system (GIS) dataset called National Hydrography Dataset Plus (NHDPlus). This hydrographic dataset has about 3 million river and water body reaches along with information on how they are connected into net- works. The U.S. Geological Survey (USGS) National Water Information System (NWIS) provides stream- flow observations at about 20 thousand gauges located on theNHDP...

  19. River network routing on the NHDPlus dataset

    OpenAIRE

    David, Cédric; Maidment, David,; Niu, Guo-Yue; Yang, Zong-Liang; Habets, Florence; Eijkhout, Victor

    2011-01-01

    International audience; The mapped rivers and streams of the contiguous United States are available in a geographic information system (GIS) dataset called National Hydrography Dataset Plus (NHDPlus). This hydrographic dataset has about 3 million river and water body reaches along with information on how they are connected into net- works. The U.S. Geological Survey (USGS) National Water Information System (NWIS) provides stream- flow observations at about 20 thousand gauges located on theNHDP...

  20. A dataset on the inventory of coniferous urban trees in the city of Orléans (France

    Directory of Open Access Journals (Sweden)

    J.-P. Rossi

    2016-12-01

    Full Text Available The dataset supplied in this article provides the spatial location and the species composition of urban trees belonging to three coniferous genera (Pinus, Cedrus and Pseudotsuga inventoried in 5 districts of the city of Orléans (France. A total of 9321 trees were georeferenced. The most abundant species was the black pine Pinus nigra for which a total of 2420 trees were observed. Other common species were the scots pine P. sylvestris, the Douglas-fir Pseudotsuga menziesii and different species of the genus Cedrus. The data supplied in this article are related to “A citywide survey of the pine processionary moth Thaumetopoea pityocampa spatial distribution in Orléans (France” by J.-P. Rossi, V. Imbault, T. Lamant, J. Rousselet, [3].

  1. A dataset on the inventory of coniferous urban trees in the city of Orléans (France).

    Science.gov (United States)

    Rossi, J-P; Imbault, V; Lamant, T; Rousselet, J

    2016-12-01

    The dataset supplied in this article provides the spatial location and the species composition of urban trees belonging to three coniferous genera (Pinus, Cedrus and Pseudotsuga) inventoried in 5 districts of the city of Orléans (France). A total of 9321 trees were georeferenced. The most abundant species was the black pine Pinus nigra for which a total of 2420 trees were observed. Other common species were the scots pine P. sylvestris, the Douglas-fir Pseudotsuga menziesii and different species of the genus Cedrus. The data supplied in this article are related to "A citywide survey of the pine processionary moth Thaumetopoea pityocampa spatial distribution in Orléans (France)" by J.-P. Rossi, V. Imbault, T. Lamant, J. Rousselet,) [3].

  2. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  3. A global distributed basin morphometric dataset

    Science.gov (United States)

    Shen, Xinyi; Anagnostou, Emmanouil N.; Mei, Yiwen; Hong, Yang

    2017-01-01

    Basin morphometry is vital information for relating storms to hydrologic hazards, such as landslides and floods. In this paper we present the first comprehensive global dataset of distributed basin morphometry at 30 arc seconds resolution. The dataset includes nine prime morphometric variables; in addition we present formulas for generating twenty-one additional morphometric variables based on combination of the prime variables. The dataset can aid different applications including studies of land-atmosphere interaction, and modelling of floods and droughts for sustainable water management. The validity of the dataset has been consolidated by successfully repeating the Hack's law.

  4. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  5. Veterans Affairs Suicide Prevention Synthetic Dataset Metadata

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  6. Dataset for Calibration and performance of synchronous SIM/scan mode for simultaneous targeted and discovery (non-targeted) analysis of exhaled breath samples from firefighters

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset includes the tables and supplementary information from the journal article. This dataset is associated with the following publication: Wallace, A., J....

  7. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  8. A Geo-Label for Geo-Referenced Information as a Service for Data Users and a Tool for Facilitating Societal Benefits of Earth Observations

    Science.gov (United States)

    Plag, H.-P.

    2012-04-01

    Geo-referenced information is increasingly important for many scientific and societal applications. The availability of reliable and applicable spatial data and information is fundamental for addressing pressing problems such as food, water, and energy security; disaster risk reduction; climate change; environmental quality; pandemics; economic crises and wars; population migration; and, in a general sense, sustainability. Today, more than 70% of societal activities in developed countries depend directly or indirectly on geo-referenced information. The rapid development of analysis tools, such as Geographic Information Systems and web-based tools for viewing, accessing, and analyzing of geo-referenced information, and the growing abundance of openly available Earth observations (e.g., through the Global Earth Observation System of Systems, GEOSS) likely will increase the dependency of science and society on geo-referenced information. Increasingly, the tools allow the combination of data sets from various sources. Improvements of interoperability, promoted particularly by GEOSS, will strengthen this trend and lead to more tools for the combinations of data from different sources. What is currently lacking is a service-oriented infrastructure helping to ensure that data quality and applicability are not compromised through modifications and combinations. Most geo-referenced information comes without sufficient information on quality and applicability. The Group on Earth Observations (GEO) has embarked on establishing a so-called GEO Label that would provide easy-to-understand, globally available information on aspects of quality, user rating, relevance, and fit-for-usage of the products and services accessible through GEOSS (with the responsibility for the concept development delegated to Work Plan Task ID-03). In designing a service-oriented architecture that could support a GEO Label, it is important to understand the impact of the goals for the label on the

  9. A dataset of forest biomass structure for Eurasia

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-01

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  10. SisFall: A Fall and Movement Dataset

    Science.gov (United States)

    Sucerquia, Angela; López, José David; Vargas-Bonilla, Jesús Francisco

    2017-01-01

    Research on fall and movement detection with wearable devices has witnessed promising growth. However, there are few publicly available datasets, all recorded with smartphones, which are insufficient for testing new proposals due to their absence of objective population, lack of performed activities, and limited information. Here, we present a dataset of falls and activities of daily living (ADLs) acquired with a self-developed device composed of two types of accelerometer and one gyroscope. It consists of 19 ADLs and 15 fall types performed by 23 young adults, 15 ADL types performed by 14 healthy and independent participants over 62 years old, and data from one participant of 60 years old that performed all ADLs and falls. These activities were selected based on a survey and a literature analysis. We test the dataset with widely used feature extraction and a simple to implement threshold based classification, achieving up to 96% of accuracy in fall detection. An individual activity analysis demonstrates that most errors coincide in a few number of activities where new approaches could be focused. Finally, validation tests with elderly people significantly reduced the fall detection performance of the tested features. This validates findings of other authors and encourages developing new strategies with this new dataset as the benchmark. PMID:28117691

  11. Public Water Supply Systems (PWS)

    Data.gov (United States)

    Kansas Data Access and Support Center — This dataset includes boundaries for most public water supply systems (PWS) in Kansas (525 municipalities, 289 rural water districts and 13 public wholesale water...

  12. USAID Public-Private Partnerships Database

    Data.gov (United States)

    US Agency for International Development — This dataset brings together information collected since 2001 on PPPs that have been supported by USAID. For the purposes of this dataset a Public-Private...

  13. Application of XML database technology to biological pathway datasets.

    Science.gov (United States)

    Jiang, Keyuan; Nash, Christopher

    2006-01-01

    The study of biological systems has accumulated a significant amount of biological pathway data, which is evident through the continued growth in both the number of databases and amount of data available. The development of BioPAX standard leads to the increased availability of biological pathway datasets through the use of a special XML format, but the lack of standard storage mechanism makes the querying and aggregation of BioPAX compliant data challenging. To address this shortcoming, we have developed a storage mechanism leveraging the existing XML technologies: the XML database and XQuery. The goal of our project is to provide a generic and centralized store with efficient queries for the needs of biomedical research. A SOAP-based Web service and direct HTTP request methods have also developed to facilitate public consumption of the datasets online.

  14. BASE MAP DATASET, LOGAN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  15. BASE MAP DATASET, KENDALL COUNTY, TEXAS, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  16. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  17. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  18. BASE MAP DATASET, ROGERS COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  20. BASE MAP DATASET, HARRISON COUNTY, TEXAS, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  2. BASE MAP DATASET, SEQUOYAH COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  3. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  4. BASE MAP DATASET, CADDO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  6. Environmental Dataset Gateway (EDG) Search Widget

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  7. BASE MAP DATASET, CHEROKEE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  8. Hajj and Umrah Event Recognition Datasets

    CERN Document Server

    Zawbaa, Hossam

    2012-01-01

    In this note, new Hajj and Umrah Event Recognition datasets (HUER) are presented. The demonstrated datasets are based on videos and images taken during 2011-2012 Hajj and Umrah seasons. HUER is the first collection of datasets covering the six types of Hajj and Umrah ritual events (rotating in Tawaf around Kabaa, performing Sa'y between Safa and Marwa, standing on the mount of Arafat, staying overnight in Muzdalifah, staying two or three days in Mina, and throwing Jamarat). The HUER datasets also contain video and image databases for nine types of human actions during Hajj and Umrah (walking, drinking from Zamzam water, sleeping, smiling, eating, praying, sitting, shaving hairs and ablutions, reading the holy Quran and making duaa). The spatial resolutions are 1280 x 720 pixels for images and 640 x 480 pixels for videos and have lengths of 20 seconds in average with 30 frame per second rates.

  9. VT Hydrography Dataset - cartographic extract lines

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) VHDCARTO is a simplified version of the local resolution Vermont Hydrography Dataset (VHD) that has been enriched with stream perenniality, e.g.,...

  10. VT Hydrography Dataset - cartographic extract polygons

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) VHDCARTO is a simplified version of the local resolution Vermont Hydrography Dataset (VHD) that has been enriched with stream perenniality, e.g.,...

  11. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  12. BASE MAP DATASET, GARVIN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  13. BASE MAP DATASET, OUACHITA COUNTY, ARKANSAS

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  14. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  15. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  16. BASE MAP DATASET, BRYAN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  17. BASE MAP DATASET, DELAWARE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. BASE MAP DATASET, STEPHENS COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. BASE MAP DATASET, WOODWARD COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. BASE MAP DATASET, HOWARD COUNTY, ARKANSAS

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. Grand Canyon as a universally accessible virtual field trip for intro Geoscience classes using geo-referenced mobile game technology

    Science.gov (United States)

    Bursztyn, N.; Pederson, J. L.; Shelton, B.

    2012-12-01

    There is a well-documented and nationally reported trend of declining interest, poor preparedness, and lack of diversity within U.S. students pursuing geoscience and other STEM disciplines. We suggest that a primary contributing factor to this problem is that introductory geoscience courses simply fail to inspire (i.e. they are boring). Our experience leads us to believe that the hands-on, contextualized learning of field excursions are often the most impactful component of lower division geoscience classes. However, field trips are becoming increasingly more difficult to run due to logistics and liability, high-enrollments, decreasing financial and administrative support, and exclusivity of the physically disabled. Recent research suggests that virtual field trips can be used to simulate this contextualized physical learning through the use of mobile devices - technology that exists in most students' hands already. Our overarching goal is to enhance interest in introductory geoscience courses by providing the kinetic and physical learning experience of field trips through geo-referenced educational mobile games and test the hypothesis that these experiences can be effectively simulated through virtual field trips. We are doing this by developing "serious" games for mobile devices that deliver introductory geology material in a fun and interactive manner. Our new teaching strategy will enhance undergraduate student learning in the geosciences, be accessible to students of diverse backgrounds and physical abilities, and be easily incorporated into higher education programs and curricula at institutions globally. Our prototype involves students virtually navigating downstream along a scaled down Colorado River through Grand Canyon - physically moving around their campus quad, football field or other real location, using their smart phone or a tablet. As students reach the next designated location, a photo or video in Grand Canyon appears along with a geological

  2. Relevancy Ranking of Satellite Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2017-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  3. Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models

    Directory of Open Access Journals (Sweden)

    Renato Bruni

    2016-09-01

    We provide here several datasets for portfolio selection generated using real-world price values from several major stock markets. The datasets contain weekly return values, adjusted for dividends and for stock splits, which are cleaned from errors as much as possible. The datasets are available in different formats, and can be used as benchmarks for testing the performances of portfolio selection models and for comparing the efficiency of the algorithms used to solve them. We also provide, for these datasets, the portfolios obtained by several selection strategies based on Stochastic Dominance models (see “On Exact and Approximate Stochastic Dominance Strategies for Portfolio Selection” (Bruni et al. [2]. We believe that testing portfolio models on publicly available datasets greatly simplifies the comparison of the different portfolio selection strategies.

  4. Geo-referenced multimedia environmental fate model (G-CIEMS): model formulation and comparison to the generic model and monitoring approaches.

    Science.gov (United States)

    Suzuki, Noriyuki; Murasawa, Kaori; Sakurai, Takeo; Nansai, Keisuke; Matsuhashi, Keisuke; Moriguchi, Yuichi; Tanabe, Kiyoshi; Nakasugi, Osami; Morita, Masatoshi

    2004-11-01

    A spatially resolved and geo-referenced dynamic multimedia environmental fate model, G-CIEMS (Grid-Catchment Integrated Environmental Modeling System) was developed on a geographical information system (GIS). The case study for Japan based on the air grid cells of 5 x 5 km resolution and catchments with an average area of 9.3 km2, which corresponds to about 40,000 air grid cells and 38,000 river segments/catchment polygons, were performed for dioxins, benzene, 1,3-butadiene, and di-(2-ethyhexyl)phthalate. The averaged concentration of the model and monitoring output were within a factor of 2-3 for all the media. Outputs from G-CIEMS and the generic model were essentially comparable when identical parameters were employed, whereas the G-CIEMS model gave explicit information of distribution of chemicals in the environment. Exposure-weighted averaged concentrations (EWAC) in air were calculated to estimate the exposure ofthe population, based on the results of generic, G-CIEMS, and monitoring approaches. The G-CIEMS approach showed significantly better agreement with the monitoring-derived EWAC than the generic model approach. Implication for the use of a geo-referenced modeling approach in the risk assessment scheme is discussed as a generic-spatial approach, which can be used to provide more accurate exposure estimation with distribution information, using generally available data sources for a wide range of chemicals.

  5. Systematic Error Analysis of Direct Georeferencing for ALOS PRISM Imagery%ALOS PRISM影像直接定位的系统误差分析

    Institute of Scientific and Technical Information of China (English)

    雷蓉; 范大昭; 刘楚斌; 马秋禾

    2011-01-01

    利用ALOS PRISM影像进行直接对地定位时必须考虑系统误差的检校.从ALOS卫星PRISM传感器的成像原理出发,分析了利用严格几何模型进行定位时可能存在的系统误差;然后用3个不同的检校模型对系统误差进行校正.实验表明利用少量控制点剔除系统误差后,定位精度明显提高,在X方向达到3m左右,在Y方向和Z方向不超过2 m.%The calibration of constant error must be taken into account by using the ALOS PRISM Imagery for direct georeferencing. From the imaging theory of the ALOS PRISM sensor, the systematic errors that may be exist by using the rigorous geometric model for direct georeferencing were analyzed. Three different calibration models were used to analyze and correct the systematic errors. Experimental results showed that there were constant systematic errors existed in imagery ancillary data. Once the systematic errors are eliminated using some ground control points, the results are improved noticeably. The indices is about 3 m in X direction, better than 2 m in Y direction and Z direction.

  6. Preschool Facilities, Location and contact information for public and private schools from the preschool through the university level in Rhode Island as listed by Rhode Island Department of Education for school year 2008. The intention of this dataset was to provide an overv, Published in 2008, 1:4800 (1in=400ft) scale, State of Rhode Island and Providence Plantations.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Preschool Facilities dataset, published at 1:4800 (1in=400ft) scale, was produced all or in part from Orthoimagery information as of 2008. It is described as...

  7. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  8. Two ultraviolet radiation datasets that cover China

    Science.gov (United States)

    Liu, Hui; Hu, Bo; Wang, Yuesi; Liu, Guangren; Tang, Liqin; Ji, Dongsheng; Bai, Yongfei; Bao, Weikai; Chen, Xin; Chen, Yunming; Ding, Weixin; Han, Xiaozeng; He, Fei; Huang, Hui; Huang, Zhenying; Li, Xinrong; Li, Yan; Liu, Wenzhao; Lin, Luxiang; Ouyang, Zhu; Qin, Boqiang; Shen, Weijun; Shen, Yanjun; Su, Hongxin; Song, Changchun; Sun, Bo; Sun, Song; Wang, Anzhi; Wang, Genxu; Wang, Huimin; Wang, Silong; Wang, Youshao; Wei, Wenxue; Xie, Ping; Xie, Zongqiang; Yan, Xiaoyuan; Zeng, Fanjiang; Zhang, Fawei; Zhang, Yangjian; Zhang, Yiping; Zhao, Chengyi; Zhao, Wenzhi; Zhao, Xueyong; Zhou, Guoyi; Zhu, Bo

    2017-07-01

    Ultraviolet (UV) radiation has significant effects on ecosystems, environments, and human health, as well as atmospheric processes and climate change. Two ultraviolet radiation datasets are described in this paper. One contains hourly observations of UV radiation measured at 40 Chinese Ecosystem Research Network stations from 2005 to 2015. CUV3 broadband radiometers were used to observe the UV radiation, with an accuracy of 5%, which meets the World Meteorology Organization's measurement standards. The extremum method was used to control the quality of the measured datasets. The other dataset contains daily cumulative UV radiation estimates that were calculated using an all-sky estimation model combined with a hybrid model. The reconstructed daily UV radiation data span from 1961 to 2014. The mean absolute bias error and root-mean-square error are smaller than 30% at most stations, and most of the mean bias error values are negative, which indicates underestimation of the UV radiation intensity. These datasets can improve our basic knowledge of the spatial and temporal variations in UV radiation. Additionally, these datasets can be used in studies of potential ozone formation and atmospheric oxidation, as well as simulations of ecological processes.

  9. Pbm: A new dataset for blog mining

    CERN Document Server

    Aziz, Mehwish

    2012-01-01

    Text mining is becoming vital as Web 2.0 offers collaborative content creation and sharing. Now Researchers have growing interest in text mining methods for discovering knowledge. Text mining researchers come from variety of areas like: Natural Language Processing, Computational Linguistic, Machine Learning, and Statistics. A typical text mining application involves preprocessing of text, stemming and lemmatization, tagging and annotation, deriving knowledge patterns, evaluating and interpreting the results. There are numerous approaches for performing text mining tasks, like: clustering, categorization, sentimental analysis, and summarization. There is a growing need to standardize the evaluation of these tasks. One major component of establishing standardization is to provide standard datasets for these tasks. Although there are various standard datasets available for traditional text mining tasks, but there are very few and expensive datasets for blog-mining task. Blogs, a new genre in web 2.0 is a digital...

  10. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  11. Genomics dataset of unidentified disclosed isolates.

    Science.gov (United States)

    Rekadwad, Bhagwan N

    2016-09-01

    Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  12. Spatial Evolution of Openstreetmap Dataset in Turkey

    Science.gov (United States)

    Zia, M.; Seker, D. Z.; Cakir, Z.

    2016-10-01

    Large amount of research work has already been done regarding many aspects of OpenStreetMap (OSM) dataset in recent years for developed countries and major world cities. On the other hand, limited work is present in scientific literature for developing or underdeveloped ones, because of poor data coverage. In presented study it has been demonstrated how Turkey-OSM dataset has spatially evolved in an 8 year time span (2007-2015) throughout the country. It is observed that there is an east-west spatial biasedness in OSM features density across the country. Population density and literacy level are found to be the two main governing factors controlling this spatial trend. Future research paradigms may involve considering contributors involvement and commenting about dataset health.

  13. Visualising Large Datasets in TOPCAT v4

    CERN Document Server

    Taylor, Mark

    2014-01-01

    TOPCAT is a widely used desktop application for manipulation of astronomical catalogues and other tables, which has long provided fast interactive visualisation features including 1, 2 and 3-d plots, multiple datasets, linked views, color coding, transparency and more. In Version 4 a new plotting library has been written from scratch to deliver new and enhanced visualisation capabilities. This paper describes some of the considerations in the design and implementation, particularly in regard to providing comprehensible interactive visualisation for multi-million point datasets.

  14. ArcHydro global datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a personal geodatabase containing several vector datasets. These datasets may be used with the ArcHydro Tools, developed by ESRI in...

  15. Precise georeferencing using the rigorous sensor model and rational function model for ZiYuan-3 strip scenes with minimum control

    Science.gov (United States)

    Pan, Hongbo; Tao, Chao; Zou, Zhengrong

    2016-09-01

    The rigorous sensor model (RSM) and the rational function model (RFM) are the most widely used geometric models for georeferencing. Even though geometric calibration and bundle adjustment with the RFM has been carried out for the ZiYuan-3 (ZY-3) earth observation satellite, few studies determined the major error sources affecting the three line cameras (TLCs). In this work, we propose a new set of compensation parameters, the shift and drift of both pitch and roll angle, for the RSM, since the yaw angle error is not as significant as the pitch angle for very narrow field of view images. Corresponding bias compensation methods are also validated for the RFM. Seven continuous strip scenes from the ZY-3 TLCs are used for the experiments, for which the root mean square error (RMSE) in the image space and object space are calculated. The experimental results demonstrate that the proposed method can model the major errors and achieve the same accuracy as the use of redundant parameters. With this model, the RMSEs of the checkpoints are 2.048 m in planimetry and 1.256 m in height. The RMSEs would increase to 2.522 m in planimetry and 2.635 m in height if the drift parameters were ignored. However, subpixel georeferencing accuracy is not as sensitive as the RMSE in the object space, since the RMSE of the height increases to 2.6 m compared to 1.3 m, while the change of the RMSE in the image space is within 0.1 pixels. In addition, the relationships among the TLCs are dynamic during imaging. Compensation for the TLCs as a unit introduces a height error of about 1 m, while maintaining subpixel georeferencing accuracy. Two ground control points (GCPs) placed at the beginning and the end of a strip are preferred to reduce oscillation and point picking errors. Compared with the RSM, the RFM can achieve similar accuracy when the drift compensation model and shift compensation model are applied.

  16. Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

    Science.gov (United States)

    Mazzitello, Karina I.; Candia, Julián

    2012-12-01

    In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.

  17. Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

    CERN Document Server

    Mazzitello, Karina I

    2012-01-01

    In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply non-standard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, socio-demographics, political science, business and m...

  18. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  19. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...

  20. A Neural Network Classifier of Volume Datasets

    CERN Document Server

    Zukić, Dženan; Kolb, Andreas

    2009-01-01

    Many state-of-the art visualization techniques must be tailored to the specific type of dataset, its modality (CT, MRI, etc.), the recorded object or anatomical region (head, spine, abdomen, etc.) and other parameters related to the data acquisition process. While parts of the information (imaging modality and acquisition sequence) may be obtained from the meta-data stored with the volume scan, there is important information which is not stored explicitly (anatomical region, tracing compound). Also, meta-data might be incomplete, inappropriate or simply missing. This paper presents a novel and simple method of determining the type of dataset from previously defined categories. 2D histograms based on intensity and gradient magnitude of datasets are used as input to a neural network, which classifies it into one of several categories it was trained with. The proposed method is an important building block for visualization systems to be used autonomously by non-experts. The method has been tested on 80 datasets,...

  1. GPS receivers for georeferencing of spatial variability of soil attributes Receptores GPS para georreferenciamento da variabilidade espacial de atributos do solo

    Directory of Open Access Journals (Sweden)

    David L Rosalen

    2011-12-01

    Full Text Available The characterization of the spatial variability of soil attributes is essential to support agricultural practices in a sustainable manner. The use of geostatistics to characterize spatial variability of these attributes, such as soil resistance to penetration (RP and gravimetric soil moisture (GM is now usual practice in precision agriculture. The result of geostatistical analysis is dependent on the sample density and other factors according to the georeferencing methodology used. Thus, this study aimed to compare two methods of georeferencing to characterize the spatial variability of RP and GM as well as the spatial correlation of these variables. Sampling grid of 60 points spaced 20 m was used. For RP measurements, an electronic penetrometer was used and to determine the GM, a Dutch auger (0.0-0.1 m depth was used. The samples were georeferenced using a GPS navigation receiver, Simple Point Positioning (SPP with navigation GPS receiver, and Semi-Kinematic Relative Positioning (SKRP with an L1 geodetic GPS receiver. The results indicated that the georeferencing conducted by PPS did not affect the characterization of spatial variability of RP or GM, neither the spatial structure relationship of these attributes.A caracterização da variabilidade espacial dos atributos do solo é indispensável para subsidiar práticas agrícolas de maneira sustentável. A utilização da geoestatística para caracterizar a variabilidade espacial desses atributos, como a resistência mecânica do solo à penetração (RP e a umidade gravimétrica do solo (UG, é, hoje, prática usual na agricultura de precisão. O resultado da análise geoestatística é dependente da densidade amostral e de outros fatores, como o método de georreferencimento utilizado. Desta forma, o presente trabalho teve como objetivo comparar dois métodos de georreferenciamento para a caracterização da variabilidade espacial da RP e da UG, bem como a correlação espacial dessas vari

  2. Birds of Antioquia: Georeferenced database of specimens from the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA)

    Science.gov (United States)

    Rozo, Andrea Morales; Valencia, Fernando; Acosta, Alexis; Parra, Juan Luis

    2014-01-01

    Abstract The department of Antioquia, Colombia, lies in the northwestern corner of South America and provides a biogeographical link among divergent faunas, including Caribbean, Andean, Pacific and Amazonian. Information about the distribution of biodiversity in this area is of relevance for academic, practical and social purposes. This data paper describes the dataset containing all bird specimens deposited in the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA). We curated all the information associated with the bird specimens, including the georeferences and taxonomy, and published the database through the Global Biodiversity Information Facility network. During this process we checked the species identification and existing georeferences and completed the information when possible. The collection holds 663 bird specimens collected between 1940 and 2011. Even though most specimens are from Antioquia (70%), the collection includes material from several other departments and one specimen from the United States. The collection holds specimens from three endemic and endangered species (Coeligena orina, Diglossa gloriossisima, and Hypopirrhus pyrohipogaster), and includes localities poorly represented in other collections. The information contained in the collection has been used for biodiversity modeling, conservation planning and management, and we expect to further facilitate these activities by making it publicly available. PMID:24899851

  3. Birds of Antioquia: Georeferenced database of specimens from the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA).

    Science.gov (United States)

    Rozo, Andrea Morales; Valencia, Fernando; Acosta, Alexis; Parra, Juan Luis

    2014-01-01

    The department of Antioquia, Colombia, lies in the northwestern corner of South America and provides a biogeographical link among divergent faunas, including Caribbean, Andean, Pacific and Amazonian. Information about the distribution of biodiversity in this area is of relevance for academic, practical and social purposes. This data paper describes the dataset containing all bird specimens deposited in the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA). We curated all the information associated with the bird specimens, including the georeferences and taxonomy, and published the database through the Global Biodiversity Information Facility network. During this process we checked the species identification and existing georeferences and completed the information when possible. The collection holds 663 bird specimens collected between 1940 and 2011. Even though most specimens are from Antioquia (70%), the collection includes material from several other departments and one specimen from the United States. The collection holds specimens from three endemic and endangered species (Coeligena orina, Diglossa gloriossisima, and Hypopirrhus pyrohipogaster), and includes localities poorly represented in other collections. The information contained in the collection has been used for biodiversity modeling, conservation planning and management, and we expect to further facilitate these activities by making it publicly available.

  4. Birds of Antioquia: Georeferenced database of specimens from the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA

    Directory of Open Access Journals (Sweden)

    Andrea Morales Rozo

    2014-05-01

    Full Text Available The department of Antioquia, Colombia, lies in the northwestern corner of South America and provides a biogeographical link among divergent faunas, including Caribbean, Andean, Pacific and Amazonian. Information about the distribution of biodiversity in this area is of relevance for academic, practical and social purposes. This data paper describes the dataset containing all bird specimens deposited in the Colección de Ciencias Naturales del Museo Universitario de la Universidad de Antioquia (MUA. We curated all the information associated with the bird specimens, including the georeferences and taxonomy, and published the database through the Global Biodiversity Information Facility network. During this process we checked the species identification and existing georeferences and completed the information when possible. The collection holds 663 bird specimens collected between 1940 and 2011. Even though most specimens are from Antioquia (70%, the collection includes material from several other departments and one specimen from the United States. The collection holds specimens from three endemic and endangered species (Coeligena orina, Diglossa gloriossisima, and Hypopirrhus pyrohipogaster, and includes localities poorly represented in other collections. The information contained in the collection has been used for biodiversity modeling, conservation planning and management, and we expect to further facilitate these activities by making it publicly available.

  5. Interpolation of diffusion weighted imaging datasets.

    Science.gov (United States)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W; Reislev, Nina L; Paulson, Olaf B; Ptito, Maurice; Siebner, Hartwig R

    2014-12-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical resolution and more anatomical details in complex regions such as tract boundaries and cortical layers, which are normally only visualized at higher image resolutions. Similar results were found with typical clinical human DWI dataset. However, a possible bias in quantitative values imposed by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue compartments.

  6. DIRECT GEOREFERENCING ON SMALL UNMANNED AERIAL PLATFORMS FOR IMPROVED RELIABILITY AND ACCURACY OF MAPPING WITHOUT THE NEED FOR GROUND CONTROL POINTS

    Directory of Open Access Journals (Sweden)

    O. Mian

    2015-08-01

    Full Text Available This paper presents results from a Direct Mapping Solution (DMS comprised of an Applanix APX-15 UAV GNSS-Inertial system integrated with a Sony a7R camera to produce highly accurate ortho-rectified imagery without Ground Control Points on a Microdrones md4-1000 platform. A 55 millimeter Nikkor f/1.8 lens was mounted on the Sony a7R and the camera was then focused and calibrated terrestrially using the Applanix camera calibration facility, and then integrated with the APX-15 UAV GNSS-Inertial system using a custom mount specifically designed for UAV applications. In July 2015, Applanix and Avyon carried out a test flight of this system. The goal of the test flight was to assess the performance of DMS APX-15 UAV direct georeferencing system on the md4-1000. The area mapped during the test was a 250 x 300 meter block in a rural setting in Ontario, Canada. Several ground control points are distributed within the test area. The test included 8 North-South lines and 1 cross strip flown at 80 meters AGL, resulting in a ~1 centimeter Ground Sample Distance (GSD. Map products were generated from the test flight using Direct Georeferencing, and then compared for accuracy against the known positions of ground control points in the test area. The GNSS-Inertial data collected by the APX-15 UAV was post-processed in Single Base mode, using a base station located in the project area via POSPac UAV. The base-station’s position was precisely determined by processing a 12-hour session using the CSRS-PPP Post Processing service. The ground control points were surveyed in using differential GNSS post-processing techniques with respect to the base-station.

  7. Reference datasets for 2-treatment, 2-sequence, 2-period bioequivalence studies.

    Science.gov (United States)

    Schütz, Helmut; Labes, Detlew; Fuglsang, Anders

    2014-11-01

    It is difficult to validate statistical software used to assess bioequivalence since very few datasets with known results are in the public domain, and the few that are published are of moderate size and balanced. The purpose of this paper is therefore to introduce reference datasets of varying complexity in terms of dataset size and characteristics (balance, range, outlier presence, residual error distribution) for 2-treatment, 2-period, 2-sequence bioequivalence studies and to report their point estimates and 90% confidence intervals which companies can use to validate their installations. The results for these datasets were calculated using the commercial packages EquivTest, Kinetica, SAS and WinNonlin, and the non-commercial package R. The results of three of these packages mostly agree, but imbalance between sequences seems to provoke questionable results with one package, which illustrates well the need for proper software validation.

  8. Method of generating features optimal to a dataset and classifier

    Energy Technology Data Exchange (ETDEWEB)

    Bruillard, Paul J.; Gosink, Luke J.; Jarman, Kenneth D.

    2016-10-18

    A method of generating features optimal to a particular dataset and classifier is disclosed. A dataset of messages is inputted and a classifier is selected. An algebra of features is encoded. Computable features that are capable of describing the dataset from the algebra of features are selected. Irredundant features that are optimal for the classifier and the dataset are selected.

  9. A Tenebrionid beetle’s dataset (Coleoptera, Tenebrionidae from Peninsula Valdés (Chubut, Argentina

    Directory of Open Access Journals (Sweden)

    German Cheli

    2013-12-01

    Full Text Available The Natural Protected Area Peninsula Valdés, located in Northeastern Patagonia, is one of the largest conservation units of arid lands in Argentina. Although this area has been in the UNESCO World Heritage List since 1999, it has been continually exposed to sheep grazing and cattle farming for more than a century which have had a negative impact on the local environment. Our aim is to describe the first dataset of tenebrionid beetle species living in Peninsula Valdés and their relationship to sheep grazing. The dataset contains 118 records on 11 species and 198 adult individuals collected. Beetles were collected using pitfall traps in the two major environmental units of Peninsula Valdés, taking into account grazing intensities over a three year time frame from 2005–2007. The Data quality was enhanced following the best practices suggested in the literature during the digitalization and geo-referencing processes. Moreover, identification of specimens and current accurate spelling of scientific names were reviewed. Finally, post-validation processes using DarwinTest software were applied. Specimens have been deposited at Entomological Collection of the Centro Nacional Patagónico (CENPAT-CONICET. The dataset is part of the database of this collection and has been published on the internet through GBIF Integrated Publishing Toolkit (IPT (http://data.gbif.org/datasets/resource/14669/. Furthermore, it is the first dataset for tenebrionid beetles of arid Patagonia available in GBIF database, and it is the first one based on a previously designed and standardized sampling to assess the interaction between these beetles and grazing in the area. The main purposes of this dataset are to ensure accessibility to data associated with Tenebrionidae specimens from Peninsula Valdés (Chubut, Argentina, also to contribute to GBIF with primary data about Patagonian tenebrionids and finally, to promote the Entomological Collection of Centro Nacional Patag

  10. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  11. RTK: efficient rarefaction analysis of large datasets.

    Science.gov (United States)

    Saary, Paul; Forslund, Kristoffer; Bork, Peer; Hildebrand, Falk

    2017-08-15

    The rapidly expanding microbiomics field is generating increasingly larger datasets, characterizing the microbiota in diverse environments. Although classical numerical ecology methods provide a robust statistical framework for their analysis, software currently available is inadequate for large datasets and some computationally intensive tasks, like rarefaction and associated analysis. Here we present a software package for rarefaction analysis of large count matrices, as well as estimation and visualization of diversity, richness and evenness. Our software is designed for ease of use, operating at least 7x faster than existing solutions, despite requiring 10x less memory. C ++ and R source code (GPL v.2) as well as binaries are available from https://github.com/hildebra/Rarefaction and from CRAN (https://cran.r-project.org/). bork@embl.de or falk.hildebrand@embl.de. Supplementary data are available at Bioinformatics online.

  12. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal...... to the voxel size showed that conventional higher-order interpolation methods improved the geometrical representation of white-matter tracts with reduced partial-volume-effect (PVE), except at tract boundaries. Simulations and interpolation of ex-vivo monkey brain DWI datasets revealed that conventional...

  13. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  14. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples.

  15. Scalable Machine Learning for Massive Astronomical Datasets

    Science.gov (United States)

    Ball, Nicholas M.; Gray, A.

    2014-04-01

    We present the ability to perform data mining and machine learning operations on a catalog of half a billion astronomical objects. This is the result of the combination of robust, highly accurate machine learning algorithms with linear scalability that renders the applications of these algorithms to massive astronomical data tractable. We demonstrate the core algorithms kernel density estimation, K-means clustering, linear regression, nearest neighbors, random forest and gradient-boosted decision tree, singular value decomposition, support vector machine, and two-point correlation function. Each of these is relevant for astronomical applications such as finding novel astrophysical objects, characterizing artifacts in data, object classification (including for rare objects), object distances, finding the important features describing objects, density estimation of distributions, probabilistic quantities, and exploring the unknown structure of new data. The software, Skytree Server, runs on any UNIX-based machine, a virtual machine, or cloud-based and distributed systems including Hadoop. We have integrated it on the cloud computing system of the Canadian Astronomical Data Centre, the Canadian Advanced Network for Astronomical Research (CANFAR), creating the world's first cloud computing data mining system for astronomy. We demonstrate results showing the scaling of each of our major algorithms on large astronomical datasets, including the full 470,992,970 objects of the 2 Micron All-Sky Survey (2MASS) Point Source Catalog. We demonstrate the ability to find outliers in the full 2MASS dataset utilizing multiple methods, e.g., nearest neighbors. This is likely of particular interest to the radio astronomy community given, for example, that survey projects contain groups dedicated to this topic. 2MASS is used as a proof-of-concept dataset due to its convenience and availability. These results are of interest to any astronomical project with large and/or complex

  16. Scalable persistent identifier systems for dynamic datasets

    Science.gov (United States)

    Golodoniuc, P.; Cox, S. J. D.; Klump, J. F.

    2016-12-01

    Reliable and persistent identification of objects, whether tangible or not, is essential in information management. Many Internet-based systems have been developed to identify digital data objects, e.g., PURL, LSID, Handle, ARK. These were largely designed for identification of static digital objects. The amount of data made available online has grown exponentially over the last two decades and fine-grained identification of dynamically generated data objects within large datasets using conventional systems (e.g., PURL) has become impractical. We have compared capabilities of various technological solutions to enable resolvability of data objects in dynamic datasets, and developed a dataset-centric approach to resolution of identifiers. This is particularly important in Semantic Linked Data environments where dynamic frequently changing data is delivered live via web services, so registration of individual data objects to obtain identifiers is impractical. We use identifier patterns and pattern hierarchies for identification of data objects, which allows relationships between identifiers to be expressed, and also provides means for resolving a single identifier into multiple forms (i.e. views or representations of an object). The latter can be implemented through (a) HTTP content negotiation, or (b) use of URI querystring parameters. The pattern and hierarchy approach has been implemented in the Linked Data API supporting the United Nations Spatial Data Infrastructure (UNSDI) initiative and later in the implementation of geoscientific data delivery for the Capricorn Distal Footprints project using International Geo Sample Numbers (IGSN). This enables flexible resolution of multi-view persistent identifiers and provides a scalable solution for large heterogeneous datasets.

  17. Data Assimilation and Model Evaluation Experiment Datasets.

    Science.gov (United States)

    Lai, Chung-Chieng A.; Qian, Wen; Glenn, Scott M.

    1994-05-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMÉE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets.The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: 1)collection of observational data; 2) analysis and interpretation; 3) interpolation using the Optimum Thermal Interpolation System package; 4) quality control and re-analysis; and 5) data archiving and software documentation.The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement.Suggestions for DAMEE data usages include 1) ocean modeling and data assimilation studies, 2) diagnosis and theorectical studies, and 3) comparisons with locally detailed observations.

  18. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap

    Directory of Open Access Journals (Sweden)

    Richard Mann

    2016-12-01

    Full Text Available Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public’s perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long-term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating healthcare delivery. We propose that greater attention (and funding needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit ‘big data’.

  19. Sally Ride EarthKAM - Automated Image Geo-Referencing Using Google Earth Web Plug-In

    Science.gov (United States)

    Andres, Paul M.; Lazar, Dennis K.; Thames, Robert Q.

    2013-01-01

    Sally Ride EarthKAM is an educational program funded by NASA that aims to provide the public the ability to picture Earth from the perspective of the International Space Station (ISS). A computer-controlled camera is mounted on the ISS in a nadir-pointing window; however, timing limitations in the system cause inaccurate positional metadata. Manually correcting images within an orbit allows the positional metadata to be improved using mathematical regressions. The manual correction process is time-consuming and thus, unfeasible for a large number of images. The standard Google Earth program allows for the importing of KML (keyhole markup language) files that previously were created. These KML file-based overlays could then be manually manipulated as image overlays, saved, and then uploaded to the project server where they are parsed and the metadata in the database is updated. The new interface eliminates the need to save, download, open, re-save, and upload the KML files. Everything is processed on the Web, and all manipulations go directly into the database. Administrators also have the control to discard any single correction that was made and validate a correction. This program streamlines a process that previously required several critical steps and was probably too complex for the average user to complete successfully. The new process is theoretically simple enough for members of the public to make use of and contribute to the success of the Sally Ride EarthKAM project. Using the Google Earth Web plug-in, EarthKAM images, and associated metadata, this software allows users to interactively manipulate an EarthKAM image overlay, and update and improve the associated metadata. The Web interface uses the Google Earth JavaScript API along with PHP-PostgreSQL to present the user the same interface capabilities without leaving the Web. The simpler graphical user interface will allow the public to participate directly and meaningfully with EarthKAM. The use of

  20. Datasets for radiation network algorithm development and testing

    Energy Technology Data Exchange (ETDEWEB)

    Rao, Nageswara S [ORNL; Sen, Satyabrata [ORNL; Berry, M. L.. [New Jersey Institute of Technology; Wu, Qishi [University of Memphis; Grieme, M. [New Jersey Institute of Technology; Brooks, Richard R [ORNL; Cordone, G. [Clemson University

    2016-01-01

    Domestic Nuclear Detection Office s (DNDO) Intelligence Radiation Sensors Systems (IRSS) program supported the development of networks of commercial-off-the-shelf (COTS) radiation counters for detecting, localizing, and identifying low-level radiation sources. Under this program, a series of indoor and outdoor tests were conducted with multiple source strengths and types, different background profiles, and various types of source and detector movements. Following the tests, network algorithms were replayed in various re-constructed scenarios using sub-networks. These measurements and algorithm traces together provide a rich collection of highly valuable datasets for testing the current and next generation radiation network algorithms, including the ones (to be) developed by broader R&D communities such as distributed detection, information fusion, and sensor networks. From this multiple TeraByte IRSS database, we distilled out and packaged the first batch of canonical datasets for public release. They include measurements from ten indoor and two outdoor tests which represent increasingly challenging baseline scenarios for robustly testing radiation network algorithms.

  1. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  2. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  3. Densidade amostral aplicada ao monitoramento georreferenciado de lagartas desfolhadoras na cultura da soja Sample density applied to the georeferenced monitoring of defoliating caterpillars in soybean crop

    Directory of Open Access Journals (Sweden)

    Cinei Teresinha Riffel

    2012-12-01

    Castilhos - RS city, in 2008/2009 season. The georeferenced monitoring was carried out following three regular grids, 50x50m, 71x71m and 100x100m and also following the traditional method of sampling. During the entire crop cycle, and for each grid, there were conducted five evaluations of caterpillar infestations, two on the vegetative stage and three in the reproductive, using a beat cloth. To analyze the spatial and temporal distribution of caterpillars in the area, the data were submitted to descriptive statistical analysis and geostatistics analysis, using semivariograms and kriging to the elaboration of thematic maps. The results obtained indicated that the evaluated sample grids allowed to characterize the spatial distribution of caterpillars and modeled the spatial variability of caterpillars in soybean crop. The sampling and the georeferenced monitoring and further development of georeferenced thematic maps constitute a potential alternative to aggregate to the IPM strategies.

  4. The Optimization of Trained and Untrained Image Classification Algorithms for Use on Large Spatial Datasets

    Science.gov (United States)

    Kocurek, Michael J.

    2005-01-01

    The HARVIST project seeks to automatically provide an accurate, interactive interface to predict crop yield over the entire United States. In order to accomplish this goal, large images must be quickly and automatically classified by crop type. Current trained and untrained classification algorithms, while accurate, are highly inefficient when operating on large datasets. This project sought to develop new variants of two standard trained and untrained classification algorithms that are optimized to take advantage of the spatial nature of image data. The first algorithm, harvist-cluster, utilizes divide-and-conquer techniques to precluster an image in the hopes of increasing overall clustering speed. The second algorithm, harvistSVM, utilizes support vector machines (SVMs), a type of trained classifier. It seeks to increase classification speed by applying a "meta-SVM" to a quick (but inaccurate) SVM to approximate a slower, yet more accurate, SVM. Speedups were achieved by tuning the algorithm to quickly identify when the quick SVM was incorrect, and then reclassifying low-confidence pixels as necessary. Comparing the classification speeds of both algorithms to known baselines showed a slight speedup for large values of k (the number of clusters) for harvist-cluster, and a significant speedup for harvistSVM. Future work aims to automate the parameter tuning process required for harvistSVM, and further improve classification accuracy and speed. Additionally, this research will move documents created in Canvas into ArcGIS. The launch of the Mars Reconnaissance Orbiter (MRO) will provide a wealth of image data such as global maps of Martian weather and high resolution global images of Mars. The ability to store this new data in a georeferenced format will support future Mars missions by providing data for landing site selection and the search for water on Mars.

  5. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    Science.gov (United States)

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  6. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  7. Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models.

    Science.gov (United States)

    Bruni, Renato; Cesarone, Francesco; Scozzari, Andrea; Tardella, Fabio

    2016-09-01

    A large number of portfolio selection models have appeared in the literature since the pioneering work of Markowitz. However, even when computational and empirical results are described, they are often hard to replicate and compare due to the unavailability of the datasets used in the experiments. We provide here several datasets for portfolio selection generated using real-world price values from several major stock markets. The datasets contain weekly return values, adjusted for dividends and for stock splits, which are cleaned from errors as much as possible. The datasets are available in different formats, and can be used as benchmarks for testing the performances of portfolio selection models and for comparing the efficiency of the algorithms used to solve them. We also provide, for these datasets, the portfolios obtained by several selection strategies based on Stochastic Dominance models (see "On Exact and Approximate Stochastic Dominance Strategies for Portfolio Selection" (Bruni et al. [2])). We believe that testing portfolio models on publicly available datasets greatly simplifies the comparison of the different portfolio selection strategies.

  8. Incorporation of Spatial Interactions in Location Networks to Identify Critical Geo-Referenced Routes for Assessing Disease Control Measures on a Large-Scale Campus

    Directory of Open Access Journals (Sweden)

    Tzai-Hung Wen

    2015-04-01

    Full Text Available Respiratory diseases mainly spread through interpersonal contact. Class suspension is the most direct strategy to prevent the spread of disease through elementary or secondary schools by blocking the contact network. However, as university students usually attend courses in different buildings, the daily contact patterns on a university campus are complicated, and once disease clusters have occurred, suspending classes is far from an efficient strategy to control disease spread. The purpose of this study is to propose a methodological framework for generating campus location networks from a routine administration database, analyzing the community structure of the network, and identifying the critical links and nodes for blocking respiratory disease transmission. The data comes from the student enrollment records of a major comprehensive university in Taiwan. We combined the social network analysis and spatial interaction model to establish a geo-referenced community structure among the classroom buildings. We also identified the critical links among the communities that were acting as contact bridges and explored the changes in the location network after the sequential removal of the high-risk buildings. Instead of conducting a questionnaire survey, the study established a standard procedure for constructing a location network on a large-scale campus from a routine curriculum database. We also present how a location network structure at a campus could function to target the high-risk buildings as the bridges connecting communities for blocking disease transmission.

  9. Fertility and Child Mortality in Urban West Africa: Leveraging geo-referenced data to move beyond the urban/rural dichotomy.

    Science.gov (United States)

    Corker, Jamaica

    2017-04-01

    Demographic research in sub-Saharan Africa (SSA) has long relied on a blunt urban/rural dichotomy that may obscure important inter-urban fertility and mortality differentials. This paper uses Demographic and Health Survey (DHS) geo-referenced data to look beyond the simple urban/rural division by spatially locating survey clusters along an urban continuum and producing estimates of fertility and child mortality by four city size categories in West Africa. Results show a gradient in urban characteristics and demographic outcomes: the largest cities are the most advantaged and smaller cities least advantaged with respect to access to urban amenities, lower fertility and under-5 survival rates. There is a difference in the patterns of fertility and under-five survival across urban categories, with fertility more linearly associated with city size while the only significant distinction for under-5 survival in urban areas is broadly between the larger and smaller cities. Notably, the small urban "satellite cities" that are adjacent to the largest cities have the most favorable outcomes of all categories. Although smaller urban areas have significantly lower fertility and child mortality than rural areas, in some cases this difference is nearly as large between the smallest and largest urban areas. These results are used to argue for the need to give greater consideration to employing an urban continuum in demographic research.

  10. EU-Forest, a high-resolution tree occurrence dataset for Europe

    Science.gov (United States)

    Mauri, Achille; Strona, Giovanni; San-Miguel-Ayanz, Jesús

    2017-01-01

    We present EU-Forest, a dataset that integrates and extends by almost one order of magnitude the publicly available information on European tree species distribution. The core of our dataset (~96% of the occurrence records) came from an unpublished, large database harmonising forest plot surveys from National Forest Inventories on an INSPIRE-compliant 1 km×1 km grid. These new data can potentially benefit several disciplines, including forestry, biodiversity conservation, palaeoecology, plant ecology, the bioeconomy, and pest management.

  11. EU-Forest, a high-resolution tree occurrence dataset for Europe

    Science.gov (United States)

    Mauri, Achille; Strona, Giovanni; San-Miguel-Ayanz, Jesús

    2017-01-01

    We present EU-Forest, a dataset that integrates and extends by almost one order of magnitude the publicly available information on European tree species distribution. The core of our dataset (~96% of the occurrence records) came from an unpublished, large database harmonising forest plot surveys from National Forest Inventories on an INSPIRE-compliant 1 km×1 km grid. These new data can potentially benefit several disciplines, including forestry, biodiversity conservation, palaeoecology, plant ecology, the bioeconomy, and pest management. PMID:28055003

  12. Dataset of anomalies and malicious acts in a cyber-physical subsystem.

    Science.gov (United States)

    Laso, Pedro Merino; Brosset, David; Puentes, John

    2017-10-01

    This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.

  13. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research.

    Science.gov (United States)

    Rinchai, Darawan; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Chaussabel, Damien

    2016-01-01

    Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

  14. A compendium of monocyte transcriptome datasets to foster biomedical knowledge discovery.

    Science.gov (United States)

    Rinchai, Darawan; Boughorbel, Sabri; Presnell, Scott; Quinn, Charlie; Chaussabel, Damien

    2016-01-01

    Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB). This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

  15. ArcHydro global datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset consists of a personal geodatabase containing several vector datasets. This database contains the information needed to link the HUCs together so a...

  16. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  17. Statistics of large detrital geochronology datasets

    Science.gov (United States)

    Saylor, J. E.; Sundell, K. E., II

    2014-12-01

    Implementation of quantitative metrics for inter-sample comparison of detrital geochronological data sets has lagged the increase in data set size, and ability to identify sub-populations and quantify their relative proportions. Visual comparison or application of some statistical approaches, particularly the Kolmogorov-Smirnov (KS) test, that initially appeared to provide a simple way of comparing detrital data sets, may be inadequate to quantify their similarity. We evaluate several proposed metrics by applying them to four large synthetic datasets drawn randomly from a parent dataset, as well as a recently published large empirical dataset consisting of four separate (n = ~1000 each) analyses of the same rock sample. Visual inspection of the cumulative probability density functions (CDF) and relative probability density functions (PDF) confirms an increasingly close correlation between data sets as the number of analyses increases. However, as data set size increases the KS test yields lower mean p-values implying greater confidence that the samples were not drawn from the same parent population and high standard deviations despite minor decreases in the mean difference between sample CDFs. We attribute this to the increasing sensitivity of the KS test when applied to larger data sets, which in turn limits its use for quantitative inter-sample comparison in detrital geochronology. Proposed alternative metrics, including Similarity, Likeness (complement to Mismatch), and the coefficient of determination (R2) of a cross-plot of PDF quantiles, point to an increasingly close correlation between data sets with increasing size, although they are the most sensitive at different ranges of data set sizes. The Similarity test is most sensitive to variation in data sets with n < 100 and is relatively insensitive to further convergence between larger data sets. The Likeness test reaches 90% of its asymptotic maximum at data set sizes of n = 200. The PDF cross-plot R2 value

  18. Medicare Physician and Other Supplier Interactive Dataset

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Centers for Medicare and Medicaid Services (CMS) has prepared a public data set, the Medicare Provider Utilization and Payment Data - Physician and Other...

  19. Controlled Vocabulary Standards for Anthropological Datasets

    Directory of Open Access Journals (Sweden)

    Celia Emmelhainz

    2014-07-01

    Full Text Available This article seeks to outline the use of controlled vocabulary standards for qualitative datasets in cultural anthropology, which are increasingly held in researcher-accessible government repositories and online digital libraries. As a humanistic science that can address almost any aspect of life with meaning to humans, cultural anthropology has proven difficult for librarians and archivists to effectively organize. Yet as anthropology moves onto the web, the challenge of organizing and curating information within the field only grows. In considering the subject classification of digital information in anthropology, I ask how we might best use controlled vocabularies for indexing digital anthropological data. After a brief discussion of likely concerns, I outline thesauri which may potentially be used for vocabulary control in metadata fields for language, location, culture, researcher, and subject. The article concludes with recommendations for those existing thesauri most suitable to provide a controlled vocabulary for describing digital objects in the anthropological world.

  20. Visualization of Cosmological Particle-Based Datasets

    CERN Document Server

    Navrátil, Paul Arthur; Bromm, Volker

    2007-01-01

    We describe our visualization process for a particle-based simulation of the formation of the first stars and their impact on cosmic history. The dataset consists of several hundred time-steps of point simulation data, with each time-step containing approximately two million point particles. For each time-step, we interpolate the point data onto a regular grid using a method taken from the radiance estimate of photon mapping. We import the resulting regular grid representation into ParaView, with which we extract isosurfaces across multiple variables. Our images provide insights into the evolution of the early universe, tracing the cosmic transition from an initially homogeneous state to one of increasing complexity. Specifically, our visualizations capture the build-up of regions of ionized gas around the first stars, their evolution, and their complex interactions with the surrounding matter. These observations will guide the upcoming James Webb Space Telescope, the key astronomy mission of the next decade.

  1. Predicting dataset popularity for the CMS experiment

    Science.gov (United States)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-10-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  2. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  3. BDML Datasets - SSBD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us SSBD BDML Datasets Data detail Data name BDML Datasets DOI 10.18908/lsdba.nbdc01349-001 Desc...This Database Database Description Download License Update History of This Database Site Policy | Contact Us BDML Datasets - SSBD | LSDB Archive ...

  4. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    Science.gov (United States)

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis.

  5. A new bed elevation dataset for Greenland

    Science.gov (United States)

    Bamber, J. L.; Griggs, J. A.; Hurkmans, R. T. W. L.; Dowdeswell, J. A.; Gogineni, S. P.; Howat, I.; Mouginot, J.; Paden, J.; Palmer, S.; Rignot, E.; Steinhage, D.

    2013-03-01

    We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM) over the entire island including across the glaciated-ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  6. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  7. Provenance of Earth Science Datasets - How Deep Should One Go?

    Science.gov (United States)

    Ramapriyan, H.; Manipon, G. J. M.; Aulenbach, S.; Duggan, B.; Goldstein, J.; Hua, H.; Tan, D.; Tilmes, C.; Wilson, B. D.; Wolfe, R.; Zednik, S.

    2015-12-01

    For credibility of scientific research, transparency and reproducibility are essential. This fundamental tenet has been emphasized for centuries, and has been receiving increased attention in recent years. The Office of Management and Budget (2002) addressed reproducibility and other aspects of quality and utility of information from federal agencies. Specific guidelines from NASA (2002) are derived from the above. According to these guidelines, "NASA requires a higher standard of quality for information that is considered influential. Influential scientific, financial, or statistical information is defined as NASA information that, when disseminated, will have or does have clear and substantial impact on important public policies or important private sector decisions." For information to be compliant, "the information must be transparent and reproducible to the greatest possible extent." We present how the principles of transparency and reproducibility have been applied to NASA data supporting the Third National Climate Assessment (NCA3). The depth of trace needed of provenance of data used to derive conclusions in NCA3 depends on how the data were used (e.g., qualitatively or quantitatively). Given that the information is diligently maintained in the agency archives, it is possible to trace from a figure in the publication through the datasets, specific files, algorithm versions, instruments used for data collection, and satellites, as well as the individuals and organizations involved in each step. Such trace back permits transparency and reproducibility.

  8. Check your biosignals here: a new dataset for off-the-person ECG biometrics.

    Science.gov (United States)

    da Silva, Hugo Plácido; Lourenço, André; Fred, Ana; Raposo, Nuno; Aires-de-Sousa, Marta

    2014-02-01

    The Check Your Biosignals Here initiative (CYBHi) was developed as a way of creating a dataset and consistently repeatable acquisition framework, to further extend research in electrocardiographic (ECG) biometrics. In particular, our work targets the novel trend towards off-the-person data acquisition, which opens a broad new set of challenges and opportunities both for research and industry. While datasets with ECG signals collected using medical grade equipment at the chest can be easily found, for off-the-person ECG data the solution is generally for each team to collect their own corpus at considerable expense of resources. In this paper we describe the context, experimental considerations, methods, and preliminary findings of two public datasets created by our team, one for short-term and another for long-term assessment, with ECG data collected at the hand palms and fingers.

  9. Utilizing reanalysis and synthesis datasets in wind resource characterization for large-scale wind integration

    Energy Technology Data Exchange (ETDEWEB)

    Henson, William L.W. [ISO New England Inc., Holyoke, MA (United States); McGowan, Jon G.; Manwell, James F. [Massachusetts Univ., Amherst, MA (United States). Wind Energy Center

    2010-07-01

    As wind plants become a more substantial portion of the generation resource, the ability of and manner in which this new fleet of generation supports meeting the power system load in a given area must be quantified in order to ensure security of supply. This paper describes the manner in which a reanalysis dataset - the National Aeronautics and Space Administration (NASA) Modern Era Retrospective-Analysis for Research and Applications (MERRA) dataset - was utilized in conjunction with the National Renewable Energy Laboratory (NREL) Eastern Wind Integration Dataset in order to perform an estimation of the interannual variability in wind power production as related to the capacity value of the investigated potential wind plants. Also described in the paper is a comparison of the MERRA data with publicly available wind data collected by the University of Massachusetts Wind Energy Center (UMass WEC). (orig.)

  10. An Image-Based Approach for the Co-Registration of Multi-Temporal UAV Image Datasets

    Directory of Open Access Journals (Sweden)

    Irene Aicardi

    2016-09-01

    Full Text Available During the past years, UAVs (Unmanned Aerial Vehicles became very popular as low-cost image acquisition platforms since they allow for high resolution and repetitive flights in a flexible way. One application is to monitor dynamic scenes. However, the fully automatic co-registration of the acquired multi-temporal data still remains an open issue. Most UAVs are not able to provide accurate direct image georeferencing and the co-registration process is mostly performed with the manual introduction of ground control points (GCPs, which is time consuming, costly and sometimes not possible at all. A new technique to automate the co-registration of multi-temporal high resolution image blocks without the use of GCPs is investigated in this paper. The image orientation is initially performed on a reference epoch and the registration of the following datasets is achieved including some anchor images from the reference data. The interior and exterior orientation parameters of the anchor images are then fixed in order to constrain the Bundle Block Adjustment of the slave epoch to be aligned with the reference one. The study involved the use of two different datasets acquired over a construction site and a post-earthquake damaged area. Different tests have been performed to assess the registration procedure using both a manual and an automatic approach for the selection of anchor images. The tests have shown that the procedure provides results comparable to the traditional GCP-based strategy and both the manual and automatic selection of the anchor images can provide reliable results.

  11. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

    Directory of Open Access Journals (Sweden)

    Carroll Adam J

    2010-07-01

    Full Text Available Abstract Background Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Description Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.. Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP their own data to the server for online processing via a novel raw data processing pipeline. Conclusions MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to

  12. Generation of open biomedical datasets through ontology-driven transformation and integration processes.

    Science.gov (United States)

    Carmen Legaz-García, María Del; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

    2016-06-03

    Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.

  13. The health care and life sciences community profile for dataset descriptions

    Directory of Open Access Journals (Sweden)

    Michel Dumontier

    2016-08-01

    Full Text Available Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG identified Resource Description Framework (RDF vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

  14. Development of a Geo-Referenced Database for Weed Mapping and Analysis of Agronomic Factors Affecting Herbicide Resistance in Apera spica-venti L. Beauv. (Silky Windgrass

    Directory of Open Access Journals (Sweden)

    Dario Massa

    2013-01-01

    Full Text Available In this work, we evaluate the role of agronomic factors in the selection for herbicide resistance in Apera spica-venti L. Beauv. (silky windgrass. During a period of three years, populations were collected in more than 250 conventional fields across Europe and tested for resistance in the greenhouse. After recording the field history of locations, a geo-referenced database has been developed to map the distribution of herbicide-resistant A. spica-venti populations in Europe. A Logistic Regression Model was used to assess whether and to what extent agricultural and biological factors (crop rotation, soil tillage, sowing date, soil texture and weed density affect the probability of resistance selection apart from the selection pressure due to herbicide application. Our results revealed that rotation management and soil tillage are the factors that have the greatest influence on the model. In addition, first order interactions between these two variables were highly significant. Under conventional tillage, a percentage of winter crops in the rotation exceeding 75% resulted in a 1280-times higher risk of resistance selection compared to rotations with less than 50% of winter crops. Under conservation tillage, the adoption of >75% of winter crops increased the risk of resistance 13-times compared to rotations with less than 50% of winter crops. Finally, early sowing and high weed density significantly increased the risk of resistance compared to the reference categories (later sowing and low weed density, respectively. Soil texture had no significant influence. The developed model can find application in management programs aimed at preventing the evolution and spread of herbicide resistance in weed populations.

  15. Mapping of the environmental contamination of Toxoplasma gondii by georeferencing isolates from chickens in an endemic area in Southeast Rio de Janeiro State, Brazil.

    Science.gov (United States)

    Casartelli-Alves, Luciana; Amendoeira, Maria Regina Reis; Boechat, Viviane Cardoso; Ferreira, Luiz Cláudio; Carreira, João Carlos Araujo; Nicolau, José Leonardo; de Freitas Trindade, Eloiza Paula; de Barros Peixoto, Julia Novaes; Magalhães, Mônica de Avelar Figueiredo Mafra; de Oliveira, Raquel de Vasconcellos Carvalhaes; Schubach, Tânia Maria Pacheco; Menezes, Rodrigo Caldas

    2015-05-18

    The environmental contamination of Toxoplasma gondii in an endemic area in Brazil was mapped by georeferencing isolates from chickens in farms in the Southeast of the state of Rio de Janeiro. Tissue samples obtained from 153 adult chickens were analyzed by the mouse bioassay for T. gondii infection. These animals were reared free-range on 51 farms in the municipalities of Rio Bonito and Maricá. The ArcGIS kernel density estimator based on the frequency of T. gondii-positive chickens was used to map the environmental contamination with this parasite. A questionnaire was applied to obtain data on the presence and management of cats and the type of water consumed. Of the farms studied, 64.7% were found to be located in areas of low to medium presence of T. gondii, 27.5% in areas with a high or very high contamination level and 7.8% in non-contaminated areas. Additionally, 70.6% kept cats, 66.7% were near water sources and 45.0% were in or near dense vegetation. Humans used untreated water for drinking on 41.2% of the farms, while all animals were given untreated water. The intensity of environmental T. gondii contamination was significantly higher on farms situated at a distance >500 m from water sources (P=0.007) and near (≤500 m) dense vegetation (P=0.003). Taken together, the results indicate a high probability of T. gondii infection of humans and animals living on the farms studied. The kernel density estimator obtained based on the frequency of chickens testing positive for T. gondii in the mouse bioassay was useful to map environmental contamination with this parasite.

  16. ionFR: Ionospheric Faraday rotation [Dataset

    NARCIS (Netherlands)

    Sotomayor-Beltran, C.; et al., [Unknown; Hessels, J.W.T.; van Leeuwen, J.; Markoff, S.; Wijers, R.A.M.J.

    2013-01-01

    IonFR calculates the amount of ionospheric Faraday rotation for a specific epoch, geographic location, and line-of-sight. The code uses a number of publicly available, GPS-derived total electron content maps and the most recent release of the International Geomagnetic Reference Field. ionFR can be

  17. A large-scale dataset of solar event reports from automated feature recognition modules

    Science.gov (United States)

    Schuh, Michael A.; Angryk, Rafal A.; Martens, Petrus C.

    2016-05-01

    The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO) mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK) from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT). With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD). Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.

  18. A large-scale dataset of solar event reports from automated feature recognition modules

    Directory of Open Access Journals (Sweden)

    Schuh Michael A.

    2016-01-01

    Full Text Available The massive repository of images of the Sun captured by the Solar Dynamics Observatory (SDO mission has ushered in the era of Big Data for Solar Physics. In this work, we investigate the entire public collection of events reported to the Heliophysics Event Knowledgebase (HEK from automated solar feature recognition modules operated by the SDO Feature Finding Team (FFT. With the SDO mission recently surpassing five years of operations, and over 280,000 event reports for seven types of solar phenomena, we present the broadest and most comprehensive large-scale dataset of the SDO FFT modules to date. We also present numerous statistics on these modules, providing valuable contextual information for better understanding and validating of the individual event reports and the entire dataset as a whole. After extensive data cleaning through exploratory data analysis, we highlight several opportunities for knowledge discovery from data (KDD. Through these important prerequisite analyses presented here, the results of KDD from Solar Big Data will be overall more reliable and better understood. As the SDO mission remains operational over the coming years, these datasets will continue to grow in size and value. Future versions of this dataset will be analyzed in the general framework established in this work and maintained publicly online for easy access by the community.

  19. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  20. Reconstructing thawing quintessence with multiple datasets

    CERN Document Server

    Lima, Nelson A; Sahlén, Martin; Parkinson, David

    2015-01-01

    In this work we model the quintessence potential in a Taylor series expansion, up to second order, around the present-day value of the scalar field. The field is evolved in a thawing regime assuming zero initial velocity. We use the latest data from the Planck satellite, baryonic acoustic oscillations observations from the Sloan Digital Sky Survey, and Supernovae luminosity distance information from Union$2.1$ to constrain our models parameters, and also include perturbation growth data from WiggleZ. We show explicitly that the growth data does not perform as well as the other datasets in constraining the dark energy parameters we introduce. We also show that the constraints we obtain for our model parameters, when compared to previous works of nearly a decade ago, have not improved significantly. This is indicative of how little dark energy constraints, overall, have improved in the last decade, even when we add new growth of structure data to previous existent types of data.

  1. Workflow to numerically reproduce laboratory ultrasonic datasets

    Institute of Scientific and Technical Information of China (English)

    A. Biryukov; N. Tisato; G. Grasselli

    2014-01-01

    The risks and uncertainties related to the storage of high-level radioactive waste (HLRW) can be reduced thanks to focused studies and investigations. HLRWs are going to be placed in deep geological re-positories, enveloped in an engineered bentonite barrier, whose physical conditions are subjected to change throughout the lifespan of the infrastructure. Seismic tomography can be employed to monitor its physical state and integrity. The design of the seismic monitoring system can be optimized via con-ducting and analyzing numerical simulations of wave propagation in representative repository geometry. However, the quality of the numerical results relies on their initial calibration. The main aim of this paper is to provide a workflow to calibrate numerical tools employing laboratory ultrasonic datasets. The finite difference code SOFI2D was employed to model ultrasonic waves propagating through a laboratory sample. Specifically, the input velocity model was calibrated to achieve a best match between experi-mental and numerical ultrasonic traces. Likely due to the imperfections of the contact surfaces, the resultant velocities of P- and S-wave propagation tend to be noticeably lower than those a priori assigned. Then, the calibrated model was employed to estimate the attenuation in a montmorillonite sample. The obtained low quality factors (Q) suggest that pronounced inelastic behavior of the clay has to be taken into account in geophysical modeling and analysis. Consequently, this contribution should be considered as a first step towards the creation of a numerical tool to evaluate wave propagation in nuclear waste repositories.

  2. Classification of antimicrobial peptides with imbalanced datasets

    Science.gov (United States)

    Camacho, Francy L.; Torres, Rodrigo; Ramos Pollán, Raúl

    2015-12-01

    In the last years, pattern recognition has been applied to several fields for solving multiple problems in science and technology as for example in protein prediction. This methodology can be useful for prediction of activity of biological molecules, e.g. for determination of antimicrobial activity of synthetic and natural peptides. In this work, we evaluate the performance of different physico-chemical properties of peptides (descriptors groups) in the presence of imbalanced data sets, when facing the task of detecting whether a peptide has antimicrobial activity. We evaluate undersampling and class weighting techniques to deal with the class imbalance with different classification methods and descriptor groups. Our classification model showed an estimated precision of 96% showing that descriptors used to codify the amino acid sequences contain enough information to correlate the peptides sequences with their antimicrobial activity by means of learning machines. Moreover, we show how certain descriptor groups (pseudoaminoacid composition type I) work better with imbalanced datasets while others (dipeptide composition) work better with balanced ones.

  3. Publicity and public relations

    Science.gov (United States)

    Fosha, Charles E.

    1990-01-01

    This paper addresses approaches to using publicity and public relations to meet the goals of the NASA Space Grant College. Methods universities and colleges can use to publicize space activities are presented.

  4. Dataset for analysing the relationships among economic growth, fossil fuel and non-fossil fuel consumption.

    Science.gov (United States)

    Asafu-Adjaye, John; Byrne, Dominic; Alvarez, Maximiliano

    2017-02-01

    The data presented in this article are related to the research article entitled 'Economic Growth, Fossil Fuel and Non-Fossil Consumption: A Pooled Mean Group Analysis using Proxies for Capital' (J. Asafu-Adjaye, D. Byrne, M. Alvarez, 2016) [1]. This article describes data modified from three publicly available data sources: the World Bank׳s World Development Indicators (http://databank.worldbank.org/data/reports.aspx?source=world-development-indicators), the U.S. Energy Information Administration׳s International Energy Statistics (http://www.eia.gov/cfapps/ipdbproject/IEDIndex3.cfm?tid=44&pid=44&aid=2) and the Barro-Lee Educational Attainment Dataset (http://www.barrolee.com). These data can be used to examine the relationships between economic growth and different forms of energy consumption. The dataset is made publicly available to promote further analyses.

  5. Datasets in Gene Expression Omnibus used in the study ORD-020382: Evaluation of estrogen receptor alpha activation by glyphosate-based herbicide constituents

    Data.gov (United States)

    U.S. Environmental Protection Agency — GEO accession number of the microarray study. This dataset is associated with the following publication: Mesnage, R., A. Phedonos, M. Biserni, M. Arno, S. Balu, C....

  6. Dataset - Evaluation of Standardized Sample Collection, Packaging, and Decontamination Procedures to Assess Cross-Contamination Potential during Bacillus anthracis Incident Response Operations

    Data.gov (United States)

    U.S. Environmental Protection Agency — Spore recovery data during sample packaging decontamination tests. This dataset is associated with the following publication: Calfee, W., J. Tufts, K. Meyer, K....

  7. A reference GNSS tropospheric dataset over Europe.

    Science.gov (United States)

    Pacione, Rosa; Di Tomaso, Simona

    2016-04-01

    The present availability of 18 years of GNSS data belonging to the European Permanent Network (EPN, http://www.epncb.oma.be/) is a valuable database for the development of a climate data record of GNSS tropospheric products over Europe. This dataset has high potential for monitoring trend and variability in atmospheric water vapour, improving the knowledge of climatic trends of atmospheric water vapour and being useful for global and regional NWP reanalyses as well as climate model simulations. In the framework of the EPN-Repro2, a second reprocessing campaign of the EPN, five Analysis Centres have homogenously reprocessed the EPN network for the 1996-2013. Three Analysis Centres are providing homogenously reprocessed solutions for the entire network, which are analyzed by the three different software packages: Bernese, GAMIT and GIPSY-OASIS. Smaller subnetworks based on Bernese 5.2 are also provided. A huge effort is made for providing solutions that are the basis for deriving new coordinates, velocities and troposphere parameters, Zenith Tropospheric Delays and Horizontal Gradients, for the entire EPN. These individual contributions are combined in order to provide the official EPN reprocessed products. A preliminary tropospheric combined solution for the period 1996-2013 has been carried out. It is based on all the available homogenously reprocessed solutions and it offers the possibility to assess each of them prior to the ongoing final combination. We will present the results of the EPN Repro2 tropospheric combined products and how the climate community will benefit from them. Aknowledgment.The EPN Repro2 working group is acknowledged for providing the EPN solutions used in this work. E-GEOS activity is carried out in the framework of ASI contract 2015-050-R.0.

  8. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. A. Griggs

    2012-11-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2011. Around 344 000 line kilometres of airborne data were used, with the majority of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated/ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice shelf thickness was determined where a floating tongue exists, in particular in the north. The across-track spacing between flight lines warranted interpolation at 1 km postings near the ice sheet margin and 2.5 km in the interior. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±6 m to about ±200 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new data sets, particularly along the ice sheet margin, where ice velocity is highest and changes most marked. We use the new bed and surface DEMs to calculate the hydraulic potential for subglacial flow and present the large scale pattern of water routing. We estimate that the volume of ice included in our land/ice mask would raise eustatic sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  9. Public Health Offices, Public Health Agencies - county, name, address, contact info, email, website, Published in 2007, Iowa Dept. of Public Health.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Public Health Offices dataset, was produced all or in part from Published Reports/Deeds information as of 2007. It is described as 'Public Health Agencies -...

  10. Cetacean Density Estimation from Novel Acoustic Datasets by Acoustic Propagation Modeling

    Science.gov (United States)

    2014-09-30

    1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Cetacean Density Estimation from Novel Acoustic Datasets...OBJECTIVES The objectives of this research are to apply existing methods for cetacean density estimation from passive acoustic recordings made by single...sensors, to novel data sets and cetacean species, as well as refine the existing techniques in order to develop a more generalized model that can be

  11. Geocoding large population-level administrative datasets at highly resolved spatial scales

    OpenAIRE

    Edwards, Sharon E.; Strauss, Benjamin; Miranda, Marie Lynn

    2013-01-01

    Using geographic information systems to link administrative databases with demographic, social, and environmental data allows researchers to use spatial approaches to explore relationships between exposures and health. Traditionally, spatial analysis in public health has focused on the county, zip code, or tract level because of limitations to geocoding at highly resolved scales. Using 2005 birth and death data from North Carolina, we examine our ability to geocode population-level datasets a...

  12. A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

    OpenAIRE

    Karanam, Srikrishna; Gou, Mengran; Wu, Ziyan; Rates-Borras, Angels; Camps, Octavia; Radke, Richard J.

    2016-01-01

    Person re-identification (re-id) is a critical problem in video analytics applications such as security and surveillance. The public release of several datasets and code for vision algorithms has facilitated rapid progress in this area over the last few years. However, directly comparing re-id algorithms reported in the literature has become difficult since a wide variety of features, experimental protocols, and evaluation metrics are employed. In order to address this need, we present an ext...

  13. Taking the Temperature of Pedestrian Movement in Public Spaces

    DEFF Research Database (Denmark)

    Nielsen, Søren Zebitz; Gade, Rikke; Moeslund, Thomas B.

    2014-01-01

    Cities require data on pedestrian movement to evaluate the use of public spaces. We propose a system using thermal cameras and Computer Vision (CV) combined with Geographical Information Systems (GIS) to track and assess pedestrian dynamics and behaviors in urban plazas. Thermal cameras operate...... independent of light and the technique is non-intrusive and preserves privacy. The approach extends the analysis to the GIS domain by capturing georeferenced tracks. We present a pilot study conducted in Copenhagen in 2013. The tracks retrieved by CV are compared to manually annotated ground truth tracks...

  14. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    facilitate rapid and efficient updating of dataset metadata records by external data producers. Here we present this new service and demonstrate the variety of ways in which a multitude of Earth Science datasets may be submitted in a manner that significantly reduces the time in ensuring that new, vital data reaches the public domain.

  15. Application of Huang-Hilbert Transforms to Geophysical Datasets

    Science.gov (United States)

    Duffy, Dean G.

    2003-01-01

    The Huang-Hilbert transform is a promising new method for analyzing nonstationary and nonlinear datasets. In this talk I will apply this technique to several important geophysical datasets. To understand the strengths and weaknesses of this method, multi- year, hourly datasets of the sea level heights and solar radiation will be analyzed. Then we will apply this transform to the analysis of gravity waves observed in a mesoscale observational net.

  16. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  17. Dataset on daytime outdoor thermal comfort for Belo Horizonte, Brazil

    Directory of Open Access Journals (Sweden)

    Simone Queiroz da Silveira Hirashima

    2016-12-01

    Full Text Available This dataset describe microclimatic parameters of two urban open public spaces in the city of Belo Horizonte, Brazil; physiological equivalent temperature (PET index values and the related subjective responses of interviewees regarding thermal sensation perception and preference and thermal comfort evaluation. Individuals and behavioral characteristics of respondents were also presented. Data were collected at daytime, in summer and winter, 2013. Statistical treatment of this data was firstly presented in a PhD Thesis (“Percepção sonora e térmica e avaliação de conforto em espaços urbanos abertos do município de Belo Horizonte – MG, Brasil” (Hirashima, 2014 [1], providing relevant information on thermal conditions in these locations and on thermal comfort assessment. Up to now, this data was also explored in the article “Daytime Thermal Comfort in Urban Spaces: A Field Study in Brazil” (Hirashima et al., in press [2]. These references are recommended for further interpretation and discussion.

  18. Pgu-Face: A dataset of partially covered facial images

    Directory of Open Access Journals (Sweden)

    Seyed Reza Salari

    2016-12-01

    Full Text Available In this article we introduce a human face image dataset. Images were taken in close to real-world conditions using several cameras, often mobile phone׳s cameras. The dataset contains 224 subjects imaged under four different figures (a nearly clean-shaven countenance, a nearly clean-shaven countenance with sunglasses, an unshaven or stubble face countenance, an unshaven or stubble face countenance with sunglasses in up to two recording sessions. Existence of partially covered face images in this dataset could reveal the robustness and efficiency of several facial image processing algorithms. In this work we present the dataset and explain the recording method.

  19. Compression method based on training dataset of SVM

    Institute of Scientific and Technical Information of China (English)

    2008-01-01

    The method to compress the training dataset of Support Vector Machine (SVM) based on the character of the Support Vector Machine is proposed.First,the distance between the unit in two training datasets,and then the samples that keep away from hyper-plane are discarded in order to compress the training dataset.The time spent in training SVM with the training dataset compressed by the method is shortened obviously.The result of the experiment shows that the algorithm is effective.

  20. Providing Geographic Datasets as Linked Data in Sdi

    Science.gov (United States)

    Hietanen, E.; Lehto, L.; Latvala, P.

    2016-06-01

    In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium's (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  1. Synthetic neuronal datasets for benchmarking directed functional connectivity metrics

    National Research Council Canada - National Science Library

    Rodrigues, João; Andrade, Alexandre

    2015-01-01

    Background. Datasets consisting of synthetic neural data generated with quantifiable and controlled parameters are a valuable asset in the process of testing and validating directed functional connectivity metrics...

  2. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  3. Development of a Global Historic Monthly Mean Precipitation Dataset

    Institute of Scientific and Technical Information of China (English)

    杨溯; 徐文慧; 许艳; 李庆祥

    2016-01-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  4. Development of a global historic monthly mean precipitation dataset

    Science.gov (United States)

    Yang, Su; Xu, Wenhui; Xu, Yan; Li, Qingxiang

    2016-04-01

    Global historic precipitation dataset is the base for climate and water cycle research. There have been several global historic land surface precipitation datasets developed by international data centers such as the US National Climatic Data Center (NCDC), European Climate Assessment & Dataset project team, Met Office, etc., but so far there are no such datasets developed by any research institute in China. In addition, each dataset has its own focus of study region, and the existing global precipitation datasets only contain sparse observational stations over China, which may result in uncertainties in East Asian precipitation studies. In order to take into account comprehensive historic information, users might need to employ two or more datasets. However, the non-uniform data formats, data units, station IDs, and so on add extra difficulties for users to exploit these datasets. For this reason, a complete historic precipitation dataset that takes advantages of various datasets has been developed and produced in the National Meteorological Information Center of China. Precipitation observations from 12 sources are aggregated, and the data formats, data units, and station IDs are unified. Duplicated stations with the same ID are identified, with duplicated observations removed. Consistency test, correlation coefficient test, significance t-test at the 95% confidence level, and significance F-test at the 95% confidence level are conducted first to ensure the data reliability. Only those datasets that satisfy all the above four criteria are integrated to produce the China Meteorological Administration global precipitation (CGP) historic precipitation dataset version 1.0. It contains observations at 31 thousand stations with 1.87 × 107 data records, among which 4152 time series of precipitation are longer than 100 yr. This dataset plays a critical role in climate research due to its advantages in large data volume and high density of station network, compared to

  5. Accuracy assessment of gridded precipitation datasets in the Himalayas

    Science.gov (United States)

    Khan, A.

    2015-12-01

    Accurate precipitation data are vital for hydro-climatic modelling and water resources assessments. Based on mass balance calculations and Turc-Budyko analysis, this study investigates the accuracy of twelve widely used precipitation gridded datasets for sub-basins in the Upper Indus Basin (UIB) in the Himalayas-Karakoram-Hindukush (HKH) region. These datasets are: 1) Global Precipitation Climatology Project (GPCP), 2) Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP), 3) NCEP / NCAR, 4) Global Precipitation Climatology Centre (GPCC), 5) Climatic Research Unit (CRU), 6) Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation of Water Resources (APHRODITE), 7) Tropical Rainfall Measuring Mission (TRMM), 8) European Reanalysis (ERA) interim data, 9) PRINCETON, 10) European Reanalysis-40 (ERA-40), 11) Willmott and Matsuura, and 12) WATCH Forcing Data based on ERA interim (WFDEI). Precipitation accuracy and consistency was assessed by physical mass balance involving sum of annual measured flow, estimated actual evapotranspiration (average of 4 datasets), estimated glacier mass balance melt contribution (average of 4 datasets), and ground water recharge (average of 3 datasets), during 1999-2010. Mass balance assessment was complemented by Turc-Budyko non-dimensional analysis, where annual precipitation, measured flow and potential evapotranspiration (average of 5 datasets) data were used for the same period. Both analyses suggest that all tested precipitation datasets significantly underestimate precipitation in the Karakoram sub-basins. For the Hindukush and Himalayan sub-basins most datasets underestimate precipitation, except ERA-interim and ERA-40. The analysis indicates that for this large region with complicated terrain features and stark spatial precipitation gradients the reanalysis datasets have better consistency with flow measurements than datasets derived from records of only sparsely distributed climatic

  6. Public health, GIS, and the internet.

    Science.gov (United States)

    Croner, Charles M

    2003-01-01

    Internet access and use of georeferenced public health information for GIS application will be an important and exciting development for the nation's Department of Health and Human Services and other health agencies in this new millennium. Technological progress toward public health geospatial data integration, analysis, and visualization of space-time events using the Web portends eventual robust use of GIS by public health and other sectors of the economy. Increasing Web resources from distributed spatial data portals and global geospatial libraries, and a growing suite of Web integration tools, will provide new opportunities to advance disease surveillance, control, and prevention, and insure public access and community empowerment in public health decision making. Emerging supercomputing, data mining, compression, and transmission technologies will play increasingly critical roles in national emergency, catastrophic planning and response, and risk management. Web-enabled public health GIS will be guided by Federal Geographic Data Committee spatial metadata, OpenGIS Web interoperability, and GML/XML geospatial Web content standards. Public health will become a responsive and integral part of the National Spatial Data Infrastructure.

  7. Formerly Used Defense Sites (FUDS) Public Properties

    Data.gov (United States)

    Department of Homeland Security — The FUDS Public GIS dataset contains point location information for the 2,709 Formerly Used Defense Sites (FUDS) properties where the U.S. Army Corps of Engineers is...

  8. The Lung TIME: annotated lung nodule dataset and nodule detection framework

    Science.gov (United States)

    Dolejsi, Martin; Kybic, Jan; Polovincak, Michal; Tuma, Stanislav

    2009-02-01

    The Lung Test Images from Motol Environment (Lung TIME) is a new publicly available dataset of thoracic CT scans with manually annotated pulmonary nodules. It is larger than other publicly available datasets. Pulmonary nodules are lesions in the lungs, which may indicate lung cancer. Their early detection significantly improves survival rate of patients. Automatic nodule detecting systems using CT scans are being developed to reduce physicians' load and to improve detection quality. Besides presenting our own nodule detection system, in this article, we mainly address the problem of testing and comparison of automatic nodule detection methods. Our publicly available 157 CT scan dataset with 394 annotated nodules contains almost every nodule types (pleura attached, vessel attached, solitary, regular, irregular) with 2-10mm in diameter, except ground glass opacities (GGO). Annotation was done consensually by two experienced radiologists. The data are in DICOM format, annotations are provided in XML format compatible with the Lung Imaging Database Consortium (LIDC). Our computer aided diagnosis system (CAD) is based on mathematical morphology and filtration with a subsequent classification step. We use Asymmetric AdaBoost classifier. The system was tested using TIME, LIDC and ANODE09 databases. The performance was evaluated by cross-validation for Lung TIME and LIDC, and using the supplied evaluation procedure for ANODE09. The sensitivity at chosen working point was 94.27% with 7.57 false positives/slice for TIME and LIDC datasets combined, 94.03% with 5.46 FPs/slice for the Lung TIME, 89.62% sensitivity with 12.03 FPs/slice for LIDC, and 78.68% with 4,61 FPs/slice when applied on ANODE09.

  9. Global Drought Assessment using a Multi-Model Dataset

    NARCIS (Netherlands)

    Lanen, van H.A.J.; Huijgevoort, van M.H.J.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, van A.F.; Estifanos, S.; Melsen, L.A.

    2011-01-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological

  10. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  11. Primary Datasets for Case Studies of River-Water Quality

    Science.gov (United States)

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  12. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  13. Primary Datasets for Case Studies of River-Water Quality

    Science.gov (United States)

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  14. Global Drought Assessment using a Multi-Model Dataset

    NARCIS (Netherlands)

    Lanen, van H.A.J.; Huijgevoort, van M.H.J.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, van A.F.; Estifanos, S.; Melsen, L.A.

    2011-01-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological forc

  15. Querying Patterns in High-Dimensional Heterogenous Datasets

    Science.gov (United States)

    Singh, Vishwakarma

    2012-01-01

    The recent technological advancements have led to the availability of a plethora of heterogenous datasets, e.g., images tagged with geo-location and descriptive keywords. An object in these datasets is described by a set of high-dimensional feature vectors. For example, a keyword-tagged image is represented by a color-histogram and a…

  16. Estimated Public Supply Water Use of the Southwest Principal Aquifers (SWPA) study in 2005

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — This dataset is a 100-meter cell resolution raster of estimated use of public supply water for the southwestern United States. The dataset was generated from...

  17. Dataset of the proteome of purified outer membrane vesicles from the human pathogen Aggregatibacter actinomycetemcomintans

    Directory of Open Access Journals (Sweden)

    Thomas Kieselbach

    2017-02-01

    Full Text Available The Gram-negative bacterium Aggregatibacter actinomycetemcomitans is an oral and systemic pathogen, which is linked to aggressive forms of periodontitis and can be associated with endocarditis. The outer membrane vesicles (OMVs of this species contain effector proteins such as cytolethal distending toxin (CDT and leukotoxin (LtxA, which they can deliver into human host cells. The OMVs can also activate innate immunity through NOD1- and NOD2-active pathogen-associated molecular patterns. This dataset provides a proteome of highly purified OMVs from A. actinomycetemcomitans serotype e strain 173. The experimental data do not only include the raw data of the LC-MS/MS analysis of four independent preparations of purified OMVs but also the mass lists of the processed data and the Mascot.dat files from the database searches. In total 501 proteins are identified, of which 151 are detected in at least three of four independent preparations. In addition, this dataset contains the COG definitions and the predicted subcellular locations (PSORTb 3.0 for the entire genome of A. actinomycetemcomitans serotype e strain SC1083, which is used for the evaluation of the LC-MS/MS data. These data are deposited in ProteomeXchange in the public dataset PXD002509. In addition, a scientific interpretation of this dataset by Kieselbach et al. (2015 [2] is available at http://dx.doi.org/10.1371/journal.pone.0138591.

  18. Squish: Near-Optimal Compression for Archival of Relational Datasets

    Science.gov (United States)

    Gao, Yihan; Parameswaran, Aditya

    2017-01-01

    Relational datasets are being generated at an alarmingly rapid rate across organizations and industries. Compressing these datasets could significantly reduce storage and archival costs. Traditional compression algorithms, e.g., gzip, are suboptimal for compressing relational datasets since they ignore the table structure and relationships between attributes. We study compression algorithms that leverage the relational structure to compress datasets to a much greater extent. We develop Squish, a system that uses a combination of Bayesian Networks and Arithmetic Coding to capture multiple kinds of dependencies among attributes and achieve near-entropy compression rate. Squish also supports user-defined attributes: users can instantiate new data types by simply implementing five functions for a new class interface. We prove the asymptotic optimality of our compression algorithm and conduct experiments to show the effectiveness of our system: Squish achieves a reduction of over 50% in storage size relative to systems developed in prior work on a variety of real datasets.

  19. New model for datasets citation and extraction reproducibility in VAMDC

    CERN Document Server

    Zwölf, Carlo Maria; Dubernet, Marie-Lise

    2016-01-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favour reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  20. New model for datasets citation and extraction reproducibility in VAMDC

    Science.gov (United States)

    Zwölf, Carlo Maria; Moreau, Nicolas; Dubernet, Marie-Lise

    2016-09-01

    In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favor reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.

  1. Public Education, Public Good.

    Science.gov (United States)

    Tomlinson, John

    1986-01-01

    Criticizes policies which would damage or destroy a public education system. Examines the relationship between government-provided education and democracy. Concludes that privatization of public education would emphasize self-interest and selfishness, further jeopardizing the altruism and civic mindedness necessary for the public good. (JDH)

  2. The Stream-Catchment (StreamCat) Dataset: A database of watershed metrics for the conterminous USA

    Science.gov (United States)

    We developed an extensive database of landscape metrics for ~2.65 million streams, and their associated catchments, within the conterminous USA: The Stream-Catchment (StreamCat) Dataset. These data are publically available and greatly reduce the specialized geospatial expertise n...

  3. Geocoding large population-level administrative datasets at highly resolved spatial scales.

    Science.gov (United States)

    Edwards, Sharon E; Strauss, Benjamin; Miranda, Marie Lynn

    2014-08-01

    Using geographic information systems to link administrative databases with demographic, social, and environmental data allows researchers to use spatial approaches to explore relationships between exposures and health. Traditionally, spatial analysis in public health has focused on the county, zip code, or tract level because of limitations to geocoding at highly resolved scales. Using 2005 birth and death data from North Carolina, we examine our ability to geocode population-level datasets at three spatial resolutions - zip code, street, and parcel. We achieve high geocoding rates at all three resolutions, with statewide street geocoding rates of 88.0% for births and 93.2% for deaths. We observe differences in geocoding rates across demographics and health outcomes, with lower geocoding rates in disadvantaged populations and the most dramatic differences occurring across the urban-rural spectrum. Our results suggest highly resolved spatial data architectures for population-level datasets are viable through geocoding individual street addresses. We recommend routinely geocoding administrative datasets to the highest spatial resolution feasible, allowing public health researchers to choose the spatial resolution used in analysis based on an understanding of the spatial dimensions of the health outcomes and exposures being investigated. Such research, however, must acknowledge how disparate geocoding success across subpopulations may affect findings.

  4. Live time-lapse dataset of in vitro wound healing experiments.

    Science.gov (United States)

    Zaritsky, Assaf; Natan, Sari; Kaplan, Doron; Ben-Jacob, Eshel; Tsarfaty, Ilan

    2015-01-01

    The wound healing assay is the common method to study collective cell migration in vitro. Computational analyses of live imaging exploit the rich temporal information and significantly improve understanding of complex phenomena that emerge during this mode of collective motility. Publicly available experimental data can allow application of new analyses to promote new discoveries, and assess algorithms' capabilities to distinguish between different experimental conditions. A freely-available dataset of 31 time-lapse in vitro wound healing experiments of two cell lines is presented. It consists of six different experimental conditions with 4-6 replicates each, gathered to study the effects of a growth factor on collective cell migration. The raw data is available at 'The Cell: an Image Library' repository. This Data Note provides detailed description of the data, intermediately processed data, scripts and experimental validations that have not been reported before and are currently available at GigaDB. This is the first publicly available repository of live collective cell migration data that includes independent replicates for each set of conditions. This dataset has the potential for extensive reuse. Some aspects in the data remain unexplored and can be exploited extensively to reveal new insight. The dataset could also be used to assess the performance of available and new quantification methods by demonstrating phenotypic discriminatory capabilities between the different experimental conditions. It may allow faster and more elaborated, reproducible and effective analyses, which will likely lead to new biological and biophysical discoveries.

  5. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  6. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  7. Advancements in Wind Integration Study Data Modeling: The Wind Integration National Dataset (WIND) Toolkit; Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Draxl, C.; Hodge, B. M.; Orwig, K.; Jones, W.; Searight, K.; Getman, D.; Harrold, S.; McCaa, J.; Cline, J.; Clark, C.

    2013-10-01

    Regional wind integration studies in the United States require detailed wind power output data at many locations to perform simulations of how the power system will operate under high-penetration scenarios. The wind data sets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as be time synchronized with available load profiles. The Wind Integration National Dataset (WIND) Toolkit described in this paper fulfills these requirements. A wind resource dataset, wind power production time series, and simulated forecasts from a numerical weather prediction model run on a nationwide 2-km grid at 5-min resolution will be made publicly available for more than 110,000 onshore and offshore wind power production sites.

  8. Dataset on force measurements of needle insertions into two ex-vivo human livers.

    Science.gov (United States)

    de Jong, Tonke L; Dankelman, Jenny; van den Dobbelsteen, John J

    2017-04-01

    A needle-tissue interaction experiment has been carried out, by inserting the inner needle of a trocar needle into two ex-vivo human livers. The dataset contains the forces that act on the needle during insertion and retraction into the livers. In addition, a MATLAB code file is included that provides base-level analysis of the data and generates force-position diagrams of the needle insertions. The dataset is available on Mendeley Data (do1i:10.17632/94s7xd9mzt.2), and is made publicly available to enable other researchers to use it for their own research purposes. For further interpretation and discussion of the data, one is referred to the associated research article entitled "PVA matches human liver in needle-tissue interaction" de Jong et al., 2017.

  9. Open source platform for collaborative construction of wearable sensor datasets for human motion analysis and an application for gait analysis.

    Science.gov (United States)

    Llamas, César; González, Manuel A; Hernández, Carmen; Vegas, Jesús

    2016-10-01

    Nearly every practical improvement in modeling human motion is well founded in a properly designed collection of data or datasets. These datasets must be made publicly available for the community could validate and accept them. It is reasonable to concede that a collective, guided enterprise could serve to devise solid and substantial datasets, as a result of a collaborative effort, in the same sense as the open software community does. In this way datasets could be complemented, extended and expanded in size with, for example, more individuals, samples and human actions. For this to be possible some commitments must be made by the collaborators, being one of them sharing the same data acquisition platform. In this paper, we offer an affordable open source hardware and software platform based on inertial wearable sensors in a way that several groups could cooperate in the construction of datasets through common software suitable for collaboration. Some experimental results about the throughput of the overall system are reported showing the feasibility of acquiring data from up to 6 sensors with a sampling frequency no less than 118Hz. Also, a proof-of-concept dataset is provided comprising sampled data from 12 subjects suitable for gait analysis.

  10. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.

  11. A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology.

    Science.gov (United States)

    Kumar, Neeraj; Verma, Ruchika; Sharma, Sanuj; Bhargava, Surabhi; Vahadane, Abhishek; Sethi, Amit

    2017-03-06

    Nuclear segmentation in digital microscopic tissue images can enable extraction of high-quality features for nuclear morphometrics and other analysis in computational pathology. Conventional image processing techniques such as Otsu thresholding and watershed segmentation do not work effectively on challenging cases, such as chromatin-sparse and crowded nuclei. In contrast, machine learning-based segmentation can generalize across various nuclear appearances. However, training machine learning algorithms require datasets of images in which a vast number of nuclei have been annotated. Publicly accessible and annotated datasets, along with widely agreed upon metrics to compare techniques, have catalyzed tremendous innovation and progress on other image classification problems, particularly in object recognition. Inspired by their success, we introduce a large publicly accessible dataset of H&E stained tissue images with more than 21,000 painstakingly annotated nuclear boundaries, whose quality was validated by a medical doctor. Because our dataset is taken from multiple hospitals and includes a diversity of nuclear appearances from several patients, disease states, and organs, techniques trained on it are likely to generalize well and work right out-of-the-box on other H&E stained images. We also propose a new metric to evaluate nuclear segmentation results that penalizes object- and pixel-level errors in a unified manner, unlike previous metrics that penalize only one type of error. We also propose a segmentation technique based on deep learning that lays special emphasis on identifying the nuclear boundaries, including those between the touching or overlapping nuclei, and works well on a diverse set of test images.

  12. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  13. GLEAM v3: updated land evaporation and root-zone soil moisture datasets

    Science.gov (United States)

    Martens, Brecht; Miralles, Diego; Lievens, Hans; van der Schalie, Robin; de Jeu, Richard; Fernández-Prieto, Diego; Verhoest, Niko

    2016-04-01

    Evaporation determines the availability of surface water resources and the requirements for irrigation. In addition, through its impacts on the water, carbon and energy budgets, evaporation influences the occurrence of rainfall and the dynamics of air temperature. Therefore, reliable estimates of this flux at regional to global scales are of major importance for water management and meteorological forecasting of extreme events. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to the limited global coverage of in situ measurements. Remote sensing techniques can help to overcome the lack of ground data. However, evaporation is not directly observable from satellite systems. As a result, recent efforts have focussed on combining the observable drivers of evaporation within process-based models. The Global Land Evaporation Amsterdam Model (GLEAM, www.gleam.eu) estimates terrestrial evaporation based on daily satellite observations of meteorological drivers of terrestrial evaporation, vegetation characteristics and soil moisture. Since the publication of the first version of the model in 2011, GLEAM has been widely applied for the study of trends in the water cycle, interactions between land and atmosphere and hydrometeorological extreme events. A third version of the GLEAM global datasets will be available from the beginning of 2016 and will be distributed using www.gleam.eu as gateway. The updated datasets include separate estimates for the different components of the evaporative flux (i.e. transpiration, bare-soil evaporation, interception loss, open-water evaporation and snow sublimation), as well as variables like the evaporative stress, potential evaporation, root-zone soil moisture and surface soil moisture. A new dataset using SMOS-based input data of surface soil moisture and vegetation optical depth will also be

  14. Public Use Airport Runways, Geographic WGS84, BTS (2006) [public_use_airport_runway_BTS_2006

    Data.gov (United States)

    Louisiana Geographic Information Center — The Public Use Airport Runways database is a geographic dataset of runways in the United States and US territories containing information on the physical...

  15. Estuarine Shoreline and Barrier-Island Sandline Change Assessment Dataset

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Barrier Island and Estuarine Wetland Physical Change Assessment Dataset was created to calibrate and test probability models of barrier island sandline and...

  16. Original Vector Datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets each consist of a folder containing a personal geodatabase of the NHD, and shapefiles used in the HydroDEM process. These files are provided as a...

  17. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  18. BASE MAP DATASET, RIO ARRIBA COUNTY, NEW MEXICO, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control, governmental unit,...

  19. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  20. BASE MAP DATASET,SOLANO COUNTY, CALIFORNIA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  1. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  2. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    This paper presents a new large scale dataset targeting evaluation of local shape descriptors and 3d object recognition algorithms. The dataset consists of point clouds and triangulated meshes from 292 physical scenes taken from 11 different views; a total of approximately 3204 views. Each...... geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  3. National Elevation Dataset (NED) of Rocky Mountain National Park

    Data.gov (United States)

    National Park Service, Department of the Interior — (USGS text) The U.S. Geological Survey has developed a National Elevation Dataset (NED). The NED is a seamless mosaic of best-available elevation data. The...

  4. Native Prairie Adaptive Management (NPAM) Monitoring Tabular Datasets

    Data.gov (United States)

    US Fish and Wildlife Service, Department of the Interior — Four core tabular datasets are collected annually for the NPAM Project. The first is tbl_PlantGoups_Monitoring which includes the belt transect monitoring data for...

  5. Estuarine Shoreline and Barrier-Island Sandline Change Assessment Dataset

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Barrier Island and Estuarine Wetland Physical Change Assessment Dataset was created to calibrate and test probability models of barrier island sandline and...

  6. Food recognition: a new dataset, experiments and results.

    Science.gov (United States)

    Ciocca, Gianluigi; Napoletano, Paolo; Schettini, Raimondo

    2016-12-07

    We propose a new dataset for the evaluation of food recognition algorithms that can be used in dietary monitoring applications. Each image depicts a real canteen tray with dishes and foods arranged in different ways. Each tray contains multiple instances of food classes. The dataset contains 1,027 canteen trays for a total of 3,616 food instances belonging to 73 food classes. The food on the tray images have been manually segmented using carefully drawn polygonal boundaries. We have benchmarked the dataset by designing an automatic tray analysis pipeline that takes a tray image as input, finds the regions of interest, and predicts for each region the corresponding food class. We have experimented three different classification strategies using also several visual descriptors. We achieve about 79% of food and tray recognition accuracy using Convolutional-Neural-Networksbased features. The dataset, as well as the benchmark framework, are available to the research community.

  7. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  8. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  9. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  10. Comparative visualization for parameter studies of dataset series.

    Science.gov (United States)

    Malik, Muhammad Muddassir; Heinzl, Christoph; Eduard Gröller, M

    2010-01-01

    This paper proposes comparison and visualization techniques to carry out parameter studies for the special application area of dimensional measurement using 3D X-ray computed tomography (3DCT). A dataset series is generated by scanning a specimen multiple times by varying parameters of an industrial 3DCT device. A high-resolution series is explored using our planar-reformatting-based visualization system. We present a novel multi-image view and an edge explorer for comparing and visualizing gray values and edges of several datasets simultaneously. Visualization results and quantitative data are displayed side by side. Our technique is scalable and generic. It can be effective in various application areas like parameter studies of imaging modalities and dataset artifact detection. For fast data retrieval and convenient usability, we use bricking of the datasets and efficient data structures. We evaluate the applicability of the proposed techniques in collaboration with our company partners.

  11. Public Speech.

    Science.gov (United States)

    Green, Thomas F.

    1994-01-01

    Discusses the importance of public speech in society, noting the power of public speech to create a world and a public. The paper offers a theory of public speech, identifies types of public speech, and types of public speech fallacies. Two ways of speaking of the public and of public life are distinguished. (SM)

  12. Interacting with Large 3D Datasets on a Mobile Device.

    Science.gov (United States)

    Schultz, Chris; Bailey, Mike

    2016-01-01

    A detail-on-demand scheme can alleviate both memory and GPU pressure on mobile devices caused by volume rendering. This approach allows a user to explore an entire dataset at its native resolution while simultaneously constraining the texture size being rendered to a dimension that does not exceed the processing capabilities of a portable device. This scheme produces higher-quality, more focused images rendered at interactive frame rates, while preserving the native resolution of the dataset.

  13. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    OpenAIRE

    Levin, Barnaby D.A.; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D.; Robinson, Richard D.

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the cu...

  14. Sampling Within k-Means Algorithm to Cluster Large Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  15. A survey of results on mobile phone datasets analysis

    CERN Document Server

    Blondel, Vincent D; Krings, Gautier

    2015-01-01

    In this paper, we review some advances made recently in the study of mobile phone datasets. This area of research has emerged a decade ago, with the increasing availability of large-scale anonymized datasets, and has grown into a stand-alone topic. We will survey the contributions made so far on the social networks that can be constructed with such data, the study of personal mobility, geographical partitioning, urban planning, and help towards development as well as security and privacy issues.

  16. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  17. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  18. CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

    Science.gov (United States)

    Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

    2014-05-01

    The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals

  19. A curated compendium of monocyte transcriptome datasets of relevance to human monocyte immunobiology research [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Darawan Rinchai

    2016-04-01

    Full Text Available Systems-scale profiling approaches have become widely used in translational research settings. The resulting accumulation of large-scale datasets in public repositories represents a critical opportunity to promote insight and foster knowledge discovery. However, resources that can serve as an interface between biomedical researchers and such vast and heterogeneous dataset collections are needed in order to fulfill this potential. Recently, we have developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. This tool can be used to overlay deep molecular phenotyping data with rich contextual information about analytes, samples and studies along with ancillary clinical or immunological profiling data. In this note, we describe a curated compendium of 93 public datasets generated in the context of human monocyte immunological studies, representing a total of 4,516 transcriptome profiles. Datasets were uploaded to an instance of GXB along with study description and sample annotations. Study samples were arranged in different groups. Ranked gene lists were generated based on relevant group comparisons. This resource is publicly available online at http://monocyte.gxbsidra.org/dm3/landing.gsp.

  20. Dataset on species incidence, species richness and forest characteristics in a Danish protected area

    Directory of Open Access Journals (Sweden)

    Adriano Mazziotta

    2016-12-01

    Full Text Available The data presented in this article are related to the research article entitled “Restoring hydrology and old-growth structures in a former production forest: Modelling the long-term effects on biodiversity” (A. Mazziotta, J. Heilmann-Clausen, H. H.Bruun, Ö. Fritz, E. Aude, A.P. Tøttrup [1]. This article describes how the changes induced by restoration actions in forest hydrology and structure alter the biodiversity value of a Danish forest reserve. The field dataset is made publicly available to enable critical or extended analyses.

  1. A High-Resolution, Wave and Current Resource Assessment of Japan: The Web GIS Dataset

    CERN Document Server

    Webb, Adrean; Fujimoto, Wataru; Horiuchi, Kazutoshi; Kiyomatsu, Keiji; Matsuda, Kazuhiro; Miyazawa, Yasumasa; Varlamov, Sergey; Yoshikawa, Jun

    2016-01-01

    The University of Tokyo and JAMSTEC have conducted state-of-the-art wave and current resource assessments to assist with generator site identification and construction in Japan. These assessments are publicly-available and accessible via a web GIS service designed by WebBrain that utilizes TDS and GeoServer software with Leaflet libraries. The web GIS dataset contains statistical analyses of wave power, ocean and tidal current power, ocean temperature power, and other basic physical variables. The data (2D maps, time charts, depth profiles, etc.) is accessed through interactive browser sessions and downloadable files.

  2. A multilayer network dataset of interaction and influence spreading in a virtual world.

    Science.gov (United States)

    Jankowski, Jarosław; Michalski, Radosław; Bródka, Piotr

    2017-10-10

    Presented data contains the record of five spreading campaigns that occurred in a virtual world platform. Users distributed avatars between each other during the campaigns. The processes varied in time and range and were either incentivized or not incentivized. Campaign data is accompanied by events. The data can be used to build a multilayer network to place the campaigns in a wider context. To the best of the authors' knowledge, the study is the first publicly available dataset containing a complete real multilayer social network together, along with five complete spreading processes in it.

  3. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L. Johnson, Timothy; Malof, Jordan M.; Devarajan, Arjun; Zhang, Wuming; M. Collins, Leslie; G. Newell, Richard

    2016-12-01

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  4. Nine martian years of dust optical depth observations: A reference dataset

    Science.gov (United States)

    Montabone, Luca; Forget, Francois; Kleinboehl, Armin; Kass, David; Wilson, R. John; Millour, Ehouarn; Smith, Michael; Lewis, Stephen; Cantor, Bruce; Lemmon, Mark; Wolff, Michael

    2016-07-01

    We present a multi-annual reference dataset of the horizontal distribution of airborne dust from martian year 24 to 32 using observations of the martian atmosphere from April 1999 to June 2015 made by the Thermal Emission Spectrometer (TES) aboard Mars Global Surveyor, the Thermal Emission Imaging System (THEMIS) aboard Mars Odyssey, and the Mars Climate Sounder (MCS) aboard Mars Reconnaissance Orbiter (MRO). Our methodology to build the dataset works by gridding the available retrievals of column dust optical depth (CDOD) from TES and THEMIS nadir observations, as well as the estimates of this quantity from MCS limb observations. The resulting (irregularly) gridded maps (one per sol) were validated with independent observations of CDOD by PanCam cameras and Mini-TES spectrometers aboard the Mars Exploration Rovers "Spirit" and "Opportunity", by the Surface Stereo Imager aboard the Phoenix lander, and by the Compact Reconnaissance Imaging Spectrometer for Mars aboard MRO. Finally, regular maps of CDOD are produced by spatially interpolating the irregularly gridded maps using a kriging method. These latter maps are used as dust scenarios in the Mars Climate Database (MCD) version 5, and are useful in many modelling applications. The two datasets (daily irregularly gridded maps and regularly kriged maps) for the nine available martian years are publicly available as NetCDF files and can be downloaded from the MCD website at the URL: http://www-mars.lmd.jussieu.fr/mars/dust_climatology/index.html

  5. Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin

    Science.gov (United States)

    Yoon, Chun Hong; Demirci, Hasan; Sierra, Raymond G.; Dao, E. Han; Ahmadi, Radman; Aksit, Fulya; Aquila, Andrew L.; Batyuk, Alexander; Ciftci, Halilibrahim; Guillet, Serge; Hayes, Matt J.; Hayes, Brandon; Lane, Thomas J.; Liang, Meng; Lundström, Ulf; Koglin, Jason E.; Mgbam, Paul; Rao, Yashas; Rendahl, Theodore; Rodriguez, Evan; Zhang, Lindsey; Wakatsuki, Soichi; Boutet, Sébastien; Holton, James M.; Hunter, Mark S.

    2017-04-01

    We provide a detailed description of selenobiotinyl-streptavidin (Se-B SA) co-crystal datasets recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS) for selenium single-wavelength anomalous diffraction (Se-SAD) structure determination. Se-B SA was chosen as the model system for its high affinity between biotin and streptavidin where the sulfur atom in the biotin molecule (C10H16N2O3S) is substituted with selenium. The dataset was collected at three different transmissions (100, 50, and 10%) using a serial sample chamber setup which allows for two sample chambers, a front chamber and a back chamber, to operate simultaneously. Diffraction patterns from Se-B SA were recorded to a resolution of 1.9 Å. The dataset is publicly available through the Coherent X-ray Imaging Data Bank (CXIDB) and also on LCLS compute nodes as a resource for research and algorithm development.

  6. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification.

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L Johnson, Timothy; Malof, Jordan M; Devarajan, Arjun; Zhang, Wuming; M Collins, Leslie; G Newell, Richard

    2016-12-06

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  7. The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

    Directory of Open Access Journals (Sweden)

    Pepper Stuart D

    2008-09-01

    Full Text Available Abstract Background The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. Results A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107, from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. Conclusion Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.

  8. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  9. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  10. Determining the Real Data Completeness of a Relational Dataset

    Institute of Scientific and Technical Information of China (English)

    Yong-Nan Liu; Jian-Zhong Li; Zhao-Nian Zou

    2016-01-01

    Low quality of data is a serious problem in the new era of big data, which can severely reduce the usability of data, mislead or bias the querying, analyzing and mining, and leads to huge loss. Incomplete data is common in low quality data, and it is necessary to determine the data completeness of a dataset to provide hints for follow-up operations on it. Little existing work focuses on the completeness of a dataset, and such work views all missing values as unknown values. In this paper, we study how to determine real data completeness of a relational dataset. By taking advantage of given functional dependencies, we aim to determine some missing attribute values by other tuples and capture the really missing attribute cells. We propose a data completeness model, formalize the problem of determining the real data completeness of a relational dataset, and give a lower bound of the time complexity of this problem. Two optimal algorithms to determine the data completeness of a dataset for different cases are proposed. We empirically show the effectiveness and the scalability of our algorithms on both real-world data and synthetic data.

  11. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction.

    Science.gov (United States)

    Browne, Fiona; Wang, Haiying; Zheng, Huiru; Azuaje, Francisco

    2009-01-26

    Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases) and non-interacting proteins (negative cases) are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. GRIP (Gold Reference dataset constructor from Information on Protein complexes) is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  12. GRIP: A web-based system for constructing Gold Standard datasets for protein-protein interaction prediction

    Directory of Open Access Journals (Sweden)

    Zheng Huiru

    2009-01-01

    Full Text Available Abstract Background Information about protein interaction networks is fundamental to understanding protein function and cellular processes. Interaction patterns among proteins can suggest new drug targets and aid in the design of new therapeutic interventions. Efforts have been made to map interactions on a proteomic-wide scale using both experimental and computational techniques. Reference datasets that contain known interacting proteins (positive cases and non-interacting proteins (negative cases are essential to support computational prediction and validation of protein-protein interactions. Information on known interacting and non interacting proteins are usually stored within databases. Extraction of these data can be both complex and time consuming. Although, the automatic construction of reference datasets for classification is a useful resource for researchers no public resource currently exists to perform this task. Results GRIP (Gold Reference dataset constructor from Information on Protein complexes is a web-based system that provides researchers with the functionality to create reference datasets for protein-protein interaction prediction in Saccharomyces cerevisiae. Both positive and negative cases for a reference dataset can be extracted, organised and downloaded by the user. GRIP also provides an upload facility whereby users can submit proteins to determine protein complex membership. A search facility is provided where a user can search for protein complex information in Saccharomyces cerevisiae. Conclusion GRIP is developed to retrieve information on protein complex, cellular localisation, and physical and genetic interactions in Saccharomyces cerevisiae. Manual construction of reference datasets can be a time consuming process requiring programming knowledge. GRIP simplifies and speeds up this process by allowing users to automatically construct reference datasets. GRIP is free to access at http://rosalind.infj.ulst.ac.uk/GRIP/.

  13. Testing the AUDI2000 colour-difference formula for solid colours using some visual datasets with usefulness to automotive industry

    Science.gov (United States)

    Martínez-García, Juan; Melgosa, Manuel; Gómez-Robledo, Luis; Li, Changjun; Huang, Min; Liu, Haoxue; Cui, Guihua; Luo, M. Ronnier; Dauser, Thomas

    2013-11-01

    Colour-difference formulas are tools employed in colour industries for objective pass/fail decisions of manufactured products. These objective decisions are based on instrumental colour measurements which must reliably predict the subjective colour-difference evaluations performed by observers' panels. In a previous paper we have tested the performance of different colour-difference formulas using the datasets employed at the development of the last CIErecommended colour-difference formula CIEDE2000, and we found that the AUDI2000 colour-difference formula for solid (homogeneous) colours performed reasonably well, despite the colour pairs in these datasets were not similar to those typically employed in the automotive industry (CIE Publication x038:2013, 465-469). Here we have tested again AUDI2000 together with 11 advanced colour-difference formulas (CIELUV, CIELAB, CMC, BFD, CIE94, CIEDE2000, CAM02-UCS, CAM02-SCD, DIN99d, DIN99b, OSA-GP-Euclidean) for three visual datasets we may consider particularly useful to the automotive industry because of different reasons: 1) 828 metallic colour pairs used to develop the highly reliable RIT-DuPont dataset (Color Res. Appl. 35, 274-283, 2010); 2) printed samples conforming 893 colour pairs with threshold colour differences (J. Opt. Soc. Am. A 29, 883-891, 2012); 3) 150 colour pairs in a tolerance dataset proposed by AUDI. To measure the relative merits of the different tested colour-difference formulas, we employed the STRESS index (J. Opt. Soc. Am. A 24, 1823-1829, 2007), assuming a 95% confidence level. For datasets 1) and 2), AUDI2000 was in the group of the best colour-difference formulas with no significant differences with respect to CIE94, CIEDE2000, CAM02-UCS, DIN99b and DIN99d formulas. For dataset 3) AUDI2000 provided the best results, being statistically significantly better than all other tested colour-difference formulas.

  14. Assessing Spatial and Attribute Errors of Input Data in Large National Datasets for use in Population Distribution Models

    Energy Technology Data Exchange (ETDEWEB)

    Patterson, Lauren A [ORNL; Urban, Marie L [ORNL; Myers, Aaron T [ORNL; Bhaduri, Budhendra L [ORNL; Bright, Eddie A [ORNL; Coleman, Phil R [ORNL

    2007-01-01

    Geospatial technologies and digital data have developed and disseminated rapidly in conjunction with increasing computing performance and internet availability. The ability to store and transmit large datasets has encouraged the development of national datasets in geospatial format. National datasets are used by numerous agencies for analysis and modeling purposes because these datasets are standardized, and are considered to be of acceptable accuracy. At Oak Ridge National Laboratory, a national population model incorporating multiple ancillary variables was developed and one of the inputs required is a school database. This paper examines inaccuracies present within two national school datasets, TeleAtlas North America (TANA) and National Center of Education Statistics (NCES). Schools are an important component of the population model, because they serve as locations containing dense clusters of vulnerable populations. It is therefore essential to validate the quality of the school input data, which was made possible by increasing national coverage of high resolution imagery. Schools were also chosen since a 'real-world' representation of K-12 schools for the Philadelphia School District was produced; thereby enabling 'ground-truthing' of the national datasets. Analyses found the national datasets not standardized and incomplete, containing 76 to 90% of existing schools. The temporal accuracy of enrollment values of updating national datasets resulted in 89% inaccuracy to match 2003 data. Spatial rectification was required for 87% of the NCES points, of which 58% of the errors were attributed to the geocoding process. Lastly, it was found that by combining the two national datasets together, the resultant dataset provided a more useful and accurate solution. Acknowledgment Prepared by Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, Tennessee 37831-6285, managed by UT-Battelle, LLC for the U. S. Department of Energy undercontract no

  15. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias.

    Directory of Open Access Journals (Sweden)

    Thomas Pfeiffer

    Full Text Available Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that address similar questions. A major obstacle for meta-analyses arises from biases in reporting. In particular, it is speculated that findings which do not achieve formal statistical significance are less likely reported than statistically significant findings. Moreover, the patterns of bias can be complex and may also depend on the timing of the research results and their relationship with previously published work. In this paper, we present an approach that is specifically designed to analyze large-scale datasets on published results. Such datasets are currently emerging in diverse research fields, particularly in molecular medicine. We use our approach to investigate a dataset on Alzheimer's disease (AD that covers 1167 results from case-control studies on 102 genetic markers. We observe that initial studies on a genetic marker tend to be substantially more biased than subsequent replications. The chances for initial, statistically non-significant results to be published are estimated to be about 44% (95% CI, 32% to 63% relative to statistically significant results, while statistically non-significant replications have almost the same chance to be published as statistically significant replications (84%; 95% CI, 66% to 107%. Early replications tend to be biased against initial findings, an observation previously termed Proteus phenomenon: The chances for non-significant studies going in the same direction as the initial result are estimated to be lower than the chances for non-significant studies opposing the initial result (73%; 95% CI, 55% to 96%. Such dynamic patterns in bias are difficult to capture by conventional methods, where typically simple publication bias is assumed to operate. Our approach captures and corrects for complex dynamic patterns of bias, and thereby helps generating conclusions from published results that are more robust

  16. Benchmark three-dimensional eye-tracking dataset for visual saliency prediction on stereoscopic three-dimensional video

    Science.gov (United States)

    Banitalebi-Dehkordi, Amin; Nasiopoulos, Eleni; Pourazad, Mahsa T.; Nasiopoulos, Panos

    2016-01-01

    Visual attention models (VAMs) predict the location of image or video regions that are most likely to attract human attention. Although saliency detection is well explored for two-dimensional (2-D) image and video content, there have been only a few attempts made to design three-dimensional (3-D) saliency prediction models. Newly proposed 3-D VAMs have to be validated over large-scale video saliency prediction datasets, which also contain results of eye-tracking information. There are several publicly available eye-tracking datasets for 2-D image and video content. In the case of 3-D, however, there is still a need for large-scale video saliency datasets for the research community for validating different 3-D VAMs. We introduce a large-scale dataset containing eye-tracking data collected from 61 stereoscopic 3-D videos (and also 2-D versions of those), and 24 subjects participated in a free-viewing test. We evaluate the performance of the existing saliency detection methods over the proposed dataset. In addition, we created an online benchmark for validating the performance of the existing 2-D and 3-D VAMs and facilitating the addition of new VAMs to the benchmark. Our benchmark currently contains 50 different VAMs.

  17. Modeling the latent dimensions of multivariate signaling datasets

    Science.gov (United States)

    Jensen, Karin J.; Janes, Kevin A.

    2012-08-01

    Cellular signal transduction is coordinated by modifications of many proteins within cells. Protein modifications are not independent, because some are connected through shared signaling cascades and others jointly converge upon common cellular functions. This coupling creates a hidden structure within a signaling network that can point to higher level organizing principles of interest to systems biology. One can identify important covariations within large-scale datasets by using mathematical models that extract latent dimensions—the key structural elements of a measurement set. In this paper, we introduce two principal component-based methods for identifying and interpreting latent dimensions. Principal component analysis provides a starting point for unbiased inspection of the major sources of variation within a dataset. Partial least-squares regression reorients these dimensions toward a specific hypothesis of interest. Both approaches have been used widely in studies of cell signaling, and they should be standard analytical tools once highly multivariate datasets become straightforward to accumulate.

  18. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy

    CERN Document Server

    Levin, Barnaby D A; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruna, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-01-01

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180{\\deg} tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of p...

  19. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  20. Robust Machine Learning Applied to Terascale Astronomical Datasets

    CERN Document Server

    Ball, Nicholas M; Myers, Adam D

    2008-01-01

    We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limit...

  1. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  2. ESTATE: Strategy for Exploring Labeled Spatial Datasets Using Association Analysis

    Science.gov (United States)

    Stepinski, Tomasz F.; Salazar, Josue; Ding, Wei; White, Denis

    We propose an association analysis-based strategy for exploration of multi-attribute spatial datasets possessing naturally arising classification. Proposed strategy, ESTATE (Exploring Spatial daTa Association patTErns), inverts such classification by interpreting different classes found in the dataset in terms of sets of discriminative patterns of its attributes. It consists of several core steps including discriminative data mining, similarity between transactional patterns, and visualization. An algorithm for calculating similarity measure between patterns is the major original contribution that facilitates summarization of discovered information and makes the entire framework practical for real life applications. Detailed description of the ESTATE framework is followed by its application to the domain of ecology using a dataset that fuses the information on geographical distribution of biodiversity of bird species across the contiguous United States with distributions of 32 environmental variables across the same area.

  3. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ's. Using the standard errors of estimates of ρ's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  4. Modelling and analysis of turbulent datasets using ARMA processes

    CERN Document Server

    Faranda, Davide; Dubrulle, Bérèngere; Daviaud, François; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe

    2014-01-01

    We introduce a novel way to extract information from turbulent datasets by applying an ARMA statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new intermittency parameter $\\Upsilon$ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von K\\'arm\\'an swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the intermittency parameter is highest in regions where shear layer vortices are present, t...

  5. Georeferenced energy information system integrated of energetic matrix of Sao Paulo state from 2005 to 2035; Sistema de informacoes energeticas georreferenciadas integrado a matriz energetica do estado de Sao Paulo: 2005-2035

    Energy Technology Data Exchange (ETDEWEB)

    Alvares, Joao Malta [IX Consultoria e Representacoes Ltda, Itajuba, MG (Brazil); Universidade Federal de Itajuba (UNIFEI), MG (Brazil)

    2010-07-01

    A georeferenced information system energy or simply SIEG, is designed to integrate into the energy matrix of Sao Paulo from 2005 to 2035. Being an innovative request made by the Department of Sanitation and Energy of the state, this system would have the purpose to collect and aggregate information and data from several themes, relating this content in a geographic location spatialized. The main focus of the system is the analysis of the energy sector as a whole, from generation to final consumption, through all phases such as transmission and distribution. The energy data would also be crossed with various themes of support, contributing to the development of numerous reviews and generating sound conclusions. Issues such as environment, socio-economics, infrastructure, interconnected sectors, geographical conditions and other information could be entered, viewed and linked to the system. The SIEG is also a facilitator for planning and managing the energy sector with forecast models in possible future situations. (author)

  6. Public Values

    DEFF Research Database (Denmark)

    Beck Jørgensen, Torben; Rutgers, Mark R.

    2015-01-01

    administration is approached in terms of processes guided or restricted by public values and as public value creating: public management and public policy-making are both concerned with establishing, following and realizing public values. To study public values a broad perspective is needed. The article suggest......This article provides the introduction to a symposium on contemporary public values research. It is argued that the contribution to this symposium represent a Public Values Perspective, distinct from other specific lines of research that also use public value as a core concept. Public...... a research agenda for this encompasing kind of public values research. Finally the contributions to the symposium are introduced....

  7. A synthetic Longitudinal Study dataset for England and Wales.

    Science.gov (United States)

    Dennett, Adam; Norman, Paul; Shelton, Nicola; Stuchbury, Rachel

    2016-12-01

    This article describes the new synthetic England and Wales Longitudinal Study 'spine' dataset designed for teaching and experimentation purposes. In the United Kingdom, there exist three Census-based longitudinal micro-datasets, known collectively as the Longitudinal Studies. The England and Wales Longitudinal Study (LS) is a 1% sample of the population of England and Wales (around 500,000 individuals), linking individual person records from the 1971 to 2011 Censuses. The synthetic data presented contains a similar number of individuals to the original data and accurate longitudinal transitions between 2001 and 2011 for key demographic variables, but unlike the original data, is open access.

  8. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  9. An ensemble approach for feature selection of Cyber Attack Dataset

    CERN Document Server

    Singh, Shailendra

    2009-01-01

    Feature selection is an indispensable preprocessing step when mining huge datasets that can significantly improve the overall system performance. Therefore in this paper we focus on a hybrid approach of feature selection. This method falls into two phases. The filter phase select the features with highest information gain and guides the initialization of search process for wrapper phase whose output the final feature subset. The final feature subsets are passed through the Knearest neighbor classifier for classification of attacks. The effectiveness of this algorithm is demonstrated on DARPA KDDCUP99 cyber attack dataset.

  10. ArcHydro 8-digit HUC datasets for Hawaii StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets each consist of a workspace (folder) containing a collection of gridded datasets plus a personal geodatabase containing several vector datasets. These...

  11. ArcHydro 8-digit HUC datasets for Idaho StreamStats

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — These datasets consist of a workspace (folder) containing a collection of gridded datasets plus a personal geodatabase containing several vector datasets. These...

  12. Using kittens to unlock photo-sharing website datasets for environmental applications

    Science.gov (United States)

    Gascoin, Simon

    2016-04-01

    Mining photo-sharing websites is a promising approach to complement in situ and satellite observations of the environment, however a challenge is to deal with the large degree of noise inherent to online social datasets. Here I explored the value of the Flickr image hosting website database to monitor the snow cover in the Pyrenees. Using the Flickr application programming interface (API) I queried all the public images metadata tagged at least with one of the following words: "snow", "neige", "nieve", "neu" (snow in French, Spanish and Catalan languages). The search was limited to the geo-tagged pictures taken in the Pyrenees area. However, the number of public pictures available in the Flickr database for a given time interval depends on several factors, including the Flickr website popularity and the development of digital photography. Thus, I also searched for all Flickr images tagged with "chat", "gat" or "gato" (cat in French, Spanish and Catalan languages). The tag "cat" was not considered in order to exclude the results from North America where Flickr got popular earlier than in Europe. The number of "cat" images per month was used to fit a model of the number of images uploaded in Flickr with time. This model was used to remove this trend in the numbers of snow-tagged photographs. The resulting time series was compared to a time series of the snow cover area derived from the MODIS satellite over the same region. Both datasets are well correlated; in particular they exhibit the same seasonal evolution, although the inter-annual variabilities are less similar. I will also discuss which other factors may explain the main discrepancies in order to further decrease the noise in the Flickr dataset.

  13. Free internet datasets for streamflow modelling using SWAT in the Johor river basin, Malaysia

    Science.gov (United States)

    Tan, M. L.

    2014-02-01

    Streamflow modelling is a mathematical computational approach that represents terrestrial hydrology cycle digitally and is used for water resources assessment. However, such modelling endeavours require a large amount of data. Generally, governmental departments produce and maintain these data sets which make it difficult to obtain this data due to bureaucratic constraints. In some countries, the availability and quality of geospatial and climate datasets remain a critical issue due to many factors such as lacking of ground station, expertise, technology, financial support and war time. To overcome this problem, this research used public domain datasets from the Internet as "input" to a streamflow model. The intention is simulate daily and monthly streamflow of the Johor River Basin in Malaysia. The model used is the Soil and Water Assessment Tool (SWAT). As input free data including a digital elevation model (DEM), land use information, soil and climate data were used. The model was validated by in-situ streamflow information obtained from Rantau Panjang station for the year 2006. The coefficient of determination and Nash-Sutcliffe efficiency were 0.35/0.02 for daily simulated streamflow and 0.92/0.21 for monthly simulated streamflow, respectively. The results show that free data can provide a better simulation at a monthly scale compared to a daily basis in a tropical region. A sensitivity analysis and calibration procedure should be conducted in order to maximize the "goodness-of-fit" between simulated and observed streamflow. The application of Internet datasets promises an acceptable performance of streamflow modelling. This research demonstrates that public domain data is suitable for streamflow modelling in a tropical river basin within acceptable accuracy.

  14. Public Safety Transmitter Towers, Public safety towers controlled by the FCC and combined with all other types of towers., Published in 2006, Johnson County AIMS.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Public Safety Transmitter Towers dataset, was produced all or in part from Other information as of 2006. It is described as 'Public safety towers controlled by...

  15. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, T.; Belka, C.; Bosch, W.; Brink, C. Van den; Brunner, T.; Budach, V.; Büttner, D.; Debus, J.; Dekker, A.; Grau, C.; Gulliford, S.; Hurkmans, C.; Just, U.; Krause, M.; Lambin, P.; Langendijk, J.A.; Lewensohn, R.; Luhr, A.; Maingon, P.; Masucci, M.; Niyazi, M.; Poortmans, P.M.P.; Simon, M.; Schmidberger, H.; Spezi, E.; Stuschke, M.; Valentini, V.; Verheij, M.; Whitfield, G.; Zackrisson, B.; Zips, D.; Baumann, M.

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research

  16. Creating a data exchange strategy for radiotherapy research : Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Buettner, Daniel; Debus, Juergen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A.; Lewensohn, Rolf; Luehr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Bjoern; Zips, Daniel; Baumann, Michael

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate 'translational research

  17. Creating a data exchange strategy for radiotherapy research : Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Buettner, Daniel; Debus, Juergen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A.; Lewensohn, Rolf; Luehr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Bjoern; Zips, Daniel; Baumann, Michael

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate 'translational research

  18. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    NARCIS (Netherlands)

    Skripcak, T.; Belka, C.; Bosch, W.; Brink, C. Van den; Brunner, T.; Budach, V.; Büttner, D.; Debus, J.; Dekker, A.; Grau, C.; Gulliford, S.; Hurkmans, C.; Just, U.; Krause, M.; Lambin, P.; Langendijk, J.A.; Lewensohn, R.; Luhr, A.; Maingon, P.; Masucci, M.; Niyazi, M.; Poortmans, P.M.P.; Simon, M.; Schmidberger, H.; Spezi, E.; Stuschke, M.; Valentini, V.; Verheij, M.; Whitfield, G.; Zackrisson, B.; Zips, D.; Baumann, M.

    2014-01-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research

  19. Creating a data exchange strategy for radiotherapy research: towards federated databases and anonymised public datasets.

    Science.gov (United States)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter; Brink, Carsten; Brunner, Thomas; Budach, Volker; Büttner, Daniel; Debus, Jürgen; Dekker, Andre; Grau, Cai; Gulliford, Sarah; Hurkmans, Coen; Just, Uwe; Krause, Mechthild; Lambin, Philippe; Langendijk, Johannes A; Lewensohn, Rolf; Lühr, Armin; Maingon, Philippe; Masucci, Michele; Niyazi, Maximilian; Poortmans, Philip; Simon, Monique; Schmidberger, Heinz; Spezi, Emiliano; Stuschke, Martin; Valentini, Vincenzo; Verheij, Marcel; Whitfield, Gillian; Zackrisson, Björn; Zips, Daniel; Baumann, Michael

    2014-12-01

    Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research in radiation therapy and oncology. The exchange of study data is one of the fundamental principles behind data aggregation and data mining. The possibilities of reproducing the original study results, performing further analyses on existing research data to generate new hypotheses or developing computational models to support medical decisions (e.g. risk/benefit analysis of treatment options) represent just a fraction of the potential benefits of medical data-pooling. Distributed machine learning and knowledge exchange from federated databases can be considered as one beyond other attractive approaches for knowledge generation within "Big Data". Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning data, are deemed to be fundamental elements for large multi-centre studies in the field of radiation therapy and oncology. To fully utilise the captured medical information, the study data have to be more than just an electronic version of a traditional (un-modifiable) paper CRF. Challenges that have to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data exchange in the field of radiation therapy and oncology.

  20. Creating a data exchange strategy for radiotherapy research: Towards federated databases and anonymised public datasets

    DEFF Research Database (Denmark)

    Skripcak, Tomas; Belka, Claus; Bosch, Walter

    2014-01-01

    for knowledge generation within “Big Data“. Data interoperability between research institutions should be the major concern behind a wider collaboration. Information captured in electronic patient records (EPRs) and study case report forms (eCRFs), linked together with medical imaging and treatment planning...... to be addressed are data interoperability, utilisation of standards, data quality and privacy concerns, data ownership, rights to publish, data pooling architecture and storage. This paper discusses a framework for conceptual packages of ideas focused on a strategic development for international research data......Disconnected cancer research data management and lack of information exchange about planned and ongoing research are complicating the utilisation of internationally collected medical information for improving cancer patient care. Rapidly collecting/pooling data can accelerate translational research...

  1. Datasets in Gene Expression Omnibus used in the study ORD-019001: Compensatory changes in CYP expression in three different toxicology mouse models: CAR-null, Cyp3a-null, and Cyp2b9/10/13-null mice.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Accession numbers of microarray data sets used in the analysis. This dataset is associated with the following publication: Kumar, R., L. Mota, E. Litoff, J. Rooney,...

  2. Dataset de contenidos musicales de video, basado en emociones

    Directory of Open Access Journals (Sweden)

    Luis Alejandro Solarte Moncayo

    2016-07-01

    Full Text Available Agilizar el acceso al contenido, disminuyendo los tiempos de navegación por los catálogos multimedia, es uno de los retos del servicio de video bajo demanda (VoD, el cual es consecuencia del incremento de la cantidad de contenidos en las redes actuales. En este artículo, se describe el proceso de conformación de un dataset de videos musicales. Este dataset fue usado para el diseño e implementación de un servicio de VoD, el cual busca mejorar el acceso al contenido, mediante la clasificación musical de emociones. Así, en este trabajo se presenta la adaptación de un modelo de clasificación de emociones a partir del modelo de arousal-valence. Además, se describe el desarrollo de una herramienta Java para la clasificación de contenidos, la cual fue usada en la conformación del dataset. Finalmente, con el propósito de evaluar el dataset construido, se muestra la estructura funcional del servicio de VoD desarrollado.

  3. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  4. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  5. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  6. Div400: a social image retrieval result diversification dataset

    DEFF Research Database (Denmark)

    Ionescu, Bogdan; Radu, Anca-Livia; Menendez Blanco, Maria

    2014-01-01

    In this paper we propose a new dataset, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machinecrowd ...

  7. Dataset - Droevendaal, Rolde and Colijnsplaat, 1996-2003

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Booij et al., 2001; Slabbekoorn, 2002; Slabbekoorn, 2003; Van Evert et al., 2011; Van Geel, 2003; Van Geel and Wijnholds, 2003; Van Geel et al., 2004). The data are presented as an SQL dump of

  8. Image dataset for testing search and detection models

    NARCIS (Netherlands)

    Toet, A.; Bijl, P.; Valeton, J.M.

    2001-01-01

    The TNO Human Factors Searchû2 image dataset consists of: a set of 44 high-resolution digital color images of different complex natural scenes, the ground truth corresponding to each of these scenes, and the results of psychophysical experiments on each of these images. The images in the Searchû2 da

  9. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based o...

  10. Automated single particle detection and tracking for large microscopy datasets.

    Science.gov (United States)

    Wilson, Rhodri S; Yang, Lei; Dun, Alison; Smyth, Annya M; Duncan, Rory R; Rickman, Colin; Lu, Weiping

    2016-05-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

  11. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  12. A global experimental dataset for assessing grain legume production

    Science.gov (United States)

    Cernay, Charles; Pelzer, Elise; Makowski, David

    2016-09-01

    Grain legume crops are a significant component of the human diet and animal feed and have an important role in the environment, but the global diversity of agricultural legume species is currently underexploited. Experimental assessments of grain legume performances are required, to identify potential species with high yields. Here, we introduce a dataset including results of field experiments published in 173 articles. The selected experiments were carried out over five continents on 39 grain legume species. The dataset includes measurements of grain yield, aerial biomass, crop nitrogen content, residual soil nitrogen content and water use. When available, yields for cereals and oilseeds grown after grain legumes in the crop sequence are also included. The dataset is arranged into a relational database with nine structured tables and 198 standardized attributes. Tillage, fertilization, pest and irrigation management are systematically recorded for each of the 8,581 crop*field site*growing season*treatment combinations. The dataset is freely reusable and easy to update. We anticipate that it will provide valuable information for assessing grain legume production worldwide.

  13. Determining Scale-dependent Patterns in Spatial and Temporal Datasets

    Science.gov (United States)

    Roy, A.; Perfect, E.; Mukerji, T.; Sylvester, L.

    2016-12-01

    Spatial and temporal datasets of interest to Earth scientists often contain plots of one variable against another, e.g., rainfall magnitude vs. time or fracture aperture vs. spacing. Such data, comprised of distributions of events along a transect / timeline along with their magnitudes, can display persistent or antipersistent trends, as well as random behavior, that may contain signatures of underlying physical processes. Lacunarity is a technique that was originally developed for multiscale analysis of data. In a recent study we showed that lacunarity can be used for revealing changes in scale-dependent patterns in fracture spacing data. Here we present a further improvement in our technique, with lacunarity applied to various non-binary datasets comprised of event spacings and magnitudes. We test our technique on a set of four synthetic datasets, three of which are based on an autoregressive model and have magnitudes at every point along the "timeline" thus representing antipersistent, persistent, and random trends. The fourth dataset is made up of five clusters of events, each containing a set of random magnitudes. The concept of lacunarity ratio, LR, is introduced; this is the lacunarity of a given dataset normalized to the lacunarity of its random counterpart. It is demonstrated that LR can successfully delineate scale-dependent changes in terms of antipersistence and persistence in the synthetic datasets. This technique is then applied to three different types of data: a hundred-year rainfall record from Knoxville, TN, USA, a set of varved sediments from Marca Shale, and a set of fracture aperture and spacing data from NE Mexico. While the rainfall data and varved sediments both appear to be persistent at small scales, at larger scales they both become random. On the other hand, the fracture data shows antipersistence at small scale (within cluster) and random behavior at large scales. Such differences in behavior with respect to scale-dependent changes in

  14. Comparison and validation of gridded precipitation datasets for Spain

    Science.gov (United States)

    Quintana-Seguí, Pere; Turco, Marco; Míguez-Macho, Gonzalo

    2016-04-01

    In this study, two gridded precipitation datasets are compared and validated in Spain: the recently developed SAFRAN dataset and the Spain02 dataset. These are validated using rain gauges and they are also compared to the low resolution ERA-Interim reanalysis. The SAFRAN precipitation dataset has been recently produced, using the SAFRAN meteorological analysis, which is extensively used in France (Durand et al. 1993, 1999; Quintana-Seguí et al. 2008; Vidal et al., 2010) and which has recently been applied to Spain (Quintana-Seguí et al., 2015). SAFRAN uses an optimal interpolation (OI) algorithm and uses all available rain gauges from the Spanish State Meteorological Agency (Agencia Estatal de Meteorología, AEMET). The product has a spatial resolution of 5 km and it spans from September 1979 to August 2014. This dataset has been produced mainly to be used in large scale hydrological applications. Spain02 (Herrera et al. 2012, 2015) is another high quality precipitation dataset for Spain based on a dense network of quality-controlled stations and it has different versions at different resolutions. In this study we used the version with a resolution of 0.11°. The product spans from 1971 to 2010. Spain02 is well tested and widely used, mainly, but not exclusively, for RCM model validation and statistical downscliang. ERA-Interim is a well known global reanalysis with a spatial resolution of ˜79 km. It has been included in the comparison because it is a widely used product for continental and global scale studies and also in smaller scale studies in data poor countries. Thus, its comparison with higher resolution products of a data rich country, such as Spain, allows us to quantify the errors made when using such datasets for national scale studies, in line with some of the objectives of the EU-FP7 eartH2Observe project. The comparison shows that SAFRAN and Spain02 perform similarly, even though their underlying principles are different. Both products are largely

  15. Regional climate change study requires new temperature datasets

    Science.gov (United States)

    Wang, Kaicun; Zhou, Chunlüe

    2017-04-01

    Analyses of global mean air temperature (Ta), i. e., NCDC GHCN, GISS, and CRUTEM4, are the fundamental datasets for climate change study and provide key evidence for global warming. All of the global temperature analyses over land are primarily based on meteorological observations of the daily maximum and minimum temperatures (Tmax and Tmin) and their averages (T2) because in most weather stations, the measurements of Tmax and Tmin may be the only choice for a homogenous century-long analysis of mean temperature. Our studies show that these datasets are suitable for long-term global warming studies. However, they may have substantial biases in quantifying local and regional warming rates, i.e., with a root mean square error of more than 25% at 5 degree grids. From 1973 to 1997, the current datasets tend to significantly underestimate the warming rate over the central U.S. and overestimate the warming rate over the northern high latitudes. Similar results revealed during the period 1998-2013, the warming hiatus period, indicate the use of T2 enlarges the spatial contrast of temperature trends. This is because T2 over land only samples air temperature twice daily and cannot accurately reflect land-atmosphere and incoming radiation variations in the temperature diurnal cycle. For better regional climate change detection and attribution, we suggest creating new global mean air temperature datasets based on the recently available high spatiotemporal resolution meteorological observations, i.e., daily four observations weather station since 1960s. These datasets will not only help investigate dynamical processes on temperature variances but also help better evaluate the reanalyzed and modeled simulations of temperature and make some substantial improvements for other related climate variables in models, especially over regional and seasonal aspects.

  16. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  17. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2012-07-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly lower in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx while for CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution of emissions in certain regions for the Aero2k dataset.

  18. Predicting MHC class I epitopes in large datasets

    Directory of Open Access Journals (Sweden)

    Lengauer Thomas

    2010-02-01

    Full Text Available Abstract Background Experimental screening of large sets of peptides with respect to their MHC binding capabilities is still very demanding due to the large number of possible peptide sequences and the extensive polymorphism of the MHC proteins. Therefore, there is significant interest in the development of computational methods for predicting the binding capability of peptides to MHC molecules, as a first step towards selecting peptides for actual screening. Results We have examined the performance of four diverse MHC Class I prediction methods on comparatively large HLA-A and HLA-B allele peptide binding datasets extracted from the Immune Epitope Database and Analysis resource (IEDB. The chosen methods span a representative cross-section of available methodology for MHC binding predictions. Until the development of IEDB, such an analysis was not possible, as the available peptide sequence datasets were small and spread out over many separate efforts. We tested three datasets which differ in the IC50 cutoff criteria used to select the binders and non-binders. The best performance was achieved when predictions were performed on the dataset consisting only of strong binders (IC50 less than 10 nM and clear non-binders (IC50 greater than 10,000 nM. In addition, robustness of the predictions was only achieved for alleles that were represented with a sufficiently large (greater than 200, balanced set of binders and non-binders. Conclusions All four methods show good to excellent performance on the comprehensive datasets, with the artificial neural networks based method outperforming the other methods. However, all methods show pronounced difficulties in correctly categorizing intermediate binders.

  19. Public participation in GIS via mobile applications

    Science.gov (United States)

    Brovelli, Maria Antonia; Minghini, Marco; Zamboni, Giorgio

    2016-04-01

    Driven by the recent trends in the GIS domain including Volunteered Geographic Information, geo-crowdsourcing and citizen science, and fostered by the constant technological advances, collection and dissemination of geospatial information by ordinary people has become commonplace. However, applications involving user-generated geospatial content show dramatically diversified patterns in terms of incentive, type and level of participation, purpose of the activity, data/metadata provided and data quality. This study contributes to this heterogeneous context by investigating public participation in GIS within the field of mobile-based applications. Results not only show examples of how to technically build GIS applications enabling user collection and interaction with geospatial data, but they also draw conclusions about the methods and needs of public participation. We describe three projects with different scales and purposes in the context of urban monitoring and planning, and tourism valorisation. In each case, an open source architecture is used, allowing users to exploit their mobile devices to collect georeferenced information. This data is then made publicly available on specific Web viewers. Analysis of user involvement in these projects provides insights related to participation patterns which suggests some generalized conclusions.

  20. Constraints on 3D fault and fracture distribution in layered volcanic- volcaniclastic sequences from terrestrial LIDAR datasets: Faroe Islands

    Science.gov (United States)

    Raithatha, Bansri; McCaffrey, Kenneth; Walker, Richard; Brown, Richard; Pickering, Giles

    2013-04-01

    Hydrocarbon reservoirs commonly contain an array of fine-scale structures that control fluid flow in the subsurface, such as polyphase fracture networks and small-scale fault zones. These structures are unresolvable using seismic imaging and therefore outcrop-based studies have been used as analogues to characterize fault and fracture networks and assess their impact on fluid flow in the subsurface. To maximize recovery and enhance production, it is essential to understand the geometry, physical properties, and distribution of these structures in 3D. Here we present field data and terrestrial LIDAR-derived 3D, photo-realistic virtual outcrops of fault zones at a range of displacement scales (0.001- 4.5 m) within a volcaniclastic sand- and basaltic lava unit sequence in the Faroe Islands. Detailed field observations were used to constrain the virtual outcrop dataset, and a workflow has been developed to build a discrete fracture network (DFN) models in GOCAD® from these datasets. Model construction involves three main stages: (1) Georeferencing and processing of LIDAR datasets; (2) Structural interpretation to discriminate between faults, fractures, veins, and joint planes using CAD software and RiSCAN Pro; and (3) Building a 3D DFN in GOCAD®. To test the validity of this workflow, we focus here on a 4.5 m displacement strike-slip fault zone that displays a complex polymodal fracture network in the inter-layered basalt-volcaniclastic sequence, which is well-constrained by field study. The DFN models support our initial field-based hypothesis that fault zone geometry varies with increasing displacement through volcaniclastic units. Fracture concentration appears to be greatest in the upper lava unit, decreases into the volcaniclastic sediments, and decreases further into the lower lava unit. This distribution of fractures appears to be related to the width of the fault zone and the amount of fault damage on the outcrop. For instance, the fault zone is thicker in

  1. Discovering New Global Climate Patterns: Curating a 21-Year High Temporal (Hourly) and Spatial (40km) Resolution Reanalysis Dataset

    Science.gov (United States)

    Hou, C. Y.; Dattore, R.; Peng, G. S.

    2014-12-01

    The National Center for Atmospheric Research's Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40km Reanalysis dataset is a dynamically downscaled dataset with high temporal and spatial resolution. The dataset contains three-dimensional hourly analyses in netCDF format for the global atmospheric state from 1985 to 2005 on a 40km horizontal grid (0.4°grid increment) with 28 vertical levels, providing good representation of local forcing and diurnal variation of processes in the planetary boundary layer. This project aimed to make the dataset publicly available, accessible, and usable in order to provide a unique resource to allow and promote studies of new climate characteristics. When the curation project started, it had been five years since the data files were generated. Also, although the Principal Investigator (PI) had generated a user document at the end of the project in 2009, the document had not been maintained. Furthermore, the PI had moved to a new institution, and the remaining team members were reassigned to other projects. These factors made data curation in the areas of verifying data quality, harvest metadata descriptions, documenting provenance information especially challenging. As a result, the project's curation process found that: Data curator's skill and knowledge helped make decisions, such as file format and structure and workflow documentation, that had significant, positive impact on the ease of the dataset's management and long term preservation. Use of data curation tools, such as the Data Curation Profiles Toolkit's guidelines, revealed important information for promoting the data's usability and enhancing preservation planning. Involving data curators during each stage of the data curation life cycle instead of at the end could improve the curation process' efficiency. Overall, the project showed that proper resources invested in the curation process would give datasets the best chance to fulfill their potential to

  2. Public Health Offices, Published in 2006, Washoe County.

    Data.gov (United States)

    NSGIC GIS Inventory (aka Ramona) — This Public Health Offices dataset, was produced all or in part from Published Reports/Deeds information as of 2006. Data by this publisher are often provided in...

  3. Public lighting.

    NARCIS (Netherlands)

    Schreuder, D.A.

    1986-01-01

    The function of public lighting and the relationship between public lighting and accidents are considered briefly as aspects of effective countermeasures. Research needs and recent developments in installation and operational described. Public lighting is an efficient accident countermeasure, but

  4. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used fo

  5. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  6. Robust Machine Learning Applied to Terascale Astronomical Datasets

    Science.gov (United States)

    Ball, N. M.; Brunner, R. J.; Myers, A. D.

    2008-08-01

    We present recent results from the Laboratory for Cosmological Data Mining {http://lcdm.astro.uiuc.edu} at the National Center for Supercomputing Applications (NCSA) to provide robust classifications and photometric redshifts for objects in the terascale-class Sloan Digital Sky Survey (SDSS). Through a combination of machine learning in the form of decision trees, k-nearest neighbor, and genetic algorithms, the use of supercomputing resources at NCSA, and the cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million objects in the SDSS, improved photometric redshifts, and a full exploitation of the powerful k-nearest neighbor algorithm. This work is the first to apply the full power of these algorithms to contemporary terascale astronomical datasets, and the improvement over existing results is demonstrable. We discuss issues that we have encountered in dealing with data on the terascale, and possible solutions that can be implemented to deal with upcoming petascale datasets.

  7. An axiomatic approach to intrinsic dimension of a dataset

    CERN Document Server

    Pestov, Vladimir

    2007-01-01

    We perform a deeper analysis of an axiomatic approach to the concept of intrinsic dimension of a dataset proposed by us in the IJCNN'07 paper (arXiv:cs/0703125). The main features of our approach are that a high intrinsic dimension of a dataset reflects the presence of the curse of dimensionality (in a certain mathematically precise sense), and that dimension of a discrete i.i.d. sample of a low-dimensional manifold is, with high probability, close to that of the manifold. At the same time, the intrinsic dimension of a sample is easily corrupted by moderate high-dimensional noise (of the same amplitude as the size of the manifold) and suffers from prohibitevely high computational complexity (computing it is an $NP$-complete problem). We outline a possible way to overcome these difficulties.

  8. Content-level deduplication on mobile internet datasets

    Science.gov (United States)

    Hou, Ziyu; Chen, Xunxun; Wang, Yang

    2017-06-01

    Various systems and applications involve a large volume of duplicate items. Based on high data redundancy in real world datasets, data deduplication can reduce storage capacity and improve the utilization of network bandwidth. However, chunks of existing deduplications range in size from 4KB to over 16KB, existing systems are not applicable to the datasets consisting of short records. In this paper, we propose a new framework called SF-Dedup which is able to implement the deduplication process on a large set of Mobile Internet records, the size of records can be smaller than 100B, or even smaller than 10B. SF-Dedup is a short fingerprint, in-line, hash-collisions-resolved deduplication. Results of experimental applications illustrate that SH-Dedup is able to reduce storage capacity and shorten query time on relational database.

  9. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    Science.gov (United States)

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-08-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.

  10. Out-of-core clustering of volumetric datasets

    Institute of Scientific and Technical Information of China (English)

    GRANBERG Carl J.; LI Ling

    2006-01-01

    In this paper we present a novel method for dividing and clustering large volumetric scalar out-of-core datasets. This work is based on the Ordered Cluster Binary Tree (OCBT) structure created using a top-down or divisive clustering method. The OCBT structure allows fast and efficient sub volume queries to be made in combination with level of detail (LOD) queries of the tree. The initial partitioning of the large out-of-core dataset is done by using non-axis aligned planes calculated using Principal Component Analysis (PCA). A hybrid OCBT structure is also proposed where an in-core cluster binary tree is combined with a large out-of-core file.

  11. MEME-ChIP: motif analysis of large DNA datasets.

    Science.gov (United States)

    Machanick, Philip; Bailey, Timothy L

    2011-06-15

    Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets. The MEME-ChIP web service is designed to analyze ChIP-seq 'peak regions'--short genomic regions surrounding declared ChIP-seq 'peaks'. Given a set of genomic regions, it performs (i) ab initio motif discovery, (ii) motif enrichment analysis, (iii) motif visualization, (iv) binding affinity analysis and (v) motif identification. It runs two complementary motif discovery algorithms on the input data--MEME and DREME--and uses the motifs they discover in subsequent visualization, binding affinity and identification steps. MEME-ChIP also performs motif enrichment analysis using the AME algorithm, which can detect very low levels of enrichment of binding sites for TFs with known DNA-binding motifs. Importantly, unlike with the MEME web service, there is no restriction on the size or number of uploaded sequences, allowing very large ChIP-seq datasets to be analyzed. The analyses performed by MEME-ChIP provide the user with a varied view of the binding and regulatory activity of the ChIP-ed TF, as well as the possible involvement of other DNA-binding TFs. MEME-ChIP is available as part of the MEME Suite at http://meme.nbcr.net.

  12. Simultaneous clustering of multiple gene expression and physical interaction datasets.

    Directory of Open Access Journals (Sweden)

    Manikandan Narayanan

    2010-04-01

    Full Text Available Many genome-wide datasets are routinely generated to study different aspects of biological systems, but integrating them to obtain a coherent view of the underlying biology remains a challenge. We propose simultaneous clustering of multiple networks as a framework to integrate large-scale datasets on the interactions among and activities of cellular components. Specifically, we develop an algorithm JointCluster that finds sets of genes that cluster well in multiple networks of interest, such as coexpression networks summarizing correlations among the expression profiles of genes and physical networks describing protein-protein and protein-DNA interactions among genes or gene-products. Our algorithm provides an efficient solution to a well-defined problem of jointly clustering networks, using techniques that permit certain theoretical guarantees on the quality of the detected clustering relative to the optimal clustering. These guarantees coupled with an effective scaling heuristic and the flexibility to handle multiple heterogeneous networks make our method JointCluster an advance over earlier approaches. Simulation results showed JointCluster to be more robust than alternate methods in recovering clusters implanted in networks with high false positive rates. In systematic evaluation of JointCluster and some earlier approaches for combined analysis of the yeast physical network and two gene expression datasets under glucose and ethanol growth conditions, JointCluster discovers clusters that are more consistently enriched for various reference classes capturing different aspects of yeast biology or yield better coverage of the analysed genes. These robust clusters, which are supported across multiple genomic datasets and diverse reference classes, agree with known biology of yeast under these growth conditions, elucidate the genetic control of coordinated transcription, and enable functional predictions for a number of uncharacterized genes.

  13. Microscopic images dataset for automation of RBCs counting

    Directory of Open Access Journals (Sweden)

    Sherif Abbas

    2015-12-01

    Full Text Available A method for Red Blood Corpuscles (RBCs counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  14. Evaluating summarised radionuclide concentration ratio datasets for wildlife.

    Science.gov (United States)

    Wood, M D; Beresford, N A; Howard, B J; Copplestone, D

    2013-12-01

    Concentration ratios (CR(wo-media)) are used in most radioecological models to predict whole-body radionuclide activity concentrations in wildlife from those in environmental media. This simplistic approach amalgamates the various factors influencing transfer within a single generic value and, as a result, comparisons of model predictions with site-specific measurements can vary by orders of magnitude. To improve model predictions, the development of 'condition-specific' CR(wo-media) values has been proposed (e.g. for a specific habitat). However, the underlying datasets for most CR(wo-media) value databases, such as the wildlife transfer database (WTD) developed within the IAEA EMRAS II programme, include summarised data. This presents challenges for the calculation and subsequent statistical evaluation of condition-specific CR(wo-media) values. A further complication is the common use of arithmetic summary statistics to summarise data in source references, even though CR(wo-media) values generally tend towards a lognormal distribution and should, therefore, be summarised using geometric statistics. In this paper, we propose a statistically-defensible and robust method for reconstructing underlying datasets to calculate condition-specific CR(wo-media) values from summarised data and deriving geometric summary statistics. This method is applied to terrestrial datasets from the WTD. Statistically significant differences in sub-category CR(wo-media) values (e.g. mammals categorised by feeding strategy) were identified, which may justify the use of these CR(wo-media) values for specific assessment contexts. However, biases and limitations within the underlying datasets of the WTD explain some of these differences. Given the uncertainty in the summarised CR(wo-media) values, we suggest that the CR(wo-media) approach to estimating transfer is used with caution above screening-level assessments. Copyright © 2013 The Authors. Published by Elsevier Ltd.. All rights

  15. Microscopic images dataset for automation of RBCs counting.

    Science.gov (United States)

    Abbas, Sherif

    2015-12-01

    A method for Red Blood Corpuscles (RBCs) counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs) images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  16. Analysis of Heart Diseases Dataset using Neural Network Approach

    CERN Document Server

    Rani, K Usha

    2011-01-01

    One of the important techniques of Data mining is Classification. Many real world problems in various fields such as business, science, industry and medicine can be solved by using classification approach. Neural Networks have emerged as an important tool for classification. The advantages of Neural Networks helps for efficient classification of given data. In this study a Heart diseases dataset is analyzed using Neural Network approach. To increase the efficiency of the classification process parallel approach is also adopted in the training phase.

  17. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea

    Directory of Open Access Journals (Sweden)

    Lenaïg G. Hemery

    2013-07-01

    Full Text Available This circumpolar dataset of the comatulid (Echinodermata: Crinoidea Promachocrinus kerguelensis (Carpenter, 1888 from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012. The aim of Hemery et al. (2012 paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008. Over one thousand three hundred specimens (1307 used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d’Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear, sampling sites (station, geographic coordinates, depth and genetic data (phylogroup, haplotype, sequence ID for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d’Histoire naturelle, Paris and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine. This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns, biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

  18. How To Break Anonymity of the Netflix Prize Dataset

    OpenAIRE

    Narayanan, Arvind; Shmatikov, Vitaly

    2006-01-01

    We present a new class of statistical de-anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. ...

  19. OpenCL based machine learning labeling of biomedical datasets

    Science.gov (United States)

    Amoros, Oscar; Escalera, Sergio; Puig, Anna

    2011-03-01

    In this paper, we propose a two-stage labeling method of large biomedical datasets through a parallel approach in a single GPU. Diagnostic methods, structures volume measurements, and visualization systems are of major importance for surgery planning, intra-operative imaging and image-guided surgery. In all cases, to provide an automatic and interactive method to label or to tag different structures contained into input data becomes imperative. Several approaches to label or segment biomedical datasets has been proposed to discriminate different anatomical structures in an output tagged dataset. Among existing methods, supervised learning methods for segmentation have been devised to easily analyze biomedical datasets by a non-expert user. However, they still have some problems concerning practical application, such as slow learning and testing speeds. In addition, recent technological developments have led to widespread availability of multi-core CPUs and GPUs, as well as new software languages, such as NVIDIA's CUDA and OpenCL, allowing to apply parallel programming paradigms in conventional personal computers. Adaboost classifier is one of the most widely applied methods for labeling in the Machine Learning community. In a first stage, Adaboost trains a binary classifier from a set of pre-labeled samples described by a set of features. This binary classifier is defined as a weighted combination of weak classifiers. Each weak classifier is a simple decision function estimated on a single feature value. Then, at the testing stage, each weak classifier is independently applied on the features of a set of unlabeled samples. In this work, we propose an alternative representation of the Adaboost binary classifier. We use this proposed representation to define a new GPU-based parallelized Adaboost testing stage using OpenCL. We provide numerical experiments based on large available data sets and we compare our results to CPU-based strategies in terms of time and

  20. Data Integration Framework Data Management Plan Remote Sensing Dataset

    Science.gov (United States)

    2016-07-01

    lidar , X-band radar, and electro-optical (EO) and infrared (IR) imagery at the FRF. These datasets provide near real-time observations of the littoral...Mobile District, Operations Division, Spatial Data Branch CHL Coastal and Hydraulics Laboratory CHS Coastal Hazards System CLARIS Coastal Lidar ... system IR infrared ISO International Organization for Standardization JPG joint photographic experts group (file format) km kilometer LAS laser

  1. Pantheon: A Dataset for the Study of Global Cultural Production

    CERN Document Server

    Yu, Amy Zhao; Hu, Kevin; Lu, Tiffany; Hidalgo, César A

    2015-01-01

    We present the Pantheon 1.0 dataset: a manually curated dataset of individuals that have transcended linguistic, temporal, and geographic boundaries. The Pantheon 1.0 dataset includes the 11,341 biographies present in more than 25 languages in Wikipedia and is enriched with: (i) manually curated demographic information (place of birth, date of birth, and gender), (ii) a cultural domain classification categorizing each biography at three levels of aggregation (i.e. Arts/Fine Arts/Painting), and (iii) measures of global visibility (fame) including the number of languages in which a biography is present in Wikipedia, the monthly page-views received by a biography (2008-2013), and a global visibility metric we name the Historical Popularity Index (HPI). We validate our measures of global visibility (HPI and Wikipedia language editions) using external measures of accomplishment in several cultural domains: Tennis, Swimming, Car Racing, and Chess. In all of these cases we find that measures of accomplishments and f...

  2. Increasing consistency of disease biomarker prediction across datasets.

    Directory of Open Access Journals (Sweden)

    Maria D Chikina

    Full Text Available Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern.

  3. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  4. Synchronization of networks of chaotic oscillators: Structural and dynamical datasets

    Directory of Open Access Journals (Sweden)

    Ricardo Sevilla-Escoboza

    2016-06-01

    Full Text Available We provide the topological structure of a series of N=28 Rössler chaotic oscillators diffusively coupled through one of its variables. The dynamics of the y variable describing the evolution of the individual nodes of the network are given for a wide range of coupling strengths. Datasets capture the transition from the unsynchronized behavior to the synchronized one, as a function of the coupling strength between oscillators. The fact that both the underlying topology of the system and the dynamics of the nodes are given together makes this dataset a suitable candidate to evaluate the interplay between functional and structural networks and serve as a benchmark to quantify the ability of a given algorithm to extract the structural network of connections from the observation of the dynamics of the nodes. At the same time, it is possible to use the dataset to analyze the different dynamical properties (randomness, complexity, reproducibility, etc. of an ensemble of oscillators as a function of the coupling strength.

  5. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  6. Scaling statistical multiple sequence alignment to large datasets

    Directory of Open Access Journals (Sweden)

    Michael Nute

    2016-11-01

    Full Text Available Abstract Background Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today. Results We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences. Conclusions Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods.

  7. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  8. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    Science.gov (United States)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  9. Nanomaterial datasets to advance tomography in scanning transmission electron microscopy.

    Science.gov (United States)

    Levin, Barnaby D A; Padgett, Elliot; Chen, Chien-Chun; Scott, M C; Xu, Rui; Theis, Wolfgang; Jiang, Yi; Yang, Yongsoo; Ophus, Colin; Zhang, Haitao; Ha, Don-Hyung; Wang, Deli; Yu, Yingchao; Abruña, Hector D; Robinson, Richard D; Ercius, Peter; Kourkoutis, Lena F; Miao, Jianwei; Muller, David A; Hovden, Robert

    2016-06-07

    Electron tomography in materials science has flourished with the demand to characterize nanoscale materials in three dimensions (3D). Access to experimental data is vital for developing and validating reconstruction methods that improve resolution and reduce radiation dose requirements. This work presents five high-quality scanning transmission electron microscope (STEM) tomography datasets in order to address the critical need for open access data in this field. The datasets represent the current limits of experimental technique, are of high quality, and contain materials with structural complexity. Included are tomographic series of a hyperbranched Co2P nanocrystal, platinum nanoparticles on a carbon nanofibre imaged over the complete 180° tilt range, a platinum nanoparticle and a tungsten needle both imaged at atomic resolution by equal slope tomography, and a through-focal tilt series of PtCu nanoparticles. A volumetric reconstruction from every dataset is provided for comparison and development of post-processing and visualization techniques. Researchers interested in creating novel data processing and reconstruction algorithms will now have access to state of the art experimental test data.

  10. Improved Cosmological Constraints from New, Old and Combined Supernova Datasets

    CERN Document Server

    Kowalski, M; Aldering, G; Agostinho, R J; Amadon, A; Amanullah, R; Balland, C; Barbary, K; Blanc, G; Challis, P J; Conley, A; Connolly, N V; Covarrubias, R; Dawson, K S; Deustua, S E; Ellis, R; Fabbro, S; Fadeev, V; Fan, X; Farris, B; Folatelli, G; Frye, B L; Garavini, G; Gates, E L; Germany, L; Goldhaber, G; Goldman, B; Goobar, A; Groom, D E; Haïssinski, J; Hardin, D; Hook, I; Kent, S; Kim, A G; Knop, R A; Lidman, C; Linder, E V; Méndez, J; Meyers, J; Miller, G J; Moniez, M; Mourão, A M; Newberg, H; Nobili, S; Nugent, P E; Pain, R; Perdereau, O; Perlmutter, S; Phillips, M M; Prasad, V; Quimby, R; Regnault, N; Rich, J; Rubenstein, E P; Ruiz-Lapuente, P; Santos, F D; Schaefer, B E; Schommer, R A; Smith, R C; Soderberg, A M; Spadafora, A L; Strolger, L -G; Strovink, M; Suntzeff, N B; Suzuki, N; Thomas, R C; Walton, N A; Wang, L; Wood-Vasey, W M; Yun, J L

    2008-01-01

    We present a new compilation of Type Ia supernovae (SNe Ia), a new dataset of low-redshift nearby-Hubble-flow SNe and new analysis procedures to work with these heterogeneous compilations. This ``Union'' compilation of 414 SN Ia, which reduces to 307 SNe after selection cuts, includes the recent large samples of SNe Ia from the Supernova Legacy Survey and ESSENCE Survey, the older datasets, as well as the recently extended dataset of distant supernovae observed with HST. A single, consistent and blind analysis procedure is used for all the various SN Ia subsamples, and a new procedure is implemented that consistently weights the heterogeneous data sets and rejects outliers. We present the latest results from this Union compilation and discuss the cosmological constraints from this new compilation and its combination with other cosmological measurements (CMB and BAO). The constraint we obtain from supernovae on the dark energy density is $\\Omega_\\Lambda= 0.713^{+0.027}_{-0.029} (stat)}^{+0.036}_{-0.039} (sys)}...

  11. Strategies for analyzing highly enriched IP-chip datasets

    Directory of Open Access Journals (Sweden)

    Tavaré Simon

    2009-09-01

    Full Text Available Abstract Background Chromatin immunoprecipitation on tiling arrays (ChIP-chip has been employed to examine features such as protein binding and histone modifications on a genome-wide scale in a variety of cell types. Array data from the latter studies typically have a high proportion of enriched probes whose signals vary considerably (due to heterogeneity in the cell population, and this makes their normalization and downstream analysis difficult. Results Here we present strategies for analyzing such experiments, focusing our discussion on the analysis of Bromodeoxyruridine (BrdU immunoprecipitation on tiling array (BrdU-IP-chip datasets. BrdU-IP-chip experiments map large, recently replicated genomic regions and have similar characteristics to histone modification/location data. To prepare such data for downstream analysis we employ a dynamic programming algorithm that identifies a set of putative unenriched probes, which we use for both within-array and between-array normalization. We also introduce a second dynamic programming algorithm that incorporates a priori knowledge to identify and quantify positive signals in these datasets. Conclusion Highly enriched IP-chip datasets are often difficult to analyze with traditional array normalization and analysis strategies. Here we present and test a set of analytical tools for their normalization and quantification that allows for accurate identification and analysis of enriched regions.

  12. Igloo-Plot: a tool for visualization of multidimensional datasets.

    Science.gov (United States)

    Kuntal, Bhusan K; Ghosh, Tarini Shankar; Mande, Sharmila S

    2014-01-01

    Advances in science and technology have resulted in an exponential growth of multivariate (or multi-dimensional) datasets which are being generated from various research areas especially in the domain of biological sciences. Visualization and analysis of such data (with the objective of uncovering the hidden patterns therein) is an important and challenging task. We present a tool, called Igloo-Plot, for efficient visualization of multidimensional datasets. The tool addresses some of the key limitations of contemporary multivariate visualization and analysis tools. The visualization layout, not only facilitates an easy identification of clusters of data-points having similar feature compositions, but also the 'marker features' specific to each of these clusters. The applicability of the various functionalities implemented herein is demonstrated using several well studied multi-dimensional datasets. Igloo-Plot is expected to be a valuable resource for researchers working in multivariate data mining studies. Igloo-Plot is available for download from: http://metagenomics.atc.tcs.com/IglooPlot/.

  13. Filtergraph: An Interactive Web Application for Visualization of Astronomy Datasets

    CERN Document Server

    Burger, Dan; Pepper, Joshua; Siverd, Robert J; Paegert, Martin; De Lee, Nathan M

    2013-01-01

    Filtergraph is a web application being developed and maintained by the Vanderbilt Initiative in Data-intensive Astrophysics (VIDA) to flexibly and rapidly visualize a large variety of astronomy datasets of various formats and sizes. The user loads a flat-file dataset into Filtergraph which automatically generates an interactive data portal that can be easily shared with others. From this portal, the user can immediately generate scatter plots of up to 5 dimensions as well as histograms and tables based on the dataset. Key features of the portal include intuitive controls with auto-completed variable names, the ability to filter the data in real time through user-specified criteria, the ability to select data by dragging on the screen, and the ability to perform arithmetic operations on the data in real time. To enable seamless data visualization and exploration, changes are quickly rendered on screen and visualizations can be exported as high quality graphics files. The application is optimized for speed in t...

  14. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  15. Multiresolution persistent homology for excessively large biomolecular datasets

    Science.gov (United States)

    Xia, Kelin; Zhao, Zhixiong; Wei, Guo-Wei

    2015-01-01

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs. PMID:26450288

  16. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  17. Establishing a minimum dataset for prospective registration of systematic reviews: an international consultation.

    Directory of Open Access Journals (Sweden)

    Alison Booth

    Full Text Available BACKGROUND: In response to growing recognition of the value of prospective registration of systematic review protocols, we planned to develop a web-based open access international register. In order for the register to fulfil its aims of reducing unplanned duplication, reducing publication bias, and providing greater transparency, it was important to ensure the appropriate data were collected. We therefore undertook a consultation process with experts in the field to identify a minimum dataset for registration. METHODS AND FINDINGS: A two-round electronic modified Delphi survey design was used. The international panel surveyed included experts from areas relevant to systematic review including commissioners, clinical and academic researchers, methodologists, statisticians, information specialists, journal editors and users of systematic reviews. Direct invitations to participate were sent out to 315 people in the first round and 322 in the second round. Responses to an open invitation to participate were collected separately. There were 194 (143 invited and 51 open respondents with a 100% completion rate in the first round and 209 (169 invited and 40 open respondents with a 91% completion rate in the second round. In the second round, 113 (54% of the participants reported having previously taken part in the first round. Participants were asked to indicate whether a series of potential items should be designated as optional or required registration items, or should not be included in the register. After the second round, a 70% or greater agreement was reached on the designation of 30 of 36 items. CONCLUSIONS: The results of the Delphi exercise have established a dataset of 22 required items for the prospective registration of systematic reviews, and 18 optional items. The dataset captures the key attributes of review design as well as the administrative details necessary for registration.

  18. Collaboration as a means toward a better dataset for both stakeholders and scientist

    Science.gov (United States)

    Chegwidden, O.; Rupp, D. E.; Nijssen, B.; Pytlak, E.; Knight, K.

    2016-12-01

    In 2013, the University of Washington (UW) and Oregon State University began a three-year project to evaluate climate change impacts in the Columbia River Basin (CRB) in the North American Pacific Northwest. The project was funded and coordinated by the River Management Joint Operating Committee (RMJOC), consisting of the Bonneville Power Administration (BPA), US Army Corps of Engineers (USACE), and US Bureau of Reclamation (USBR) and included a host of stakeholders in the region. The team worked to foster communication and collaboration throughout the production process, and also discovered effective collaborative strategies along the way. Project status updates occurred through a variety of outlets, ranging from monthly team check-ins to bi-annual workshops for a much larger audience. The workshops were used to solicit ongoing and timely feedback from a variety of stakeholders including RMJOC members, fish habitat advocates, tribal representatives and public utilities. To further facilitate collaboration, the team restructured the original project timeline, opting for delivering a provisional dataset nine months before the scheduled delivery of the final dataset. This allowed for a previously unplanned series of reviews from stakeholders in the region, who contributed their own expertise and interests to the dataset. The restructuring also encouraged the development of a streamlined infrastructure for performing the actual model simulation, resulting in two benefits: (1) reproducibility, an oft-touted goal within the scientific community, and (2) the ability to incorporate improvements from both stakeholders and scientists at a late stage in the project. We will highlight some of the key scientist-stakeholder engagement interactions throughout the project. We will show that active co-production resulted in a product more useful for not only stakeholders in the region, but also the scientific community.

  19. NGS-QC Generator: A Quality Control System for ChIP-Seq and Related Deep Sequencing-Generated Datasets.

    Science.gov (United States)

    Mendoza-Parra, Marco Antonio; Saleem, Mohamed-Ashick M; Blum, Matthias; Cholley, Pierre-Etienne; Gronemeyer, Hinrich

    2016-01-01

    The combination of massive parallel sequencing with a variety of modern DNA/RNA enrichment technologies provides means for interrogating functional protein-genome interactions (ChIP-seq), genome-wide transcriptional activity (RNA-seq; GRO-seq), chromatin accessibility (DNase-seq, FAIRE-seq, MNase-seq), and more recently the three-dimensional organization of chromatin (Hi-C, ChIA-PET). In systems biology-based approaches several of these readouts are generally cumulated with the aim of describing living systems through a reconstitution of the genome-regulatory functions. However, an issue that is often underestimated is that conclusions drawn from such multidimensional analyses of NGS-derived datasets critically depend on the quality of the compared datasets. To address this problem, we have developed the NGS-QC Generator, a quality control system that infers quality descriptors for any kind of ChIP-sequencing and related datasets. In this chapter we provide a detailed protocol for (1) assessing quality descriptors with the NGS-QC Generator; (2) to interpret the generated reports; and (3) to explore the database of QC indicators (www.ngs-qc.org) for >21,000 publicly available datasets.

  20. Integrating Information from Multiple Methods into the Analysis of Perceived Risk of Crime: The Role of Geo-Referenced Field Data and Mobile Methods

    Directory of Open Access Journals (Sweden)

    Jane Fielding

    2013-01-01

    Full Text Available This paper demonstrates the use of mixed methods discovery techniques to explore public perceptions of community safety and risk, using computational techniques that combine and integrate layers of information to reveal connections between community and place. Perceived vulnerability to crime is conceptualised using an etic/emic framework. The etic “outsider” viewpoint imposes its categorisation of vulnerability not only on areas (“crime hot spots” or “deprived neighbourhoods” but also on socially constructed groupings of individuals (the “sick” or the “poor” based on particular qualities considered relevant by the analyst. The range of qualities is often both narrow and shallow. The alternative, emic, “insider” perspective explores vulnerability based on the meanings held by the individuals informed by their lived experience. Using recorded crime data and Census-derived area classifications, we categorise an area in Southern England from an etic viewpoint. Mobile interviews with local residents and police community support officers and researcher-led environmental audits provide qualitative emic data. GIS software provides spatial context to analytically link both quantitative and qualitative data. We demonstrate how this approach reveals hidden sources of community resilience and produces findings that explicate low level social disorder and vandalism as turns in a “dialogue” of resistance against urbanisation and property development.