WorldWideScience

Sample records for multiple datasets including

  1. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  2. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  3. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Science.gov (United States)

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  4. The SAIL databank: linking multiple health and social care datasets

    Directory of Open Access Journals (Sweden)

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  5. The SAIL databank: linking multiple health and social care datasets.

    Science.gov (United States)

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  6. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  7. Compilation and analysis of multiple groundwater-quality datasets for Idaho

    Science.gov (United States)

    Hundt, Stephen A.; Hopkins, Candice B.

    2018-05-09

    Groundwater is an important source of drinking and irrigation water throughout Idaho, and groundwater quality is monitored by various Federal, State, and local agencies. The historical, multi-agency records of groundwater quality include a valuable dataset that has yet to be compiled or analyzed on a statewide level. The purpose of this study is to combine groundwater-quality data from multiple sources into a single database, to summarize this dataset, and to perform bulk analyses to reveal spatial and temporal patterns of water quality throughout Idaho. Data were retrieved from the Water Quality Portal (https://www.waterqualitydata.us/), the Idaho Department of Environmental Quality, and the Idaho Department of Water Resources. Analyses included counting the number of times a sample location had concentrations above Maximum Contaminant Levels (MCL), performing trends tests, and calculating correlations between water-quality analytes. The water-quality database and the analysis results are available through USGS ScienceBase (https://doi.org/10.5066/F72V2FBG).

  8. A collection of annotated and harmonized human breast cancer transcriptome datasets, including immunologic classification [version 2; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Jessica Roelands

    2018-02-01

    Full Text Available The increased application of high-throughput approaches in translational research has expanded the number of publicly available data repositories. Gathering additional valuable information contained in the datasets represents a crucial opportunity in the biomedical field. To facilitate and stimulate utilization of these datasets, we have recently developed an interactive data browsing and visualization web application, the Gene Expression Browser (GXB. In this note, we describe a curated compendium of 13 public datasets on human breast cancer, representing a total of 2142 transcriptome profiles. We classified the samples according to different immune based classification systems and integrated this information into the datasets. Annotated and harmonized datasets were uploaded to GXB. Study samples were categorized in different groups based on their immunologic tumor response profiles, intrinsic molecular subtypes and multiple clinical parameters. Ranked gene lists were generated based on relevant group comparisons. In this data note, we demonstrate the utility of GXB to evaluate the expression of a gene of interest, find differential gene expression between groups and investigate potential associations between variables with a specific focus on immunologic classification in breast cancer. This interactive resource is publicly available online at: http://breastcancer.gxbsidra.org/dm3/geneBrowser/list.

  9. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Directory of Open Access Journals (Sweden)

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  10. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  11. Road Bridges and Culverts, Bridge dataset only includes bridges maintained by Johnson County Public Works in the unincorporated areas, Published in Not Provided, Johnson County Government.

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — Road Bridges and Culverts dataset current as of unknown. Bridge dataset only includes bridges maintained by Johnson County Public Works in the unincorporated areas.

  12. Datasets will not be made accessible to the public due to the fact that they include household level data with PII.

    Data.gov (United States)

    U.S. Environmental Protection Agency — Datasets will not be made accessible to the public due to the fact that they include household level data with PII. This dataset is not publicly accessible because:...

  13. A Bayesian trans-dimensional approach for the fusion of multiple geophysical datasets

    Science.gov (United States)

    JafarGandomi, Arash; Binley, Andrew

    2013-09-01

    We propose a Bayesian fusion approach to integrate multiple geophysical datasets with different coverage and sensitivity. The fusion strategy is based on the capability of various geophysical methods to provide enough resolution to identify either subsurface material parameters or subsurface structure, or both. We focus on electrical resistivity as the target material parameter and electrical resistivity tomography (ERT), electromagnetic induction (EMI), and ground penetrating radar (GPR) as the set of geophysical methods. However, extending the approach to different sets of geophysical parameters and methods is straightforward. Different geophysical datasets are entered into a trans-dimensional Markov chain Monte Carlo (McMC) search-based joint inversion algorithm. The trans-dimensional property of the McMC algorithm allows dynamic parameterisation of the model space, which in turn helps to avoid bias of the post-inversion results towards a particular model. Given that we are attempting to develop an approach that has practical potential, we discretize the subsurface into an array of one-dimensional earth-models. Accordingly, the ERT data that are collected by using two-dimensional acquisition geometry are re-casted to a set of equivalent vertical electric soundings. Different data are inverted either individually or jointly to estimate one-dimensional subsurface models at discrete locations. We use Shannon's information measure to quantify the information obtained from the inversion of different combinations of geophysical datasets. Information from multiple methods is brought together via introducing joint likelihood function and/or constraining the prior information. A Bayesian maximum entropy approach is used for spatial fusion of spatially dispersed estimated one-dimensional models and mapping of the target parameter. We illustrate the approach with a synthetic dataset and then apply it to a field dataset. We show that the proposed fusion strategy is

  14. An integrated pan-tropical biomass map using multiple reference datasets.

    Science.gov (United States)

    Avitabile, Valerio; Herold, Martin; Heuvelink, Gerard B M; Lewis, Simon L; Phillips, Oliver L; Asner, Gregory P; Armston, John; Ashton, Peter S; Banin, Lindsay; Bayol, Nicolas; Berry, Nicholas J; Boeckx, Pascal; de Jong, Bernardus H J; DeVries, Ben; Girardin, Cecile A J; Kearsley, Elizabeth; Lindsell, Jeremy A; Lopez-Gonzalez, Gabriela; Lucas, Richard; Malhi, Yadvinder; Morel, Alexandra; Mitchard, Edward T A; Nagy, Laszlo; Qie, Lan; Quinones, Marcela J; Ryan, Casey M; Ferry, Slik J W; Sunderland, Terry; Laurin, Gaia Vaglio; Gatti, Roberto Cazzolla; Valentini, Riccardo; Verbeeck, Hans; Wijaya, Arief; Willcock, Simon

    2016-04-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging that incorporates and spatializes the biomass patterns indicated by the reference data. The method was applied independently in areas (strata) with homogeneous error patterns of the input (Saatchi and Baccini) maps, which were estimated from the reference data and additional covariates. Based on the fused map, we estimated AGB stock for the tropics (23.4 N-23.4 S) of 375 Pg dry mass, 9-18% lower than the Saatchi and Baccini estimates. The fused map also showed differing spatial patterns of AGB over large areas, with higher AGB density in the dense forest areas in the Congo basin, Eastern Amazon and South-East Asia, and lower values in Central America and in most dry vegetation areas of Africa than either of the input maps. The validation exercise, based on 2118 estimates from the reference dataset not used in the fusion process, showed that the fused map had a RMSE 15-21% lower than that of the input maps and, most importantly, nearly unbiased estimates (mean bias 5 Mg dry mass ha(-1) vs. 21 and 28 Mg ha(-1) for the input maps). The fusion method can be applied at any scale including the policy-relevant national level, where it can provide improved biomass estimates by integrating existing regional biomass maps as input maps and additional, country-specific reference datasets. © 2015 John Wiley & Sons Ltd.

  15. Imputation and quality control steps for combining multiple genome-wide datasets

    Directory of Open Access Journals (Sweden)

    Shefali S Verma

    2014-12-01

    Full Text Available The electronic MEdical Records and GEnomics (eMERGE network brings together DNA biobanks linked to electronic health records (EHRs from multiple institutions. Approximately 52,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes, and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2 were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.

  16. Mapping of government land encroachment in Cameron Highlands using multiple remote sensing datasets

    International Nuclear Information System (INIS)

    Zin, M H M; Ahmad, B

    2014-01-01

    The cold and refreshing highland weather is one of the factors that give impact to socio-economic growth in Cameron Highlands. This unique weather of the highland surrounded by tropical rain forest can only be found in a few places in Malaysia. It makes this place a famous tourism attraction and also provides a very suitable temperature for agriculture activities. Thus it makes agriculture such as tea plantation, vegetable, fruits and flowers one of the biggest economic activities in Cameron Highlands. However unauthorized agriculture activities are rampant. The government land, mostly forest area have been encroached by farmers, in many cases indiscriminately cutting down trees and hill slopes. This study is meant to detect and assess this encroachment using multiple remote sensing datasets. The datasets were used together with cadastral parcel data where survey lines describe property boundary, pieces of land are subdivided into lots of government and private. The general maximum likelihood classification method was used on remote sensing image to classify the land-cover in the study area. Ground truth data from field observation were used to assess the accuracy of the classification. Cadastral parcel data was overlaid on the classification map in order to detect the encroachment area. The result of this study shows that there is a land cover change of 93.535 ha in the government land of the study area between years 2001 to 2010, nevertheless almost no encroachment took place in the studied forest reserve area. The result of this study will be useful for the authority in monitoring and managing the forest

  17. Mapping of government land encroachment in Cameron Highlands using multiple remote sensing datasets

    Science.gov (United States)

    Zin, M. H. M.; Ahmad, B.

    2014-02-01

    The cold and refreshing highland weather is one of the factors that give impact to socio-economic growth in Cameron Highlands. This unique weather of the highland surrounded by tropical rain forest can only be found in a few places in Malaysia. It makes this place a famous tourism attraction and also provides a very suitable temperature for agriculture activities. Thus it makes agriculture such as tea plantation, vegetable, fruits and flowers one of the biggest economic activities in Cameron Highlands. However unauthorized agriculture activities are rampant. The government land, mostly forest area have been encroached by farmers, in many cases indiscriminately cutting down trees and hill slopes. This study is meant to detect and assess this encroachment using multiple remote sensing datasets. The datasets were used together with cadastral parcel data where survey lines describe property boundary, pieces of land are subdivided into lots of government and private. The general maximum likelihood classification method was used on remote sensing image to classify the land-cover in the study area. Ground truth data from field observation were used to assess the accuracy of the classification. Cadastral parcel data was overlaid on the classification map in order to detect the encroachment area. The result of this study shows that there is a land cover change of 93.535 ha in the government land of the study area between years 2001 to 2010, nevertheless almost no encroachment took place in the studied forest reserve area. The result of this study will be useful for the authority in monitoring and managing the forest.

  18. An integrated pan-tropical biomass map using multiple reference datasets

    NARCIS (Netherlands)

    Avitabile, V.; Herold, M.; Heuvelink, G.B.M.; Lewis, S.L.; Phillips, O.L.; Asner, G.P.; Armston, J.; Asthon, P.; Banin, L.F.; Bayol, N.; Berry, N.; Boeckx, P.; Jong, De B.; Devries, B.; Girardin, C.; Kearsley, E.; Lindsell, J.A.; Lopez-gonzalez, G.; Lucas, R.; Malhi, Y.; Morel, A.; Mitchard, E.; Nagy, L.; Qie, L.; Quinones, M.; Ryan, C.M.; Slik, F.; Sunderland, T.; Vaglio Laurin, G.; Valentini, R.; Verbeeck, H.; Wijaya, A.; Willcock, S.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of

  19. Plasma Cell Neoplasms (Including Multiple Myeloma)—Patient Version

    Science.gov (United States)

    Plasma cell neoplasms occur when abnormal plasma cells form cancerous tumors. When there is only one tumor, the disease is called a plasmacytoma. When there are multiple tumors, it is called multiple myeloma. Start here to find information on plasma cell neoplasms treatment, research, and statistics.

  20. Observed climate variability over Chad using multiple observational and reanalysis datasets

    Science.gov (United States)

    Maharana, Pyarimohan; Abdel-Lathif, Ahmat Younous; Pattnayak, Kanhu Charan

    2018-03-01

    Chad is the largest of Africa's landlocked countries and one of the least studied region of the African continent. The major portion of Chad lies in the Sahel region, which is known for its rapid climate change. In this study, multiple observational datasets are analyzed from 1950 to 2014, in order to examine the trend of precipitation and temperature along with their variability over Chad to understand possible impacts of climate change over this region. Trend analysis of the climatic fields has been carried out using Mann-Kendall test. The precipitation over Chad is mostly contributed during summer by West African Monsoon, with maximum northward limit of 18° N. The Atlantic Ocean as well as the Mediterranean Sea are the major source of moisture for the summer rainfall over Chad. Based on the rainfall time series, the entire study period has been divided in to wet (1950 to 1965), dry (1966 to 1990) and recovery period (1991 to 2014). The rainfall has decreased drastically for almost 3 decades during the dry period resulted into various drought years. The temperature increases at a rate of 0.15 °C/decade during the entire period of analysis. The seasonal rainfall as well as temperature plays a major role in the change of land use/cover. The decrease of monsoon rainfall during the dry period reduces the C4 cover drastically; this reduction of C4 grass cover leads to increase of C3 grass cover. The slow revival of rainfall is still not good enough for the increase of shrub cover but it favors the gradual reduction of bare land over Chad.

  1. Identification of dust storm source areas in West Asia using multiple environmental datasets.

    Science.gov (United States)

    Cao, Hui; Amiraslani, Farshad; Liu, Jian; Zhou, Na

    2015-01-01

    Sand and Dust storms are common phenomena in arid and semi-arid areas. West Asia Region, especially Tigris-Euphrates alluvial plain, has been recognized as one of the most important dust source areas in the world. In this paper, a method is applied to extract SDS (Sand and Dust Storms) sources in West Asia region using thematic maps, climate and geography, HYSPLIT model and satellite images. Out of 50 dust storms happened during 2000-2013 and collected in form of MODIS images, 27 events were incorporated as demonstrations of the simulated trajectories by HYSPLIT model. Besides, a dataset of the newly released Landsat images was used as base-map for the interpretation of SDS source regions. As a result, six main clusters were recognized as dust source areas. Of which, 3 clusters situated in Tigris-Euphrates plain were identified as severe SDS sources (including 70% dust storms in this research). Another cluster in Sistan plain is also a potential source area. This approach also confirmed six main paths causing dust storms. These paths are driven by the climate system including Siberian and Polar anticyclones, monsoon from Indian Subcontinent and depression from north of Africa. The identification of SDS source areas and paths will improve our understandings on the mechanisms and impacts of dust storms on socio-economy and environment of the region. Copyright © 2014 Elsevier B.V. All rights reserved.

  2. Stages of Plasma Cell Neoplasms (Including Multiple Myeloma)

    Science.gov (United States)

    ... cancer treatment is also called biotherapy or immunotherapy. Immunomodulators are a type of biologic therapy. Thalidomide , lenalidomide , and pomalidomide are immunomodulators used to treat multiple myeloma and other plasma ...

  3. Treatment Options for Plasma Cell Neoplasms (Including Multiple Myeloma)

    Science.gov (United States)

    ... cancer treatment is also called biotherapy or immunotherapy. Immunomodulators are a type of biologic therapy. Thalidomide , lenalidomide , and pomalidomide are immunomodulators used to treat multiple myeloma and other plasma ...

  4. An integrated pan-tropical biomass map using multiple reference datasets

    OpenAIRE

    Avitabile, V.; Herold, M.; Heuvelink, G. B. M.; Lewis, S. L.; Phillips, O. L.; Asner, G. P.; Armston, J.; Ashton, P. S.; Banin, L.; Bayol, N.; Berry, N. J.; Boeckx, P.; de Jong, B. H. J.; DeVries, B.; Girardin, C. A. J.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging...

  5. Search for MH370: New Geologic Insights Gained from Integrating Multiple Geophysical Datasets

    Science.gov (United States)

    McBee, J.; Gharib, J. J.; Ingle, S.

    2017-12-01

    During the search for the missing flight MH370, Fugro acquired the largest extent of high resolution bathymetry in the southern Indian Ocean to date. These recently released multibeam echosounder (MBES) backscatter, bathymetry, water column, and sub-bottom profiler data reveal additional insights into the characteristics of the Indian Ocean seafloor and the geologic and oceanographic processes that shaped it. The mapping is at a sufficient resolution to examine relict spreading texture such as fracture zones, pseudofaults, failed propogating rifts, devals (deviations from axial linearity), etc. In this presentation, we will highlight several prominent regional seafloor features, and illustrate insights gained by integrating MBES backscatter and water column data with bathymetric analyses. Backscatter data to the north of the southern flank of Broken Ridge illustrate the complexity to which sediment has been reworked downslope where intricate patterns of low backscatter intensities are observed. Here, exposed rocks form prominent high backscatter reflectors amid the surrounding low backscatter sediments. The lateral extent of high backscatter intensity reflectors south of the Diamantina reveals the expansiveness of exposed igneous rocks that resulted from seafloor spreading. Volcanic features, including off-axis volcanoes and leaky transforms are also interpreted as high backscatter anomalies in the tectonic speading fabric to the north of the Geelvinck fracture zone, towards the southern extent of the dataset. These and other examples show that by integrating the entire suite of data collected by MBES systems, more detailed interpretations of the geologic processes that shaped the seafloor may be gained than by examination of bathymetry alone.

  6. Analysis of Naïve Bayes Algorithm for Email Spam Filtering across Multiple Datasets

    Science.gov (United States)

    Fitriah Rusland, Nurul; Wahid, Norfaradilla; Kasim, Shahreen; Hafit, Hanayanti

    2017-08-01

    E-mail spam continues to become a problem on the Internet. Spammed e-mail may contain many copies of the same message, commercial advertisement or other irrelevant posts like pornographic content. In previous research, different filtering techniques are used to detect these e-mails such as using Random Forest, Naïve Bayesian, Support Vector Machine (SVM) and Neutral Network. In this research, we test Naïve Bayes algorithm for e-mail spam filtering on two datasets and test its performance, i.e., Spam Data and SPAMBASE datasets [8]. The performance of the datasets is evaluated based on their accuracy, recall, precision and F-measure. Our research use WEKA tool for the evaluation of Naïve Bayes algorithm for e-mail spam filtering on both datasets. The result shows that the type of email and the number of instances of the dataset has an influence towards the performance of Naïve Bayes.

  7. A review of multiple stressor studies that include ionising radiation

    International Nuclear Information System (INIS)

    Vanhoudt, Nathalie; Vandenhove, Hildegarde; Real, Almudena; Bradshaw, Clare; Stark, Karolina

    2012-01-01

    Studies were reviewed that investigated the combined effects of ionising radiation and other stressors on non-human biota. The aim was to determine the state of research in this area of science, and determine if a review of the literature might permit a gross generalization as to whether the combined effects of multi-stressors and radiation are fundamentally additive, synergistic or antagonistic. A multiple stressor database was established for different organism groups. Information was collected on species, stressors applied and effects evaluated. Studies were mostly laboratory based and investigated two-component mixtures. Interactions declared positive occurred in 58% of the studies, while 26% found negative interactions. Interactions were dependent on dose/concentration, on organism's life stage and exposure time and differed among endpoints. Except for one study, none of the studies predicted combined effects following Concentration Addition or Independent Action, and hence, no justified conclusions can be made about synergism or antagonism. - This review on multiple stressor studies involving radiation, highlights that most experimental designs used did not allow to deduce the nature of the interactive effects.

  8. SAR image dataset of military ground targets with multiple poses for ATR

    Science.gov (United States)

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  9. Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

    Directory of Open Access Journals (Sweden)

    Karacali Bilge

    2007-10-01

    Full Text Available Abstract Background Independently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype against profiles correlated with relapse-free status. Prediction error of profiles identified with supervised univariate feature selection algorithms were compared to profiles selected randomly from a all genes on the microarray platform and b a list of known disease-related genes (a priori selection. We also determined the relevance of expression profiles on test arrays from independent datasets, measured on either the same or different microarray platforms. Results Highly discriminative expression profiles were produced on both simulated gene expression data and expression data from breast cancer and lymphoma datasets on the basis of ER and BCL-6 expression, respectively. Use of relapse-free status to identify profiles for prognosis prediction resulted in poorly discriminative decision rules. Supervised feature selection resulted in more accurate classifications than random or a priori selection, however, the difference in prediction error decreased as the number of features increased. These results held when decision rules were applied across-datasets to samples profiled on the same microarray platform. Conclusion Our results show that many gene sets predict molecular phenotypes accurately. Given this, expression profiles identified using different training datasets should be expected to show little agreement. In addition, we demonstrate the difficulty in predicting relapse directly from microarray data using supervised machine

  10. Multiple imaging procedures including MRI for the bladder cancer

    International Nuclear Information System (INIS)

    Mikata, Noriharu; Suzuki, Makoto; Takeuchi, Takumi; Kunisawa, Yositaka; Fukutani, Keiko; Kawabe, Kazuki

    1986-01-01

    Endoscopic photography, double contrast cystography, transurethral echography, X-ray CT scan, and MRI (magnetic resonance imaging) were utilized for the staging diagnosis of the four patients with carcinoma of the bladder. In the first case, a 70-year-old man, since all of the five imaging procedures suggested a superficial and pedunculated tumor, his bladder cancer was considered T1. The classification of stage T3 carcinoma was made for the second 86-year-old male. Because all of his imaging examinations showed a tumor infiltrating deep muscle and penetrating the bladder wall. The third case was a 36-year-old male. His clinical stage was diagnosed as T2 or T3a by cystophotography, double contrast cystogram, ultrasonography, and X-ray CT scan. However, MRI showed only thickened bladder wall and the infiltrating tumor could not be distinguished from the hypertrophic wall. The last patient, a 85-year-old female, had a smaller Ta cancer. Her double contrast cystography revealed the small tumor at the lateral bladder wall. But, the tumor could not be detected by transaxial, sagittal and coronal scans. Multiple imaging procedures combining MRI and staging diagnosis of the bladder carcinoma were discussed. (author)

  11. Quantifying selective reporting and the Proteus phenomenon for multiple datasets with similar bias.

    Directory of Open Access Journals (Sweden)

    Thomas Pfeiffer

    2011-03-01

    Full Text Available Meta-analyses play an important role in synthesizing evidence from diverse studies and datasets that address similar questions. A major obstacle for meta-analyses arises from biases in reporting. In particular, it is speculated that findings which do not achieve formal statistical significance are less likely reported than statistically significant findings. Moreover, the patterns of bias can be complex and may also depend on the timing of the research results and their relationship with previously published work. In this paper, we present an approach that is specifically designed to analyze large-scale datasets on published results. Such datasets are currently emerging in diverse research fields, particularly in molecular medicine. We use our approach to investigate a dataset on Alzheimer's disease (AD that covers 1167 results from case-control studies on 102 genetic markers. We observe that initial studies on a genetic marker tend to be substantially more biased than subsequent replications. The chances for initial, statistically non-significant results to be published are estimated to be about 44% (95% CI, 32% to 63% relative to statistically significant results, while statistically non-significant replications have almost the same chance to be published as statistically significant replications (84%; 95% CI, 66% to 107%. Early replications tend to be biased against initial findings, an observation previously termed Proteus phenomenon: The chances for non-significant studies going in the same direction as the initial result are estimated to be lower than the chances for non-significant studies opposing the initial result (73%; 95% CI, 55% to 96%. Such dynamic patterns in bias are difficult to capture by conventional methods, where typically simple publication bias is assumed to operate. Our approach captures and corrects for complex dynamic patterns of bias, and thereby helps generating conclusions from published results that are more robust

  12. Solution of neutron slowing down equation including multiple inelastic scattering

    International Nuclear Information System (INIS)

    El-Wakil, S.A.; Saad, A.E.

    1977-01-01

    The present work is devoted the presentation of an analytical method for the calculation of elastically and inelastically slowed down neutrons in an infinite non absorbing homogeneous medium. On the basis of the Central limit theory (CLT) and the integral transform technique the slowing down equation including inelastic scattering in terms of the Green function of elastic scattering is solved. The Green function is decomposed according to the number of collisions. A formula for the flux at any lethargy O (u) after any number of collisions is derived. An equation for the asymptotic flux is also obtained

  13. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Directory of Open Access Journals (Sweden)

    Nilotpal Chowdhury

    Full Text Available Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis.The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets.Four microarray series (having 742 patients were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA.Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed.To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and

  14. Association of Protein Translation and Extracellular Matrix Gene Sets with Breast Cancer Metastasis: Findings Uncovered on Analysis of Multiple Publicly Available Datasets Using Individual Patient Data Approach.

    Science.gov (United States)

    Chowdhury, Nilotpal; Sapru, Shantanu

    2015-01-01

    Microarray analysis has revolutionized the role of genomic prognostication in breast cancer. However, most studies are single series studies, and suffer from methodological problems. We sought to use a meta-analytic approach in combining multiple publicly available datasets, while correcting for batch effects, to reach a more robust oncogenomic analysis. The aim of the present study was to find gene sets associated with distant metastasis free survival (DMFS) in systemically untreated, node-negative breast cancer patients, from publicly available genomic microarray datasets. Four microarray series (having 742 patients) were selected after a systematic search and combined. Cox regression for each gene was done for the combined dataset (univariate, as well as multivariate - adjusted for expression of Cell cycle related genes) and for the 4 major molecular subtypes. The centre and microarray batch effects were adjusted by including them as random effects variables. The Cox regression coefficients for each analysis were then ranked and subjected to a Gene Set Enrichment Analysis (GSEA). Gene sets representing protein translation were independently negatively associated with metastasis in the Luminal A and Luminal B subtypes, but positively associated with metastasis in Basal tumors. Proteinaceous extracellular matrix (ECM) gene set expression was positively associated with metastasis, after adjustment for expression of cell cycle related genes on the combined dataset. Finally, the positive association of the proliferation-related genes with metastases was confirmed. To the best of our knowledge, the results depicting mixed prognostic significance of protein translation in breast cancer subtypes are being reported for the first time. We attribute this to our study combining multiple series and performing a more robust meta-analytic Cox regression modeling on the combined dataset, thus discovering 'hidden' associations. This methodology seems to yield new and interesting

  15. Investigating water budget dynamics in 18 river basins across the Tibetan Plateau through multiple datasets

    Science.gov (United States)

    Liu, Wenbin; Sun, Fubao; Li, Yanzhong; Zhang, Guoqing; Sang, Yan-Fang; Lim, Wee Ho; Liu, Jiahong; Wang, Hong; Bai, Peng

    2018-01-01

    The dynamics of basin-scale water budgets over the Tibetan Plateau (TP) are not well understood nowadays due to the lack of in situ hydro-climatic observations. In this study, we investigate the seasonal cycles and trends of water budget components (e.g. precipitation P, evapotranspiration ET and runoff Q) in 18 TP river basins during the period 1982-2011 through the use of multi-source datasets (e.g. in situ observations, satellite retrievals, reanalysis outputs and land surface model simulations). A water balance-based two-step procedure, which considers the changes in basin-scale water storage on the annual scale, is also adopted to calculate actual ET. The results indicated that precipitation (mainly snowfall from mid-autumn to next spring), which are mainly concentrated during June-October (varied among different monsoons-impacted basins), was the major contributor to the runoff in TP basins. The P, ET and Q were found to marginally increase in most TP basins during the past 30 years except for the upper Yellow River basin and some sub-basins of Yalong River, which were mainly affected by the weakening east Asian monsoon. Moreover, the aridity index (PET/P) and runoff coefficient (Q/P) decreased slightly in most basins, which were in agreement with the warming and moistening climate in the Tibetan Plateau. The results obtained demonstrated the usefulness of integrating multi-source datasets to hydrological applications in the data-sparse regions. More generally, such an approach might offer helpful insights into understanding the water and energy budgets and sustainability of water resource management practices of data-sparse regions in a changing environment.

  16. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  17. A Dataset for Three-Dimensional Distribution of 39 Elements Including Plant Nutrients and Other Metals and Metalloids in the Soils of a Forested Headwater Catchment.

    Science.gov (United States)

    Wu, B; Wiekenkamp, I; Sun, Y; Fisher, A S; Clough, R; Gottselig, N; Bogena, H; Pütz, T; Brüggemann, N; Vereecken, H; Bol, R

    2017-11-01

    Quantification and evaluation of elemental distribution in forested ecosystems are key requirements to understand element fluxes and their relationship with hydrological and biogeochemical processes in the system. However, datasets supporting such a study on the catchment scale are still limited. Here we provide a dataset comprising spatially highly resolved distributions of 39 elements in soil profiles of a small forested headwater catchment in western Germany () to gain a holistic picture of the state and fluxes of elements in the catchment. The elements include both plant nutrients and other metals and metalloids that were predominately derived from lithospheric or anthropogenic inputs, thereby allowing us to not only capture the nutrient status of the catchment but to also estimate the functional development of the ecosystem. Soil samples were collected at high lateral resolution (≤60 m), and element concentrations were determined vertically for four soil horizons (L/Of, Oh, A, B). From this, a three-dimensional view of the distribution of these elements could be established with high spatial resolution on the catchment scale in a temperate natural forested ecosystem. The dataset can be combined with other datasets and studies of the TERENO (Terrestrial Environmental Observatories) Data Discovery Portal () to reveal elemental fluxes, establish relations between elements and other soil properties, and/or as input for modeling elemental cycling in temperate forested ecosystems. Copyright © by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America, Inc.

  18. Inferring Ice Thickness from a Glacier Dynamics Model and Multiple Surface Datasets.

    Science.gov (United States)

    Guan, Y.; Haran, M.; Pollard, D.

    2017-12-01

    The future behavior of the West Antarctic Ice Sheet (WAIS) may have a major impact on future climate. For instance, ice sheet melt may contribute significantly to global sea level rise. Understanding the current state of WAIS is therefore of great interest. WAIS is drained by fast-flowing glaciers which are major contributors to ice loss. Hence, understanding the stability and dynamics of glaciers is critical for predicting the future of the ice sheet. Glacier dynamics are driven by the interplay between the topography, temperature and basal conditions beneath the ice. A glacier dynamics model describes the interactions between these processes. We develop a hierarchical Bayesian model that integrates multiple ice sheet surface data sets with a glacier dynamics model. Our approach allows us to (1) infer important parameters describing the glacier dynamics, (2) learn about ice sheet thickness, and (3) account for errors in the observations and the model. Because we have relatively dense and accurate ice thickness data from the Thwaites Glacier in West Antarctica, we use these data to validate the proposed approach. The long-term goal of this work is to have a general model that may be used to study multiple glaciers in the Antarctic.

  19. Zoning Districts, The zoning districts dataset includes the towns in Manitowoc County, WI that have adopted the county's zoning ordinance., Published in 2013, 1:2400 (1in=200ft) scale, Manitowoc County Government.

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — Zoning Districts dataset current as of 2013. The zoning districts dataset includes the towns in Manitowoc County, WI that have adopted the county's zoning ordinance..

  20. Model for CO2 leakage including multiple geological layers and multiple leaky wells.

    Science.gov (United States)

    Nordbotten, Jan M; Kavetski, Dmitri; Celia, Michael A; Bachu, Stefan

    2009-02-01

    Geological storage of carbon dioxide (CO2) is likely to be an integral component of any realistic plan to reduce anthropogenic greenhouse gas emissions. In conjunction with large-scale deployment of carbon storage as a technology, there is an urgent need for tools which provide reliable and quick assessments of aquifer storage performance. Previously, abandoned wells from over a century of oil and gas exploration and production have been identified as critical potential leakage paths. The practical importance of abandoned wells is emphasized by the correlation of heavy CO2 emitters (typically associated with industrialized areas) to oil and gas producing regions in North America. Herein, we describe a novel framework for predicting the leakage from large numbers of abandoned wells, forming leakage paths connecting multiple subsurface permeable formations. The framework is designed to exploit analytical solutions to various components of the problem and, ultimately, leads to a grid-free approximation to CO2 and brine leakage rates, as well as fluid distributions. We apply our model in a comparison to an established numerical solverforthe underlying governing equations. Thereafter, we demonstrate the capabilities of the model on typical field data taken from the vicinity of Edmonton, Alberta. This data set consists of over 500 wells and 7 permeable formations. Results show the flexibility and utility of the solution methods, and highlight the role that analytical and semianalytical solutions can play in this important problem.

  1. 42 CFR 137.327 - May multiple projects be included in a single construction project agreement?

    Science.gov (United States)

    2010-10-01

    ... 42 Public Health 1 2010-10-01 2010-10-01 false May multiple projects be included in a single construction project agreement? 137.327 Section 137.327 Public Health PUBLIC HEALTH SERVICE, DEPARTMENT OF...-GOVERNANCE Construction Project Assumption Process § 137.327 May multiple projects be included in a single...

  2. Plasma Cell Neoplasms (Including Multiple Myeloma)—Health Professional Version

    Science.gov (United States)

    There are several types of plasma cell neoplasms, including monoclonal gammopathy of undetermined significance (MGUS), isolated plasmacytoma of the bone, extramedullary plasmacytoma, and multiple myeloma. Find evidence-based information on plasma cell neoplasms treatment, research, and statistics.

  3. Importing, Working With, and Sharing Microstructural Data in the StraboSpot Digital Data System, Including an Example Dataset from the Pilbara Craton, Western Australia.

    Science.gov (United States)

    Roberts, N.; Cunningham, H.; Snell, A.; Newman, J.; Tikoff, B.; Chatzaras, V.; Walker, J. D.; Williams, R. T.

    2017-12-01

    There is currently no repository where a geologist can survey microstructural datasets that have been collected from a specific field area or deformation experiment. New development of the StraboSpot digital data system provides a such a repository as well as visualization and analysis tools. StraboSpot is a graph database that allows field geologists to share primary data and develop new types of scientific questions. The database can be accessed through: 1) a field-based mobile application that runs on iOS and Android mobile devices; and 2) a desktop system. We are expanding StraboSpot to include the handling of a variety of microstructural data types. Presented here is the detailed vocabulary and logic used for the input of microstructural data, and how this system operates with the anticipated workflow of users. Microstructural data include observations and interpretations from photomicrographs, scanning electron microscope images, electron backscatter diffraction, and transmission electron microscopy data. The workflow for importing microstructural data into StraboSpot is organized into the following tabs: Images, Mineralogy & Composition; Sedimentary; Igneous; Metamorphic; Fault Rocks; Grain size & configuration; Crystallographic Preferred Orientation; Reactions; Geochronology; Relationships; and Interpretations. Both the sample and the thin sections are also spots. For the sample spot, the user can specify whether a sample is experimental or natural; natural samples are inherently linked to their field context. For the thin section (sub-sample) spot, the user can select between different options for sample preparation, geometry, and methods. A universal framework for thin section orientation is given, which allows users to overlay different microscope images of the same area and keeps georeferenced orientation. We provide an example dataset of field and microstructural data from the Mt Edgar dome, a granitic complex in the Paleoarchean East Pilbara craton

  4. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Yu-Wei [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Simmons, Blake A. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Singer, Steven W. [Joint BioEnergy Inst. (JBEI), Emeryville, CA (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2015-10-29

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning a single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.

  5. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    Science.gov (United States)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  6. Monitoring the Lavina di Roncovetro (RE, Italy) landslide by integrating traditional monitoring systems and multiple high-resolution topographic datasets

    Science.gov (United States)

    Fornaciai, Alessandro; Favalli, Massimiliano; Gigli, Giovanni; Nannipieri, Luca; Mucchi, Lorenzo; Intieri, Emanuele; Agostini, Andrea; Pizziolo, Marco; Bertolini, Giovanni; Trippi, Federico; Casagli, Nicola; Schina, Rosa; Carnevale, Ennio

    2016-04-01

    Roncovetro Landslide were generated at different times. The 3D models are then georeferenced and the digital elevation models (DEMs) created. By comparing the obtained DEMs, changes in the investigated area were detected and the sediment volumes, as well as the 3D displacement at the most active parts of the landslide quantified. In this work, we test the performance of the SFM techniques applied on active landslide by comparing them with the traditional monitoring systems, highlighting the strengths and weaknesses of both methods. In addition, we show the preliminary results obtained integrating the traditional monitoring systems and the multiple high-resolution topographic datasets, over a period of more than one year, used for investigating the spatial and the temporal evolution of the upper sector of the Roncovetro landslide.

  7. Exact multiple scattering theory of two-nucleus collisions including the Pauli principle

    International Nuclear Information System (INIS)

    Gurvitz, S.A.

    1981-01-01

    Exact equations for two-nucleus scattering are derived in which the effects of the Pauli principle are fully included. Our method exploits a modified equation for the scattering of two identical nucleons, which is obtained at the beginning. Considering proton-nucleus scattering we found that the resulting amplitude has two components, one resembling a multiple scattering series for distinguishable particles, and the other a distorted (A-1) nucleon cluster exchange. For elastic pA scattering the multiple scattering amplitude is found in the form of an optical potential expansion. We show that the Kerman-McManus-Thaler theory of the optical potential could be easily modified to include the effects of antisymmetrization of the projectile with the target nucleons. Nucleus-nucleus scattering is studied first for distinguishable target and beam nucleus. Afterwards the Pauli principle is included, where only the case of deuteron-nucleus scattering is discussed in detail. The resulting amplitude has four components. Two of them correspond to modified multiple scattering expansions and the others are distorted (A-1)- and (A-2)- nucleon cluster exchange. The result for d-A scattering is extended to the general case of nucleus-nucleus scattering. The equations are simple to use and as such constitute an improvement over existing schemes

  8. Global estimates of CO sources with high resolution by adjoint inversion of multiple satellite datasets (MOPITT, AIRS, SCIAMACHY, TES

    Directory of Open Access Journals (Sweden)

    M. Kopacz

    2010-02-01

    Full Text Available We combine CO column measurements from the MOPITT, AIRS, SCIAMACHY, and TES satellite instruments in a full-year (May 2004–April 2005 global inversion of CO sources at 4°×5° spatial resolution and monthly temporal resolution. The inversion uses the GEOS-Chem chemical transport model (CTM and its adjoint applied to MOPITT, AIRS, and SCIAMACHY. Observations from TES, surface sites (NOAA/GMD, and aircraft (MOZAIC are used for evaluation of the a posteriori solution. Using GEOS-Chem as a common intercomparison platform shows global consistency between the different satellite datasets and with the in situ data. Differences can be largely explained by different averaging kernels and a priori information. The global CO emission from combustion as constrained in the inversion is 1350 Tg a−1. This is much higher than current bottom-up emission inventories. A large fraction of the correction results from a seasonal underestimate of CO sources at northern mid-latitudes in winter and suggests a larger-than-expected CO source from vehicle cold starts and residential heating. Implementing this seasonal variation of emissions solves the long-standing problem of models underestimating CO in the northern extratropics in winter-spring. A posteriori emissions also indicate a general underestimation of biomass burning in the GFED2 inventory. However, the tropical biomass burning constraints are not quantitatively consistent across the different datasets.

  9. Classification of Small-Scale Eucalyptus Plantations Based on NDVI Time Series Obtained from Multiple High-Resolution Datasets

    Directory of Open Access Journals (Sweden)

    Hailang Qiao

    2016-02-01

    Full Text Available Eucalyptus, a short-rotation plantation, has been expanding rapidly in southeast China in recent years owing to its short growth cycle and high yield of wood. Effective identification of eucalyptus, therefore, is important for monitoring land use changes and investigating environmental quality. For this article, we used remote sensing images over 15 years (one per year with a 30-m spatial resolution, including Landsat 5 thematic mapper images, Landsat 7-enhanced thematic mapper images, and HJ 1A/1B images. These data were used to construct a 15-year Normalized Difference Vegetation Index (NDVI time series for several cities in Guangdong Province, China. Eucalyptus reference NDVI time series sub-sequences were acquired, including one-year-long and two-year-long growing periods, using invested eucalyptus samples in the study region. In order to compensate for the discontinuity of the NDVI time series that is a consequence of the relatively coarse temporal resolution, we developed an inverted triangle area methodology. Using this methodology, the images were classified on the basis of the matching degree of the NDVI time series and two reference NDVI time series sub-sequences during the growing period of the eucalyptus rotations. Three additional methodologies (Bounding Envelope, City Block, and Standardized Euclidian Distance were also tested and used as a comparison group. Threshold coefficients for the algorithms were adjusted using commission–omission error criteria. The results show that the triangle area methodology out-performed the other methodologies in classifying eucalyptus plantations. Threshold coefficients and an optimal discriminant function were determined using a mosaic photograph that had been taken by an unmanned aerial vehicle platform. Good stability was found as we performed further validation using multiple-year data from the high-resolution Gaofen Satellite 1 (GF-1 observations of larger regions. Eucalyptus planting dates

  10. Induced Systemic Tolerance to Multiple Stresses Including Biotic and Abiotic Factors by Rhizobacteria

    Directory of Open Access Journals (Sweden)

    Sung-Je Yoo

    2017-06-01

    Full Text Available Recently, global warming and drastic climate change are the greatest threat to the world. The climate change can affect plant productivity by reducing plant adaptation to diverse environments including frequent high temperature; worsen drought condition and increased pathogen transmission and infection. Plants have to survive in this condition with a variety of biotic (pathogen/pest attack and abiotic stress (salt, high/low temperature, drought. Plants can interact with beneficial microbes including plant growth-promoting rhizobacteria, which help plant mitigate biotic and abiotic stress. This overview presents that rhizobacteria plays an important role in induced systemic resistance (ISR to biotic stress or induced systemic tolerance (IST to abiotic stress condition; bacterial determinants related to ISR and/or IST. In addition, we describe effects of rhizobacteria on defense/tolerance related signal pathway in plants. We also review recent information including plant resistance or tolerance against multiple stresses (bioticabiotic. We desire that this review contribute to expand understanding and knowledge on the microbial application in a constantly varying agroecosystem, and suggest beneficial microbes as one of alternative environment-friendly application to alleviate multiple stresses.

  11. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  12. Cerebellar abnormalities contribute to disability including cognitive impairment in multiple sclerosis.

    Directory of Open Access Journals (Sweden)

    Katrin Weier

    Full Text Available The cerebellum is known to be involved not only in motor but also cognitive and affective processes. Structural changes in the cerebellum in relation to cognitive dysfunction are an emerging topic in the field of neuro-psychiatric disorders. In Multiple Sclerosis (MS cerebellar motor and cognitive dysfunction occur in parallel, early in the onset of the disease, and the cerebellum is one of the predilection sites of atrophy. This study is aimed at determining the relationship between cerebellar volumes, clinical cerebellar signs, cognitive functioning and fatigue in MS. Cerebellar volumetry was conducted using T1-weighted MPRAGE magnetic resonance imaging of 172 MS patients. All patients underwent a clinical and brief neuropsychological assessment (information processing speed, working memory, including fatigue testing. Patients with and without cerebellar signs differed significantly regarding normalized cerebellar total volume (nTCV, normalized brain volume (nBV and whole brain T2 lesion volume (LV. Patients with cerebellar dysfunction likewise performed worse in cognitive tests. A regression analysis indicated that age and nTCV explained 26.3% of the variance in SDMT (symbol digit modalities test performance. However, only age, T2 LV and nBV remained predictors in the full model (r(2 = 0.36. The full model for the prediction of PASAT (Paced Auditory Serial Addition Test scores (r(2 = 0.23 included age, cerebellar and T2 LV. In the case of fatigue, only age and nBV (r(2 = 0.17 emerged as significant predictors. These data support the view that cerebellar abnormalities contribute to disability, including cognitive impairment in MS. However, this contribution does not seem to be independent of, and may even be dominated by wider spread MS pathology as reflected by nBV and T2 LV.

  13. Historical Datasets Support Genomic Selection Models for the Prediction of Cotton Fiber Quality Phenotypes Across Multiple Environments.

    Science.gov (United States)

    Gapare, Washington; Liu, Shiming; Conaty, Warren; Zhu, Qian-Hao; Gillespie, Vanessa; Llewellyn, Danny; Stiller, Warwick; Wilson, Iain

    2018-03-20

    Genomic selection (GS) has successfully been used in plant breeding to improve selection efficiency and reduce breeding time and cost. However, there has not been a study to evaluate GS prediction models that may be used for predicting cotton breeding lines across multiple environments. In this study, we evaluated the performance of Bayes Ridge Regression, BayesA, BayesB, BayesC and Reproducing Kernel Hilbert Spaces regression models. We then extended the single-site GS model to accommodate genotype × environment interaction (G×E) in order to assess the merits of multi- over single-environment models in a practical breeding and selection context in cotton, a crop for which this has not previously been evaluated. Our study was based on a population of 215 upland cotton ( Gossypium hirsutum ) breeding lines which were evaluated for fiber length and strength at multiple locations in Australia and genotyped with 13,330 single nucleotide polymorphic (SNP) markers. BayesB, which assumes unique variance for each marker and a proportion of markers to have large effects, while most other markers have zero effect, was the preferred model. GS accuracy for fiber length based on a single-site model varied across sites, ranging from 0.27 to 0.77 (mean = 0.38), while that of fiber strength ranged from 0.19 to 0.58 (mean = 0.35) using randomly selected sub-populations as the training population. Prediction accuracies from the M×E model were higher than those for single-site and across-site models, with an average accuracy of 0.71 and 0.59 for fiber length and strength, respectively. The use of the M×E model could therefore identify which breeding lines have effects that are stable across environments and which ones are responsible for G×E and so reduce the amount of phenotypic screening required in cotton breeding programs to identify adaptable genotypes. Copyright © 2018, G3: Genes, Genomes, Genetics.

  14. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  15. Thermoelectric material including a multiple transition metal-doped type I clathrate crystal structure

    Science.gov (United States)

    Yang, Jihui [Lakeshore, CA; Shi, Xun [Troy, MI; Bai, Shengqiang [Shanghai, CN; Zhang, Wenqing [Shanghai, CN; Chen, Lidong [Shanghai, CN; Yang, Jiong [Shanghai, CN

    2012-01-17

    A thermoelectric material includes a multiple transition metal-doped type I clathrate crystal structure having the formula A.sub.8TM.sub.y.sub.1.sup.1TM.sub.y.sub.2.sup.2 . . . TM.sub.y.sub.n.sup.nM.sub.zX.sub.46-y.sub.1.sub.-y.sub.2.sub.- . . . -y.sub.n.sub.-z. In the formula, A is selected from the group consisting of barium, strontium, and europium; X is selected from the group consisting of silicon, germanium, and tin; M is selected from the group consisting of aluminum, gallium, and indium; TM.sup.1, TM.sup.2, and TM.sup.n are independently selected from the group consisting of 3d, 4d, and 5d transition metals; and y.sub.1, y.sub.2, y.sub.n and Z are actual compositions of TM.sup.1, TM.sup.2, TM.sup.n, and M, respectively. The actual compositions are based upon nominal compositions derived from the following equation: z=8q.sub.A-|.DELTA.q.sub.1|y.sub.1-|.DELTA.q.sub.2|y.sub.2- . . . -|.DELTA.q.sub.n|y.sub.n, wherein q.sub.A is a charge state of A, and wherein .DELTA.q.sub.1, .DELTA.q.sub.2, .DELTA.q.sub.n are, respectively, the nominal charge state of the first, second, and n-th TM.

  16. Seismic and thermal structure of the crust and uppermost mantle beneath Antarctica from inversion of multiple seismic datasets

    Science.gov (United States)

    Wiens, D.; Shen, W.; Anandakrishnan, S.; Aster, R. C.; Gerstoft, P.; Bromirski, P. D.; Dalziel, I.; Hansen, S. E.; Heeszel, D.; Huerta, A. D.; Nyblade, A.; Stephen, R. A.; Wilson, T. J.; Winberry, J. P.; Stern, T. A.

    2017-12-01

    Since the last decade of the 20th century, over 200 broadband seismic stations have been deployed across Antarctica (e.g., temporary networks such as TAMSEIS, AGAP/GAMSEIS, POLENET/ANET, TAMNNET and RIS/DRIS by U.S. geoscientists as well as stations deployed by Japan, Britain, China, Norway, and other countries). In this presentation, we discuss our recent efforts to build reference crustal and uppermost mantle shear velocity (Vs) and thermal models for continental Antarctica based on those seismic arrays. By combing the high resolution Rayleigh wave dispersion maps derived from both ambient noise and teleseismic earthquakes, together with P receiver function waveforms, we develop a 3-D Vs model for the crust and uppermost mantle beneath Central and West Antarctica to a depth of 200 km. Additionally, using this 3-D seismic model to constrain the crustal structure, we re-invert for the upper mantle thermal structure using the surface wave data within a thermodynamic framework and construct a 3-D thermal model for the Antarctic lithosphere. The final product, a high resolution thermal model together with associated uncertainty estimates from the Monte Carlo inversion, allows us to derive lithospheric thickness and surface heat flux maps for much of the continent. West Antarctica shows a much thinner lithosphere ( 50-90 km) than East Antarctica ( 130-230 km), with a sharp transition along the Transantarctic Mountains (TAM). A variety of geological features, including a slower/hotter but highly heterogeneous West Antarctica and a much faster/colder East Antarctic craton, are present in the 3-D seismic/thermal models. Notably, slow seismic velocities observed in the uppermost mantle beneath the southern TAM are interpreted as a signature of lithospheric foundering and replacement with hot asthenosphere. The high resolution image of these features from the 3-D models helps further investigation of the dynamic state of Antarctica's lithosphere and underlying asthenosphere

  17. Dataset for Phase I randomized clinical trial for safety and tolerability of GET 73 in single and repeated ascending doses including preliminary pharmacokinetic parameters

    Directory of Open Access Journals (Sweden)

    Carolina L. Haass-Koffler

    2017-12-01

    Full Text Available The data in this article outline the methods used for the administration of GET 73 in the first time-in-human manuscript entitled “Phase I randomized clinical trial for the safety, tolerability and preliminary pharmacokinetics of the mGluR5 negative allosteric modulator GET 73 following single and repeated doses in healthy male volunteers” (Haass-Koffler et al., 2017 [1]. Data sets are provided in two different manners. The first series of tables provided includes procedural information about the experiments conducted. The next series of tables provided includes Pharmacokinetic (PK parameters for GET 73 and its main metabolite MET 2. This set of data is comprised by two experiments: Experiment 1 references a single ascending dose administration of GET 73 and Experiment 2 references a repeated ascending dose administration of GET 73. Keywords: Glutamate receptor subtype 5 (mGlu5, Allosteric modulator, GET 73, Safety, Tolerability

  18. Dataset for Phase I randomized clinical trial for safety and tolerability of GET 73 in single and repeated ascending doses including preliminary pharmacokinetic parameters.

    Science.gov (United States)

    Haass-Koffler, Carolina L; Goodyear, Kimberly; Long, Victoria M; Tran, Harrison H; Loche, Antonella; Cacciaglia, Roberto; Swift, Robert M; Leggio, Lorenzo

    2017-12-01

    The data in this article outline the methods used for the administration of GET 73 in the first time-in-human manuscript entitled "Phase I randomized clinical trial for the safety, tolerability and preliminary pharmacokinetics of the mGluR5 negative allosteric modulator GET 73 following single and repeated doses in healthy male volunteers" (Haass-Koffler et al., 2017) [1]. Data sets are provided in two different manners. The first series of tables provided includes procedural information about the experiments conducted. The next series of tables provided includes Pharmacokinetic (PK) parameters for GET 73 and its main metabolite MET 2. This set of data is comprised by two experiments: Experiment 1 references a single ascending dose administration of GET 73 and Experiment 2 references a repeated ascending dose administration of GET 73.

  19. Plasma Cell Neoplasms (Including Multiple Myeloma) Treatment (PDQ®)—Patient Version

    Science.gov (United States)

    Plasma cell neoplasms occur when abnormal plasma cells or myeloma cells form tumors in the bones or soft tissues of the body. Multiple myeloma, plasmacytoma, lymphoplasmacytic lymphoma, and monoclonal gammopathy of undetermined significance (MGUS) are different types of plasma cell neoplasms. Find out about risk factors, symptoms, diagnostic tests, prognosis, and treatment for these diseases.

  20. Automatic treatment of multiple wound coils in 3D finite element problems including multiply connected regions

    Energy Technology Data Exchange (ETDEWEB)

    Leonard, P.J.; Lai, H.C.; Eastham, J.F.; Al-Akayshee, Q.H. [Univ. of Bath (United Kingdom)

    1996-05-01

    This paper describes an efficient scheme for incorporating multiple wire wound coils into 3D finite element models. The scheme is based on the magnetic scalar representation with an additional basis for each coil. There are no restrictions on the topology of coils with respect to ferromagnetic and conductor regions. Reduced scalar regions and cuts are automatically generated.

  1. A Case Study of Tack Tiles[R] Literacy Instruction for a Student with Multiple Disabilities Including Congenital Blindness

    Science.gov (United States)

    Klenk, Jessicia A.; Pufpaff, Lisa A.

    2011-01-01

    Research on literacy instruction for students with multiple disabilities is limited. Empirical research on braille instruction for students with multiple disabilities that include congenital blindness is virtually nonexistent. This case study offers initial insight into possible methods of early braille literacy instruction for a student with…

  2. Clinical problems of multiple primary cancers including head and neck cancers. From the viewpoint of radiotherapy

    International Nuclear Information System (INIS)

    Nishio, Masamichi; Myojin, Miyako; Nishiyama, Noriaki; Taguchi, Hiroshi; Takagi, Masaru; Tanaka, Katsuhiko

    2003-01-01

    A total of 2144 head and neck cancers were treated by radiotherapy at the National Sapporo Hospital between 1974 and 2001. Of these, 313 (14.6%) were found to have other primary cancers besides head and neck cancer, in which double cancers were 79% and triple or more cancers were 21%. Frequency according to primary site of the first head and neck cancer was oral cavity: 107/603 (17.7%), epipharynx cancer: 7/117 (6.0%), oropharyngeal cancer: 63/257 (24.5%), hypopharyngeal cancer: 65/200 (32.5%), laryngeal cancer: 114/558 (20.4%), and nose/paranasal sinus: 4.9% respectively. Esophageal cancer, head and neck cancer, lung cancer and gastric cancer were very frequent as other primary sites combined with the head and neck. The first onset region was the head and neck in 233 out of 313 cases with multiple primary cancers. The five-year survival rate from the onset of head and neck cancers is 52%, 10-year: 30%, and 5-year cause-specific survival rate 82%, and 10-year: 78%, respectively. The treatment possibilities in multiple primary cancers tend to be limited because the treatment areas are sometimes overlapped. New approaches to the treatment of multiple primary cancers should be considered in the future. (author)

  3. Multiple shooting applied to robust reservoir control optimization including output constraints on coherent risk measures

    DEFF Research Database (Denmark)

    Codas, Andrés; Hanssen, Kristian G.; Foss, Bjarne

    2017-01-01

    The production life of oil reservoirs starts under significant uncertainty regarding the actual economical return of the recovery process due to the lack of oil field data. Consequently, investors and operators make management decisions based on a limited and uncertain description of the reservoir....... In this work, we propose a new formulation for robust optimization of reservoir well controls. It is inspired by the multiple shooting (MS) method which permits a broad range of parallelization opportunities and output constraint handling. This formulation exploits coherent risk measures, a concept...

  4. Multiple Crack Growth Prediction in AA2024-T3 Friction Stir Welded Joints, Including Manufacturing Effects

    DEFF Research Database (Denmark)

    Carlone, Pierpaolo; Citarella, Roberto; Sonne, Mads Rostgaard

    2016-01-01

    A great deal of attention is currently paid by several industries toward the friction stir welding process to realize lightweight structures. Within this aim, the realistic prediction of fatigue behavior of welded assemblies is a key factor. In this work an integrated finite element method - dual...... boundary element method (FEM-DBEM) procedure, coupling the welding process simulation to the subsequent crack growth assessment, is proposed and applied to simulate multiple crack propagation, with allowance for manufacturing effects. The friction stir butt welding process of the precipitation hardened AA...... on a notched specimen. The whole procedure was finally tested comparing simulation outcomes with experimental data. The good agreement obtained highlights the predictive capability of the method. The influence of the residual stress distribution on crack growth and the mutual interaction between propagating...

  5. The dyad palindromic glutathione transferase P enhancer binds multiple factors including AP1.

    Science.gov (United States)

    Diccianni, M B; Imagawa, M; Muramatsu, M

    1992-10-11

    Glutathione Transferase P (GST-P) gene expression is dominantly regulated by an upstream enhancer (GPEI) consisting of a dyad of palindromically oriented imperfect TPA (12-O-tetradecanoyl-phorbol-13-acetate)-responsive elements (TRE). GPEI is active in AP1-lacking F9 cells as well in AP1-containing HeLa cells. Despite GPEI's similarity to a TRE, c-jun co-transfection has only a minimal effect on transactivation. Antisense c-jun and c-fos co-transfection experiments further demonstrate the lack of a role for AP1 in GPEI mediated trans-activation in F9 cells, although endogenously present AP1 can influence GPEI in HeLa cells. Co-transfection of delta fosB with c-jun, which forms an inactive c-Jun/delta FosB heterodimer that binds TRE sequences, inhibits GPEI-mediated transcription in AP1-lacking F9 cells as well as AP1-containing HeLa cells. These data suggest novel factor(s) other than AP1 are influencing GPEI. Binding studies reveal multiple nucleoproteins bind to GPEI. These factors are likely responsible for the high level of GPEI-mediated transcription observed in the absence of AP1 and during hepatocarcinogenesis.

  6. GWAS of clinically defined gout and subtypes identifies multiple susceptibility loci that include urate transporter genes

    NARCIS (Netherlands)

    Nakayama, A.; Nakaoka, H.; Yamamoto, K.; Sakiyama, M.; Shaukat, A.; Toyoda, Y.; Okada, Y.; Kamatani, Y.; Nakamura, T.; Takada, T.; Inoue, K.; Yasujima, T.; Yuasa, H.; Shirahama, Y.; Nakashima, H.; Shimizu, S.; Higashino, T.; Kawamura, Y.; Ogata, H.; Kawaguchi, M.; Ohkawa, Y.; Danjoh, I.; Tokumasu, A.; Ooyama, K.; Ito, T.; Kondo, T.; Wakai, K.; Stiburkova, B.; Pavelka, K.; Stamp, L.K.; Dalbeth, N.; Sakurai, Y.; Suzuki, H; Hosoyamada, M.; Fujimori, S.; Yokoo, T.; Hosoya, T.; Inoue, I.; Takahashi, A.; Kubo, M.; Ooyama, H.; Shimizu, T.; Ichida, K.; Shinomiya, N.; Merriman, T.R.; Matsuo, H.; Andres, M; Joosten, L.A.; Janssen, M.C.H.; Jansen, T.L.; Liote, F.; Radstake, T.R.; Riches, P.L.; So, A.; Tauches, A.K.

    2017-01-01

    OBJECTIVE: A genome-wide association study (GWAS) of gout and its subtypes was performed to identify novel gout loci, including those that are subtype-specific. METHODS: Putative causal association signals from a GWAS of 945 clinically defined gout cases and 1213 controls from Japanese males were

  7. Using Photovoice to Include People with Profound and Multiple Learning Disabilities in Inclusive Research

    Science.gov (United States)

    Cluley, Victoria

    2017-01-01

    Background: It is now expected that projects addressing the lives of people with learning disabilities include people with learning disabilities in the research process. In the past, such research often excluded people with learning disabilities, favouring the opinions of family members, carers and professionals. The inclusion of the voices of…

  8. Prospective evaluation of dermatologic surgery complications including patients on multiple antiplatelet and anticoagulant medications.

    Science.gov (United States)

    Bordeaux, Jeremy S; Martires, Kathryn J; Goldberg, Dori; Pattee, Sean F; Fu, Pingfu; Maloney, Mary E

    2011-09-01

    Few prospective studies have evaluated the safety of dermatologic surgery. We sought to determine rates of bleeding, infection, flap and graft necrosis, and dehiscence in outpatient dermatologic surgery, and to examine their relationship to type of repair, anatomic location of repair, antibiotic use, antiplatelet use, or anticoagulant use. Patients presenting to University of Massachusetts Medical School Dermatology Clinic for surgery during a 15-month period were prospectively entered. Medications, procedures, and complications were recorded. Of the 1911 patients, 38% were on one anticoagulant or antiplatelet medication, and 8.0% were on two or more. Risk of hemorrhage was 0.89%. Complex repair (odds ratio [OR] = 5.80), graft repair (OR = 7.58), flap repair (OR = 11.93), and partial repair (OR = 43.13) were more likely to result in bleeding than intermediate repair. Patients on both clopidogrel and warfarin were 40 times more likely to have bleeding complications than all others (P = .03). Risk of infection was 1.3%, but was greater than 3% on the genitalia, scalp, back, and leg. Partial flap necrosis occurred in 1.7% of flaps, and partial graft necrosis occurred in 8.6% of grafts. Partial graft necrosis occurred in 20% of grafts on the scalp and 10% of grafts on the nose. All complications resolved without sequelae. The study was limited to one academic dermatology practice. The rate of complications in dermatologic surgery is low, even when multiple oral anticoagulant and antiplatelet medications are continued, and prophylactic antibiotics are not used. Closure type and use of warfarin or clopidogrel increase bleeding risk. However, these medications should be continued to avoid adverse thrombotic events. Copyright © 2011 American Academy of Dermatology, Inc. Published by Mosby, Inc. All rights reserved.

  9. Identification of Multiple Novel Viruses, Including a Parvovirus and a Hepevirus, in Feces of Red Foxes

    Science.gov (United States)

    van der Giessen, Joke; Haagmans, Bart L.; Osterhaus, Albert D. M. E.; Smits, Saskia L.

    2013-01-01

    Red foxes (Vulpes vulpes) are the most widespread members of the order of Carnivora. Since they often live in (peri)urban areas, they are a potential reservoir of viruses that transmit from wildlife to humans or domestic animals. Here we evaluated the fecal viral microbiome of 13 red foxes by random PCR in combination with next-generation sequencing. Various novel viruses, including a parvovirus, bocavirus, adeno-associated virus, hepevirus, astroviruses, and picobirnaviruses, were identified. PMID:23616657

  10. GWAS of clinically defined gout and subtypes identifies multiple susceptibility loci that include urate transporter genes

    OpenAIRE

    Nakayama, Akiyoshi; Nakaoka, Hirofumi; Yamamoto, Ken; Sakiyama, Masayuki; Shaukat, Amara; Toyoda, Yu; Okada, Yukinori; Kamatani, Yoichiro; Nakamura, Takahiro; Takada, Tappei; Inoue, Katsuhisa; Yasujima, Tomoya; Yuasa, Hiroaki; Shirahama, Yuko; Nakashima, Hiroshi

    2016-01-01

    Objective A genome-wide association study (GWAS) of gout and its subtypes was performed to identify novel gout loci, including those that are subtype-specific. Methods Putative causal association signals from a GWAS of 945 clinically defined gout cases and 1213 controls from Japanese males were replicated with 1396 cases and 1268 controls using a custom chip of 1961 single nucleotide polymorphisms (SNPs). We also first conducted GWASs of gout subtypes. Replication with Caucasian and New Zeala...

  11. Assessment of average of normals (AON) procedure for outlier-free datasets including qualitative values below limit of detection (LoD): an application within tumor markers such as CA 15-3, CA 125, and CA 19-9.

    Science.gov (United States)

    Usta, Murat; Aral, Hale; Mete Çilingirtürk, Ahmet; Kural, Alev; Topaç, Ibrahim; Semerci, Tuna; Hicri Köseoğlu, Mehmet

    2016-11-01

    Average of normals (AON) is a quality control procedure that is sensitive only to systematic errors that can occur in an analytical process in which patient test results are used. The aim of this study was to develop an alternative model in order to apply the AON quality control procedure to datasets that include qualitative values below limit of detection (LoD). The reported patient test results for tumor markers, such as CA 15-3, CA 125, and CA 19-9, analyzed by two instruments, were retrieved from the information system over a period of 5 months, using the calibrator and control materials with the same lot numbers. The median as a measure of central tendency and the median absolute deviation (MAD) as a measure of dispersion were used for the complementary model of AON quality control procedure. The u bias values, which were determined for the bias component of the measurement uncertainty, were partially linked to the percentages of the daily median values of the test results that fall within the control limits. The results for these tumor markers, in which lower limits of reference intervals are not medically important for clinical diagnosis and management, showed that the AON quality control procedure, using the MAD around the median, can be applied for datasets including qualitative values below LoD.

  12. GWAS of clinically defined gout and subtypes identifies multiple susceptibility loci that include urate transporter genes.

    Science.gov (United States)

    Nakayama, Akiyoshi; Nakaoka, Hirofumi; Yamamoto, Ken; Sakiyama, Masayuki; Shaukat, Amara; Toyoda, Yu; Okada, Yukinori; Kamatani, Yoichiro; Nakamura, Takahiro; Takada, Tappei; Inoue, Katsuhisa; Yasujima, Tomoya; Yuasa, Hiroaki; Shirahama, Yuko; Nakashima, Hiroshi; Shimizu, Seiko; Higashino, Toshihide; Kawamura, Yusuke; Ogata, Hiraku; Kawaguchi, Makoto; Ohkawa, Yasuyuki; Danjoh, Inaho; Tokumasu, Atsumi; Ooyama, Keiko; Ito, Toshimitsu; Kondo, Takaaki; Wakai, Kenji; Stiburkova, Blanka; Pavelka, Karel; Stamp, Lisa K; Dalbeth, Nicola; Sakurai, Yutaka; Suzuki, Hiroshi; Hosoyamada, Makoto; Fujimori, Shin; Yokoo, Takashi; Hosoya, Tatsuo; Inoue, Ituro; Takahashi, Atsushi; Kubo, Michiaki; Ooyama, Hiroshi; Shimizu, Toru; Ichida, Kimiyoshi; Shinomiya, Nariyoshi; Merriman, Tony R; Matsuo, Hirotaka

    2017-05-01

    A genome-wide association study (GWAS) of gout and its subtypes was performed to identify novel gout loci, including those that are subtype-specific. Putative causal association signals from a GWAS of 945 clinically defined gout cases and 1213 controls from Japanese males were replicated with 1396 cases and 1268 controls using a custom chip of 1961 single nucleotide polymorphisms (SNPs). We also first conducted GWASs of gout subtypes. Replication with Caucasian and New Zealand Polynesian samples was done to further validate the loci identified in this study. In addition to the five loci we reported previously, further susceptibility loci were identified at a genome-wide significance level (pgout cases, and NIPAL1 and FAM35A for the renal underexcretion gout subtype. While NIPAL1 encodes a magnesium transporter, functional analysis did not detect urate transport via NIPAL1, suggesting an indirect association with urate handling. Localisation analysis in the human kidney revealed expression of NIPAL1 and FAM35A mainly in the distal tubules, which suggests the involvement of the distal nephron in urate handling in humans. Clinically ascertained male patients with gout and controls of Caucasian and Polynesian ancestries were also genotyped, and FAM35A was associated with gout in all cases. A meta-analysis of the three populations revealed FAM35A to be associated with gout at a genome-wide level of significance (p meta =3.58×10 -8 ). Our findings including novel gout risk loci provide further understanding of the molecular pathogenesis of gout and lead to a novel concept for the therapeutic target of gout/hyperuricaemia. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  13. Dataset Akkerweb

    NARCIS (Netherlands)

    Baltissen, A.H.M.C.; Molendijk, L.P.G.

    2016-01-01

    This is a zipped file with various files that are distributed to be used in conjunction with a number of crop rotation decision support systems for Dutch arable farming that can be found at www.akkerweb.nl. They come in different formats including Microsoft Excel files, Microsoft Powerpoint fies,

  14. Microvascular decompression for trigeminal neuralgia: comments on a series of 250 cases, including 10 patients with multiple sclerosis.

    Science.gov (United States)

    Broggi, G; Ferroli, P; Franzini, A; Servello, D; Dones, I

    2000-01-01

    To examine surgical findings and results of microvascular decompression (MVD) for trigeminal neuralgia (TN), including patients with multiple sclerosis, to bring new insight about the role of microvascular compression in the pathogenesis of the disorder and the role of MVD in its treatment. Between 1990 and 1998, 250 patients affected by trigeminal neuralgia underwent MVD in the Department of Neurosurgery of the "Istituto Nazionale Neurologico C Besta" in Milan. Limiting the review to the period 1991-6, to exclude the "learning period" (the first 50 cases) and patients with less than 1 year follow up, surgical findings and results were critically analysed in 148 consecutive cases, including 10 patients with multiple sclerosis. Vascular compression of the trigeminal nerve was found in all cases. The recurrence rate was 15.3% (follow up 1-7 years, mean 38 months). In five of 10 patients with multiple sclerosis an excellent result was achieved (follow up 12-39 months, mean 24 months). Patients with TN for more than 84 months did significantly worse than those with a shorter history (p<0.05). There was no mortality and most complications occurred in the learning period. Surgical complications were not related to age of the patients. Aetiopathogenesis of trigeminal neuralgia remains a mystery. These findings suggest a common neuromodulatory role of microvascular compression in both patients with or without multiple sclerosis rather than a direct causal role. MVD was found to be a safe and effective procedure to relieve typical TN in patients of all ages. It should be proposed as first choice surgery to all patients affected by TN, even in selected cases with multiple sclerosis, to give them the opportunity of pain relief without sensory deficits.

  15. Multiple time-scale optimization scheduling for islanded microgrids including PV, wind turbine, diesel generator and batteries

    DEFF Research Database (Denmark)

    Xiao, Zhao xia; Nan, Jiakai; Guerrero, Josep M.

    2017-01-01

    A multiple time-scale optimization scheduling including day ahead and short time for an islanded microgrid is presented. In this paper, the microgrid under study includes photovoltaics (PV), wind turbine (WT), diesel generator (DG), batteries, and shiftable loads. The study considers the maximum...... efficiency operation area for the diesel engine and the cost of the battery charge/discharge cycle losses. The day-ahead generation scheduling takes into account the minimum operational cost and the maximum load satisfaction as the objective function. Short-term optimal dispatch is based on minimizing...

  16. Reliability analysis of the auxiliary feedwater system of Angra-1 including common cause failures using the multiple greek letter model

    International Nuclear Information System (INIS)

    Lapa, Celso Marcelo Franklin.

    1996-05-01

    The use of redundancy to increase the reliability of industrial systems make them subject to the occurrence of common cause events. The industrial experience and the results of safety analysis studies have indicated that common cause failures are the main contributors to the unreliability of plants that have redundant systems, specially in nuclear power plants. In this Thesis procedures are developed in order to include the impact of common cause failures in the calculation of the top event occurrence probability of the Auxiliary Feedwater System in a typical two-loop Nuclear Power Plant (PWR). For this purpose the Multiple Greek Letter Model is used. (author). 14 refs., 10 figs., 11 tabs

  17. Comparison and clinical utility evaluation of four multiple allergen simultaneous tests including two newly introduced fully automated analyzers

    Directory of Open Access Journals (Sweden)

    John Hoon Rim

    2016-04-01

    Full Text Available Background: We compared the diagnostic performances of two newly introduced fully automated multiple allergen simultaneous tests (MAST analyzers with two conventional MAST assays. Methods: The serum samples from a total of 53 and 104 patients were tested for food panels and inhalant panels, respectively, in four analyzers including AdvanSure AlloScreen (LG Life Science, Korea, AdvanSure Allostation Smart II (LG Life Science, PROTIA Allergy-Q (ProteomeTech, Korea, and RIDA Allergy Screen (R-Biopharm, Germany. We compared not only the total agreement percentages but also positive propensities among four analyzers. Results: Evaluation of AdvanSure Allostation Smart II as upgraded version of AdvanSure AlloScreen revealed good concordance with total agreement percentages of 93.0% and 92.2% in food and inhalant panel, respectively. Comparisons of AdvanSure Allostation Smart II or PROTIA Allergy-Q with RIDA Allergy Screen also showed good concordance performance with positive propensities of two new analyzers for common allergens (Dermatophagoides farina and Dermatophagoides pteronyssinus. The changes of cut-off level resulted in various total agreement percentage fluctuations among allergens by different analyzers, although current cut-off level of class 2 appeared to be generally suitable. Conclusions: AdvanSure Allostation Smart II and PROTIA Allergy-Q presented favorable agreement performances with RIDA Allergy Screen, although positive propensities were noticed in common allergens. Keywords: Multiple allergen simultaneous test, Automated analyzer

  18. Compound K, a Ginsenoside Metabolite, Inhibits Colon Cancer Growth via Multiple Pathways Including p53-p21 Interactions

    Directory of Open Access Journals (Sweden)

    Eugene B. Chang

    2013-01-01

    Full Text Available Compound K (20-O-beta-D-glucopyranosyl-20(S-protopanaxadiol, CK, an intestinal bacterial metabolite of ginseng protopanaxadiol saponins, has been shown to inhibit cell growth in a variety of cancers. However, the mechanisms are not completely understood, especially in colorectal cancer (CRC. A xenograft tumor model was used first to examine the anti-CRC effect of CK in vivo. Then, multiple in vitro assays were applied to investigate the anticancer effects of CK including antiproliferation, apoptosis and cell cycle distribution. In addition, a qPCR array and western blot analysis were executed to screen and validate the molecules and pathways involved. We observed that CK significantly inhibited the growth of HCT-116 tumors in an athymic nude mouse xenograft model. CK significantly inhibited the proliferation of human CRC cell lines HCT-116, SW-480, and HT-29 in a dose- and time-dependent manner. We also observed that CK induced cell apoptosis and arrested the cell cycle in the G1 phase in HCT-116 cells. The processes were related to the upregulation of p53/p21, FoxO3a-p27/p15 and Smad3, and downregulation of cdc25A, CDK4/6 and cyclin D1/3. The major regulated targets of CK were cyclin dependent inhibitors, including p21, p27, and p15. These results indicate that CK inhibits transcriptional activation of multiple tumor-promoting pathways in CRC, suggesting that CK could be an active compound in the prevention or treatment of CRC.

  19. Applicability of bioanalysis of multiple analytes in drug discovery and development: review of select case studies including assay development considerations.

    Science.gov (United States)

    Srinivas, Nuggehally R

    2006-05-01

    The development of sound bioanalytical method(s) is of paramount importance during the process of drug discovery and development culminating in a marketing approval. Although the bioanalytical procedure(s) originally developed during the discovery stage may not necessarily be fit to support the drug development scenario, they may be suitably modified and validated, as deemed necessary. Several reviews have appeared over the years describing analytical approaches including various techniques, detection systems, automation tools that are available for an effective separation, enhanced selectivity and sensitivity for quantitation of many analytes. The intention of this review is to cover various key areas where analytical method development becomes necessary during different stages of drug discovery research and development process. The key areas covered in this article with relevant case studies include: (a) simultaneous assay for parent compound and metabolites that are purported to display pharmacological activity; (b) bioanalytical procedures for determination of multiple drugs in combating a disease; (c) analytical measurement of chirality aspects in the pharmacokinetics, metabolism and biotransformation investigations; (d) drug monitoring for therapeutic benefits and/or occupational hazard; (e) analysis of drugs from complex and/or less frequently used matrices; (f) analytical determination during in vitro experiments (metabolism and permeability related) and in situ intestinal perfusion experiments; (g) determination of a major metabolite as a surrogate for the parent molecule; (h) analytical approaches for universal determination of CYP450 probe substrates and metabolites; (i) analytical applicability to prodrug evaluations-simultaneous determination of prodrug, parent and metabolites; (j) quantitative determination of parent compound and/or phase II metabolite(s) via direct or indirect approaches; (k) applicability in analysis of multiple compounds in select

  20. A nonlinear model for fluid flow in a multiple-zone composite reservoir including the quadratic gradient term

    International Nuclear Information System (INIS)

    Wang, Xiao-Lu; Fan, Xiang-Yu; Nie, Ren-Shi; Huang, Quan-Hua; He, Yong-Ming

    2013-01-01

    Based on material balance and Darcy's law, the governing equation with the quadratic pressure gradient term was deduced. Then the nonlinear model for fluid flow in a multiple-zone composite reservoir including the quadratic gradient term was established and solved using a Laplace transform. A series of standard log–log type curves of 1-zone (homogeneous), 2-zone and 3-zone reservoirs were plotted and nonlinear flow characteristics were analysed. The type curves governed by the coefficient of the quadratic gradient term (β) gradually deviate from those of a linear model with time elapsing. Qualitative and quantitative analyses were implemented to compare the solutions of the linear and nonlinear models. The results showed that differences of pressure transients between the linear and nonlinear models increase with elapsed time and β. At the end, a successful application of the theoretical model data against the field data shows that the nonlinear model will be a good tool to evaluate formation parameters more accurately. (paper)

  1. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  2. Enhanced peripheral visual processing in congenitally deaf humans is supported by multiple brain regions, including primary auditory cortex

    Directory of Open Access Journals (Sweden)

    Gregory D. Scott

    2014-03-01

    Full Text Available Brain reorganization associated with altered sensory experience clarifies the critical role of neuroplasticity in development. An example is enhanced peripheral visual processing associated with congenital deafness, but the neural systems supporting this have not been fully characterized. A gap in our understanding of deafness-enhanced peripheral vision is the contribution of primary auditory cortex. Previous studies of auditory cortex that use anatomical normalization across participants were limited by inter-subject variability of Heschl’s gyrus. In addition to reorganized auditory cortex (cross-modal plasticity, a second gap in our understanding is the contribution of altered modality-specific cortices (visual intramodal plasticity in this case, as well as supramodal and multisensory cortices, especially when target detection is required across contrasts. Here we address these gaps by comparing fMRI signal change for peripheral versus perifoveal visual stimulation (11-15° vs. 2°-7° in congenitally deaf and hearing participants in a blocked experimental design with two analytical approaches: a Heschl’s gyrus region of interest analysis and a whole brain analysis. Our results using individually-defined primary auditory cortex (Heschl’s gyrus indicate that fMRI signal change for more peripheral stimuli was greater than perifoveal in deaf but not in hearing participants. Whole-brain analyses revealed differences between deaf and hearing participants for peripheral versus perifoveal visual processing in extrastriate visual cortex including primary auditory cortex, MT+/V5, superior-temporal auditory and multisensory and/or supramodal regions, such as posterior parietal cortex, frontal eye fields, anterior cingulate, and supplementary eye fields. Overall, these data demonstrate the contribution of neuroplasticity in multiple systems including primary auditory cortex, supramodal and multisensory regions, to altered visual processing in

  3. Klotho: a humeral mediator in CSF and plasma that influences longevity and susceptibility to multiple complex disorders, including depression.

    Science.gov (United States)

    Pavlatou, M G; Remaley, A T; Gold, P W

    2016-08-30

    Klotho is a hormone secreted into human cerebrospinal fluid (CSF), plasma and urine that promotes longevity and influences the onset of several premature senescent phenotypes in mice and humans, including atherosclerosis, cardiovascular disease, stroke and osteoporosis. Preliminary studies also suggest that Klotho possesses tumor suppressor properties. Klotho's roles in these phenomena were first suggested by studies demonstrating that a defect in the Klotho gene in mice results in a significant decrease in lifespan. The Klotho-deficient mouse dies prematurely at 8-9 weeks of age. At 4-5 weeks of age, a syndrome resembling human ageing emerges consisting of atherosclerosis, osteoporosis, cognitive disturbances and alterations of hippocampal architecture. Several deficits in Klotho-deficient mice are likely to contribute to these phenomena. These include an inability to defend against oxidative stress in the central nervous system and periphery, decreased capacity to generate nitric oxide to sustain normal endothelial reactivity, defective Klotho-related mediation of glycosylation and ion channel regulation, increased insulin/insulin-like growth factor signaling and a disturbed calcium and phosphate homeostasis accompanied by altered vitamin D levels and ectopic calcification. Identifying the mechanisms by which Klotho influences multiple important pathways is an emerging field in human biology that will contribute significantly to understanding basic physiologic processes and targets for the treatment of complex diseases. Because many of the phenomena seen in Klotho-deficient mice occur in depressive illness, major depression and bipolar disorder represent illnesses potentially associated with Klotho dysregulation. Klotho's presence in CSF, blood and urine should facilitate its study in clinical populations.

  4. A re-consideration of the taxonomic status of Nebria lacustris Casey (Coleoptera, Carabidae, Nebriini based on multiple datasets – a single species or a species complex?

    Directory of Open Access Journals (Sweden)

    David Kavanaugh

    2011-11-01

    Full Text Available This study gathered evidence from principal component analysis (PCA of morphometric data and molecular analyses of nucleotide sequence data for four nuclear genes (28S, TpI, CAD1, and Wg and two mitochondrial genes (COI and 16S, using parsimony, maximum likelihood, and Bayesian methods. This evidence was combined with morphological and chorological data to re-evaluate the taxonomic status of Nebria lacustris Casey sensu lato. PCA demonstrated that both body size and one conspicuous aspect of pronotal shape vary simultaneously with elevation, latitude, and longitude and served to distinguish populations from the southern Appalachian highlands, south of the French Broad, from all other populations. Molecular analyses revealed surprisingly low overall genetic diversity within N. lacustris sensu lato, with only 0.39% of 4605 bp varied in the concatenated dataset. Evaluation of patterns observed in morphological and genetic variation and distribution led to the following taxonomic conclusions: (1 Nebria lacustris Casey and Nebria bellorum Kavanaugh should be considered distinct species, which is a NEW STATUS for N. bellorum. (2 No other distinct taxonomic subunits could be distinguished with the evidence at hand, but samples from northeastern Iowa, in part of the region known as the “Driftless Zone”, have unique genetic markers for two genes that hint at descent from a local population surviving at least the last glacial advance. (3 No morphometric or molecular evidence supports taxonomic distinction between lowland populations on the shores of Lake Champlain and upland populations in the adjacent Green Mountains of Vermont, despite evident size and pronotal shape differences between many of their members.

  5. Mortality from Musculoskeletal Disorders Including Rheumatoid Arthritis in Southern Sweden: A Multiple-cause-of-death Analysis, 1998-2014.

    Science.gov (United States)

    Kiadaliri, Aliasghar A; Turkiewicz, Aleksandra; Englund, Martin

    2017-05-01

    To assess mortality related to musculoskeletal (MSK) disorders and rheumatoid arthritis (RA), specifically, among adults (aged ≥ 20 yrs) in southern Sweden using the multiple-cause-of-death approach. All death certificates (DC; n = 201,488) from 1998 to 2014 for adults in the region of Skåne were analyzed when mortality from MSK disorders and RA was listed as the underlying and nonunderlying cause of death (UCD/NUCD). Trends in age-standardized mortality rates (ASMR) were evaluated using joinpoint regression, and associated causes were identified by age- and sex-adjusted observed/expected ratios. MSK (RA) was mentioned on 2.8% (0.8%) of all DC and selected as UCD in 0.6% (0.2%), with higher values among women. Proportion of MSK disorder deaths from all deaths increased from 2.7% in 1998 to 3.1% in 2014, and declined from 0.9% to 0.5% for RA. The mean age at death was higher in DC with mention of MSK/RA than in DC without. The mean ASMR for MSK (RA) was 15.5 (4.3) per 100,000 person-years and declined by 1.1% (3.8%) per year during 1998-2014. When MSK/RA were UCD, pneumonia and heart failure were the main NUCD. When MSK/RA were NUCD, the leading UCD were ischemic heart disease and neoplasms. The greatest observed/expected ratios were seen for infectious diseases (including sepsis) and blood diseases. We observed significant reduction in MSK and RA mortality rates and increase in the mean age at death. Further analyses are required to investigate determinants of these improvements in MSK/RA survival and their potential effect on the Swedish healthcare systems.

  6. Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research

    Directory of Open Access Journals (Sweden)

    Hardt Jochen

    2012-12-01

    Full Text Available Abstract Background Multiple imputation is becoming increasingly popular. Theoretical considerations as well as simulation studies have shown that the inclusion of auxiliary variables is generally of benefit. Methods A simulation study of a linear regression with a response Y and two predictors X1 and X2 was performed on data with n = 50, 100 and 200 using complete cases or multiple imputation with 0, 10, 20, 40 and 80 auxiliary variables. Mechanisms of missingness were either 100% MCAR or 50% MAR + 50% MCAR. Auxiliary variables had low (r=.10 vs. moderate correlations (r=.50 with X’s and Y. Results The inclusion of auxiliary variables can improve a multiple imputation model. However, inclusion of too many variables leads to downward bias of regression coefficients and decreases precision. When the correlations are low, inclusion of auxiliary variables is not useful. Conclusion More research on auxiliary variables in multiple imputation should be performed. A preliminary rule of thumb could be that the ratio of variables to cases with complete data should not go below 1 : 3.

  7. iStimulation: Apple iPad Use with Children Who Are Visually Impaired, Including Those with Multiple Disabilities

    Science.gov (United States)

    Campaña, Laura V.; Ouimet, Donald A.

    2015-01-01

    Since its creation in the early 1980s, Light Box, a product developed by the American Printing House for the Blind (APH) that is designed for working on functional vision tasks with children who have visual impairments or multiple disabilities, has been an effective tool to help teach children with visual impairments to locate and track items…

  8. Nontypeable pneumococci can be divided into multiple cps types, including one type expressing the novel gene pspK.

    Science.gov (United States)

    Park, In Ho; Kim, Kyung-Hyo; Andrade, Ana Lucia; Briles, David E; McDaniel, Larry S; Nahm, Moon H

    2012-01-01

    Although virulence of Streptococcus pneumoniae is associated with its capsule, some pathogenic S. pneumoniae isolates lack capsules and are serologically nontypeable (NT). We obtained 64 isolates that were identified as NT "pneumococci" (i.e., bacteria satisfying the conventional definition but without the multilocus sequence typing [MLST]-based definition of S. pneumoniae) by the traditional criteria. All 64 were optochin sensitive and had lytA, and 63 had ply. Twelve isolates had cpsA, suggesting the presence of a conventional but defective capsular polysaccharide synthesis (cps) locus. The 52 cpsA-negative isolates could be divided into three null capsule clades (NCC) based on aliC (aliB-like ORF1), aliD (aliB-like ORF2), and our newly discovered gene, pspK, in their cps loci. pspK encodes a protein with a long alpha-helical region containing an LPxTG motif and a YPT motif known to bind human pIgR. There were nine isolates in NCC1 (pspK(+) but negative for aliC and aliD), 32 isolates in NCC2 (aliC(+) aliD(+) but negative for pspK), and 11 in NCC3 (aliD(+) but negative for aliC and pspK). Among 52 cpsA-negative isolates, 41 were identified as S. pneumoniae by MLST analysis. All NCC1 and most NCC2 isolates were S. pneumoniae, whereas all nine NCC3 and two NCC2 isolates were not S. pneumoniae. Several NCC1 and NCC2 isolates from multiple individuals had identical MLST and cps regions, showing that unencapsulated S. pneumoniae can be infectious among humans. Furthermore, NCC1 and NCC2 S. pneumoniae isolates could colonize mice as well as encapsulated S. pneumoniae, although S. pneumoniae with an artificially disrupted cps locus did not. Moreover, an NCC1 isolate with pspK deletion did not colonize mice, suggesting that pspK is critical for colonization. Thus, PspK may provide pneumococci a means of surviving in the nasopharynx without capsule. IMPORTANCE The presence of a capsule is critical for many pathogenic bacteria, including pneumococci. Reflecting the

  9. Ca analysis: an Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis.

    Science.gov (United States)

    Greensmith, David J

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. Copyright © 2013 The Author. Published by Elsevier Ireland Ltd.. All rights reserved.

  10. Ca analysis: An Excel based program for the analysis of intracellular calcium transients including multiple, simultaneous regression analysis☆

    Science.gov (United States)

    Greensmith, David J.

    2014-01-01

    Here I present an Excel based program for the analysis of intracellular Ca transients recorded using fluorescent indicators. The program can perform all the necessary steps which convert recorded raw voltage changes into meaningful physiological information. The program performs two fundamental processes. (1) It can prepare the raw signal by several methods. (2) It can then be used to analyze the prepared data to provide information such as absolute intracellular Ca levels. Also, the rates of change of Ca can be measured using multiple, simultaneous regression analysis. I demonstrate that this program performs equally well as commercially available software, but has numerous advantages, namely creating a simplified, self-contained analysis workflow. PMID:24125908

  11. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level designs.

    Science.gov (United States)

    Hemming, Karla; Lilford, Richard; Girling, Alan J

    2015-01-30

    Stepped-wedge cluster randomised trials (SW-CRTs) are being used with increasing frequency in health service evaluation. Conventionally, these studies are cross-sectional in design with equally spaced steps, with an equal number of clusters randomised at each step and data collected at each and every step. Here we introduce several variations on this design and consider implications for power. One modification we consider is the incomplete cross-sectional SW-CRT, where the number of clusters varies at each step or where at some steps, for example, implementation or transition periods, data are not collected. We show that the parallel CRT with staggered but balanced randomisation can be considered a special case of the incomplete SW-CRT. As too can the parallel CRT with baseline measures. And we extend these designs to allow for multiple layers of clustering, for example, wards within a hospital. Building on results for complete designs, power and detectable difference are derived using a Wald test and obtaining the variance-covariance matrix of the treatment effect assuming a generalised linear mixed model. These variations are illustrated by several real examples. We recommend that whilst the impact of transition periods on power is likely to be small, where they are a feature of the design they should be incorporated. We also show examples in which the power of a SW-CRT increases as the intra-cluster correlation (ICC) increases and demonstrate that the impact of the ICC is likely to be smaller in a SW-CRT compared with a parallel CRT, especially where there are multiple levels of clustering. Finally, through this unified framework, the efficiency of the SW-CRT and the parallel CRT can be compared. © 2014 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  12. Nonadiabatic dynamics of electron transfer in solution: Explicit and implicit solvent treatments that include multiple relaxation time scales

    International Nuclear Information System (INIS)

    Schwerdtfeger, Christine A.; Soudackov, Alexander V.; Hammes-Schiffer, Sharon

    2014-01-01

    The development of efficient theoretical methods for describing electron transfer (ET) reactions in condensed phases is important for a variety of chemical and biological applications. Previously, dynamical dielectric continuum theory was used to derive Langevin equations for a single collective solvent coordinate describing ET in a polar solvent. In this theory, the parameters are directly related to the physical properties of the system and can be determined from experimental data or explicit molecular dynamics simulations. Herein, we combine these Langevin equations with surface hopping nonadiabatic dynamics methods to calculate the rate constants for thermal ET reactions in polar solvents for a wide range of electronic couplings and reaction free energies. Comparison of explicit and implicit solvent calculations illustrates that the mapping from explicit to implicit solvent models is valid even for solvents exhibiting complex relaxation behavior with multiple relaxation time scales and a short-time inertial response. The rate constants calculated for implicit solvent models with a single solvent relaxation time scale corresponding to water, acetonitrile, and methanol agree well with analytical theories in the Golden rule and solvent-controlled regimes, as well as in the intermediate regime. The implicit solvent models with two relaxation time scales are in qualitative agreement with the analytical theories but quantitatively overestimate the rate constants compared to these theories. Analysis of these simulations elucidates the importance of multiple relaxation time scales and the inertial component of the solvent response, as well as potential shortcomings of the analytical theories based on single time scale solvent relaxation models. This implicit solvent approach will enable the simulation of a wide range of ET reactions via the stochastic dynamics of a single collective solvent coordinate with parameters that are relevant to experimentally accessible

  13. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  14. Improvement in genetic evaluation of female fertility in dairy cattle using multiple-trait models including milk production traits

    DEFF Research Database (Denmark)

    Sun, C; Madsen, P; Lund, M S

    2010-01-01

    This study investigated the improvement in genetic evaluation of fertility traits by using production traits as secondary traits (MILK = 305-d milk yield, FAT = 305-d fat yield, and PROT = 305-d protein yield). Data including 471,742 records from first lactations of Denmark Holstein cows, covering...... the years of inseminations during first lactations from 1995 to 2004, were analyzed. Six fertility traits (i.e., interval in days from calving to first insemination, calving interval, days open, interval in days from first to last insemination, numbers of inseminations per conception, and nonreturn rate...... stability and predictive ability than single-trait models for all the fertility traits, except for nonreturn rate within 56 d after first service. The stability and predictive ability for the model including MILK or PROT were similar to the model including all 3 milk production traits and better than...

  15. Leveraging multiple datasets for deep leaf counting

    OpenAIRE

    Dobrescu, Andrei; Giuffrida, Mario Valerio; Tsaftaris, Sotirios A

    2017-01-01

    The number of leaves a plant has is one of the key traits (phenotypes) describing its development and growth. Here, we propose an automated, deep learning based approach for counting leaves in model rosette plants. While state-of-the-art results on leaf counting with deep learning methods have recently been reported, they obtain the count as a result of leaf segmentation and thus require per-leaf (instance) segmentation to train the models (a rather strong annotation). Instead, our method tre...

  16. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  17. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  18. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  19. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  20. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  1. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  2. Study Modules for Calculus-Based General Physics. [Includes Modules 3-5: Planar Motion; Newton's Laws; and Vector Multiplication].

    Science.gov (United States)

    Fuller, Robert G., Ed.; And Others

    This is part of a series of 42 Calculus Based Physics (CBP) modules totaling about 1,000 pages. The modules include study guides, practice tests, and mastery tests for a full-year individualized course in calculus-based physics based on the Personalized System of Instruction (PSI). The units are not intended to be used without outside materials;…

  3. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  4. Multiple Beneficial Lipids Including Lecithin Detected in the Edible Invasive Mollusk Crepidula fornicata from the French Northeastern Atlantic Coast

    Science.gov (United States)

    Dagorn, Flore; Buzin, Florence; Couzinet-Mossion, Aurélie; Decottignies, Priscilla; Viau, Michèle; Rabesaotra, Vony; Barnathan, Gilles; Wielgosz-Collin, Gaëtane

    2014-01-01

    The invasive mollusk Crepidula fornicata, occurring in large amounts in bays along the French Northeastern Atlantic coasts, may have huge environmental effects in highly productive ecosystems where shellfish are exploited. The present study aims at determining the potential economic value of this marine species in terms of exploitable substances with high added value. Lipid content and phospholipid (PL) composition of this mollusk collected on the Bourgneuf Bay were studied through four seasons. Winter specimens contained the highest lipid levels (5.3% dry weight), including 69% of PLs. Phosphatidylcholine (PC) was the major PL class all year, accounting for 63.9% to 88.9% of total PLs. Consequently, the winter specimens were then investigated for PL fatty acids (FAs), and free sterols. Dimethylacetals (DMAs) were present (10.7% of PL FA + DMA mixture) revealing the occurrence of plasmalogens. More than forty FAs were identified, including 20:5n-3 (9.4%) and 22:6n-3 (7.3%) acids. Fourteen free sterols were present, including cholesterol at 31.3% of the sterol mixture and about 40% of phytosterols. These data on lipids of C. fornicata demonstrate their positive attributes for human nutrition and health. The PL mixture, rich in PC and polyunsaturated FAs, offers an interesting alternative source of high value-added marine lecithin. PMID:25532566

  5. Multiple beneficial lipids including lecithin detected in the edible invasive mollusk Crepidula fornicata from the French Northeastern Atlantic coast.

    Science.gov (United States)

    Dagorn, Flore; Buzin, Florence; Couzinet-Mossion, Aurélie; Decottignies, Priscilla; Viau, Michèle; Rabesaotra, Vony; Barnathan, Gilles; Wielgosz-Collin, Gaëtane

    2014-12-22

    The invasive mollusk Crepidula fornicata, occurring in large amounts in bays along the French Northeastern Atlantic coasts, may have huge environmental effects in highly productive ecosystems where shellfish are exploited. The present study aims at determining the potential economic value of this marine species in terms of exploitable substances with high added value. Lipid content and phospholipid (PL) composition of this mollusk collected on the Bourgneuf Bay were studied through four seasons. Winter specimens contained the highest lipid levels (5.3% dry weight), including 69% of PLs. Phosphatidylcholine (PC) was the major PL class all year, accounting for 63.9% to 88.9% of total PLs. Consequently, the winter specimens were then investigated for PL fatty acids (FAs), and free sterols. Dimethylacetals (DMAs) were present (10.7% of PL FA + DMA mixture) revealing the occurrence of plasmalogens. More than forty FAs were identified, including 20:5n-3 (9.4%) and 22:6n-3 (7.3%) acids. Fourteen free sterols were present, including cholesterol at 31.3% of the sterol mixture and about 40% of phytosterols. These data on lipids of C. fornicata demonstrate their positive attributes for human nutrition and health. The PL mixture, rich in PC and polyunsaturated FAs, offers an interesting alternative source of high value-added marine lecithin.

  6. Evaluation and Comparison of Multiple Test Methods, Including Real-time PCR, for Legionella Detection in Clinical Specimens

    Science.gov (United States)

    Peci, Adriana; Winter, Anne-Luise; Gubbay, Jonathan B.

    2016-01-01

    Legionella is a Gram-negative bacterium that can cause Pontiac fever, a mild upper respiratory infection and Legionnaire’s disease, a more severe illness. We aimed to compare the performance of urine antigen, culture, and polymerase chain reaction (PCR) test methods and to determine if sputum is an acceptable alternative to the use of more invasive bronchoalveolar lavage (BAL). Data for this study included specimens tested for Legionella at Public Health Ontario Laboratories from 1st January, 2010 to 30th April, 2014, as part of routine clinical testing. We found sensitivity of urinary antigen test (UAT) compared to culture to be 87%, specificity 94.7%, positive predictive value (PPV) 63.8%, and negative predictive value (NPV) 98.5%. Sensitivity of UAT compared to PCR was 74.7%, specificity 98.3%, PPV 77.7%, and NPV 98.1%. Out of 146 patients who had a Legionella-positive result by PCR, only 66 (45.2%) also had a positive result by culture. Sensitivity for culture was the same using either sputum or BAL (13.6%); sensitivity for PCR was 10.3% for sputum and 12.8% for BAL. Both sputum and BAL yield similar results regardless testing methods (Fisher Exact p-values = 1.0, for each test). In summary, all test methods have inherent weaknesses in identifying Legionella; therefore, more than one testing method should be used. Obtaining a single specimen type from patients with pneumonia limits the ability to diagnose Legionella, particularly when urine is the specimen type submitted. Given ease of collection and similar sensitivity to BAL, clinicians are encouraged to submit sputum in addition to urine when BAL submission is not practical from patients being tested for Legionella. PMID:27630979

  7. Evaluation and comparison of multiple test methods, including real-time PCR, for Legionella detection in clinical specimens.

    Directory of Open Access Journals (Sweden)

    Adriana Peci

    2016-08-01

    Full Text Available Legionella is a gram-negative bacterium that can cause Pontiac fever, a mild upper respiratory infection and Legionnaire’s disease, a more severe illness. We aimed to compare the performance of urine antigen, culture and PCR test methods and to determine if sputum is an alternative to the use of more invasive bronchoalveolar lavage (BAL. Data for this study included specimens tested for Legionella at PHOL from January 1, 2010 to April 30, 2014, as part of routine clinical testing. We found sensitivity of UAT compared to culture to be 87%, specificity 94.7%, positive predictive value (PPV 63.8% and negative predictive value (NPV 98.5%. Sensitivity of UAT compared to PCR was 74.7%, specificity 98.3%, PPV 77.7% and NPV 98.1%. Of 146 patients who had a Legionella positive result by PCR, only 66(45.2% also had a positive result by culture. Sensitivity for culture was the same using either sputum or BAL (13.6%; sensitivity for PCR was 10.3% for sputum and 12.8% for BAL. Both sputum and BAL yield similar results despite testing methods (Fisher Exact p-values=1.0, for each test. In summary, all test methods have inherent weaknesses in identifying Legionella; thereforemore than one testing method should be used. Obtaining a single specimen type from patients with pneumonia limits the ability to diagnose Legionella, particularly when urine is the specimen type submitted. Given ease of collection, and similar sensitivity to BAL, clinicians are encouraged to submit sputum in addition to urine when BAL submission is not practical, from patients being tested for Legionella.

  8. Airborne electromagnetic detection of shallow seafloor topographic features, including resolution of multiple sub-parallel seafloor ridges

    Science.gov (United States)

    Vrbancich, Julian; Boyd, Graham

    2014-05-01

    The HoistEM helicopter time-domain electromagnetic (TEM) system was flown over waters in Backstairs Passage, South Australia, in 2003 to test the bathymetric accuracy and hence the ability to resolve seafloor structure in shallow and deeper waters (extending to ~40 m depth) that contain interesting seafloor topography. The topography that forms a rock peak (South Page) in the form of a mini-seamount that barely rises above the water surface was accurately delineated along its ridge from the start of its base (where the seafloor is relatively flat) in ~30 m water depth to its peak at the water surface, after an empirical correction was applied to the data to account for imperfect system calibration, consistent with earlier studies using the same HoistEM system. A much smaller submerged feature (Threshold Bank) of ~9 m peak height located in waters of 35 to 40 m depth was also accurately delineated. These observations when checked against known water depths in these two regions showed that the airborne TEM system, following empirical data correction, was effectively operating correctly. The third and most important component of the survey was flown over the Yatala Shoals region that includes a series of sub-parallel seafloor ridges (resembling large sandwaves rising up to ~20 m from the seafloor) that branch out and gradually decrease in height as the ridges spread out across the seafloor. These sub-parallel ridges provide an interesting topography because the interpreted water depths obtained from 1D inversion of TEM data highlight the limitations of the EM footprint size in resolving both the separation between the ridges (which vary up to ~300 m) and the height of individual ridges (which vary up to ~20 m), and possibly also the limitations of assuming a 1D model in areas where the topography is quasi-2D/3D.

  9. Biallelic germline and somatic mutations in malignant mesothelioma: multiple mutations in transcription regulators including mSWI/SNF genes.

    Science.gov (United States)

    Yoshikawa, Yoshie; Sato, Ayuko; Tsujimura, Tohru; Otsuki, Taiichiro; Fukuoka, Kazuya; Hasegawa, Seiki; Nakano, Takashi; Hashimoto-Tamaoki, Tomoko

    2015-02-01

    We detected low levels of acetylation for histone H3 tail lysines in malignant mesothelioma (MM) cell lines resistant to histone deacetylase inhibitors. To identify the possible genetic causes related to the low histone acetylation levels, whole-exome sequencing was conducted with MM cell lines established from eight patients. A mono-allelic variant of BRD1 was common to two MM cell lines with very low acetylation levels. We identified 318 homozygous protein-damaging variants/mutations (18-78 variants/mutations per patient); annotation analysis showed enrichment of the molecules associated with mammalian SWI/SNF (mSWI/SNF) chromatin remodeling complexes and co-activators that facilitate initiation of transcription. In seven of the patients, we detected a combination of variants in histone modifiers or transcription factors/co-factors, in addition to variants in mSWI/SNF. Direct sequencing showed that homozygous mutations in SMARCA4, PBRM1 and ARID2 were somatic. In one patient, homozygous germline variants were observed for SMARCC1 and SETD2 in chr3p22.1-3p14.2. These exhibited extended germline homozygosity and were in regions containing somatic mutations, leading to a loss of BAP1 and PBRM1 expression in MM cell line. Most protein-damaging variants were heterozygous in normal tissues. Heterozygous germline variants were often converted into hemizygous variants by mono-allelic deletion, and were rarely homozygous because of acquired uniparental disomy. Our findings imply that MM might develop through the somatic inactivation of mSWI/SNF complex subunits and/or histone modifiers, including BAP1, in subjects that have rare germline variants of these transcription regulators and/or transcription factors/co-factors, and in regions prone to mono-allelic deletion during oncogenesis. © 2014 UICC.

  10. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

    Science.gov (United States)

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-07-02

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

  11. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  12. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  13. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  14. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  15. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  16. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  17. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  18. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  19. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  20. More Than Just Accuracy: A Novel Method to Incorporate Multiple Test Attributes in Evaluating Diagnostic Tests Including Point of Care Tests.

    Science.gov (United States)

    Thompson, Matthew; Weigl, Bernhard; Fitzpatrick, Annette; Ide, Nicole

    2016-01-01

    Current frameworks for evaluating diagnostic tests are constrained by a focus on diagnostic accuracy, and assume that all aspects of the testing process and test attributes are discrete and equally important. Determining the balance between the benefits and harms associated with new or existing tests has been overlooked. Yet, this is critically important information for stakeholders involved in developing, testing, and implementing tests. This is particularly important for point of care tests (POCTs) where tradeoffs exist between numerous aspects of the testing process and test attributes. We developed a new model that multiple stakeholders (e.g., clinicians, patients, researchers, test developers, industry, regulators, and health care funders) can use to visualize the multiple attributes of tests, the interactions that occur between these attributes, and their impacts on health outcomes. We use multiple examples to illustrate interactions between test attributes (test availability, test experience, and test results) and outcomes, including several POCTs. The model could be used to prioritize research and development efforts, and inform regulatory submissions for new diagnostics. It could potentially provide a way to incorporate the relative weights that various subgroups or clinical settings might place on different test attributes. Our model provides a novel way that multiple stakeholders can use to visualize test attributes, their interactions, and impacts on individual and population outcomes. We anticipate that this will facilitate more informed decision making around diagnostic tests.

  1. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  2. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  3. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  4. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  5. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  6. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  7. Efficient Time-Domain Ray-Tracing Technique for the Analysis of Ultra-Wideband Indoor Environments including Lossy Materials and Multiple Effects

    Directory of Open Access Journals (Sweden)

    F. Saez de Adana

    2009-01-01

    Full Text Available This paper presents an efficient application of the Time-Domain Uniform Theory of Diffraction (TD-UTD for the analysis of Ultra-Wideband (UWB mobile communications for indoor environments. The classical TD-UTD formulation is modified to include the contribution of lossy materials and multiple-ray interactions with the environment. The electromagnetic analysis is combined with a ray-tracing acceleration technique to treat realistic and complex environments. The validity of this method is tested with measurements performed inside the Polytechnic building of the University of Alcala and shows good performance of the model for the analysis of UWB propagation.

  8. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  9. Prediction of preterm birth in multiple pregnancies: development of a multivariable model including cervical length measurement at 16 to 21 weeks' gestation.

    Science.gov (United States)

    van de Mheen, Lidewij; Schuit, Ewoud; Lim, Arianne C; Porath, Martina M; Papatsonis, Dimitri; Erwich, Jan J; van Eyck, Jim; van Oirschot, Charlotte M; Hummel, Piet; Duvekot, Johannes J; Hasaart, Tom H M; Groenwold, Rolf H H; Moons, Karl G M; de Groot, Christianne J M; Bruinse, Hein W; van Pampus, Maria G; Mol, Ben W J

    2014-04-01

    To develop a multivariable prognostic model for the risk of preterm delivery in women with multiple pregnancy that includes cervical length measurement at 16 to 21 weeks' gestation and other variables. We used data from a previous randomized trial. We assessed the association between maternal and pregnancy characteristics including cervical length measurement at 16 to 21 weeks' gestation and time to delivery using multivariable Cox regression modelling. Performance of the final model was assessed for the outcomes of preterm and very preterm delivery using calibration and discrimination measures. We studied 507 women, of whom 270 (53%) delivered models for preterm and very preterm delivery had a c-index of 0.68 (95% CI 0.63 to 0.72) and 0.68 (95% CI 0.62 to 0.75), respectively, and showed good calibration. In women with a multiple pregnancy, the risk of preterm delivery can be assessed with a multivariable model incorporating cervical length and other predictors.

  10. A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

    Science.gov (United States)

    Contractor, S.; Donat, M.; Alexander, L. V.

    2017-12-01

    Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

  11. The rank-heat plot is a novel way to present the results from a network meta-analysis including multiple outcomes.

    Science.gov (United States)

    Veroniki, Areti Angeliki; Straus, Sharon E; Fyraridis, Alexandros; Tricco, Andrea C

    2016-08-01

    To present a novel and simple graphical approach to improve the presentation of the treatment ranking in a network meta-analysis (NMA) including multiple outcomes. NMA simultaneously compares many relevant interventions for a clinical condition from a network of trials, and allows ranking of the effectiveness and/or safety of each intervention. There are numerous ways to present the NMA results, which can challenge their interpretation by research users. The rank-heat plot is a novel graph that can be used to quickly recognize which interventions are most likely the best or worst interventions with respect to their effectiveness and/or safety for a single or multiple outcome(s) and may increase interpretability. Using empirical NMAs, we show that the need for a concise and informative presentation of results is imperative, particularly as the number of competing treatments and outcomes in an NMA increases. The rank-heat plot is an efficient way to present the results of ranking statistics, particularly when a large amount of data is available, and it is targeted to users from various backgrounds. Copyright © 2016 Elsevier Inc. All rights reserved.

  12. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  13. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  14. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  15. MiR-7 triggers cell cycle arrest at the G1/S transition by targeting multiple genes including Skp2 and Psme3.

    Directory of Open Access Journals (Sweden)

    Noelia Sanchez

    Full Text Available MiR-7 acts as a tumour suppressor in many cancers and abrogates proliferation of CHO cells in culture. In this study we demonstrate that miR-7 targets key regulators of the G1 to S phase transition, including Skp2 and Psme3, to promote increased levels of p27(KIP and temporary growth arrest of CHO cells in the G1 phase. Simultaneously, the down-regulation of DNA repair-specific proteins via miR-7 including Rad54L, and pro-apoptotic regulators such as p53, combined with the up-regulation of anti-apoptotic factors like p-Akt, promoted cell survival while arrested in G1. Thus miR-7 can co-ordinate the levels of multiple genes and proteins to influence G1 to S phase transition and the apoptotic response in order to maintain cellular homeostasis. This work provides further mechanistic insight into the role of miR-7 as a regulator of cell growth in times of cellular stress.

  16. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  17. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

    Science.gov (United States)

    Shcherbina, Anna

    2014-08-15

    High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge

  18. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  19. Activator Gcn4 employs multiple segments of Med15/Gal11, including the KIX domain, to recruit mediator to target genes in vivo.

    Science.gov (United States)

    Jedidi, Iness; Zhang, Fan; Qiu, Hongfang; Stahl, Stephen J; Palmer, Ira; Kaufman, Joshua D; Nadaud, Philippe S; Mukherjee, Sujoy; Wingfield, Paul T; Jaroniec, Christopher P; Hinnebusch, Alan G

    2010-01-22

    Mediator is a multisubunit coactivator required for initiation by RNA polymerase II. The Mediator tail subdomain, containing Med15/Gal11, is a target of the activator Gcn4 in vivo, critical for recruitment of native Mediator or the Mediator tail subdomain present in sin4Delta cells. Although several Gal11 segments were previously shown to bind Gcn4 in vitro, the importance of these interactions for recruitment of Mediator and transcriptional activation by Gcn4 in cells was unknown. We show that interaction of Gcn4 with the Mediator tail in vitro and recruitment of this subcomplex and intact Mediator to the ARG1 promoter in vivo involve additive contributions from three different segments in the N terminus of Gal11. These include the KIX domain, which is a critical target of other activators, and a region that shares a conserved motif (B-box) with mammalian coactivator SRC-1, and we establish that B-box is a critical determinant of Mediator recruitment by Gcn4. We further demonstrate that Gcn4 binds to the Gal11 KIX domain directly and, by NMR chemical shift analysis combined with mutational studies, we identify the likely binding site for Gcn4 on the KIX surface. Gcn4 is distinctive in relying on comparable contributions from multiple segments of Gal11 for efficient recruitment of Mediator in vivo.

  20. Patients with neuromyelitis optica have a more severe disease than patients with relapsingremitting multiple sclerosis, including higher risk of dying of a demyelinating disease

    Directory of Open Access Journals (Sweden)

    Denis Bernardi Bichuetti

    2013-05-01

    Full Text Available Although neuromyelitis optica (NMO is known to be a more severe disease than relapsing-remitting multiple sclerosis (RRMS, few studies comparing both conditions in a single center have been done. Methods: Comparison of our previously published cohort of 41 NMO patients with 177 RRMS patients followed in the same center, from 1994 to 2007. Results: Mean age of onset was 32.6 for NMO and 30.2 for RRMS (p=0.2062 with mean disease duration of 7.4 years for NMO and 10.3 years for RRMS. Patients with NMO had a higher annualized relapse rate (1.0 versus 0.8, p=0.0013 and progression index (0.9 versus 0.6, p≪0.0001, with more patients reaching expanded disability status scale (EDSS 6.0 (39 versus 17%, p=0.0036. The odds ratio for reaching EDSS 6.0 and being deceased due to NMO in comparison to RRMS were, respectively, 3.14 and 12.15. Conclusion: Patients with NMO have a more severe disease than patients with RRMS, including higher risk of dying of a demyelinating disease.

  1. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  2. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  3. Benthic Habitat Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Beam trawl (2 m) and otter trawl (36' Yankee) demersal and epibenthic catches from targeted areas of interest including mid-Atlantic shelf and vicinity of Hudson...

  4. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  5. PHYSICS PERFORMANCE & DATASET (PPD)

    CERN Multimedia

    L. Silvestris, P. Azzi, C. Cerminara, M. Pierini with contributions from R. Castello, M. De Mattia, S. Di Guida, A. Pfeiffer, F. De Guio, D. Duggan, M. Ojeda-Sandonis, M. Rovere, G. Boudoul, G. Franzoni, P. Srimanobhas, J.-R. Vlimant. Edited by K. Aspola.

    2013-01-01

      Despite the LHC shutdown, the past six months of 2013 have been extremely intense for our coordination area. All the PPD teams are engaged on three major fronts: the exploitation of the 2011 and 2012 data, the preparation of the post-LS1 data taking and the support to the studies for the upgrade of the detector. Alignment and Calibration and Database (AlCaDB) Work in the AlCaDB project followed the planning and moved into a consolidation phase. On the AlCa side, efforts mainly concentrated on providing and validating new calibration and alignment conditions as needed for the re-processing campaigns of 2011 and 2012 data and simulation of multiple upgrade scenarios. Also, work on improvements on the Global Tag Collector tool to manage these are ongoing. On the DB side, the major redesign of the core conditions software, as discussed in various meetings in the last months of 2012, is being finalised according to schedule. We plan to change over to the new DB structure in early 2014, well before t...

  6. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  7. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying

    2011-10-31

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  8. Geostatistics for Large Datasets

    KAUST Repository

    Sun, Ying; Li, Bo; Genton, Marc G.

    2011-01-01

    Each chapter should be preceded by an abstract (10–15 lines long) that summarizes the content. The abstract will appear onlineat www.SpringerLink.com and be available with unrestricted access. This allows unregistered users to read the abstract as a teaser for the complete chapter. As a general rule the abstracts will not appear in the printed version of your book unless it is the style of your particular book or that of the series to which your book belongs. Please use the ’starred’ version of the new Springer abstractcommand for typesetting the text of the online abstracts (cf. source file of this chapter template abstract) and include them with the source files of your manuscript. Use the plain abstractcommand if the abstract is also to appear in the printed version of the book.

  9. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  10. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  11. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  12. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  13. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  14. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  15. Conservative two-step procedure including uterine artery embolization with embosphere and surgical myomectomy for the treatment of multiple fibroids: Preliminary experience

    International Nuclear Information System (INIS)

    Malartic, Cécile; Morel, Olivier; Fargeaudou, Yann; Le Dref, Olivier; Fazel, Afchine; Barranger, Emmanuel; Soyer, Philippe

    2012-01-01

    Objective: To evaluate the feasibility and safety of combined uterine artery embolization (UAE) using embosphere and surgical myomectomy as an alternative to radical hysterectomy in premenopausal women with multiple fibroids. Materials and methods: Mid-term clinical outcome (mean, 25 months) of 12 premenopausal women (mean age, 38 years) with multiple and large symptomatic fibroids who desired to retain their uterus and who were treated using combined UAE and surgical myomectomy were retrospectively analyzed. In all women, UAE alone was contraindicated because of large (>10 cm) or subserosal or submucosal fibroids and myomectomy alone was contraindicated because of too many (>10) fibroids. Results: UAE and surgical myomectomy were successfully performed in all women. Myomectomy was performed using laparoscopy (n = 6), open laparotomy (n = 3), hysteroscopy (n = 2), or laparoscopy and hysteroscopy (n = 1). Mean serum hemoglobin level drop was 0.97 g/dL and no blood transfusion was needed. No immediate complications were observed and all women reported resumption of normal menses. During a mean follow-up period of 25 months (range, 14–37 months), complete resolution of initial symptoms along with decrease in uterine volume (mean, 48%) was observed in all women. No further hysterectomy was required in any woman. Conclusion: In premenopausal women with multiple fibroids, the two-step procedure is safe and effective alternative to radical hysterectomy, which allows preserving the uterus. Further prospective studies, however, should be done to determine the actual benefit of this combined approach on the incidence of subsequent pregnancies.

  16. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  17. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  18. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  19. Report of Increasing Overdose Deaths that include Acetyl Fentanyl in Multiple Counties of the Southwestern Region of the Commonwealth of Pennsylvania in 2015-2016.

    Science.gov (United States)

    Dwyer, Jessica B; Janssen, Jennifer; Luckasevic, Todd M; Williams, Karl E

    2018-01-01

    Acetyl fentanyl is a Schedule I controlled synthetic opioid that is becoming an increasingly detected "designer drug." Routine drug screening procedures in local forensic toxicology laboratories identified a total of 41 overdose deaths associated with acetyl fentanyl within multiple counties of the southwestern region of the state of Pennsylvania. The range, median, mean, and standard deviation of blood acetyl fentanyl concentrations for these 41 cases were 0.13-2100 ng/mL, 11 ng/mL, 169.3 ng/mL, and 405.3 ng/mL, respectively. Thirty-six individuals (88%) had a confirmed history of substance abuse, and all but one case (96%) were ruled multiple drug toxicities. This report characterizes this localized trend of overdose deaths associated with acetyl fentanyl and provides further evidence supporting an alarmingly concentrated opiate and opioid epidemic of both traditional and novel drugs within this region of the United States. © 2017 American Academy of Forensic Sciences.

  20. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  1. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  2. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  3. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  4. Fitting and Phenomenology in Type IA Supernova Cosmology: Generalized Likelihood Analyses for Multiple Evolving Populations and Observations of Near-Infrared Lightcurves Including Host Galaxy Properties

    Science.gov (United States)

    Ponder, Kara A.

    In the late 1990s, Type Ia supernovae (SNeIa) led to the discovery that the Universe is expanding at an accelerating rate due to dark energy. Since then, many different tracers of acceleration have been used to characterize dark energy, but the source of cosmic acceleration has remained a mystery. To better understand dark energy, future surveys such as the ground-based Large Synoptic Survey Telescope and the space-based Wide-Field Infrared Survey Telescope will collect thousands of SNeIa to use as a primary dark energy probe. These large surveys will be systematics limited, which makes it imperative for our insight regarding systematics to dramatically increase over the next decade for SNeIa to continue to contribute to precision cosmology. I approach this problem by improving statistical methods in the likelihood analysis and collecting near infrared (NIR) SNeIa with their host galaxies to improve the nearby data set and search for additional systematics. Using more statistically robust methods to account for systematics within the likelihood function can increase accuracy in cosmological parameters with a minimal precision loss. Though a sample of at least 10,000 SNeIa is necessary to confirm multiple populations of SNeIa, the bias in cosmology is ˜ 2 sigma with only 2,500 SNeIa. This work focused on an example systematic (host galaxy correlations), but it can be generalized for any systematic that can be represented by a distribution of multiple Gaussians. The SweetSpot survey gathered 114 low-redshift, NIR SNeIa that will act as a crucial anchor sample for the future high redshift surveys. NIR observations are not as affected by dust contamination, which may lead to increased understanding of systematics seen in optical wavelengths. We obtained spatially resolved spectra for 32 SweetSpot host galaxies to test for local host galaxy correlations. For the first time, we probe global host galaxy correlations with NIR brightnesses from the current literature

  5. EPR spectroscopy of MRI-related Gd(III) complexes: simultaneous analysis of multiple frequency and temperature spectra, including static and transient crystal field effects.

    Science.gov (United States)

    Rast, S; Borel, A; Helm, L; Belorizky, E; Fries, P H; Merbach, A E

    2001-03-21

    For the first time, a very general theoretical method is proposed to interpret the full electron paramagnetic resonance (EPR) spectra at multiple temperatures and frequencies in the important case of S-state metal ions complexed in liquid solution. This method is illustrated by a careful analysis of the measured spectra of two Gd3+ (S = 7/2) complexes. It is shown that the electronic relaxation mechanisms at the origin of the EPR line shape arise from the combined effects of the modulation of the static crystal field by the random Brownian rotation of the complex and of the transient zero-field splitting. A detailed study of the static crystal field mechanism shows that, contrarily to the usual global models involving only second-order terms, the fourth and sixth order terms can play a non-negligible role. The obtained parameters are well interpreted in the framework of the physics of the various underlying relaxation processes. A better understanding of these mechanisms is highly valuable since they partly control the efficiency of paramagnetic metal ions in contrast agents for medical magnetic resonance imaging (MRI).

  6. Challenges and Experiences of Building Multidisciplinary Datasets across Cultures

    Science.gov (United States)

    Jamiyansharav, K.; Laituri, M.; Fernandez-Gimenez, M.; Fassnacht, S. R.; Venable, N. B. H.; Allegretti, A. M.; Reid, R.; Baival, B.; Jamsranjav, C.; Ulambayar, T.; Linn, S.; Angerer, J.

    2017-12-01

    Efficient data sharing and management are key challenges to multidisciplinary scientific research. These challenges are further complicated by adding a multicultural component. We address the construction of a complex database for social-ecological analysis in Mongolia. Funded by the National Science Foundation (NSF) Dynamics of Coupled Natural and Human (CNH) Systems, the Mongolian Rangelands and Resilience (MOR2) project focuses on the vulnerability of Mongolian pastoral systems to climate change and adaptive capacity. The MOR2 study spans over three years of fieldwork in 36 paired districts (Soum) from 18 provinces (Aimag) of Mongolia that covers steppe, mountain forest steppe, desert steppe and eastern steppe ecological zones. Our project team is composed of hydrologists, social scientists, geographers, and ecologists. The MOR2 database includes multiple ecological, social, meteorological, geospatial and hydrological datasets, as well as archives of original data and survey in multiple formats. Managing this complex database requires significant organizational skills, attention to detail and ability to communicate within collective team members from diverse disciplines and across multiple institutions in the US and Mongolia. We describe the database's rich content, organization, structure and complexity. We discuss lessons learned, best practices and recommendations for complex database management, sharing, and archiving in creating a cross-cultural and multi-disciplinary database.

  7. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  8. SPATIO-TEMPORAL DATA MODEL FOR INTEGRATING EVOLVING NATION-LEVEL DATASETS

    Directory of Open Access Journals (Sweden)

    A. Sorokine

    2017-10-01

    Full Text Available Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc. and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets. Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.

  9. Spatio-Temporal Data Model for Integrating Evolving Nation-Level Datasets

    Science.gov (United States)

    Sorokine, A.; Stewart, R. N.

    2017-10-01

    Ability to easily combine the data from diverse sources in a single analytical workflow is one of the greatest promises of the Big Data technologies. However, such integration is often challenging as datasets originate from different vendors, governments, and research communities that results in multiple incompatibilities including data representations, formats, and semantics. Semantics differences are hardest to handle: different communities often use different attribute definitions and associate the records with different sets of evolving geographic entities. Analysis of global socioeconomic variables across multiple datasets over prolonged time is often complicated by the difference in how boundaries and histories of countries or other geographic entities are represented. Here we propose an event-based data model for depicting and tracking histories of evolving geographic units (countries, provinces, etc.) and their representations in disparate data. The model addresses the semantic challenge of preserving identity of geographic entities over time by defining criteria for the entity existence, a set of events that may affect its existence, and rules for mapping between different representations (datasets). Proposed model is used for maintaining an evolving compound database of global socioeconomic and environmental data harvested from multiple sources. Practical implementation of our model is demonstrated using PostgreSQL object-relational database with the use of temporal, geospatial, and NoSQL database extensions.

  10. Novel interstitial deletion of 10q24.3-25.1 associated with multiple congenital anomalies including lobar holoprosencephaly, cleft lip and palate, and hypoplastic kidneys.

    Science.gov (United States)

    Peltekova, Iskra T; Hurteau-Millar, Julie; Armour, Christine M

    2014-12-01

    Chromosome 10q deletions are rare and phenotypically diverse. Such deletions differ in length and occur in numerous regions on the long arm of chromosome 10, accounting for the wide clinical variability. Commonly reported findings include dysmorphic facial features, microcephaly, developmental delay, and genitourinary abnormalities. Here, we report on a female patient with a novel interstitial 5.54 Mb deletion at 10q24.31-q25.1. This patient had findings in common with a previously reported patient with an overlapping deletion, including renal anomalies and an orofacial cleft, but also demonstrated lobar holoprosencephaly and a Dandy-Walker malformation, features which have not been previously reported with 10q deletions. An analysis of the region deleted in our patient showed numerous genes, such as KAZALD1, PAX2, SEMA4G, ACTRA1, INA, and FGF8, whose putative functions may have played a role in the phenotype seen in our patient. © 2014 Wiley Periodicals, Inc.

  11. Epigenetic profiling of cutaneous T-cell lymphoma: promoter hypermethylation of multiple tumor suppressor genes including BCL7a, PTPRG, and p73.

    Science.gov (United States)

    van Doorn, Remco; Zoutman, Willem H; Dijkman, Remco; de Menezes, Renee X; Commandeur, Suzan; Mulder, Aat A; van der Velden, Pieter A; Vermeer, Maarten H; Willemze, Rein; Yan, Pearlly S; Huang, Tim H; Tensen, Cornelis P

    2005-06-10

    To analyze the occurrence of promoter hypermethylation in primary cutaneous T-cell lymphoma (CTCL) on a genome-wide scale, focusing on epigenetic alterations with pathogenetic significance. DNA isolated from biopsy specimens of 28 patients with CTCL, including aggressive CTCL entities (transformed mycosis fungoides and CD30-negative large T-cell lymphoma) and an indolent entity (CD30-positive large T-cell lymphoma), were investigated. For genome-wide DNA methylation screening, differential methylation hybridization using CpG island microarrays was applied, which allows simultaneous detection of the methylation status of 8640 CpG islands. Bisulfite sequence analysis was applied for confirmation and detection of hypermethylation of eight selected tumor suppressor genes. The DNA methylation patterns of CTCLs emerging from differential methylation hybridization analysis included 35 CpG islands hypermethylated in at least four of the 28 studied CTCL samples when compared with benign T-cell samples. Hypermethylation of the putative tumor suppressor genes BCL7a (in 48% of CTCL samples), PTPRG (27%), and thrombospondin 4 (52%) was confirmed and demonstrated to be associated with transcriptional downregulation. BCL7a was hypermethylated at a higher frequency in aggressive (64%) than in indolent (14%) CTCL entities. In addition, the promoters of the selected tumor suppressor genes p73 (48%), p16 (33%), CHFR (19%), p15 (10%), and TMS1 (10%) were hypermethylated in CTCL. Malignant T cells of patients with CTCL display widespread promoter hypermethylation associated with inactivation of several tumor suppressor genes involved in DNA repair, cell cycle, and apoptosis signaling pathways. In view of this, CTCL may be amenable to treatment with demethylating agents.

  12. Electrophysiological heterogeneity of pacemaker cells in the rabbit intercaval region, including the SA node: insights from recording multiple ion currents in each cell.

    Science.gov (United States)

    Monfredi, Oliver; Tsutsui, Kenta; Ziman, Bruce; Stern, Michael D; Lakatta, Edward G; Maltsev, Victor A

    2018-03-01

    Cardiac pacemaker cells, including cells of the sinoatrial node, are heterogeneous in size, morphology, and electrophysiological characteristics. The exact extent to which these cells differ electrophysiologically is unclear yet is critical to understanding their functioning. We examined major ionic currents in individual intercaval pacemaker cells (IPCs) sampled from the paracristal, intercaval region (including the sinoatrial node) that were spontaneously beating after enzymatic isolation from rabbit hearts. The beating rate was measured at baseline and after inhibition of the Ca 2+ pump with cyclopiazonic acid. Thereafter, in each cell, we consecutively measured the density of funny current ( I f ), delayed rectifier K + current ( I K ) (a surrogate of repolarization capacity), and L-type Ca 2+ current ( I Ca,L ) using whole cell patch clamp . The ionic current densities varied to a greater extent than previously appreciated, with some IPCs demonstrating very small or zero I f . The density of none of the currents was correlated with cell size, while I Ca,L and I f densities were related to baseline beating rates. I f density was correlated with I K density but not with that of I Ca,L . Inhibition of Ca 2+ cycling had a greater beating rate slowing effect in IPCs with lower I f densities. Our numerical model simulation indicated that 1) IPCs with small (or zero) I f or small I Ca,L can operate via a major contribution of Ca 2+ clock, 2) I f -Ca 2+ -clock interplay could be important for robust pacemaking function, and 3) coupled I f - I K function could regulate maximum diastolic potential. Thus, we have demonstrated marked electrophysiological heterogeneity of IPCs. This heterogeneity is manifested in basal beating rate and response to interference of Ca 2+ cycling, which is linked to I f . NEW & NOTEWORTHY In the present study, a hitherto unrecognized range of heterogeneity of ion currents in pacemaker cells from the intercaval region is demonstrated

  13. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  14. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  15. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  16. A de novo 1q22q23.1 Interstitial Microdeletion in a Girl with Intellectual Disability and Multiple Congenital Anomalies Including Congenital Heart Defect.

    Science.gov (United States)

    Aleksiūnienė, Beata; Preiksaitiene, Egle; Morkūnienė, Aušra; Ambrozaitytė, Laima; Utkus, Algirdas

    2018-01-01

    Many studies have shown that molecular karyotyping is an effective diagnostic tool in individuals with developmental delay/intellectual disability. We report on a de novo interstitial 1q22q23.1 microdeletion, 1.6 Mb in size, detected in a patient with short stature, microcephaly, hypoplastic corpus callosum, cleft palate, minor facial anomalies, congenital heart defect, camptodactyly of the 4-5th fingers, and intellectual disability. Chromosomal microarray analysis revealed a 1.6-Mb deletion in the 1q22q23.1 region, arr[GRCh37] 1q22q23.1(155630752_157193893)×1. Real-time PCR analysis confirmed its de novo origin. The deleted region encompasses 50 protein-coding genes, including the morbid genes APOA1BP, ARHGEF2, LAMTOR2, LMNA, NTRK1, PRCC, RIT1, SEMA4A, and YY1AP1. Although the unique phenotype observed in our patient can arise from the haploinsufficiency of the dosage-sensitive LMNA gene, the dosage imbalance of other genes implicated in the rearrangement could also contribute to the phenotype. Further studies are required for the delineation of the phenotype associated with this rare chromosomal alteration and elucidation of the critical genes for manifestation of the specific clinical features. © 2018 S. Karger AG, Basel.

  17. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  18. Multiple factors, including non-motor impairments, influence decision making with regard to exercise participation in Parkinson's disease: a qualitative enquiry.

    Science.gov (United States)

    O'Brien, Christine; Clemson, Lindy; Canning, Colleen G

    2016-01-01

    To explore how the meaning of exercise and other factors interact and influence the exercise behaviour of individuals with Parkinson's disease (PD) enrolled in a 6-month minimally supervised exercise program to prevent falls, regardless of whether they completed the prescribed exercise or not. This qualitative study utilised in-depth semi-structured interviews analysed using grounded theory methodology. Four main themes were constructed from the data: adapting to change and loss, the influence of others, making sense of the exercise experience and hope for a more active future. Participation in the PD-specific physiotherapy program involving group exercise provided an opportunity for participants to reframe their identity of their "active" self. Three new influences on exercise participation were identified and explored: non-motor impairments of apathy and fatigue, the belief in a finite energy quota, and the importance of feedback. A model was developed incorporating the themes and influences to explain decision-making for exercise participation in this group. Complex and interacting issues, including non-motor impairments, need to be considered in order to enhance the development and ongoing implementation of effective exercise programmes for people with PD. Exercise participation can assist individuals to reframe their identity as they are faced with losses associated with Parkinson's disease and ageing. Non-motor impairments of apathy and fatigue may influence exercise participation in people with Parkinson's disease. Particular attention needs to be paid to the provision of feedback in exercise programs for people with Parkinson's disease as it important for their decision-making about continuing exercise.

  19. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  20. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  1. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  2. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  3. Fusing Multiple Satellite Datasets Toward Defining and Understanding Organized Convection

    Science.gov (United States)

    Elsaesser, G.; Del Genio, A. D.

    2017-12-01

    How do we differentiate unorganized from organized convection? We might think of organized convection as being long lasting (at least longer than the lifetime of any individual cumulus cell), clustered at larger spatial scales (>100 km), and responsible for substantial rainfall accumulation. Organized convection is sustained on such scales due to the arrangement of moist/dry and buoyant/non-buoyant mesoscale circulations. The nature of these circulations is tied to system diabatic heating profiles; in particular, the 2nd baroclinic (top-heavy), stratiform heating mode is thought to be important for organized convection maintenance/propagation. We investigate the extent to which these characteristics are jointly found in propagating convective systems. Lifecycle information comes from hi-res IR data. Diabatic heating profiles, convective fractions and rainfall are provided by GPM retrievals mapped to convective system tracks. Moisture is provided by AIRS/AMSU and passive microwave retrievals. Instead of compositing heating profile information along a system track, where information is smoothed out, we sort system heating profile structures according to their "top heaviness" and then analyze PDFs of system rainfall, system sizes, durations, convective/stratiform ratios, etc. as a function of diabatic heating structure. Perhaps contrary to expectation, we find only small differences in PDFs of rainfall rates, system sizes, and system duration for different heating profile structures. If organization is defined according to heating structures, then one possible interpretation of these results is that organization is independent of system size, duration, and many times, even lifecycle stage. Is it possible that most systems "hobble" along and exhibit varying degrees of organization, dependent on local environment moisture/buoyancy variations, unlike the archetypical MCS paradigm? This presentation will also discuss the questions posed above within the context of parameterizing organized convection in the GISS GCM. GCMs must make/sustain the right heating profile at the right time, which requires observations-based understanding of such distinctions. Such knowledge is important for simulating and understanding the deep convective contribution to cloud feedback in a changing climate.

  4. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  5. Unsupervised multiple kernel learning for heterogeneous data integration.

    Science.gov (United States)

    Mariette, Jérôme; Villa-Vialaneix, Nathalie

    2018-03-15

    Recent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account. We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single kernel PCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of these two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system. Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/. jerome.mariette@inra.fr or nathalie.villa-vialaneix@inra.fr. Supplementary data are available at Bioinformatics online.

  6. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  7. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  8. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  9. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Science.gov (United States)

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  10. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  11. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  12. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  13. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  14. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  15. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  16. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  17. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  18. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  19. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  20. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  1. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  2. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  3. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  4. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  5. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  6. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  7. Wehmas et al. 94-04 Toxicol Sci: Datasets for manuscript

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset includes overview text document (accepted version of manuscript) and tables, figures, and supplementary materials. Supplementary tables provide summary data...

  8. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  9. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  10. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  11. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  12. Total ozone trends from 1979 to 2016 derived from five merged observational datasets - the emergence into ozone recovery

    Science.gov (United States)

    Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Long, Craig S.; Loyola, Diego

    2018-02-01

    We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978-present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (˜ 1996 globally and ˜ 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 % decade-1 that are barely statistically significant at the 2σ uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 % decade-1, while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend

  13. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  14. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  15. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  16. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  17. Use of country of birth as an indicator of refugee background in health datasets

    Science.gov (United States)

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  18. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  19. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  20. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  1. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  2. Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

    Science.gov (United States)

    Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng

    2016-07-18

    To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. We propose a new privacy model called MS(k, θ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. With all three

  3. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  6. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  7. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  8. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  9. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  10. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  11. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  12. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  13. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  14. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  15. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  16. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  17. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction Since July, the activities have been focused on very diverse subjects: operations activities for the 2012 data-taking, Monte Carlo production and data re-processing plans for 2013 conferences (winter and summer), preparation for the Upgrades TDRs and readiness after LS1. The regular operations activities have included: changing to the 53X release at the Tier-0, regular calibrations updates, and data certification to guarantee certified data for analysis with the shortest delay from data taking. The samples, simulated at 8 TeV, have been re-reconstructed using 53X. A lot of effort has been put in their prioritisation to ensure that the samples needed for HCP and future conferences are produced on time. Given the large amount of data that have been collected in 2012 and the available computing resources, a careful planning is needed. The PPD and Physics groups worked on a master schedule for the Monte Carlo production, new conditions validation and data reprocessing. The ...

  18. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  19. A de novo 1.58 Mb deletion, including MAP2K6 and mapping 1.28 Mb upstream to SOX9, identified in a patient with Pierre Robin sequence and osteopenia with multiple fractures.

    Science.gov (United States)

    Smyk, Marta; Roeder, Elizabeth; Cheung, Sau Wai; Szafranski, Przemyslaw; Stankiewicz, Paweł

    2015-08-01

    Defects of long-range regulatory elements of dosage-sensitive genes represent an under-recognized mechanism underlying genetic diseases. Haploinsufficiency of SOX9, the gene essential for development of testes and differentiation of chondrocytes, results in campomelic dysplasia, a skeletal malformation syndrome often associated with sex reversal. Chromosomal rearrangements with breakpoints mapping up to 1.6 Mb up- and downstream to SOX9, and disrupting its distant cis-regulatory elements, have been described in patients with milder forms of campomelic dysplasia, Pierre Robin sequence, and sex reversal. We present an ∼1.58 Mb deletion mapping ∼1.28 Mb upstream to SOX9 that encompasses its putative long-range cis-regulatory element(s) and MAP2K6 in a patient with Pierre Robin sequence and osteopenia with multiple fractures. Low bone mass panel testing using massively parallel sequencing of 23 nuclear genes, including COL1A1 and COL1A2 was negative. Based on the previous mouse model of Map2k6, suggesting that Sox9 is likely a downstream target of the p38 MAPK pathway, and our previous chromosome conformation capture-on-chip (4C) data showing potential interactions between SOX9 promoter and MAP2K6, we hypothesize that deletion of MAP2K6 might have affected SOX9 expression and contributed to our patient's phenotype. © 2015 Wiley Periodicals, Inc.

  20. Problemas multiplicativos envolvendo combinatória: estratégias de resolução empregadas por alunos do Ensino Fundamental público Multiplicative problems including combinatorics: solving strategies adopted by Public Elementary School students

    Directory of Open Access Journals (Sweden)

    Leny R. M. Teixeira

    2011-01-01

    was better in problems with two variables and factors with low values. There was no alteration in the performance among the 6th and 9th graders. In general, the difficulties found were related to: 1 intuitive models students have when dealing with multiplication (especially the one including repeated addition; 2 the semantic structure of the problem; 3 numerical preferences regarding the quantity of numerical digits, ways of representing the problem and interpretation of verbal problem statements. Because multiplication is a very complex operation involving abstract cognitive processes in its solution, we believe that the teacher needs to know them to facilitate students' learning.

  1. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  2. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Science.gov (United States)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  3. Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

    Science.gov (United States)

    Williamson, A.; Newman, A. V.

    2017-12-01

    Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.

  4. Data-Driven Decision Support for Radiologists: Re-using the National Lung Screening Trial Dataset for Pulmonary Nodule Management

    OpenAIRE

    Morrison, James J.; Hostetter, Jason; Wang, Kenneth; Siegel, Eliot L.

    2014-01-01

    Real-time mining of large research trial datasets enables development of case-based clinical decision support tools. Several applicable research datasets exist including the National Lung Screening Trial (NLST), a dataset unparalleled in size and scope for studying population-based lung cancer screening. Using these data, a clinical decision support tool was developed which matches patient demographics and lung nodule characteristics to a cohort of similar patients. The NLST dataset was conve...

  5. The new Planetary Science Archive: A tool for exploration and discovery of scientific datasets from ESA's planetary missions

    Science.gov (United States)

    Heather, David

    2016-07-01

    Introduction: The Planetary Science Archive (PSA) is the European Space Agency's (ESA) repository of science data from all planetary science and exploration missions. The PSA provides access to scientific datasets through various interfaces (e.g. FTP browser, Map based, Advanced search, and Machine interface): http://archives.esac.esa.int/psa All datasets are scientifically peer-reviewed by independent scientists, and are compliant with the Planetary Data System (PDS) standards. Updating the PSA: The PSA is currently implementing a number of significant changes, both to its web-based interface to the scientific community, and to its database structure. The new PSA will be up-to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's upcoming ExoMars and BepiColombo missions. The newly designed PSA homepage will provide direct access to scientific datasets via a text search for targets or missions. This will significantly reduce the complexity for users to find their data and will promote one-click access to the datasets. Additionally, the homepage will provide direct access to advanced views and searches of the datasets. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. A login mechanism will provide additional functionalities to the users to aid / ease their searches (e.g. saving queries, managing default views). Queries to the PSA database will be possible either via the homepage (for simple searches of missions or targets), or through a filter menu for more tailored queries. The filter menu will offer multiple options to search for a particular dataset or product, and will manage queries for both in-situ and remote sensing instruments. Parameters such as start-time, phase angle, and heliocentric distance will be emphasized. A further

  6. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  7. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  8. A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

    Directory of Open Access Journals (Sweden)

    Mohammad Amin Shayegan

    2014-01-01

    Full Text Available A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.

  9. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Long-term dataset on aquatic responses to concurrent climate change and recovery from acidification

    Science.gov (United States)

    Leach, Taylor H.; Winslow, Luke A.; Acker, Frank W.; Bloomfield, Jay A.; Boylen, Charles W.; Bukaveckas, Paul A.; Charles, Donald F.; Daniels, Robert A.; Driscoll, Charles T.; Eichler, Lawrence W.; Farrell, Jeremy L.; Funk, Clara S.; Goodrich, Christine A.; Michelena, Toby M.; Nierzwicki-Bauer, Sandra A.; Roy, Karen M.; Shaw, William H.; Sutherland, James W.; Swinton, Mark W.; Winkler, David A.; Rose, Kevin C.

    2018-04-01

    Concurrent regional and global environmental changes are affecting freshwater ecosystems. Decadal-scale data on lake ecosystems that can describe processes affected by these changes are important as multiple stressors often interact to alter the trajectory of key ecological phenomena in complex ways. Due to the practical challenges associated with long-term data collections, the majority of existing long-term data sets focus on only a small number of lakes or few response variables. Here we present physical, chemical, and biological data from 28 lakes in the Adirondack Mountains of northern New York State. These data span the period from 1994-2012 and harmonize multiple open and as-yet unpublished data sources. The dataset creation is reproducible and transparent; R code and all original files used to create the dataset are provided in an appendix. This dataset will be useful for examining ecological change in lakes undergoing multiple stressors.

  11. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  12. Reconstructing missing information on precipitation datasets: impact of tails on adopted statistical distributions.

    Science.gov (United States)

    Pedretti, Daniele; Beckie, Roger Daniel

    2014-05-01

    Missing data in hydrological time-series databases are ubiquitous in practical applications, yet it is of fundamental importance to make educated decisions in problems involving exhaustive time-series knowledge. This includes precipitation datasets, since recording or human failures can produce gaps in these time series. For some applications, directly involving the ratio between precipitation and some other quantity, lack of complete information can result in poor understanding of basic physical and chemical dynamics involving precipitated water. For instance, the ratio between precipitation (recharge) and outflow rates at a discharge point of an aquifer (e.g. rivers, pumping wells, lysimeters) can be used to obtain aquifer parameters and thus to constrain model-based predictions. We tested a suite of methodologies to reconstruct missing information in rainfall datasets. The goal was to obtain a suitable and versatile method to reduce the errors given by the lack of data in specific time windows. Our analyses included both a classical chronologically-pairing approach between rainfall stations and a probability-based approached, which accounted for the probability of exceedence of rain depths measured at two or multiple stations. Our analyses proved that it is not clear a priori which method delivers the best methodology. Rather, this selection should be based considering the specific statistical properties of the rainfall dataset. In this presentation, our emphasis is to discuss the effects of a few typical parametric distributions used to model the behavior of rainfall. Specifically, we analyzed the role of distributional "tails", which have an important control on the occurrence of extreme rainfall events. The latter strongly affect several hydrological applications, including recharge-discharge relationships. The heavy-tailed distributions we considered were parametric Log-Normal, Generalized Pareto, Generalized Extreme and Gamma distributions. The methods were

  13. Application of Density Estimation Methods to Datasets from a Glider

    Science.gov (United States)

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  14. Dataset on records of Hericium erinaceus in Slovakia

    OpenAIRE

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  15. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  16. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  17. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  18. Multiple plots in R

    DEFF Research Database (Denmark)

    Edwards, Stefan McKinnon

    2012-01-01

    In this chapter I will investigate how to combine multiple plots into a single. The scenario is a dataset of a series of measurements, on three samples in three situations. There are many ways we can display this, e.g. 3d graphs or faceting. 3d graphs are not good for displaying static data so we...

  19. Vector Nonlinear Time-Series Analysis of Gamma-Ray Burst Datasets on Heterogeneous Clusters

    Directory of Open Access Journals (Sweden)

    Ioana Banicescu

    2005-01-01

    Full Text Available The simultaneous analysis of a number of related datasets using a single statistical model is an important problem in statistical computing. A parameterized statistical model is to be fitted on multiple datasets and tested for goodness of fit within a fixed analytical framework. Definitive conclusions are hopefully achieved by analyzing the datasets together. This paper proposes a strategy for the efficient execution of this type of analysis on heterogeneous clusters. Based on partitioning processors into groups for efficient communications and a dynamic loop scheduling approach for load balancing, the strategy addresses the variability of the computational loads of the datasets, as well as the unpredictable irregularities of the cluster environment. Results from preliminary tests of using this strategy to fit gamma-ray burst time profiles with vector functional coefficient autoregressive models on 64 processors of a general purpose Linux cluster demonstrate the effectiveness of the strategy.

  20. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  1. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  2. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  3. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  4. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  5. Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines

    NARCIS (Netherlands)

    Ellrott, Kyle; Bailey, Matthew H.; Saksena, Gordon; Covington, Kyle R.; Kandoth, Cyriac; Stewart, Chip; Hess, Julian; Ma, Singer; Chiotti, Kami E.; McLellan, Michael; Sofia, Heidi J.; Hutter, Carolyn M.; Getz, Gad; Wheeler, David A.; Ding, Li; Caesar-Johnson, Samantha J.; Demchok, John A.; Felau, Ina; Kasapi, Melpomeni; Ferguson, Martin L.; Hutter, Carolyn M.; Sofia, Heidi J.; Tarnuzzer, Roy; Wang, Zhining; Yang, Liming; Zenklusen, Jean C.; Zhang, Jiashan (Julia); Chudamani, Sudha; Liu, Jia; Lolla, Laxmi; Naresh, Rashi; Pihl, Todd; Sun, Qiang; Wan, Yunhu; Wu, Ye; Cho, Juok; DeFreitas, Timothy; Frazer, Scott; Gehlenborg, Nils; Getz, Gad; Heiman, David I.; Kim, Jaegil; Lawrence, Michael S.; Lin, Pei; Meier, Sam; Noble, Michael S.; Saksena, Gordon; Voet, Doug; Zhang, Hailei; Bernard, Brady; Chambwe, Nyasha; Dhankani, Varsha; Knijnenburg, Theo; Kramer, Roger; Leinonen, Kalle; Liu, Yuexin; Miller, Michael; Reynolds, Sheila; Shmulevich, Ilya; Thorsson, Vesteinn; Zhang, Wei; Akbani, Rehan; Broom, Bradley M.; Hegde, Apurva M.; Ju, Zhenlin; Kanchi, Rupa S.; Korkut, Anil; Li, Jun; Liang, Han; Ling, Shiyun; Liu, Wenbin; Lu, Yiling; Mills, Gordon B.; Ng, Kwok Shing; Rao, Arvind; Ryan, Michael; Wang, Jing; Weinstein, John N.; Zhang, Jiexin; Abeshouse, Adam; Armenia, Joshua; Chakravarty, Debyani; Chatila, Walid K.; de Bruijn, Ino; Gao, Jianjiong; Gross, Benjamin E.; Heins, Zachary J.; Kundra, Ritika; La, Konnor; Ladanyi, Marc; Luna, Augustin; Nissan, Moriah G.; Ochoa, Angelica; Phillips, Sarah M.; Reznik, Ed; Sanchez-Vega, Francisco; Sander, Chris; Schultz, Nikolaus; Sheridan, Robert; Sumer, S. Onur; Sun, Yichao; Taylor, Barry S.; Wang, Jioajiao; Zhang, Hongxin; Anur, Pavana; Peto, Myron; Spellman, Paul; Benz, Christopher; Stuart, Joshua M.; Wong, Christopher K.; Yau, Christina; Hayes, D. Neil; Wilkerson, Matthew D.; Ally, Adrian; Balasundaram, Miruna; Bowlby, Reanne; Brooks, Denise; Carlsen, Rebecca; Chuah, Eric; Dhalla, Noreen; Holt, Robert; Jones, Steven J.M.; Kasaian, Katayoon; Lee, Darlene; Ma, Yussanne; Marra, Marco A.; Mayo, Michael; Moore, Richard A.; Mungall, Andrew J.; Mungall, Karen; Robertson, A. Gordon; Sadeghi, Sara; Schein, Jacqueline E.; Sipahimalani, Payal; Tam, Angela; Thiessen, Nina; Tse, Kane; Wong, Tina; Berger, Ashton C.; Beroukhim, Rameen; Cherniack, Andrew D.; Cibulskis, Carrie; Gabriel, Stacey B.; Gao, Galen F.; Ha, Gavin; Meyerson, Matthew; Schumacher, Steven E.; Shih, Juliann; Kucherlapati, Melanie H.; Kucherlapati, Raju S.; Baylin, Stephen; Cope, Leslie; Danilova, Ludmila; Bootwalla, Moiz S.; Lai, Phillip H.; Maglinte, Dennis T.; Van Den Berg, David J.; Weisenberger, Daniel J.; Auman, J. Todd; Balu, Saianand; Bodenheimer, Tom; Fan, Cheng; Hoadley, Katherine A.; Hoyle, Alan P.; Jefferys, Stuart R.; Jones, Corbin D.; Meng, Shaowu; Mieczkowski, Piotr A.; Mose, Lisle E.; Perou, Amy H.; Perou, Charles M.; Roach, Jeffrey; Shi, Yan; Simons, Janae V.; Skelly, Tara; Soloway, Matthew G.; Tan, Donghui; Veluvolu, Umadevi; Fan, Huihui; Hinoue, Toshinori; Laird, Peter W.; Shen, Hui; Zhou, Wanding; Bellair, Michelle; Chang, Kyle; Covington, Kyle; Creighton, Chad J.; Dinh, Huyen; Doddapaneni, Harsha Vardhan; Donehower, Lawrence A.; Drummond, Jennifer; Gibbs, Richard A.; Glenn, Robert; Hale, Walker; Han, Yi; Hu, Jianhong; Korchina, Viktoriya; Lee, Sandra; Lewis, Lora; Li, Wei; Liu, Xiuping; Morgan, Margaret; Morton, Donna; Muzny, Donna; Santibanez, Jireh; Sheth, Margi; Shinbrot, Eve; Wang, Linghua; Wang, Min; Wheeler, David A.; Xi, Liu; Zhao, Fengmei; Hess, Julian; Appelbaum, Elizabeth L.; Bailey, Matthew; Cordes, Matthew G.; Ding, Li; Fronick, Catrina C.; Fulton, Lucinda A.; Fulton, Robert S.; Kandoth, Cyriac; Mardis, Elaine R.; McLellan, Michael D.; Miller, Christopher A.; Schmidt, Heather K.; Wilson, Richard K.; Crain, Daniel; Curley, Erin; Gardner, Johanna; Lau, Kevin; Mallery, David; Morris, Scott; Paulauskis, Joseph; Penny, Robert; Shelton, Candace; Shelton, Troy; Sherman, Mark; Thompson, Eric; Yena, Peggy; Bowen, Jay; Gastier-Foster, Julie M.; Gerken, Mark; Leraas, Kristen M.; Lichtenberg, Tara M.; Ramirez, Nilsa C.; Wise, Lisa; Zmuda, Erik; Corcoran, Niall; Costello, Tony; Hovens, Christopher; Carvalho, Andre L.; de Carvalho, Ana C.; Fregnani, José H.; Longatto-Filho, Adhemar; Reis, Rui M.; Scapulatempo-Neto, Cristovam; Silveira, Henrique C.S.; Vidal, Daniel O.; Burnette, Andrew; Eschbacher, Jennifer; Hermes, Beth; Noss, Ardene; Singh, Rosy; Anderson, Matthew L.; Castro, Patricia D.; Ittmann, Michael; Huntsman, David; Kohl, Bernard; Le, Xuan; Thorp, Richard; Andry, Chris; Duffy, Elizabeth R.; Lyadov, Vladimir; Paklina, Oxana; Setdikova, Galiya; Shabunin, Alexey; Tavobilov, Mikhail; McPherson, Christopher; Warnick, Ronald; Berkowitz, Ross; Cramer, Daniel; Feltmate, Colleen; Horowitz, Neil; Kibel, Adam; Muto, Michael; Raut, Chandrajit P.; Malykh, Andrei; Barnholtz-Sloan, Jill S.; Barrett, Wendi; Devine, Karen; Fulop, Jordonna; Ostrom, Quinn T.; Shimmel, Kristen; Wolinsky, Yingli; Sloan, Andrew E.; De Rose, Agostino; Giuliante, Felice; Goodman, Marc; Karlan, Beth Y.; Hagedorn, Curt H.; Eckman, John; Harr, Jodi; Myers, Jerome; Tucker, Kelinda; Zach, Leigh Anne; Deyarmin, Brenda; Hu, Hai; Kvecher, Leonid; Larson, Caroline; Mural, Richard J.; Somiari, Stella; Vicha, Ales; Zelinka, Tomas; Bennett, Joseph; Iacocca, Mary; Rabeno, Brenda; Swanson, Patricia; Latour, Mathieu; Lacombe, Louis; Têtu, Bernard; Bergeron, Alain; McGraw, Mary; Staugaitis, Susan M.; Chabot, John; Hibshoosh, Hanina; Sepulveda, Antonia; Su, Tao; Wang, Timothy; Potapova, Olga; Voronina, Olga; Desjardins, Laurence; Mariani, Odette; Roman-Roman, Sergio; Sastre, Xavier; Stern, Marc Henri; Cheng, Feixiong; Signoretti, Sabina; Berchuck, Andrew; Bigner, Darell; Lipp, Eric; Marks, Jeffrey; McCall, Shannon; McLendon, Roger; Secord, Angeles; Sharp, Alexis; Behera, Madhusmita; Brat, Daniel J.; Chen, Amy; Delman, Keith; Force, Seth; Khuri, Fadlo; Magliocca, Kelly; Maithel, Shishir; Olson, Jeffrey J.; Owonikoko, Taofeek; Pickens, Alan; Ramalingam, Suresh; Shin, Dong M.; Sica, Gabriel; Van Meir, Erwin G.; Zhang, Hongzheng; Eijckenboom, Wil; Gillis, Ad; Korpershoek, Esther; Looijenga, Leendert; Oosterhuis, Wolter; Stoop, Hans; van Kessel, Kim E.; Zwarthoff, Ellen C.; Calatozzolo, Chiara; Cuppini, Lucia; Cuzzubbo, Stefania; DiMeco, Francesco; Finocchiaro, Gaetano; Mattei, Luca; Perin, Alessandro; Pollo, Bianca; Chen, Chu; Houck, John; Lohavanichbutr, Pawadee; Hartmann, Arndt; Stoehr, Christine; Stoehr, Robert; Taubert, Helge; Wach, Sven; Wullich, Bernd; Kycler, Witold; Murawa, Dawid; Wiznerowicz, Maciej; Chung, Ki; Edenfield, W. Jeffrey; Martin, Julie; Baudin, Eric; Bubley, Glenn; Bueno, Raphael; De Rienzo, Assunta; Richards, William G.; Kalkanis, Steven; Mikkelsen, Tom; Noushmehr, Houtan; Scarpace, Lisa; Girard, Nicolas; Aymerich, Marta; Campo, Elias; Giné, Eva; Guillermo, Armando López; Van Bang, Nguyen; Hanh, Phan Thi; Phu, Bui Duc; Tang, Yufang; Colman, Howard; Evason, Kimberley; Dottino, Peter R.; Martignetti, John A.; Gabra, Hani; Juhl, Hartmut; Akeredolu, Teniola; Stepa, Serghei; Hoon, Dave; Ahn, Keunsoo; Kang, Koo Jeong; Beuschlein, Felix; Breggia, Anne; Birrer, Michael; Bell, Debra; Borad, Mitesh; Bryce, Alan H.; Castle, Erik; Chandan, Vishal; Cheville, John; Copland, John A.; Farnell, Michael; Flotte, Thomas; Giama, Nasra; Ho, Thai; Kendrick, Michael; Kocher, Jean Pierre; Kopp, Karla; Moser, Catherine; Nagorney, David; O'Brien, Daniel; O'Neill, Brian Patrick; Patel, Tushar; Petersen, Gloria; Que, Florencia; Rivera, Michael; Roberts, Lewis; Smallridge, Robert; Smyrk, Thomas; Stanton, Melissa; Thompson, R. Houston; Torbenson, Michael; Yang, Ju Dong; Zhang, Lizhi; Brimo, Fadi; Ajani, Jaffer A.; Angulo Gonzalez, Ana Maria; Behrens, Carmen; Bondaruk, Jolanta; Broaddus, Russell; Czerniak, Bogdan; Esmaeli, Bita; Fujimoto, Junya; Gershenwald, Jeffrey; Guo, Charles; Lazar, Alexander J.; Logothetis, Christopher; Meric-Bernstam, Funda; Moran, Cesar; Ramondetta, Lois; Rice, David; Sood, Anil; Tamboli, Pheroze; Thompson, Timothy; Troncoso, Patricia; Tsao, Anne; Wistuba, Ignacio; Carter, Candace; Haydu, Lauren; Hersey, Peter; Jakrot, Valerie; Kakavand, Hojabr; Kefford, Richard; Lee, Kenneth; Long, Georgina; Mann, Graham; Quinn, Michael; Saw, Robyn; Scolyer, Richard; Shannon, Kerwin; Spillane, Andrew; Stretch, Jonathan; Synott, Maria; Thompson, John; Wilmott, James; Al-Ahmadie, Hikmat; Chan, Timothy A.; Ghossein, Ronald; Gopalan, Anuradha; Levine, Douglas A.; Reuter, Victor; Singer, Samuel; Singh, Bhuvanesh; Tien, Nguyen Viet; Broudy, Thomas; Mirsaidi, Cyrus; Nair, Praveen; Drwiega, Paul; Miller, Judy; Smith, Jennifer; Zaren, Howard; Park, Joong Won; Hung, Nguyen Phi; Kebebew, Electron; Linehan, W. Marston; Metwalli, Adam R.; Pacak, Karel; Pinto, Peter A.; Schiffman, Mark; Schmidt, Laura S.; Vocke, Cathy D.; Wentzensen, Nicolas; Worrell, Robert; Yang, Hannah; Moncrieff, Marc; Goparaju, Chandra; Melamed, Jonathan; Pass, Harvey; Botnariuc, Natalia; Caraman, Irina; Cernat, Mircea; Chemencedji, Inga; Clipca, Adrian; Doruc, Serghei; Gorincioi, Ghenadie; Mura, Sergiu; Pirtac, Maria; Stancul, Irina; Tcaciuc, Diana; Albert, Monique; Alexopoulou, Iakovina; Arnaout, Angel; Bartlett, John; Engel, Jay; Gilbert, Sebastien; Parfitt, Jeremy; Sekhon, Harman; Thomas, George; Rassl, Doris M.; Rintoul, Robert C.; Bifulco, Carlo; Tamakawa, Raina; Urba, Walter; Hayward, Nicholas; Timmers, Henri; Antenucci, Anna; Facciolo, Francesco; Grazi, Gianluca; Marino, Mirella; Merola, Roberta; de Krijger, Ronald; Gimenez-Roqueplo, Anne Paule; Piché, Alain; Chevalier, Simone; McKercher, Ginette; Birsoy, Kivanc; Barnett, Gene; Brewer, Cathy; Farver, Carol; Naska, Theresa; Pennell, Nathan A.; Raymond, Daniel; Schilero, Cathy; Smolenski, Kathy; Williams, Felicia; Morrison, Carl; Borgia, Jeffrey A.; Liptay, Michael J.; Pool, Mark; Seder, Christopher W.; Junker, Kerstin; Omberg, Larsson; Dinkin, Mikhail; Manikhas, George; Alvaro, Domenico; Bragazzi, Maria Consiglia; Cardinale, Vincenzo; Carpino, Guido; Gaudio, Eugenio; Chesla, David; Cottingham, Sandra; Dubina, Michael; Moiseenko, Fedor; Dhanasekaran, Renumathy; Becker, Karl Friedrich; Janssen, Klaus Peter; Slotta-Huspenina, Julia; Abdel-Rahman, Mohamed H.; Aziz, Dina; Bell, Sue; Cebulla, Colleen M.; Davis, Amy; Duell, Rebecca; Elder, J. Bradley; Hilty, Joe; Kumar, Bahavna; Lang, James; Lehman, Norman L.; Mandt, Randy; Nguyen, Phuong; Pilarski, Robert; Rai, Karan; Schoenfield, Lynn; Senecal, Kelly; Wakely, Paul; Hansen, Paul; Lechan, Ronald; Powers, James; Tischler, Arthur; Grizzle, William E.; Sexton, Katherine C.; Kastl, Alison; Henderson, Joel; Porten, Sima; Waldmann, Jens; Fassnacht, Martin; Asa, Sylvia L.; Schadendorf, Dirk; Couce, Marta; Graefen, Markus; Huland, Hartwig; Sauter, Guido; Schlomm, Thorsten; Simon, Ronald; Tennstedt, Pierre; Olabode, Oluwole; Nelson, Mark; Bathe, Oliver; Carroll, Peter R.; Chan, June M.; Disaia, Philip; Glenn, Pat; Kelley, Robin K.; Landen, Charles N.; Phillips, Joanna; Prados, Michael; Simko, Jeffry; Smith-McCune, Karen; VandenBerg, Scott; Roggin, Kevin; Fehrenbach, Ashley; Kendler, Ady; Sifri, Suzanne; Steele, Ruth; Jimeno, Antonio; Carey, Francis; Forgie, Ian; Mannelli, Massimo; Carney, Michael; Hernandez, Brenda; Campos, Benito; Herold-Mende, Christel; Jungk, Christin; Unterberg, Andreas; von Deimling, Andreas; Bossler, Aaron; Galbraith, Joseph; Jacobus, Laura; Knudson, Michael; Knutson, Tina; Ma, Deqin; Milhem, Mohammed; Sigmund, Rita; Godwin, Andrew K.; Madan, Rashna; Rosenthal, Howard G.; Adebamowo, Clement; Adebamowo, Sally N.; Boussioutas, Alex; Beer, David; Giordano, Thomas; Mes-Masson, Anne Marie; Saad, Fred; Bocklage, Therese; Landrum, Lisa; Mannel, Robert; Moore, Kathleen; Moxley, Katherine; Postier, Russel; Walker, Joan; Zuna, Rosemary; Feldman, Michael; Valdivieso, Federico; Dhir, Rajiv; Luketich, James; Mora Pinero, Edna M.; Quintero-Aguilo, Mario; Carlotti, Carlos Gilberto; Dos Santos, Jose Sebastião; Kemp, Rafael; Sankarankuty, Ajith; Tirapelli, Daniela; Catto, James; Agnew, Kathy; Swisher, Elizabeth; Creaney, Jenette; Robinson, Bruce; Shelley, Carl Simon; Godwin, Eryn M.; Kendall, Sara; Shipman, Cassaundra; Bradford, Carol; Carey, Thomas; Haddad, Andrea; Moyer, Jeffey; Peterson, Lisa; Prince, Mark; Rozek, Laura; Wolf, Gregory; Bowman, Rayleen; Fong, Kwun M.; Yang, Ian; Korst, Robert; Rathmell, W. Kimryn; Fantacone-Campbell, J. Leigh; Hooke, Jeffrey A.; Kovatich, Albert J.; Shriver, Craig D.; DiPersio, John; Drake, Bettina; Govindan, Ramaswamy; Heath, Sharon; Ley, Timothy; Van Tine, Brian; Westervelt, Peter; Rubin, Mark A.; Lee, Jung Il; Aredes, Natália D.; Mariamidze, Armaz

    2018-01-01

    The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a

  6. National Hydrography Dataset Plus High Resolution (NHDPlus HR) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The High Resolution National Hydrography Dataset Plus (NHDPlus HR) is an integrated set of geospatial data layers, including the best available National Hydrography...

  7. Dataset for Testing Contamination Source Identification Methods for Water Distribution Networks

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset includes the results of a simulation study using the source inversion techniques available in the Water Security Toolkit. The data was created to test...

  8. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  9. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  10. Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets

    Directory of Open Access Journals (Sweden)

    Paul H. Lee

    2014-09-01

    Full Text Available In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset, and area under the receiver operating characteristic curve (AUC was computed using the remaining 30% of the sample for evaluation (testing dataset. CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees were also examined. CARTs fitted on the oversampled (AUC = 0.70 and undersampled training data (AUC = 0.74 yielded a better classification power than that on the training data (AUC = 0.65. Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests

  11. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

    Science.gov (United States)

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED

  12. Prognostic breast cancer signature identified from 3D culture model accurately predicts clinical outcome across independent datasets

    Energy Technology Data Exchange (ETDEWEB)

    Martin, Katherine J.; Patrick, Denis R.; Bissell, Mina J.; Fournier, Marcia V.

    2008-10-20

    One of the major tenets in breast cancer research is that early detection is vital for patient survival by increasing treatment options. To that end, we have previously used a novel unsupervised approach to identify a set of genes whose expression predicts prognosis of breast cancer patients. The predictive genes were selected in a well-defined three dimensional (3D) cell culture model of non-malignant human mammary epithelial cell morphogenesis as down-regulated during breast epithelial cell acinar formation and cell cycle arrest. Here we examine the ability of this gene signature (3D-signature) to predict prognosis in three independent breast cancer microarray datasets having 295, 286, and 118 samples, respectively. Our results show that the 3D-signature accurately predicts prognosis in three unrelated patient datasets. At 10 years, the probability of positive outcome was 52, 51, and 47 percent in the group with a poor-prognosis signature and 91, 75, and 71 percent in the group with a good-prognosis signature for the three datasets, respectively (Kaplan-Meier survival analysis, p<0.05). Hazard ratios for poor outcome were 5.5 (95% CI 3.0 to 12.2, p<0.0001), 2.4 (95% CI 1.6 to 3.6, p<0.0001) and 1.9 (95% CI 1.1 to 3.2, p = 0.016) and remained significant for the two larger datasets when corrected for estrogen receptor (ER) status. Hence the 3D-signature accurately predicts breast cancer outcome in both ER-positive and ER-negative tumors, though individual genes differed in their prognostic ability in the two subtypes. Genes that were prognostic in ER+ patients are AURKA, CEP55, RRM2, EPHA2, FGFBP1, and VRK1, while genes prognostic in ER patients include ACTB, FOXM1 and SERPINE2 (Kaplan-Meier p<0.05). Multivariable Cox regression analysis in the largest dataset showed that the 3D-signature was a strong independent factor in predicting breast cancer outcome. The 3D-signature accurately predicts breast cancer outcome across multiple datasets and holds prognostic

  13. Multiple homicides.

    Science.gov (United States)

    Copeland, A R

    1989-09-01

    A study of multiple homicides or multiple deaths involving a solitary incident of violence by another individual was performed on the case files of the Office of the Medical Examiner of Metropolitan Dade County in Miami, Florida, during 1983-1987. A total of 107 multiple homicides were studied: 88 double, 17 triple, one quadruple, and one quintuple. The 236 victims were analyzed regarding age, race, sex, cause of death, toxicologic data, perpetrator, locale of the incident, and reason for the incident. This article compares this type of slaying with other types of homicide including those perpetrated by serial killers. Suggestions for future research in this field are offered.

  14. Orthology detection combining clustering and synteny for very large datasets.

    Science.gov (United States)

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  15. Orthology detection combining clustering and synteny for very large datasets.

    Directory of Open Access Journals (Sweden)

    Marcus Lechner

    Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  16. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  17. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  18. A synthetic dataset for evaluating soft and hard fusion algorithms

    Science.gov (United States)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  19. Efficacy of melflufen, a peptidase targeted therapy, and dexamethasone in an ongoing open-label phase 2a study in patients with relapsed and relapsed-refractory multiple myeloma (RRMM) including an initial report on progression free survival

    DEFF Research Database (Denmark)

    Voorhees, P. M.; Magarotto, V.; Sonneveld, P.

    2015-01-01

    to DNA or is readily metabolized by intracellular peptidases into hydrophilic alkylating metabolites. With targeted delivery of alkylating metabolites to tumor cells in vitro (such as multiple myeloma that are rich in activating peptidase), melflufen exerts a 20-100 fold higher anti-tumor potency...... and produces a 20 fold higher intracellular concentration of alkylating moieties compared with melphalan. Methods: Melflufen is evaluated in combination with dexamethasone (dex) 40 mg weekly in an ongoing Phase 1/2a study. RRMM patients with measurable disease and at least 2 prior lines of therapy are eligible......%) and constipation and epistaxis (13%). Treatment-related Grade 3 or 4 AEs were reported in 27 patients (87%). Those occurring in >5% of patients were thrombocytopenia (68%), neutropenia (55%), anemia (42%), leukopenia (32%) and febrile neutropenia, fatigue, pyrexia, asthenia and hyperglycemia each occurred in 6...

  20. Dataset for Calibration and performance of synchronous SIM/scan mode for simultaneous targeted and discovery (non-targeted) analysis of exhaled breath samples from firefighters

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset includes the tables and supplementary information from the journal article. This dataset is associated with the following publication: Wallace, A., J....

  1. Statistical Analysis of Large Simulated Yield Datasets for Studying Climate Effects

    Science.gov (United States)

    Makowski, David; Asseng, Senthold; Ewert, Frank; Bassu, Simona; Durand, Jean-Louis; Martre, Pierre; Adam, Myriam; Aggarwal, Pramod K.; Angulo, Carlos; Baron, Chritian; hide

    2015-01-01

    Many studies have been carried out during the last decade to study the effect of climate change on crop yields and other key crop characteristics. In these studies, one or several crop models were used to simulate crop growth and development for different climate scenarios that correspond to different projections of atmospheric CO2 concentration, temperature, and rainfall changes (Semenov et al., 1996; Tubiello and Ewert, 2002; White et al., 2011). The Agricultural Model Intercomparison and Improvement Project (AgMIP; Rosenzweig et al., 2013) builds on these studies with the goal of using an ensemble of multiple crop models in order to assess effects of climate change scenarios for several crops in contrasting environments. These studies generate large datasets, including thousands of simulated crop yield data. They include series of yield values obtained by combining several crop models with different climate scenarios that are defined by several climatic variables (temperature, CO2, rainfall, etc.). Such datasets potentially provide useful information on the possible effects of different climate change scenarios on crop yields. However, it is sometimes difficult to analyze these datasets and to summarize them in a useful way due to their structural complexity; simulated yield data can differ among contrasting climate scenarios, sites, and crop models. Another issue is that it is not straightforward to extrapolate the results obtained for the scenarios to alternative climate change scenarios not initially included in the simulation protocols. Additional dynamic crop model simulations for new climate change scenarios are an option but this approach is costly, especially when a large number of crop models are used to generate the simulated data, as in AgMIP. Statistical models have been used to analyze responses of measured yield data to climate variables in past studies (Lobell et al., 2011), but the use of a statistical model to analyze yields simulated by complex

  2. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  3. Dataset on records of Hericium erinaceus in Slovakia.

    Science.gov (United States)

    Kunca, Vladimír; Čiliak, Marek

    2017-06-01

    The data presented in this article are related to the research article entitled "Habitat preferences of Hericium erinaceus in Slovakia" (Kunca and Čiliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  4. Dataset on records of Hericium erinaceus in Slovakia

    Directory of Open Access Journals (Sweden)

    Vladimír Kunca

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Habitat preferences of Hericium erinaceus in Slovakia” (Kunca and Čiliak, 2016 [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  5. ClimateNet: A Machine Learning dataset for Climate Science Research

    Science.gov (United States)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  6. Common pitfalls in statistical analysis: The perils of multiple testing

    Science.gov (United States)

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2016-01-01

    Multiple testing refers to situations where a dataset is subjected to statistical testing multiple times - either at multiple time-points or through multiple subgroups or for multiple end-points. This amplifies the probability of a false-positive finding. In this article, we look at the consequences of multiple testing and explore various methods to deal with this issue. PMID:27141478

  7. CDW-EIS calculations for multiple ionization of Ne, Ar, Kr and Xe by the impact of H{sup +} and He{sup +}, including post-collisional electron emission

    Energy Technology Data Exchange (ETDEWEB)

    Montanari, C C; Miraglia, J E [Instituto de AstronomIa y FIsica del Espacio, casilla de correo 67, sucursal 28, C1428EGA, Buenos Aires (Argentina); Montenegro, E C, E-mail: mclaudia@iafe.uba.a [Instituto de FIsica, Universidade Federal de Rio de Janeiro, Caixa Postal 68528, Rio de Janeiro 21945-970, RJ (Brazil)

    2010-08-28

    We present theoretical single to quintuple ionization cross sections for Ne, Ar, Kr and Xe bombarded by H{sup +} and He{sup +}. Post-collisional contributions due to Auger-like processes are taken into account using recent photoionization data. The present continuum distorted wave-eikonal initial state (CDW-EIS) and first Born approximation results are compared with the experimental data available in the energy range of 50-10 000 keV amu{sup -1} for H{sup +} on Ne and Ar, and 50-1000 keV amu{sup -1} for the other cases. In general, the combination of the CDW-EIS with the post-collisional branching ratios describes well the multiple ionization data above 300 keV amu{sup -1}, showing a clear tendency to coalesce with the first Born approximation at high energies. The surprising result of this work is the good performance of the first Born approximation which describes rather well the experimental data of double and triple ionization, even in the intermediate energy range (50-300 keV amu{sup -1}), where direct ionization is the dominant contribution.

  8. Dataset of Passerine bird communities in a Mediterranean high mountain (Sierra Nevada, Spain)

    Science.gov (United States)

    Pérez-Luque, Antonio Jesús; Barea-Azcón, José Miguel; Álvarez-Ruiz, Lola; Bonet-García, Francisco Javier; Zamora, Regino

    2016-01-01

    Abstract In this data paper, a dataset of passerine bird communities is described in Sierra Nevada, a Mediterranean high mountain located in southern Spain. The dataset includes occurrence data from bird surveys conducted in four representative ecosystem types of Sierra Nevada from 2008 to 2015. For each visit, bird species numbers as well as distance to the transect line were recorded. A total of 27847 occurrence records were compiled with accompanying measurements on distance to the transect and animal counts. All records are of species in the order Passeriformes. Records of 16 different families and 44 genera were collected. Some of the taxa in the dataset are included in the European Red List. This dataset belongs to the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area. PMID:26865820

  9. Dataset of Passerine bird communities in a Mediterranean high mountain (Sierra Nevada, Spain).

    Science.gov (United States)

    Pérez-Luque, Antonio Jesús; Barea-Azcón, José Miguel; Álvarez-Ruiz, Lola; Bonet-García, Francisco Javier; Zamora, Regino

    2016-01-01

    In this data paper, a dataset of passerine bird communities is described in Sierra Nevada, a Mediterranean high mountain located in southern Spain. The dataset includes occurrence data from bird surveys conducted in four representative ecosystem types of Sierra Nevada from 2008 to 2015. For each visit, bird species numbers as well as distance to the transect line were recorded. A total of 27847 occurrence records were compiled with accompanying measurements on distance to the transect and animal counts. All records are of species in the order Passeriformes. Records of 16 different families and 44 genera were collected. Some of the taxa in the dataset are included in the European Red List. This dataset belongs to the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area.

  10. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  11. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  12. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  13. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  14. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    DEFF Research Database (Denmark)

    Jensen, Tue Vissing; Pinson, Pierre

    2017-01-01

    , we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven...... to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecastingof renewable power generation....

  15. Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

    OpenAIRE

    Antonio Jesús Pérez-Luque; Cristina Patricia Sánchez-Rojas; Regino Zamora; Ramón Pérez-Pérez; Francisco Javier Bonet

    2015-01-01

    Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems...

  16. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  17. Condensing Massive Satellite Datasets For Rapid Interactive Analysis

    Science.gov (United States)

    Grant, G.; Gallaher, D. W.; Lv, Q.; Campbell, G. G.; Fowler, C.; LIU, Q.; Chen, C.; Klucik, R.; McAllister, R. A.

    2015-12-01

    Our goal is to enable users to interactively analyze massive satellite datasets, identifying anomalous data or values that fall outside of thresholds. To achieve this, the project seeks to create a derived database containing only the most relevant information, accelerating the analysis process. The database is designed to be an ancillary tool for the researcher, not an archival database to replace the original data. This approach is aimed at improving performance by reducing the overall size by way of condensing the data. The primary challenges of the project include: - The nature of the research question(s) may not be known ahead of time. - The thresholds for determining anomalies may be uncertain. - Problems associated with processing cloudy, missing, or noisy satellite imagery. - The contents and method of creation of the condensed dataset must be easily explainable to users. The architecture of the database will reorganize spatially-oriented satellite imagery into temporally-oriented columns of data (a.k.a., "data rods") to facilitate time-series analysis. The database itself is an open-source parallel database, designed to make full use of clustered server technologies. A demonstration of the system capabilities will be shown. Applications for this technology include quick-look views of the data, as well as the potential for on-board satellite processing of essential information, with the goal of reducing data latency.

  18. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  19. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  20. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  1. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  2. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  3. Multiple Sclerosis.

    Science.gov (United States)

    Plummer, Nancy; Michael, Nancy, Ed.

    This module on multiple sclerosis is intended for use in inservice or continuing education programs for persons who administer medications in long-term care facilities. Instructor information, including teaching suggestions, and a listing of recommended audiovisual materials and their sources appear first. The module goal and objectives are then…

  4. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Science.gov (United States)

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  5. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  6. Advanced Neuropsychological Diagnostics Infrastructure (ANDI): A Normative Database Created from Control Datasets.

    Science.gov (United States)

    de Vent, Nathalie R; Agelink van Rentergem, Joost A; Schmand, Ben A; Murre, Jaap M J; Huizenga, Hilde M

    2016-01-01

    In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI), datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity, and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.

  7. Advanced Neuropsychological Diagnostics Infrastructure (ANDI: A Normative Database Created from Control Datasets.

    Directory of Open Access Journals (Sweden)

    Nathalie R. de Vent

    2016-10-01

    Full Text Available In the Advanced Neuropsychological Diagnostics Infrastructure (ANDI, datasets of several research groups are combined into a single database, containing scores on neuropsychological tests from healthy participants. For most popular neuropsychological tests the quantity and range of these data surpasses that of traditional normative data, thereby enabling more accurate neuropsychological assessment. Because of the unique structure of the database, it facilitates normative comparison methods that were not feasible before, in particular those in which entire profiles of scores are evaluated. In this article, we describe the steps that were necessary to combine the separate datasets into a single database. These steps involve matching variables from multiple datasets, removing outlying values, determining the influence of demographic variables, and finding appropriate transformations to normality. Also, a brief description of the current contents of the ANDI database is given.

  8. A first dataset toward a standardized community-driven global mapping of the human immunopeptidome

    Directory of Open Access Journals (Sweden)

    Pouya Faridi

    2016-06-01

    Full Text Available We present the first standardized HLA peptidomics dataset generated by the immunopeptidomics community. The dataset is composed of native HLA class I peptides as well as synthetic HLA class II peptides that were acquired in data-dependent acquisition mode using multiple types of mass spectrometers. All laboratories used the spiked-in landmark iRT peptides for retention time normalization and data analysis. The mass spectrometric data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier http://www.ebi.ac.uk/pride/archive/projects/PXD001872. The generated data were used to build HLA allele-specific peptide spectral and assay libraries, which were stored in the SWATHAtlas database. Data presented here are described in more detail in the original eLife article entitled ‘An open-source computational and data resource to analyze digital maps of immunopeptidomes’.

  9. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

    Science.gov (United States)

    Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

    2017-08-01

    At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  10. A multimodal dataset for authoring and editing multimedia content: The MAMEM project

    Directory of Open Access Journals (Sweden)

    Spiros Nikolopoulos

    2017-12-01

    Full Text Available We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate signals collected from 34 individuals (18 able-bodied and 16 motor-impaired. Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.

  11. Accounting for inertia in modal choices: some new evidence using a RP/SP dataset

    DEFF Research Database (Denmark)

    Cherchi, Elisabetta; Manca, Francesco

    2011-01-01

    effect is stable along the SP experiments. Inertia has been studied more extensively with panel datasets, but few investigations have used RP/SP datasets. In this paper we extend previous work in several ways. We test and compare several ways of measuring inertia, including measures that have been...... proposed for both short and long RP panel datasets. We also explore new measures of inertia to test for the effect of “learning” (in the sense of acquiring experience or getting more familiar with) along the SP experiment and we disentangle this effect from the pure inertia effect. A mixed logit model...... is used that allows us to account for both systematic and random taste variations in the inertia effect and for correlations among RP and SP observations. Finally we explore the relation between the utility specification (especially in the SP dataset) and the role of inertia in explaining current choices....

  12. Topographic and hydrographic GIS dataset for the Afghanistan Geological Survey and U.S. Geological Survey 2010 Minerals Project

    Science.gov (United States)

    Chirico, P.G.; Moran, T.W.

    2011-01-01

    This dataset contains a collection of 24 folders, each representing a specific U.S. Geological Survey area of interest (AOI; fig. 1), as well as datasets for AOI subsets. Each folder includes the extent, contours, Digital Elevation Model (DEM), and hydrography of the corresponding AOI, which are organized into feature vector and raster datasets. The dataset comprises a geographic information system (GIS), which is available upon request from the USGS Afghanistan programs Web site (http://afghanistan.cr.usgs.gov/minerals.php), and the maps of the 24 areas of interest of the USGS AOIs.

  13. DBcloud: Semantic Dataset for the cloud

    NARCIS (Netherlands)

    Morsey, M.; Willner, A.; Loughnane, R.; Giatili, M.; Papagianni, C.; Baldin, I.; Grosso, P.; Al-Hazmi, Y.

    2016-01-01

    In cloud environments, the process of matching requests from users with the available computing resources is a challenging task. This is even more complex in federated environments, where multiple providers cooperate to offer enhanced services, suitable for distributed applications. In order to

  14. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  15. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    Science.gov (United States)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  16. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  17. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    Science.gov (United States)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  18. An innovative privacy preserving technique for incremental datasets on cloud computing.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

    2016-08-01

    Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.

  19. Statistical segmentation of multidimensional brain datasets

    Science.gov (United States)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  20. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  1. Multiple transcription factors directly regulate Hox gene lin-39 expression in ventral hypodermal cells of the C. elegans embryo and larva, including the hypodermal fate regulators LIN-26 and ELT-6.

    Science.gov (United States)

    Liu, Wan-Ju; Reece-Hoyes, John S; Walhout, Albertha J M; Eisenmann, David M

    2014-05-13

    Hox genes encode master regulators of regional fate specification during early metazoan development. Much is known about the initiation and regulation of Hox gene expression in Drosophila and vertebrates, but less is known in the non-arthropod invertebrate model system, C. elegans. The C. elegans Hox gene lin-39 is required for correct fate specification in the midbody region, including the Vulval Precursor Cells (VPCs). To better understand lin-39 regulation and function, we aimed to identify transcription factors necessary for lin-39 expression in the VPCs, and in particular sought factors that initiate lin-39 expression in the embryo. We used the yeast one-hybrid (Y1H) method to screen for factors that bound to 13 fragments from the lin-39 region: twelve fragments contained sequences conserved between C. elegans and two other nematode species, while one fragment was known to drive reporter gene expression in the early embryo in cells that generate the VPCs. Sixteen transcription factors that bind to eight lin-39 genomic fragments were identified in yeast, and we characterized several factors by verifying their physical interactions in vitro, and showing that reduction of their function leads to alterations in lin-39 levels and lin-39::GFP reporter expression in vivo. Three factors, the orphan nuclear hormone receptor NHR-43, the hypodermal fate regulator LIN-26, and the GATA factor ELT-6 positively regulate lin-39 expression in the embryonic precursors to the VPCs. In particular, ELT-6 interacts with an enhancer that drives GFP expression in the early embryo, and the ELT-6 site we identified is necessary for proper embryonic expression. These three factors, along with the factors ZTF-17, BED-3 and TBX-9, also positively regulate lin-39 expression in the larval VPCs. These results significantly expand the number of factors known to directly bind and regulate lin-39 expression, identify the first factors required for lin-39 expression in the embryo, and hint at a

  2. Utilizing the Antarctic Master Directory to find orphan datasets

    Science.gov (United States)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  3. Intensity-Duration-Frequency curves from remote sensing datasets: direct comparison of weather radar and CMORPH over the Eastern Mediterranean

    Science.gov (United States)

    Morin, Efrat; Marra, Francesco; Peleg, Nadav; Mei, Yiwen; Anagnostou, Emmanouil N.

    2017-04-01

    Rainfall frequency analysis is used to quantify the probability of occurrence of extreme rainfall and is traditionally based on rain gauge records. The limited spatial coverage of rain gauges is insufficient to sample the spatiotemporal variability of extreme rainfall and to provide the areal information required by management and design applications. Conversely, remote sensing instruments, even if quantitative uncertain, offer coverage and spatiotemporal detail that allow overcoming these issues. In recent years, remote sensing datasets began to be used for frequency analyses, taking advantage of increased record lengths and quantitative adjustments of the data. However, the studies so far made use of concepts and techniques developed for rain gauge (i.e. point or multiple-point) data and have been validated by comparison with gauge-derived analyses. These procedures add further sources of uncertainty and prevent from isolating between data and methodological uncertainties and from fully exploiting the available information. In this study, we step out of the gauge-centered concept presenting a direct comparison between at-site Intensity-Duration-Frequency (IDF) curves derived from different remote sensing datasets on corresponding spatial scales, temporal resolutions and records. We analyzed 16 years of homogeneously corrected and gauge-adjusted C-Band weather radar estimates, high-resolution CMORPH and gauge-adjusted high-resolution CMORPH over the Eastern Mediterranean. Results of this study include: (a) good spatial correlation between radar and satellite IDFs ( 0.7 for 2-5 years return period); (b) consistent correlation and dispersion in the raw and gauge adjusted CMORPH; (c) bias is almost uniform with return period for 12-24 h durations; (d) radar identifies thicker tail distributions than CMORPH and the tail of the distributions depends on the spatial and temporal scales. These results demonstrate the potential of remote sensing datasets for rainfall

  4. Location Prediction Based on Transition Probability Matrices Constructing from Sequential Rules for Spatial-Temporal K-Anonymity Dataset

    Science.gov (United States)

    Liu, Zhao; Zhu, Yunhong; Wu, Chenxue

    2016-01-01

    Spatial-temporal k-anonymity has become a mainstream approach among techniques for protection of users’ privacy in location-based services (LBS) applications, and has been applied to several variants such as LBS snapshot queries and continuous queries. Analyzing large-scale spatial-temporal anonymity sets may benefit several LBS applications. In this paper, we propose two location prediction methods based on transition probability matrices constructing from sequential rules for spatial-temporal k-anonymity dataset. First, we define single-step sequential rules mined from sequential spatial-temporal k-anonymity datasets generated from continuous LBS queries for multiple users. We then construct transition probability matrices from mined single-step sequential rules, and normalize the transition probabilities in the transition matrices. Next, we regard a mobility model for an LBS requester as a stationary stochastic process and compute the n-step transition probability matrices by raising the normalized transition probability matrices to the power n. Furthermore, we propose two location prediction methods: rough prediction and accurate prediction. The former achieves the probabilities of arriving at target locations along simple paths those include only current locations, target locations and transition steps. By iteratively combining the probabilities for simple paths with n steps and the probabilities for detailed paths with n-1 steps, the latter method calculates transition probabilities for detailed paths with n steps from current locations to target locations. Finally, we conduct extensive experiments, and correctness and flexibility of our proposed algorithm have been verified. PMID:27508502

  5. A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

    Science.gov (United States)

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

    2014-01-01

    The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (pworkflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (pworkflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements.

  6. [Parallel virtual reality visualization of extreme large medical datasets].

    Science.gov (United States)

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  7. xarray: N-D labeled Arrays and Datasets in Python

    Directory of Open Access Journals (Sweden)

    Stephan Hoyer

    2017-04-01

    Full Text Available xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Our approach combines an application programing interface (API inspired by pandas with the Common Data Model for self-described scientific data. Key features of the xarray package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, out-of-core computation on datasets that don’t fit into memory, a wide range of serialization and input/output (I/O options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. xarray, as a data model and analytics toolkit, has been widely adopted in the geoscience community but is also used more broadly for multi-dimensional data analysis in physics, machine learning and finance.

  8. Multiple Improvements of Multiple Imputation Likelihood Ratio Tests

    OpenAIRE

    Chan, Kin Wai; Meng, Xiao-Li

    2017-01-01

    Multiple imputation (MI) inference handles missing data by first properly imputing the missing values $m$ times, and then combining the $m$ analysis results from applying a complete-data procedure to each of the completed datasets. However, the existing method for combining likelihood ratio tests has multiple defects: (i) the combined test statistic can be negative in practice when the reference null distribution is a standard $F$ distribution; (ii) it is not invariant to re-parametrization; ...

  9. Treatment Option Overview (Plasma Cell Neoplasms Including Multiple Myeloma)

    Science.gov (United States)

    ... factors affect prognosis (chance of recovery) and treatment options. The prognosis (chance of recovery ) depends on the ... going up even though treatment is given. Treatment Option Overview Key Points There are different types of ...

  10. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets.

    Science.gov (United States)

    Narechania, Apurva; Baker, Richard; DeSalle, Rob; Mathema, Barun; Kolokotronis, Sergios-Orestis; Kreiswirth, Barry; Planet, Paul J

    2016-10-24

    Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.

  11. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  12. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  13. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  14. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  15. A Scalable Permutation Approach Reveals Replication and Preservation Patterns of Network Modules in Large Datasets.

    Science.gov (United States)

    Ritchie, Scott C; Watts, Stephen; Fearnley, Liam G; Holt, Kathryn E; Abraham, Gad; Inouye, Michael

    2016-07-01

    Network modules-topologically distinct groups of edges and nodes-that are preserved across datasets can reveal common features of organisms, tissues, cell types, and molecules. Many statistics to identify such modules have been developed, but testing their significance requires heuristics. Here, we demonstrate that current methods for assessing module preservation are systematically biased and produce skewed p values. We introduce NetRep, a rapid and computationally efficient method that uses a permutation approach to score module preservation without assuming data are normally distributed. NetRep produces unbiased p values and can distinguish between true and false positives during multiple hypothesis testing. We use NetRep to quantify preservation of gene coexpression modules across murine brain, liver, adipose, and muscle tissues. Complex patterns of multi-tissue preservation were revealed, including a liver-derived housekeeping module that displayed adipose- and muscle-specific association with body weight. Finally, we demonstrate the broader applicability of NetRep by quantifying preservation of bacterial networks in gut microbiota between men and women. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  16. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  17. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    , provenance and user annotation properties. Once generated they are included into a proprietary taxonomy, used by the overall architecture of the web portal. The metadata are made available through a SPARQL endpoint, thus allowing the datasets to be aggregated and shared among users in a meaningful way, enabling at the same time the development of third party visualization tools beyond the portal infrastructure. The SEE-GRID-SCI and the JISC-funded RapidSeis projects investigate the usage of this framework to enable the waveform data processing over the Grid.

  18. Facing the Challenges of Accessing, Managing, and Integrating Large Observational Datasets in Ecology: Enabling and Enriching the Use of NEON's Observational Data

    Science.gov (United States)

    Thibault, K. M.

    2013-12-01

    As the construction of NEON and its transition to operations progresses, more and more data will become available to the scientific community, both from NEON directly and from the concomitant growth of existing data repositories. Many of these datasets include ecological observations of a diversity of taxa in both aquatic and terrestrial environments. Although observational data have been collected and used throughout the history of organismal biology, the field has not yet fully developed a culture of data management, documentation, standardization, sharing and discoverability to facilitate the integration and synthesis of datasets. Moreover, the tools required to accomplish these goals, namely database design, implementation, and management, and automation and parallelization of analytical tasks through computational techniques, have not historically been included in biology curricula, at either the undergraduate or graduate levels. To ensure the success of data-generating projects like NEON in advancing organismal ecology and to increase transparency and reproducibility of scientific analyses, an acceleration of the cultural shift to open science practices, the development and adoption of data standards, such as the DarwinCore standard for taxonomic data, and increased training in computational approaches for biologists need to be realized. Here I highlight several initiatives that are intended to increase access to and discoverability of publicly available datasets and equip biologists and other scientists with the skills that are need to manage, integrate, and analyze data from multiple large-scale projects. The EcoData Retriever (ecodataretriever.org) is a tool that downloads publicly available datasets, re-formats the data into an efficient relational database structure, and then automatically imports the data tables onto a user's local drive into the database tool of the user's choice. The automation of these tasks results in nearly instantaneous execution

  19. Forest restoration: a global dataset for biodiversity and vegetation structure.

    Science.gov (United States)

    Crouzeilles, Renato; Ferreira, Mariana S; Curran, Michael

    2016-08-01

    Restoration initiatives are becoming increasingly applied around the world. Billions of dollars have been spent on ecological restoration research and initiatives, but restoration outcomes differ widely among these initiatives in part due to variable socioeconomic and ecological contexts. Here, we present the most comprehensive dataset gathered to date on forest restoration. It encompasses 269 primary studies across 221 study landscapes in 53 countries and contains 4,645 quantitative comparisons between reference ecosystems (e.g., old-growth forest) and degraded or restored ecosystems for five taxonomic groups (mammals, birds, invertebrates, herpetofauna, and plants) and five measures of vegetation structure reflecting different ecological processes (cover, density, height, biomass, and litter). We selected studies that (1) were conducted in forest ecosystems; (2) had multiple replicate sampling sites to measure indicators of biodiversity and/or vegetation structure in reference and restored and/or degraded ecosystems; and (3) used less-disturbed forests as a reference to the ecosystem under study. We recorded (1) latitude and longitude; (2) study year; (3) country; (4) biogeographic realm; (5) past disturbance type; (6) current disturbance type; (7) forest conversion class; (8) restoration activity; (9) time that a system has been disturbed; (10) time elapsed since restoration started; (11) ecological metric used to assess biodiversity; and (12) quantitative value of the ecological metric of biodiversity and/or vegetation structure for reference and restored and/or degraded ecosystems. These were the most common data available in the selected studies. We also estimated forest cover and configuration in each study landscape using a recently developed 1 km consensus land cover dataset. We measured forest configuration as the (1) mean size of all forest patches; (2) size of the largest forest patch; and (3) edge:area ratio of forest patches. Global analyses of the

  20. Dataset on information strategies for energy conservation: A field experiment in India.

    Science.gov (United States)

    Chen, Victor L; Delmas, Magali A; Locke, Stephen L; Singh, Amarjeet

    2018-02-01

    The data presented in this article are related to the research article entitled: "Information strategies for energy conservation: a field experiment in India" (Chen et al., 2017) [1]. The availability of high-resolution electricity data offers benefits to both utilities and consumers to understand the dynamics of energy consumption for example, between billing periods or times of peak demand. However, few public datasets with high-temporal resolution have been available to researchers on electricity use, especially at the appliance-level. This article describes data collected in a residential field experiment for 19 apartments at an Indian faculty housing complex during the period from August 1, 2013 to May 12, 2014. The dataset includes detailed information about electricity consumption. It also includes information on apartment characteristics and hourly weather variation to enable further studies of energy performance. These data can be used by researchers as training datasets to evaluate electricity usage consumption.

  1. A public dataset of overground and treadmill walking kinematics and kinetics in healthy individuals

    Directory of Open Access Journals (Sweden)

    Claudiane A. Fukuchi

    2018-04-01

    Full Text Available In a typical clinical gait analysis, the gait patterns of pathological individuals are commonly compared with the typically faster, comfortable pace of healthy subjects. However, due to potential bias related to gait speed, this comparison may not be valid. Publicly available gait datasets have failed to address this issue. Therefore, the goal of this study was to present a publicly available dataset of 42 healthy volunteers (24 young adults and 18 older adults who walked both overground and on a treadmill at a range of gait speeds. Their lower-extremity and pelvis kinematics were measured using a three-dimensional (3D motion-capture system. The external forces during both overground and treadmill walking were collected using force plates and an instrumented treadmill, respectively. The results include both raw and processed kinematic and kinetic data in different file formats: c3d and ASCII files. In addition, a metadata file is provided that contain demographic and anthropometric data and data related to each file in the dataset. All data are available at Figshare (DOI: 10.6084/m9.figshare.5722711. We foresee several applications of this public dataset, including to examine the influences of speed, age, and environment (overground vs. treadmill on gait biomechanics, to meet educational needs, and, with the inclusion of additional participants, to use as a normative dataset.

  2. REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

    Science.gov (United States)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history

  3. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  4. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  5. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  6. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  7. SAMNet: a network-based approach to integrate multi-dimensional high throughput datasets.

    Science.gov (United States)

    Gosline, Sara J C; Spencer, Sarah J; Ursu, Oana; Fraenkel, Ernest

    2012-11-01

    The rapid development of high throughput biotechnologies has led to an onslaught of data describing genetic perturbations and changes in mRNA and protein levels in the cell. Because each assay provides a one-dimensional snapshot of active signaling pathways, it has become desirable to perform multiple assays (e.g. mRNA expression and phospho-proteomics) to measure a single condition. However, as experiments expand to accommodate various cellular conditions, proper analysis and interpretation of these data have become more challenging. Here we introduce a novel approach called SAMNet, for Simultaneous Analysis of Multiple Networks, that is able to interpret diverse assays over multiple perturbations. The algorithm uses a constrained optimization approach to integrate mRNA expression data with upstream genes, selecting edges in the protein-protein interaction network that best explain the changes across all perturbations. The result is a putative set of protein interactions that succinctly summarizes the results from all experiments, highlighting the network elements unique to each perturbation. We evaluated SAMNet in both yeast and human datasets. The yeast dataset measured the cellular response to seven different transition metals, and the human dataset measured cellular changes in four different lung cancer models of Epithelial-Mesenchymal Transition (EMT), a crucial process in tumor metastasis. SAMNet was able to identify canonical yeast metal-processing genes unique to each commodity in the yeast dataset, as well as human genes such as β-catenin and TCF7L2/TCF4 that are required for EMT signaling but escaped detection in the mRNA and phospho-proteomic data. Moreover, SAMNet also highlighted drugs likely to modulate EMT, identifying a series of less canonical genes known to be affected by the BCR-ABL inhibitor imatinib (Gleevec), suggesting a possible influence of this drug on EMT.

  8. Chemical elements in the environment: multi-element geochemical datasets from continental to national scale surveys on four continents

    Science.gov (United States)

    Caritat, Patrice de; Reimann, Clemens; Smith, David; Wang, Xueqiu

    2017-01-01

    During the last 10-20 years, Geological Surveys around the world have undertaken a major effort towards delivering fully harmonized and tightly quality-controlled low-density multi-element soil geochemical maps and datasets of vast regions including up to whole continents. Concentrations of between 45 and 60 elements commonly have been determined in a variety of different regolith types (e.g., sediment, soil). The multi-element datasets are published as complete geochemical atlases and made available to the general public. Several other geochemical datasets covering smaller areas but generally at a higher spatial density are also available. These datasets may, however, not be found by superficial internet-based searches because the elements are not mentioned individually either in the title or in the keyword lists of the original references. This publication attempts to increase the visibility and discoverability of these fundamental background datasets covering large areas up to whole continents.

  9. Merged SAGE II, Ozone_cci and OMPS ozone profile dataset and evaluation of ozone trends in the stratosphere

    Directory of Open Access Journals (Sweden)

    V. F. Sofieva

    2017-10-01

    Full Text Available In this paper, we present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of the European Space Agency Climate Change Initiative (Ozone_cci with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and intercompared; only those datasets which are in good agreement, and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE–CCI–OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE–CCI–OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90° S to 90° N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at midlatitudes and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s and stratospheric cooling.

  10. New seismograph includes filters

    Energy Technology Data Exchange (ETDEWEB)

    1979-11-02

    The new Nimbus ES-1210 multichannel signal enhancement seismograph from EG and G geometrics has recently been redesigned to include multimode signal fillers on each amplifier. The ES-1210F is a shallow exploration seismograph for near subsurface exploration such as in depth-to-bedrock, geological hazard location, mineral exploration, and landslide investigations.

  11. Report of the Association of Coloproctology of Great Britain and Ireland/British Society of Gastroenterology Colorectal Polyp Working Group: the development of a complex colorectal polyp minimum dataset.

    Science.gov (United States)

    Chattree, A; Barbour, J A; Thomas-Gibson, S; Bhandari, P; Saunders, B P; Veitch, A M; Anderson, J; Rembacken, B J; Loughrey, M B; Pullan, R; Garrett, W V; Lewis, G; Dolwani, S; Rutter, M D

    2017-01-01

    The management of large non-pedunculated colorectal polyps (LNPCPs) is complex, with widespread variation in management and outcome, even amongst experienced clinicians. Variations in the assessment and decision-making processes are likely to be a major factor in this variability. The creation of a standardized minimum dataset to aid decision-making may therefore result in improved clinical management. An official working group of 13 multidisciplinary specialists was appointed by the Association of Coloproctology of Great Britain and Ireland (ACPGBI) and the British Society of Gastroenterology (BSG) to develop a minimum dataset on LNPCPs. The literature review used to structure the ACPGBI/BSG guidelines for the management of LNPCPs was used by a steering subcommittee to identify various parameters pertaining to the decision-making processes in the assessment and management of LNPCPs. A modified Delphi consensus process was then used for voting on proposed parameters over multiple voting rounds with at least 80% agreement defined as consensus. The minimum dataset was used in a pilot process to ensure rigidity and usability. A 23-parameter minimum dataset with parameters relating to patient and lesion factors, including six parameters relating to image retrieval, was formulated over four rounds of voting with two pilot processes to test rigidity and usability. This paper describes the development of the first reported evidence-based and expert consensus minimum dataset for the management of LNPCPs. It is anticipated that this dataset will allow comprehensive and standardized lesion assessment to improve decision-making in the assessment and management of LNPCPs. Colorectal Disease © 2016 The Association of Coloproctology of Great Britain and Ireland.

  12. Generalized internal multiple imaging

    KAUST Repository

    Zuberi, Mohammad Akbar Hosain

    2014-12-04

    Various examples are provided for generalized internal multiple imaging (GIMI). In one example, among others, a method includes generating a higher order internal multiple image using a background Green\\'s function and rendering the higher order internal multiple image for presentation. In another example, a system includes a computing device and a generalized internal multiple imaging (GIMI) application executable in the computing device. The GIMI application includes logic that generates a higher order internal multiple image using a background Green\\'s function and logic that renders the higher order internal multiple image for display on a display device. In another example, a non-transitory computer readable medium has a program executable by processing circuitry that generates a higher order internal multiple image using a background Green\\'s function and renders the higher order internal multiple image for display on a display device.

  13. Generalized internal multiple imaging

    KAUST Repository

    Zuberi, Mohammad Akbar Hosain; Alkhalifah, Tariq

    2014-01-01

    Various examples are provided for generalized internal multiple imaging (GIMI). In one example, among others, a method includes generating a higher order internal multiple image using a background Green's function and rendering the higher order internal multiple image for presentation. In another example, a system includes a computing device and a generalized internal multiple imaging (GIMI) application executable in the computing device. The GIMI application includes logic that generates a higher order internal multiple image using a background Green's function and logic that renders the higher order internal multiple image for display on a display device. In another example, a non-transitory computer readable medium has a program executable by processing circuitry that generates a higher order internal multiple image using a background Green's function and renders the higher order internal multiple image for display on a display device.

  14. Analytic device including nanostructures

    KAUST Repository

    Di Fabrizio, Enzo M.; Fratalocchi, Andrea; Totero Gongora, Juan Sebastian; Coluccio, Maria Laura; Candeloro, Patrizio; Cuda, Gianni

    2015-01-01

    A device for detecting an analyte in a sample comprising: an array including a plurality of pixels, each pixel including a nanochain comprising: a first nanostructure, a second nanostructure, and a third nanostructure, wherein size of the first nanostructure is larger than that of the second nanostructure, and size of the second nanostructure is larger than that of the third nanostructure, and wherein the first nanostructure, the second nanostructure, and the third nanostructure are positioned on a substrate such that when the nanochain is excited by an energy, an optical field between the second nanostructure and the third nanostructure is stronger than an optical field between the first nanostructure and the second nanostructure, wherein the array is configured to receive a sample; and a detector arranged to collect spectral data from a plurality of pixels of the array.

  15. Saskatchewan resources. [including uranium

    Energy Technology Data Exchange (ETDEWEB)

    1979-09-01

    The production of chemicals and minerals for the chemical industry in Saskatchewan are featured, with some discussion of resource taxation. The commodities mentioned include potash, fatty amines, uranium, heavy oil, sodium sulfate, chlorine, sodium hydroxide, sodium chlorate and bentonite. Following the successful outcome of the Cluff Lake inquiry, the uranium industry is booming. Some developments and production figures for Gulf Minerals, Amok, Cenex and Eldorado are mentioned.

  16. Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

    Science.gov (United States)

    Mirzaei, Sajad; Wu, Yufeng

    2016-01-01

    Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.

  17. Evaluating SPARQL queries on massive RDF datasets

    KAUST Repository

    Al-Harbi, Razen

    2015-08-01

    Distributed RDF systems partition data across multiple computer nodes. Partitioning is typically based on heuristics that minimize inter-node communication and it is performed in an initial, data pre-processing phase. Therefore, the resulting partitions are static and do not adapt to changes in the query workload; as a result, existing systems are unable to consistently avoid communication for queries that are not favored by the initial data partitioning. Furthermore, for very large RDF knowledge bases, the partitioning phase becomes prohibitively expensive, leading to high startup costs. In this paper, we propose AdHash, a distributed RDF system which addresses the shortcomings of previous work. First, AdHash initially applies lightweight hash partitioning, which drastically minimizes the startup cost, while favoring the parallel processing of join patterns on subjects, without any data communication. Using a locality-aware planner, queries that cannot be processed in parallel are evaluated with minimal communication. Second, AdHash monitors the data access patterns and adapts dynamically to the query load by incrementally redistributing and replicating frequently accessed data. As a result, the communication cost for future queries is drastically reduced or even eliminated. Our experiments with synthetic and real data verify that AdHash (i) starts faster than all existing systems, (ii) processes thousands of queries before other systems become online, and (iii) gracefully adapts to the query load, being able to evaluate queries on billion-scale RDF data in sub-seconds. In this demonstration, audience can use a graphical interface of AdHash to verify its performance superiority compared to state-of-the-art distributed RDF systems.

  18. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    Science.gov (United States)

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  19. A Dataset for Education-Related Majors' Performance Measures with Pre/Post-Video Game Practice

    Science.gov (United States)

    Novak, Elena; Tassell, Janet Lynne

    2015-01-01

    This dataset includes a series of 30 education-related majors' performance measures before and after they completed a 10-hour video game practice in a computer lab. The goal of the experimental study was to examine the effects of action video gaming on students' mathematics performance and mathematics anxiety as mediated by the effect of attention…

  20. The role of metadata in managing large environmental science datasets. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Melton, R.B.; DeVaney, D.M. [eds.] [Pacific Northwest Lab., Richland, WA (United States); French, J. C. [Univ. of Virginia, (United States)

    1995-06-01

    The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.

  1. Use of Maple Seeding Canopy Reflectance Dataset for Validation of SART/LEAFMOD Radiative Transfer Model

    Science.gov (United States)

    Bond, Barbara J.; Peterson, David L.

    1999-01-01

    This project was a collaborative effort by researchers at ARC, OSU and the University of Arizona. The goal was to use a dataset obtained from a previous study to "empirically validate a new canopy radiative-transfer model (SART) which incorporates a recently-developed leaf-level model (LEAFMOD)". The document includes a short research summary.

  2. Sigma-2 receptor ligands QSAR model dataset

    Directory of Open Access Journals (Sweden)

    Antonio Rescifina

    2017-08-01

    Full Text Available The data have been obtained from the Sigma-2 Receptor Selective Ligands Database (S2RSLDB and refined according to the QSAR requirements. These data provide information about a set of 548 Sigma-2 (σ2 receptor ligands selective over Sigma-1 (σ1 receptor. The development of the QSAR model has been undertaken with the use of CORAL software using SMILES, molecular graphs and hybrid descriptors (SMILES and graph together. Data here reported include the regression for σ2 receptor pKi QSAR models. The QSAR model was also employed to predict the σ2 receptor pKi values of the FDA approved drugs that are herewith included.

  3. Integrated remotely sensed datasets for disaster management

    OpenAIRE

    McCarthy, Tim; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-01-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North...

  4. Synthetic ALSPAC longitudinal datasets for the Big Data VR project [version 1; referees: 2 approved

    Directory of Open Access Journals (Sweden)

    Demetris Avraam

    2017-08-01

    Full Text Available Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  5. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system

    Science.gov (United States)

    Jensen, Tue V.; Pinson, Pierre

    2017-11-01

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  6. RE-Europe, a large-scale dataset for modeling a highly renewable European electricity system.

    Science.gov (United States)

    Jensen, Tue V; Pinson, Pierre

    2017-11-28

    Future highly renewable energy systems will couple to complex weather and climate dynamics. This coupling is generally not captured in detail by the open models developed in the power and energy system communities, where such open models exist. To enable modeling such a future energy system, we describe a dedicated large-scale dataset for a renewable electric power system. The dataset combines a transmission network model, as well as information for generation and demand. Generation includes conventional generators with their technical and economic characteristics, as well as weather-driven forecasts and corresponding realizations for renewable energy generation for a period of 3 years. These may be scaled according to the envisioned degrees of renewable penetration in a future European energy system. The spatial coverage, completeness and resolution of this dataset, open the door to the evaluation, scaling analysis and replicability check of a wealth of proposals in, e.g., market design, network actor coordination and forecasting of renewable power generation.

  7. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  8. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  9. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  10. PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets.

    Science.gov (United States)

    Djokic-Petrovic, Marija; Cvjetkovic, Vladimir; Yang, Jeremy; Zivanovic, Marko; Wild, David J

    2017-09-20

    There are a huge variety of data sources relevant to chemical, biological and pharmacological research, but these data sources are highly siloed and cannot be queried together in a straightforward way. Semantic technologies offer the ability to create links and mappings across datasets and manage them as a single, linked network so that searching can be carried out across datasets, independently of the source. We have developed an application called PIBAS FedSPARQL that uses semantic technologies to allow researchers to carry out such searching across a vast array of data sources. PIBAS FedSPARQL is a web-based query builder and result set visualizer of bioinformatics data. As an advanced feature, our system can detect similar data items identified by different Uniform Resource Identifiers (URIs), using a text-mining algorithm based on the processing of named entities to be used in Vector Space Model and Cosine Similarity Measures. According to our knowledge, PIBAS FedSPARQL was unique among the systems that we found in that it allows detecting of similar data items. As a query builder, our system allows researchers to intuitively construct and run Federated SPARQL queries across multiple data sources, including global initiatives, such as Bio2RDF, Chem2Bio2RDF, EMBL-EBI, and one local initiative called CPCTAS, as well as additional user-specified data source. From the input topic, subtopic, template and keyword, a corresponding initial Federated SPARQL query is created and executed. Based on the data obtained, end users have the ability to choose the most appropriate data sources in their area of interest and exploit their Resource Description Framework (RDF) structure, which allows users to select certain properties of data to enhance query results. The developed system is flexible and allows intuitive creation and execution of queries for an extensive range of bioinformatics topics. Also, the novel "similar data items detection" algorithm can be particularly

  11. Being Included and Excluded

    DEFF Research Database (Denmark)

    Korzenevica, Marina

    2016-01-01

    Following the civil war of 1996–2006, there was a dramatic increase in the labor mobility of young men and the inclusion of young women in formal education, which led to the transformation of the political landscape of rural Nepal. Mobility and schooling represent a level of prestige that rural...... politics. It analyzes how formal education and mobility either challenge or reinforce traditional gendered norms which dictate a lowly position for young married women in the household and their absence from community politics. The article concludes that women are simultaneously excluded and included from...... community politics. On the one hand, their mobility and decision-making powers decrease with the increase in the labor mobility of men and their newly gained education is politically devalued when compared to the informal education that men gain through mobility, but on the other hand, schooling strengthens...

  12. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  13. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to

  14. Inquiring with Geoscience Datasets: Instruction and Assessment

    Science.gov (United States)

    Zalles, D.; Quellmalz, E.; Gobert, J.

    2005-12-01

    This session will describe a new NSF-funded project in Geoscience education, Inquiring with Geoscience Data Sets. The goals of the project are to (1) Study the impacts on student learning of Web-based supplementary curriculum modules that engage secondary-level students in inquiry projects addressing important geoscience problems using an Earth System Science approach. Students will use technologies to access real data sets in the geosciences and to interpret, analyze, and communicate findings based on the data sets. The standards addressed will include geoscience concepts, inquiry abilities in NSES and Benchmarks for Science Literacy, data literacy, NCTM standards, and 21st-century skills and technology proficiencies (NETTS/ISTE). (2) Develop design principles, specification templates, and prototype exemplars for technology-based performance assessments that provide evidence of students' geoscientific knowledge and inquiry skills (including data literacy skills) and students' ability to access, use, analyze, and interpret technology-based geoscience data sets. (3) Develop scenarios based on the specification templates that describe curriculum modules and performance assessments that could be developed for other Earth Science standards and curriculum programs. Also to be described in the session are the project's efforts to differentiate among the dimensions of data literacy and scientific inquiry that are relevant for the geoscience discplines, and how recognition and awareness of the differences can be effectively channelled for the betterment of geoscience education.

  15. An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

    Directory of Open Access Journals (Sweden)

    Hubbard Alan E

    2010-06-01

    Full Text Available Abstract Background As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited. Results Using a multiple sclerosis (MS case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD, and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, MPHOSPH9, CTNNA3, PHACTR2 and IL7, by RF analysis and warrant further follow-up in independent studies. Conclusions This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.

  16. A comprehensive dataset of genes with a loss-of-function mutant phenotype in Arabidopsis.

    Science.gov (United States)

    Lloyd, Johnny; Meinke, David

    2012-03-01

    Despite the widespread use of Arabidopsis (Arabidopsis thaliana) as a model plant, a curated dataset of Arabidopsis genes with mutant phenotypes remains to be established. A preliminary list published nine years ago in Plant Physiology is outdated, and genome-wide phenotype information remains difficult to obtain. We describe here a comprehensive dataset of 2,400 genes with a loss-of-function mutant phenotype in Arabidopsis. Phenotype descriptions were gathered primarily from manual curation of the scientific literature. Genes were placed into prioritized groups (essential, morphological, cellular-biochemical, and conditional) based on the documented phenotypes of putative knockout alleles. Phenotype classes (e.g. vegetative, reproductive, and timing, for the morphological group) and subsets (e.g. flowering time, senescence, circadian rhythms, and miscellaneous, for the timing class) were also established. Gene identities were classified as confirmed (through molecular complementation or multiple alleles) or not confirmed. Relationships between mutant phenotype and protein function, genetic redundancy, protein connectivity, and subcellular protein localization were explored. A complementary dataset of 401 genes that exhibit a mutant phenotype only when disrupted in combination with a putative paralog was also compiled. The importance of these genes in confirming functional redundancy and enhancing the value of single gene datasets is discussed. With further input and curation from the Arabidopsis community, these datasets should help to address a variety of important biological questions, provide a foundation for exploring the relationship between genotype and phenotype in angiosperms, enhance the utility of Arabidopsis as a reference plant, and facilitate comparative studies with model genetic organisms.

  17. A framework for automatic creation of gold-standard rigid 3D-2D registration datasets.

    Science.gov (United States)

    Madan, Hennadii; Pernuš, Franjo; Likar, Boštjan; Špiclin, Žiga

    2017-02-01

    Advanced image-guided medical procedures incorporate 2D intra-interventional information into pre-interventional 3D image and plan of the procedure through 3D/2D image registration (32R). To enter clinical use, and even for publication purposes, novel and existing 32R methods have to be rigorously validated. The performance of a 32R method can be estimated by comparing it to an accurate reference or gold standard method (usually based on fiducial markers) on the same set of images (gold standard dataset). Objective validation and comparison of methods are possible only if evaluation methodology is standardized, and the gold standard  dataset is made publicly available. Currently, very few such datasets exist and only one contains images of multiple patients acquired during a procedure. To encourage the creation of gold standard 32R datasets, we propose an automatic framework. The framework is based on rigid registration of fiducial markers. The main novelty is spatial grouping of fiducial markers on the carrier device, which enables automatic marker localization and identification across the 3D and 2D images. The proposed framework was demonstrated on clinical angiograms of 20 patients. Rigid 32R computed by the framework was more accurate than that obtained manually, with the respective target registration error below 0.027 mm compared to 0.040 mm. The framework is applicable for gold standard setup on any rigid anatomy, provided that the acquired images contain spatially grouped fiducial markers. The gold standard datasets and software will be made publicly available.

  18. Software ion scan functions in analysis of glycomic and lipidomic MS/MS datasets.

    Science.gov (United States)

    Haramija, Marko

    2018-03-01

    Hardware ion scan functions unique to tandem mass spectrometry (MS/MS) mode of data acquisition, such as precursor ion scan (PIS) and neutral loss scan (NLS), are important for selective extraction of key structural data from complex MS/MS spectra. However, their software counterparts, software ion scan (SIS) functions, are still not regularly available. Software ion scan functions can be easily coded for additional functionalities, such as software multiple precursor ion scan, software no ion scan, and software variable ion scan functions. These are often necessary, since they allow more efficient analysis of complex MS/MS datasets, often encountered in glycomics and lipidomics. Software ion scan functions can be easily coded by using modern script languages and can be independent of instrument manufacturer. Here we demonstrate the utility of SIS functions on a medium-size glycomic MS/MS dataset. Knowledge of sample properties, as well as of diagnostic and conditional diagnostic ions crucial for data analysis, was needed. Based on the tables constructed with the output data from the SIS functions performed, a detailed analysis of a complex MS/MS glycomic dataset could be carried out in a quick, accurate, and efficient manner. Glycomic research is progressing slowly, and with respect to the MS experiments, one of the key obstacles for moving forward is the lack of appropriate bioinformatic tools necessary for fast analysis of glycomic MS/MS datasets. Adding novel SIS functionalities to the glycomic MS/MS toolbox has a potential to significantly speed up the glycomic data analysis process. Similar tools are useful for analysis of lipidomic MS/MS datasets as well, as will be discussed briefly. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Investigating country-specific music preferences and music recommendation algorithms with the LFM-1b dataset.

    Science.gov (United States)

    Schedl, Markus

    2017-01-01

    Recently, the LFM-1b dataset has been proposed to foster research and evaluation in music retrieval and music recommender systems, Schedl (Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR). New York, 2016). It contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. Basic demographic information and a selection of more elaborate listener-specific descriptors are included as well, for anonymized users. In this article, we reveal information about LFM-1b's acquisition and content and we compare it to existing datasets. We furthermore provide an extensive statistical analysis of the dataset, including basic properties of the item sets, demographic coverage, distribution of listening events (e.g., over artists and users), and aspects related to music preference and consumption behavior (e.g., temporal features and mainstreaminess of listeners). Exploiting country information of users and genre tags of artists, we also create taste profiles for populations and determine similar and dissimilar countries in terms of their populations' music preferences. Finally, we illustrate the dataset's usage in a simple artist recommendation task, whose results are intended to serve as baseline against which more elaborate techniques can be assessed.

  20. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  1. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  2. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  3. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  4. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  5. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  6. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  7. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  8. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  9. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  10. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  11. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  12. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  13. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  14. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  15. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  16. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  17. Associating uncertainty with datasets using Linked Data and allowing propagation via provenance chains

    Science.gov (United States)

    Car, Nicholas; Cox, Simon; Fitch, Peter

    2015-04-01

    With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty

  18. Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

    Science.gov (United States)

    Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

    2015-09-01

    Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the

  19. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  20. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  1. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  2. Multiple Perspectives / Multiple Readings

    Directory of Open Access Journals (Sweden)

    Simon Biggs

    2005-01-01

    Full Text Available People experience things from their own physical point of view. What they see is usually a function of where they are and what physical attitude they adopt relative to the subject. With augmented vision (periscopes, mirrors, remote cameras, etc we are able to see things from places where we are not present. With time-shifting technologies, such as the video recorder, we can also see things from the past; a time and a place we may never have visited.In recent artistic work I have been exploring the implications of digital technology, interactivity and internet connectivity that allow people to not so much space/time-shift their visual experience of things but rather see what happens when everybody is simultaneously able to see what everybody else can see. This is extrapolated through the remote networking of sites that are actual installation spaces; where the physical movements of viewers in the space generate multiple perspectives, linked to other similar sites at remote locations or to other viewers entering the shared data-space through a web based version of the work.This text explores the processes involved in such a practice and reflects on related questions regarding the non-singularity of being and the sense of self as linked to time and place.

  3. A global dataset of sub-daily rainfall indices

    Science.gov (United States)

    Fowler, H. J.; Lewis, E.; Blenkinsop, S.; Guerreiro, S.; Li, X.; Barbero, R.; Chan, S.; Lenderink, G.; Westra, S.

    2017-12-01

    It is still uncertain how hydrological extremes will change with global warming as we do not fully understand the processes that cause extreme precipitation under current climate variability. The INTENSE project is using a novel and fully-integrated data-modelling approach to provide a step-change in our understanding of the nature and drivers of global precipitation extremes and change on societally relevant timescales, leading to improved high-resolution climate model representation of extreme rainfall processes. The INTENSE project is in conjunction with the World Climate Research Programme (WCRP)'s Grand Challenge on 'Understanding and Predicting Weather and Climate Extremes' and the Global Water and Energy Exchanges Project (GEWEX) Science questions. A new global sub-daily precipitation dataset has been constructed (data collection is ongoing). Metadata for each station has been calculated, detailing record lengths, missing data, station locations. A set of global hydroclimatic indices have been produced based upon stakeholder recommendations including indices that describe maximum rainfall totals and timing, the intensity, duration and frequency of storms, frequency of storms above specific thresholds and information about the diurnal cycle. This will provide a unique global data resource on sub-daily precipitation whose derived indices will be freely available to the wider scientific community.

  4. A time warping approach to multiple sequence alignment.

    Science.gov (United States)

    Arribas-Gil, Ana; Matias, Catherine

    2017-04-25

    We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.

  5. The impact of the resolution of meteorological datasets on catchment-scale drought studies

    Science.gov (United States)

    Hellwig, Jost; Stahl, Kerstin

    2017-04-01

    Gridded meteorological datasets provide the basis to study drought at a range of scales, including catchment scale drought studies in hydrology. They are readily available to study past weather conditions and often serve real time monitoring as well. As these datasets differ in spatial/temporal coverage and spatial/temporal resolution, for most studies there is a tradeoff between these features. Our investigation examines whether biases occur when studying drought on catchment scale with low resolution input data. For that, a comparison among the datasets HYRAS (covering Central Europe, 1x1 km grid, daily data, 1951 - 2005), E-OBS (Europe, 0.25° grid, daily data, 1950-2015) and GPCC (whole world, 0.5° grid, monthly data, 1901 - 2013) is carried out. Generally, biases in precipitation increase with decreasing resolution. Most important variations are found during summer. In low mountain range of Central Europe the datasets of sparse resolution (E-OBS, GPCC) overestimate dry days and underestimate total precipitation since they are not able to describe high spatial variability. However, relative measures like the correlation coefficient reveal good consistencies of dry and wet periods, both for absolute precipitation values and standardized indices like the Standardized Precipitation Index (SPI) or Standardized Precipitation Evaporation Index (SPEI). Particularly the most severe droughts derived from the different datasets match very well. These results indicate that absolute values of sparse resolution datasets applied to catchment scale might be critical to use for an assessment of the hydrological drought at catchment scale, whereas relative measures for determining periods of drought are more trustworthy. Therefore, studies on drought, that downscale meteorological data, should carefully consider their data needs and focus on relative measures for dry periods if sufficient for the task.

  6. Lunar Meteorites: A Global Geochemical Dataset

    Science.gov (United States)

    Zeigler, R. A.; Joy, K. H.; Arai, T.; Gross, J.; Korotev, R. L.; McCubbin, F. M.

    2017-01-01

    bulk of the chapter will use examples from the lunar meteorite suite to examine important recent advances in lunar science, including (but not limited to the following: (1) Understanding the global compositional diversity of the lunar surface; (2) Understanding the formation of the ancient lunar primary crust; (3) Understanding the diversity and timing of mantle melting, and secondary crust formation; (4) Comparing KREEPy lunar meteorites to KREEPy Apollo samples as evidence of variability within the PKT; and (5) A better understanding of the South Pole Aitken Basin through lunar meteorites whose provenance are within that Terrane.

  7. 3D Modeling of Iran and Surrounding Areas From Simultaneous Inversion of Multiple Geophysical Datasets

    Science.gov (United States)

    2010-09-01

    shorter periods). Figure 4 shows example fits to the dispersion values and the Bouguer gravity variations. As seen in earlier studies (Maceira and Ammon...50 100 150 150 100 50 Figure 4. Sample dispersion (top) and Bouguer gravity (bottom) for the preliminary inversion. As for other

  8. A Cost-Effective Strategy for Storing Scientific Datasets with Multiple Service Providers in the Cloud

    OpenAIRE

    Yuan, Dong; Cui, Lizhen; Liu, Xiao; Fu, Erjiang; Yang, Yun

    2016-01-01

    Cloud computing provides scientists a platform that can deploy computation and data intensive applications without infrastructure investment. With excessive cloud resources and a decision support system, large generated data sets can be flexibly 1 stored locally in the current cloud, 2 deleted and regenerated whenever reused or 3 transferred to cheaper cloud service for storage. However, due to the pay for use model, the total application cost largely depends on the usage of computation, stor...

  9. Integrating Multiple Analytical Datasets to Compare Metabolite Profiles of Mouse Colonic-Cecal Contents and Feces.

    Science.gov (United States)

    Zeng, Huawei; Grapov, Dmitry; Jackson, Matthew I; Fahrmann, Johannes; Fiehn, Oliver; Combs, Gerald F

    2015-09-11

    The pattern of metabolites produced by the gut microbiome comprises a phenotype indicative of the means by which that microbiome affects the gut. We characterized that phenotype in mice by conducting metabolomic analyses of the colonic-cecal contents, comparing that to the metabolite patterns of feces in order to determine the suitability of fecal specimens as proxies for assessing the metabolic impact of the gut microbiome. We detected a total of 270 low molecular weight metabolites in colonic-cecal contents and feces by gas chromatograph, time-of-flight mass spectrometry (GC-TOF) and ultra-high performance liquid chromatography, quadrapole time-of-flight mass spectrometry (UPLC-Q-TOF). Of that number, 251 (93%) were present in both types of specimen, representing almost all known biochemical pathways related to the amino acid, carbohydrate, energy, lipid, membrane transport, nucleotide, genetic information processing, and cancer-related metabolism. A total of 115 metabolites differed significantly in relative abundance between both colonic-cecal contents and feces. These data comprise the first characterization of relationships among metabolites present in the colonic-cecal contents and feces in a healthy mouse model, and shows that feces can be a useful proxy for assessing the pattern of metabolites to which the colonic mucosum is exposed.

  10. Integrating Multiple Analytical Datasets to Compare Metabolite Profiles of Mouse Colonic-Cecal Contents and Feces

    Directory of Open Access Journals (Sweden)

    Huawei Zeng

    2015-09-01

    Full Text Available The pattern of metabolites produced by the gut microbiome comprises a phenotype indicative of the means by which that microbiome affects the gut. We characterized that phenotype in mice by conducting metabolomic analyses of the colonic-cecal contents, comparing that to the metabolite patterns of feces in order to determine the suitability of fecal specimens as proxies for assessing the metabolic impact of the gut microbiome. We detected a total of 270 low molecular weight metabolites in colonic-cecal contents and feces by gas chromatograph, time-of-flight mass spectrometry (GC-TOF and ultra-high performance liquid chromatography, quadrapole time-of-flight mass spectrometry (UPLC-Q-TOF. Of that number, 251 (93% were present in both types of specimen, representing almost all known biochemical pathways related to the amino acid, carbohydrate, energy, lipid, membrane transport, nucleotide, genetic information processing, and cancer-related metabolism. A total of 115 metabolites differed significantly in relative abundance between both colonic-cecal contents and feces. These data comprise the first characterization of relationships among metabolites present in the colonic-cecal contents and feces in a healthy mouse model, and shows that feces can be a useful proxy for assessing the pattern of metabolites to which the colonic mucosum is exposed.

  11. Robo-AO M Dwarf Multiplicity Survey

    Science.gov (United States)

    Lamman, Claire; Baranec, Christoph; Berta-Thompson, Zachory K.; Law, Nicholas M.; Ziegler, Carl; Schonhut-Stasik, Jessica

    2018-06-01

    We analyzed close to 7,000 observations from Robo-AO’s field M dwarf survey taken on the 2.1m Kitt Peak telescope. Results will help determine the total multiplicity fraction and multiplicity functions of M dwarfs, which are crucial steps towards understanding their evolution and formation mechanics. Through its robotic, laser-guided, and automated system, the Robo-AO instrument has yielded the largest adaptive-optics M dwarf survey to date. I developed a graphical user interface to quickly analyze this data. Initial data analysis included assessing data quality, checking the result from Robo-AO’s automatic reduction pipeline, and determining existence as well as the relative position of companions through a visual inspection. This program can be applied to other datasets and was successfully tested by re-analyzing observations from a separate Robo-AO survey. After a conservative initial cut for quality, over 350 companions were found within 4” of a primary star out of 2,746 high quality Robo-AO M dwarf observations, including four triple systems. Further observations were done with the Keck II telescope by using its NIRC2 imager to follow up on ten select targets for the existence and physical association of companions. Future research will yield insights into low-mass stellar formation and provide a database of nearby M dwarf multiples that will potentially assist ongoing and future surveys for planets around these stars, such as the NASA TESS mission.

  12. On the visualization of water-related big data: extracting insights from drought proxies' datasets

    Science.gov (United States)

    Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

    2017-04-01

    Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was

  13. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  14. The mitochondrial genomes of Atlas Geckos (Quedenfeldtia): mitogenome assembly from transcriptomes and anchored hybrid enrichment datasets

    OpenAIRE

    Lyra, Mariana L.; Joger, Ulrich; Schulte, Ulrich; Slimani, Tahar; El Mouden, El Hassan; Bouazza, Abdellah; Künzel, Sven; Lemmon, Alan R.; Moriarty Lemmon, Emily; Vences, Miguel

    2017-01-01

    The nearly complete mitogenomes of the two species of North African Atlas geckos, Quedenfeldtia moerens and Q. trachyblepharus were assembled from anchored hybrid enrichment data and RNAseq data. Congruent assemblies were obtained for four samples included in both datasets. We recovered the 13 protein-coding genes, 22 tRNA genes, and two rRNA genes for both species, including partial control region. The order of genes agrees with that of other geckos.

  15. Towards a climatology of tropical cyclone morphometric structures using a newly standardized passive microwave satellite dataset

    Science.gov (United States)

    Cossuth, J.; Hart, R. E.

    2013-12-01

    The structure of a tropical cyclone (TC) is a spatial representation of its organizational pattern and distribution of energy acquisition and release. Physical processes that react to both the external environment and its own internal dynamics manifest themselves in the TC shape. This structure depicts a specific phase in the TC's meteorological lifecycle, reflecting its past and potentially constraining its future development. For a number of reasons, a thorough objective definition of TC structures and an intercomparison of their varieties have been neglected. This lack of knowledge may be a key reason why TC intensity forecasts, despite numerical model improvements and theoretical advances, have been stagnant in recent years relative to track forecasts. Satellite microwave imagers provide multiple benefits in discerning TC structure, but compiling a research quality data set has been problematic due to several inherent technical and logistical issues. While there are multiple satellite sensors that incorporate microwave frequencies, inter-comparison between such sensors is limited by the different available channels, spatial resolutions, and calibration metrics between satellites, all of which provide inconsistencies in resolving TC structural features. To remedy these difficulties, a global archive of TCs as measured by all available US satellite microwave sensors is compiled and standardized. Using global historical best track data, TC microwave data is retrieved from the Defense Meteorological Satellite Program (DMSP) series (including all SSM/I and SSMIS), TMI, AMSR-E, and WindSat sensors. Standardization between sensors for each TC overpass are performed, including: 1) Recalibration of data from the 'ice scattering' channels to a common frequency (89GHz); 2) Resampling the DMSP series to a higher resolution using the Backus-Gilbert technique; and 3) Re-centering the TC center more precisely using the ARCHER technique (Wimmers and Velden 2010) to analyze the

  16. Water availability and agricultural demand: An assessment framework using global datasets in a data scarce catchment, Rokel-Seli River, Sierra Leone

    Directory of Open Access Journals (Sweden)

    Christopher K. Masafu

    2016-12-01

    New hydrological insights: We find that the hydrological model capably simulates both low and high flows satisfactorily, and that all the input datasets consistently produce similar results for water withdrawal scenarios. The proposed framework is successfully applied to assess the variability of flows available for abstraction against agricultural demand. The assessment framework conclusions are robust despite the different input datasets and calibration scenarios tested, and can be extended to include other global input datasets.

  17. Automatic plankton image classification combining multiple view features via multiple kernel learning.

    Science.gov (United States)

    Zheng, Haiyong; Wang, Ruchen; Yu, Zhibin; Wang, Nan; Gu, Zhaorui; Zheng, Bing

    2017-12-28

    Plankton, including phytoplankton and zooplankton, are the main source of food for organisms in the ocean and form the base of marine food chain. As the fundamental components of marine ecosystems, plankton is very sensitive to environment changes, and the study of plankton abundance and distribution is crucial, in order to understand environment changes and protect marine ecosystems. This study was carried out to develop an extensive applicable plankton classification system with high accuracy for the increasing number of various imaging devices. Literature shows that most plankton image classification systems were limited to only one specific imaging device and a relatively narrow taxonomic scope. The real practical system for automatic plankton classification is even non-existent and this study is partly to fill this gap. Inspired by the analysis of literature and development of technology, we focused on the requirements of practical application and proposed an automatic system for plankton image classification combining multiple view features via multiple kernel learning (MKL). For one thing, in order to describe the biomorphic characteristics of plankton more completely and comprehensively, we combined general features with robust features, especially by adding features like Inner-Distance Shape Context for morphological representation. For another, we divided all the features into different types from multiple views and feed them to multiple classifiers instead of only one by combining different kernel matrices computed from different types of features optimally via multiple kernel learning. Moreover, we also applied feature selection method to choose the optimal feature subsets from redundant features for satisfying different datasets from different imaging devices. We implemented our proposed classification system on three different datasets across more than 20 categories from phytoplankton to zooplankton. The experimental results validated that our system

  18. The LRO Diviner Foundation Dataset: A Comprehensive Temperature Record of the Moon

    Science.gov (United States)

    Sefton-Nash, E.; Aye, K. M.; Williams, J. P.; Greenhagen, B. T.; Sullivan, M.; Paige, D. A.

    2014-12-01

    The Diviner Lunar Radiometer Experiment aboard NASA's Lunar Reconnaissance Orbiter (LRO) has been systematically mapping the thermal state of the Moon at a mean rate of >1400 observations/second since July 2009. Diviner measures solar reflectance and infrared radiance in 9 spectral channels with bandpasses from 0.3 - 400 μm. With more than 5 years of continuous data, complete spatial coverage of the lunar surface is achieved multiple times and coverage of local solar time enables the diurnal curve to be well-resolved for a given subsolar point. The Diviner Foundation Dataset (FDS) represents a coordinated effort to recalibrate raw data to improve quality, and produce a definitive and comprehensive set of products for use by the lunar science community. We present the contents and organization of the FDS, background on the enhanced processing pipeline, show how it is retrieved from NASA's Planetary Data System, and demonstrate its use with common mapping & analysis tools. The FDS comprises level 1 Reduced Data Records (RDRs) and level 2/3 Gridded Data Records (GDRs). We produce new RDRs using improved calibration algorithms that remove instrument artifacts and improve accuracy of measured radiance, particularly for polar data in permanently shadowed regions. GDRs are built using a per-orbit gridding scheme, and data are sourced from a database constructed by modeling the effective field-of-view for each observation. Notable gridded products available for lunar science include: 1) Globally mapped brightness temperatures for all channels in tiled cylindrical and polar stereographic map projections, 2) Global hourly temperature snapshots - maps of bolometric temperature binned into 1 hour intervals of local time, 3) Topographic products (elevation, slope and azimuth) for each map tile, that represent the terrain model used to process the data, and 4) Accompanying gridded maps of auxiliary quantities such as emission angle, local solar time, error etc…, for filtering

  19. Multiple-Ring Digital Communication Network

    Science.gov (United States)

    Kirkham, Harold

    1992-01-01

    Optical-fiber digital communication network to support data-acquisition and control functions of electric-power-distribution networks. Optical-fiber links of communication network follow power-distribution routes. Since fiber crosses open power switches, communication network includes multiple interconnected loops with occasional spurs. At each intersection node is needed. Nodes of communication network include power-distribution substations and power-controlling units. In addition to serving data acquisition and control functions, each node acts as repeater, passing on messages to next node(s). Multiple-ring communication network operates on new AbNET protocol and features fiber-optic communication.

  20. Mango: multiple alignment with N gapped oligos.

    Science.gov (United States)

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2008-06-01

    Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.

  1. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

    DEFF Research Database (Denmark)

    Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte

    2006-01-01

    Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods...... analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset...

  2. Combining multiple decisions: applications to bioinformatics

    International Nuclear Information System (INIS)

    Yukinawa, N; Ishii, S; Takenouchi, T; Oba, S

    2008-01-01

    Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods

  3. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  4. Multiple sclerosis

    Science.gov (United States)

    ... indwelling catheter Osteoporosis or thinning of the bones Pressure sores Side effects of medicines used to treat the ... Daily bowel care program Multiple sclerosis - discharge Preventing pressure ulcers Swallowing problems Images Multiple sclerosis MRI of the ...

  5. A conceptual prototype for the next-generation national elevation dataset

    Science.gov (United States)

    Stoker, Jason M.; Heidemann, Hans Karl; Evans, Gayla A.; Greenlee, Susan K.

    2013-01-01

    In 2012 the U.S. Geological Survey's (USGS) National Geospatial Program (NGP) funded a study to develop a conceptual prototype for a new National Elevation Dataset (NED) design with expanded capabilities to generate and deliver a suite of bare earth and above ground feature information over the United States. This report details the research on identifying operational requirements based on prior research, evaluation of what is needed for the USGS to meet these requirements, and development of a possible conceptual framework that could potentially deliver the kinds of information that are needed to support NGP's partners and constituents. This report provides an initial proof-of-concept demonstration using an existing dataset, and recommendations for the future, to inform NGP's ongoing and future elevation program planning and management decisions. The demonstration shows that this type of functional process can robustly create derivatives from lidar point cloud data; however, more research needs to be done to see how well it extends to multiple datasets.

  6. Validating a continental-scale groundwater diffuse pollution model using regional datasets.

    Science.gov (United States)

    Ouedraogo, Issoufou; Defourny, Pierre; Vanclooster, Marnik

    2017-12-11

    In this study, we assess the validity of an African-scale groundwater pollution model for nitrates. In a previous study, we identified a statistical continental-scale groundwater pollution model for nitrate. The model was identified using a pan-African meta-analysis of available nitrate groundwater pollution studies. The model was implemented in both Random Forest (RF) and multiple regression formats. For both approaches, we collected as predictors a comprehensive GIS database of 13 spatial attributes, related to land use, soil type, hydrogeology, topography, climatology, region typology, nitrogen fertiliser application rate, and population density. In this paper, we validate the continental-scale model of groundwater contamination by using a nitrate measurement dataset from three African countries. We discuss the issue of data availability, and quality and scale issues, as challenges in validation. Notwithstanding that the modelling procedure exhibited very good success using a continental-scale dataset (e.g. R 2  = 0.97 in the RF format using a cross-validation approach), the continental-scale model could not be used without recalibration to predict nitrate pollution at the country scale using regional data. In addition, when recalibrating the model using country-scale datasets, the order of model exploratory factors changes. This suggests that the structure and the parameters of a statistical spatially distributed groundwater degradation model for the African continent are strongly scale dependent.

  7. The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-02-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.

  8. The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-01-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585

  9. Re-inspection of small RNA sequence datasets reveals several novel human miRNA genes.

    Directory of Open Access Journals (Sweden)

    Thomas Birkballe Hansen

    Full Text Available BACKGROUND: miRNAs are key players in gene expression regulation. To fully understand the complex nature of cellular differentiation or initiation and progression of disease, it is important to assess the expression patterns of as many miRNAs as possible. Thereby, identifying novel miRNAs is an essential prerequisite to make possible a comprehensive and coherent understanding of cellular biology. METHODOLOGY/PRINCIPAL FINDINGS: Based on two extensive, but previously published, small RNA sequence datasets from human embryonic stem cells and human embroid bodies, respectively [1], we identified 112 novel miRNA-like structures and were able to validate miRNA processing in 12 out of 17 investigated cases. Several miRNA candidates were furthermore substantiated by including additional available small RNA datasets, thereby demonstrating the power of combining datasets to identify miRNAs that otherwise may be assigned as experimental noise. CONCLUSIONS/SIGNIFICANCE: Our analysis highlights that existing datasets are not yet exhaustedly studied and continuous re-analysis of the available data is important to uncover all features of small RNA sequencing.

  10. The largest human cognitive performance dataset reveals insights into the effects of lifestyle factors and aging

    Directory of Open Access Journals (Sweden)

    Daniel A Sternberg

    2013-06-01

    Full Text Available Making new breakthroughs in understanding the processes underlying human cognition may depend on the availability of very large datasets that have not historically existed in psychology and neuroscience. Lumosity is a web-based cognitive training platform that has grown to include over 600 million cognitive training task results from over 35 million individuals, comprising the largest existing dataset of human cognitive performance. As part of the Human Cognition Project, Lumosity’s collaborative research program to understand the human mind, Lumos Labs researchers and external research collaborators have begun to explore this dataset in order uncover novel insights about the correlates of cognitive performance. This paper presents two preliminary demonstrations of some of the kinds of questions that can be examined with the dataset. The first example focuses on replicating known findings relating lifestyle factors to baseline cognitive performance in a demographically diverse, healthy population at a much larger scale than has previously been available. The second example examines a question that would likely be very difficult to study in laboratory-based and existing online experimental research approaches: specifically, how learning ability for different types of cognitive tasks changes with age. We hope that these examples will provoke the imagination of researchers who are interested in collaborating to answer fundamental questions about human cognitive performance.

  11. Discovery of Teleconnections Using Data Mining Technologies in Global Climate Datasets

    Directory of Open Access Journals (Sweden)

    Fan Lin

    2007-10-01

    Full Text Available In this paper, we apply data mining technologies to a 100-year global land precipitation dataset and a 100-year Sea Surface Temperature (SST dataset. Some interesting teleconnections are discovered, including well-known patterns and unknown patterns (to the best of our knowledge, such as teleconnections between the abnormally low temperature events of the North Atlantic and floods in Northern Bolivia, abnormally low temperatures of the Venezuelan Coast and floods in Northern Algeria and Tunisia, etc. In particular, we use a high dimensional clustering method and a method that mines episode association rules in event sequences. The former is used to cluster the original time series datasets into higher spatial granularity, and the later is used to discover teleconnection patterns among events sequences that are generated by the clustering method. In order to verify our method, we also do experiments on the SOI index and a 100-year global land precipitation dataset and find many well-known teleconnections, such as teleconnections between SOI lower events and drought events of Eastern Australia, South Africa, and North Brazil; SOI lower events and flood events of the middle-lower reaches of Yangtze River; etc. We also do explorative experiments to help domain scientists discover new knowledge.

  12. Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

    Science.gov (United States)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2018-05-01

    Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.

  13. Risk behaviours among internet-facilitated sex workers: evidence from two new datasets.

    Science.gov (United States)

    Cunningham, Scott; Kendall, Todd D

    2010-12-01

    Sex workers have historically played a central role in STI outbreaks by forming a core group for transmission and due to their higher rates of concurrency and inconsistent condom usage. Over the past 15 years, North American commercial sex markets have been radically reorganised by internet technologies that channelled a sizeable share of the marketplace online. These changes may have had a meaningful impact on the role that sex workers play in STI epidemics. In this study, two new datasets documenting the characteristics and practices of internet-facilitated sex workers are presented and analysed. The first dataset comes from a ratings website where clients share detailed information on over 94,000 sex workers in over 40 cities between 1999 and 2008. The second dataset reflects a year-long field survey of 685 sex workers who advertise online. Evidence from these datasets suggests that internet-facilitated sex workers are dissimilar from the street-based workers who largely populated the marketplace in earlier eras. Differences in characteristics and practices were found which suggest a lower potential for the spread of STIs among internet-facilitated sex workers. The internet-facilitated population appears to include a high proportion of sex workers who are well-educated, hold health insurance and operate only part time. They also engage in relatively low levels of risky sexual practices.

  14. Dimension Reduction Aided Hyperspectral Image Classification with a Small-sized Training Dataset: Experimental Comparisons

    Directory of Open Access Journals (Sweden)

    Jinya Su

    2017-11-01

    Full Text Available Hyperspectral images (HSI provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification, and the need of a large amount of labelled data for training. Particularly, it is generally difficult and expensive for experts to acquire sufficient training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of training dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance and feature extraction (e.g., Principal Component Analysis (PCA, Kernel PCA are adopted to augment a baseline classification method, Support Vector Machine (SVM. The proposed algorithms are evaluated using a real HSI dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small training dataset, which makes it suitable for real-time applications or when only limited training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large datasets but with much less computing time.

  15. RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis

    Directory of Open Access Journals (Sweden)

    Svetlana Avramova

    2018-04-01

    Full Text Available Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms. While there are many public datasets with reactions in synthetic direction (usually non-generic reactions, there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools.

  16. A Unified Framework for Measuring Stewardship Practices Applied to Digital Environmental Datasets

    Directory of Open Access Journals (Sweden)

    Ge Peng

    2015-01-01

    Full Text Available This paper presents a stewardship maturity assessment model in the form of a matrix for digital environmental datasets. Nine key components are identified based on requirements imposed on digital environmental data and information that are cared for and disseminated by U.S. Federal agencies by U.S. law, i.e., Information Quality Act of 2001, agencies’ guidance, expert bodies’ recommendations, and users. These components include: preservability, accessibility, usability, production sustainability, data quality assurance, data quality control/monitoring, data quality assessment, transparency/traceability, and data integrity. A five-level progressive maturity scale is then defined for each component associated with measurable practices applied to individual datasets, representing Ad Hoc, Minimal, Intermediate, Advanced, and Optimal stages. The rationale for each key component and its maturity levels is described. This maturity model, leveraging community best practices and standards, provides a unified framework for assessing scientific data stewardship. It can be used to create a stewardship maturity scoreboard of dataset(s and a roadmap for scientific data stewardship improvement or to provide data quality and usability information to users, stakeholders, and decision makers.

  17. 2001 - 2010 Danish design reference year. Reference climate dataset for technical dimensioning in building, construction and other sectors

    Energy Technology Data Exchange (ETDEWEB)

    Grunnet Wang, P.; Scharling, M.; Pagh Nielsen, K.; Kern-Hansen, C. [Danish Meteorological Institute (DMI), Copenhagen (Denmark); Wittchen, K.B. [Aalborg Univ., Danish Building Research Institute (SBi), Copenhagen (Denmark)

    2013-09-15

    This report presents the Danish Design Reference Year based on observed data from 2001 - 2010. In various sectors - i.e. building and construction, energy, etc. - the climate and weather usually plays a part in a given project. The Danish Design Reference Year dataset is a collection of data series for eleven specific parameters, that each represents a typical year in Denmark. The uses of the dataset may vary from simulations to statistical analysis, graphical overviews etc. The Danish land areas have been sectionalised into five to six climatological zones depending on the parameter, each characterized by distinct diurnal and yearly variations. The dataset consists of observed data from one station located within and representing each zone. In addition to the complete Danish Design Reference Year dataset, a subset specifically selected to be used for energy performance calculations for obtaining a building permit is included. (Author)

  18. FY13 Summary Report on the Augmentation of the Spent Fuel Composition Dataset for Nuclear Forensics: SFCOMPO/NF

    Energy Technology Data Exchange (ETDEWEB)

    Brady Raap, Michaele C.; Lyons, Jennifer A.; Collins, Brian A.; Livingston, James V.

    2014-03-31

    This report documents the FY13 efforts to enhance a dataset of spent nuclear fuel isotopic composition data for use in developing intrinsic signatures for nuclear forensics. A review and collection of data from the open literature was performed in FY10. In FY11, the Spent Fuel COMPOsition (SFCOMPO) excel-based dataset for nuclear forensics (NF), SFCOMPO/NF was established and measured data for graphite production reactors, Boiling Water Reactors (BWRs) and Pressurized Water Reactors (PWRs) were added to the dataset and expanded to include a consistent set of data simulated by calculations. A test was performed to determine whether the SFCOMPO/NF dataset will be useful for the analysis and identification of reactor types from isotopic ratios observed in interdicted samples.

  19. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  20. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  1. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  2. Oil palm mapping for Malaysia using PALSAR-2 dataset

    Science.gov (United States)

    Gong, P.; Qi, C. Y.; Yu, L.; Cracknell, A.

    2016-12-01

    Oil palm is one of the most productive vegetable oil crops in the world. The main oil palm producing areas are distributed in humid tropical areas such as Malaysia, Indonesia, Thailand, western and central Africa, northern South America, and central America. Increasing market demands, high yields and low production costs of palm oil are the primary factors driving large-scale commercial cultivation of oil palm, especially in Malaysia and Indonesia. Global demand for palm oil has grown exponentially during the last 50 years, and the expansion of oil palm plantations is linked directly to the deforestation of natural forests. Satellite remote sensing plays an important role in monitoring expansion of oil palm. However, optical remote sensing images are difficult to acquire in the Tropics because of the frequent occurrence of thick cloud cover. This problem has led to the use of data obtained by synthetic aperture radar (SAR), which is a sensor capable of all-day/all-weather observation for studies in the Tropics. In this study, the ALOS-2 (Advanced Land Observing Satellite) PALSAR-2 (Phased Array type L-band SAR) datasets for year 2015 were used as an input to a support vector machine (SVM) based machine learning algorithm. Oil palm/non-oil palm samples were collected using a hexagonal equal-area sampling design. High-resolution images in Google Earth and PALSAR-2 imagery were used in human photo-interpretation to separate oil palm from others (i.e. cropland, forest, grassland, shrubland, water, hard surface and bareland). The characteristics of oil palms from various aspects, including PALSAR-2 backscattering coefficients (HH, HV), terrain and climate by using this sample set were further explored to post-process the SVM output. The average accuracy of oil palm type is better than 80% in the final oil palm map for Malaysia.

  3. Automatic aortic root segmentation in CTA whole-body dataset

    Science.gov (United States)

    Gao, Xinpei; Kitslaar, Pieter H.; Scholte, Arthur J. H. A.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke; Reiber, Johan H. C.

    2016-03-01

    Trans-catheter aortic valve replacement (TAVR) is an evolving technique for patients with serious aortic stenosis disease. Typically, in this application a CTA data set is obtained of the patient's arterial system from the subclavian artery to the femoral arteries, to evaluate the quality of the vascular access route and analyze the aortic root to determine if and which prosthesis should be used. In this paper, we concentrate on the automated segmentation of the aortic root. The purpose of this study was to automatically segment the aortic root in computed tomography angiography (CTA) datasets to support TAVR procedures. The method in this study includes 4 major steps. First, the patient's cardiac CTA image was resampled to reduce the computation time. Next, the cardiac CTA image was segmented using an atlas-based approach. The most similar atlas was selected from a total of 8 atlases based on its image similarity to the input CTA image. Third, the aortic root segmentation from the previous step was transferred to the patient's whole-body CTA image by affine registration and refined in the fourth step using a deformable subdivision surface model fitting procedure based on image intensity. The pipeline was applied to 20 patients. The ground truth was created by an analyst who semi-automatically corrected the contours of the automatic method, where necessary. The average Dice similarity index between the segmentations of the automatic method and the ground truth was found to be 0.965±0.024. In conclusion, the current results are very promising.

  4. Publishing datasets with eSciDoc and panMetaDocs

    Science.gov (United States)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of

  5. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  6. Traffic sign classification with dataset augmentation and convolutional neural network

    Science.gov (United States)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  7. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  8. MULTIPLE OBJECTS

    Directory of Open Access Journals (Sweden)

    A. A. Bosov

    2015-04-01

    Full Text Available Purpose. The development of complicated techniques of production and management processes, information systems, computer science, applied objects of systems theory and others requires improvement of mathematical methods, new approaches for researches of application systems. And the variety and diversity of subject systems makes necessary the development of a model that generalizes the classical sets and their development – sets of sets. Multiple objects unlike sets are constructed by multiple structures and represented by the structure and content. The aim of the work is the analysis of multiple structures, generating multiple objects, the further development of operations on these objects in application systems. Methodology. To achieve the objectives of the researches, the structure of multiple objects represents as constructive trio, consisting of media, signatures and axiomatic. Multiple object is determined by the structure and content, as well as represented by hybrid superposition, composed of sets, multi-sets, ordered sets (lists and heterogeneous sets (sequences, corteges. Findings. In this paper we study the properties and characteristics of the components of hybrid multiple objects of complex systems, proposed assessments of their complexity, shown the rules of internal and external operations on objects of implementation. We introduce the relation of arbitrary order over multiple objects, we define the description of functions and display on objects of multiple structures. Originality.In this paper we consider the development of multiple structures, generating multiple objects.Practical value. The transition from the abstract to the subject of multiple structures requires the transformation of the system and multiple objects. Transformation involves three successive stages: specification (binding to the domain, interpretation (multiple sites and particularization (goals. The proposed describe systems approach based on hybrid sets

  9. Reliability of Source Mechanisms for a Hydraulic Fracturing Dataset

    Science.gov (United States)

    Eyre, T.; Van der Baan, M.

    2016-12-01

    Non-double-couple components have been inferred for induced seismicity due to fluid injection, yet these components are often poorly constrained due to the acquisition geometry. Likewise non-double-couple components in microseismic recordings are not uncommon. Microseismic source mechanisms provide an insight into the fracturing behaviour of a hydraulically stimulated reservoir. However, source inversion in a hydraulic fracturing environment is complicated by the likelihood of volumetric contributions to the source due to the presence of high pressure fluids, which greatly increases the possible solution space and therefore the non-uniqueness of the solutions. Microseismic data is usually recorded on either 2D surface or borehole arrays of sensors. In many cases, surface arrays appear to constrain source mechanisms with high shear components, whereas borehole arrays tend to constrain more variable mechanisms including those with high tensile components. The abilities of each geometry to constrain the true source mechanisms are therefore called into question.The ability to distinguish between shear and tensile source mechanisms with different acquisition geometries is investigated using synthetic data. For both inversions, both P- and S- wave amplitudes recorded on three component sensors need to be included to obtain reliable solutions. Surface arrays appear to give more reliable solutions due to a greater sampling of the focal sphere, but in reality tend to record signals with a low signal to noise ratio. Borehole arrays can produce acceptable results, however the reliability is much more affected by relative source-receiver locations and source orientation, with biases produced in many of the solutions. Therefore more care must be taken when interpreting results.These findings are taken into account when interpreting a microseismic dataset of 470 events recorded by two vertical borehole arrays monitoring a horizontal treatment well. Source locations and

  10. Scalable and portable visualization of large atomistic datasets

    Science.gov (United States)

    Sharma, Ashish; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2004-10-01

    A scalable and portable code named Atomsviewer has been developed to interactively visualize a large atomistic dataset consisting of up to a billion atoms. The code uses a hierarchical view frustum-culling algorithm based on the octree data structure to efficiently remove atoms outside of the user's field-of-view. Probabilistic and depth-based occlusion-culling algorithms then select atoms, which have a high probability of being visible. Finally a multiresolution algorithm is used to render the selected subset of visible atoms at varying levels of detail. Atomsviewer is written in C++ and OpenGL, and it has been tested on a number of architectures including Windows, Macintosh, and SGI. Atomsviewer has been used to visualize tens of millions of atoms on a standard desktop computer and, in its parallel version, up to a billion atoms. Program summaryTitle of program: Atomsviewer Catalogue identifier: ADUM Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADUM Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: 2.4 GHz Pentium 4/Xeon processor, professional graphics card; Apple G4 (867 MHz)/G5, professional graphics card Operating systems under which the program has been tested: Windows 2000/XP, Mac OS 10.2/10.3, SGI IRIX 6.5 Programming languages used: C++, C and OpenGL Memory required to execute with typical data: 1 gigabyte of RAM High speed storage required: 60 gigabytes No. of lines in the distributed program including test data, etc.: 550 241 No. of bytes in the distributed program including test data, etc.: 6 258 245 Number of bits in a word: Arbitrary Number of processors used: 1 Has the code been vectorized or parallelized: No Distribution format: tar gzip file Nature of physical problem: Scientific visualization of atomic systems Method of solution: Rendering of atoms using computer graphic techniques, culling algorithms for data

  11. RAE: The Rainforest Automation Energy Dataset for Smart Grid Meter Data Analysis

    Directory of Open Access Journals (Sweden)

    Stephen Makonin

    2018-02-01

    Full Text Available Datasets are important for researchers to build models and test how well their machine learning algorithms perform. This paper presents the Rainforest Automation Energy (RAE dataset to help smart grid researchers test their algorithms that make use of smart meter data. This initial release of RAE contains 1 Hz data (mains and sub-meters from two residential houses. In addition to power data, environmental and sensor data from the house’s thermostat is included. Sub-meter data from one of the houses includes heat pump and rental suite captures, which is of interest to power utilities. We also show an energy breakdown of each house and show (by example how RAE can be used to test non-intrusive load monitoring (NILM algorithms.

  12. Semi-supervised tracking of extreme weather events in global spatio-temporal climate datasets

    Science.gov (United States)

    Kim, S. K.; Prabhat, M.; Williams, D. N.

    2017-12-01

    Deep neural networks have been successfully applied to solve problem to detect extreme weather events in large scale climate datasets and attend superior performance that overshadows all previous hand-crafted methods. Recent work has shown that multichannel spatiotemporal encoder-decoder CNN architecture is able to localize events in semi-supervised bounding box. Motivated by this work, we propose new learning metric based on Variational Auto-Encoders (VAE) and Long-Short-Term-Memory (LSTM) to track extreme weather events in spatio-temporal dataset. We consider spatio-temporal object tracking problems as learning probabilistic distribution of continuous latent features of auto-encoder using stochastic variational inference. For this, we assume that our datasets are i.i.d and latent features is able to be modeled by Gaussian distribution. In proposed metric, we first train VAE to generate approximate posterior given multichannel climate input with an extreme climate event at fixed time. Then, we predict bounding box, location and class of extreme climate events using convolutional layers given input concatenating three features including embedding, sampled mean and standard deviation. Lastly, we train LSTM with concatenated input to learn timely information of dataset by recurrently feeding output back to next time-step's input of VAE. Our contribution is two-fold. First, we show the first semi-supervised end-to-end architecture based on VAE to track extreme weather events which can apply to massive scaled unlabeled climate datasets. Second, the information of timely movement of events is considered for bounding box prediction using LSTM which can improve accuracy of localization. To our knowledge, this technique has not been explored neither in climate community or in Machine Learning community.

  13. Segmentation of teeth in CT volumetric dataset by panoramic projection and variational level set

    International Nuclear Information System (INIS)

    Hosntalab, Mohammad; Aghaeizadeh Zoroofi, Reza; Abbaspour Tehrani-Fard, Ali; Shirani, Gholamreza

    2008-01-01

    Quantification of teeth is of clinical importance for various computer assisted procedures such as dental implant, orthodontic planning, face, jaw and cosmetic surgeries. In this regard, segmentation is a major step. In this paper, we propose a method for segmentation of teeth in volumetric computed tomography (CT) data using panoramic re-sampling of the dataset in the coronal view and variational level set. The proposed method consists of five steps as follows: first, we extract a mask in a CT images using Otsu thresholding. Second, the teeth are segmented from other bony tissues by utilizing anatomical knowledge of teeth in the jaws. Third, the proposed method is followed by estimating the arc of the upper and lower jaws and panoramic re-sampling of the dataset. Separation of upper and lower jaws and initial segmentation of teeth are performed by employing the horizontal and vertical projections of the panoramic dataset, respectively. Based the above mentioned procedures an initial mask for each tooth is obtained. Finally, we utilize the initial mask of teeth and apply a Variational level set to refine initial teeth boundaries to final contours. The proposed algorithm was evaluated in the presence of 30 multi-slice CT datasets including 3,600 images. Experimental results reveal the effectiveness of the proposed method. In the proposed algorithm, the variational level set technique was utilized to trace the contour of the teeth. In view of the fact that, this technique is based on the characteristic of the overall region of the teeth image, it is possible to extract a very smooth and accurate tooth contour using this technique. In the presence of the available datasets, the proposed technique was successful in teeth segmentation compared to previous techniques. (orig.)

  14. Segmentation of teeth in CT volumetric dataset by panoramic projection and variational level set

    Energy Technology Data Exchange (ETDEWEB)

    Hosntalab, Mohammad [Islamic Azad University, Faculty of Engineering, Science and Research Branch, Tehran (Iran); Aghaeizadeh Zoroofi, Reza [University of Tehran, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, Tehran (Iran); Abbaspour Tehrani-Fard, Ali [Islamic Azad University, Faculty of Engineering, Science and Research Branch, Tehran (Iran); Sharif University of Technology, Department of Electrical Engineering, Tehran (Iran); Shirani, Gholamreza [Faculty of Dentistry Medical Science of Tehran University, Oral and Maxillofacial Surgery Department, Tehran (Iran)

    2008-09-15

    Quantification of teeth is of clinical importance for various computer assisted procedures such as dental implant, orthodontic planning, face, jaw and cosmetic surgeries. In this regard, segmentation is a major step. In this paper, we propose a method for segmentation of teeth in volumetric computed tomography (CT) data using panoramic re-sampling of the dataset in the coronal view and variational level set. The proposed method consists of five steps as follows: first, we extract a mask in a CT images using Otsu thresholding. Second, the teeth are segmented from other bony tissues by utilizing anatomical knowledge of teeth in the jaws. Third, the proposed method is followed by estimating the arc of the upper and lower jaws and panoramic re-sampling of the dataset. Separation of upper and lower jaws and initial segmentation of teeth are performed by employing the horizontal and vertical projections of the panoramic dataset, respectively. Based the above mentioned procedures an initial mask for each tooth is obtained. Finally, we utilize the initial mask of teeth and apply a Variational level set to refine initial teeth boundaries to final contours. The proposed algorithm was evaluated in the presence of 30 multi-slice CT datasets including 3,600 images. Experimental results reveal the effectiveness of the proposed method. In the proposed algorithm, the variational level set technique was utilized to trace the contour of the teeth. In view of the fact that, this technique is based on the characteristic of the overall region of the teeth image, it is possible to extract a very smooth and accurate tooth contour using this technique. In the presence of the available datasets, the proposed technique was successful in teeth segmentation compared to previous techniques. (orig.)

  15. Uncertainty Assessment of the NASA Earth Exchange Global Daily Downscaled Climate Projections (NEX-GDDP) Dataset

    Science.gov (United States)

    Wang, Weile; Nemani, Ramakrishna R.; Michaelis, Andrew; Hashimoto, Hirofumi; Dungan, Jennifer L.; Thrasher, Bridget L.; Dixon, Keith W.

    2016-01-01

    The NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP) dataset is comprised of downscaled climate projections that are derived from 21 General Circulation Model (GCM) runs conducted under the Coupled Model Intercomparison Project Phase 5 (CMIP5) and across two of the four greenhouse gas emissions scenarios (RCP4.5 and RCP8.5). Each of the climate projections includes daily maximum temperature, minimum temperature, and precipitation for the periods from 1950 through 2100 and the spatial resolution is 0.25 degrees (approximately 25 km x 25 km). The GDDP dataset has received warm welcome from the science community in conducting studies of climate change impacts at local to regional scales, but a comprehensive evaluation of its uncertainties is still missing. In this study, we apply the Perfect Model Experiment framework (Dixon et al. 2016) to quantify the key sources of uncertainties from the observational baseline dataset, the downscaling algorithm, and some intrinsic assumptions (e.g., the stationary assumption) inherent to the statistical downscaling techniques. We developed a set of metrics to evaluate downscaling errors resulted from bias-correction ("quantile-mapping"), spatial disaggregation, as well as the temporal-spatial non-stationarity of climate variability. Our results highlight the spatial disaggregation (or interpolation) errors, which dominate the overall uncertainties of the GDDP dataset, especially over heterogeneous and complex terrains (e.g., mountains and coastal area). In comparison, the temporal errors in the GDDP dataset tend to be more constrained. Our results also indicate that the downscaled daily precipitation also has relatively larger uncertainties than the temperature fields, reflecting the rather stochastic nature of precipitation in space. Therefore, our results provide insights in improving statistical downscaling algorithms and products in the future.

  16. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [CEADEN, Havana 11300, Cuba and INFN, Sezione di Torino, Torino 10125 (Italy); Fiorina, E.; Pennazio, F.; Peroni, C. [Department of Physics, University of Torino, Torino 10125, Italy and INFN, Sezione di Torino, Torino 10125 (Italy); Saletta, M.; Cerello, P., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [INFN, Sezione di Torino, Torino 10125 (Italy); Camarlinghi, N.; Fantacci, M. E. [Department of Physics, University of Pisa, Pisa 56127, Italy and INFN, Sezione di Pisa, Pisa 56127 (Italy)

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  17. Macroeconomic dataset for generating macroeconomic volatility among selected countries in the Asia Pacific region

    OpenAIRE

    Chow, Yee Peng; Muhammad, Junaina; Amin Noordin, Bany Ariffin; Cheng, Fan Fah

    2017-01-01

    This data article provides macroeconomic data that can be used to generate macroeconomic volatility. The data cover a sample of seven selected countries in the Asia Pacific region for the period 2004–2014, including both developing and developed countries. This dataset was generated to enhance our understanding of the sources of macroeconomic volatility affecting the countries in this region. Although the Asia Pacific region continues to remain as the most dynamic part of the world's economy,...

  18. Two Types of Social Grooming discovered in Primitive and Modern Communication Data-Sets

    OpenAIRE

    Takano, Masanori

    2017-01-01

    Social networking sites (SNS) provide innovative social bonding methods known as social grooming. These have drastically decreased time and distance constraints of social grooming. Here we show two type social grooming (elaborate social grooming and lightweight social grooming) discovered in a model constructed by thirty communication data-sets including face to face, SNS, mobile phones, and Chacma baboons. This demarcation is caused by a trade-off between the number and strength of social re...

  19. Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain)

    Science.gov (United States)

    Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

    2015-01-01

    Abstract Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988–1990 and 2009–2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area. PMID:25878552

  20. Dataset of Phenology of Mediterranean high-mountain meadows flora (Sierra Nevada, Spain).

    Science.gov (United States)

    Pérez-Luque, Antonio Jesús; Sánchez-Rojas, Cristina Patricia; Zamora, Regino; Pérez-Pérez, Ramón; Bonet, Francisco Javier

    2015-01-01

    Sierra Nevada mountain range (southern Spain) hosts a high number of endemic plant species, being one of the most important biodiversity hotspots in the Mediterranean basin. The high-mountain meadow ecosystems (borreguiles) harbour a large number of endemic and threatened plant species. In this data paper, we describe a dataset of the flora inhabiting this threatened ecosystem in this Mediterranean mountain. The dataset includes occurrence data for flora collected in those ecosystems in two periods: 1988-1990 and 2009-2013. A total of 11002 records of occurrences belonging to 19 orders, 28 families 52 genera were collected. 73 taxa were recorded with 29 threatened taxa. We also included data of cover-abundance and phenology attributes for the records. The dataset is included in the Sierra Nevada Global-Change Observatory (OBSNEV), a long-term research project designed to compile socio-ecological information on the major ecosystem types in order to identify the impacts of global change in this area.

  1. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  2. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  3. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  4. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  5. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  6. The LAMBADA dataset: Word prediction requiring a broad discourse context

    NARCIS (Netherlands)

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  7. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    Energy Technology Data Exchange (ETDEWEB)

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  8. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  9. Dataset: Multi Sensor-Orientation Movement Data of Goats

    NARCIS (Netherlands)

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  10. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  11. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  12. participatory development of a minimum dataset for the khayelitsha ...

    African Journals Online (AJOL)

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  13. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  14. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  15. Relative Error Evaluation to Typical Open Global dem Datasets in Shanxi Plateau of China

    Science.gov (United States)

    Zhao, S.; Zhang, S.; Cheng, W.

    2018-04-01

    Produced by radar data or stereo remote sensing image pairs, global DEM datasets are one of the most important types for DEM data. Relative error relates to surface quality created by DEM data, so it relates to geomorphology and hydrologic applications using DEM data. Taking Shanxi Plateau of China as the study area, this research evaluated the relative error to typical open global DEM datasets including Shuttle Radar Terrain Mission (SRTM) data with 1 arc second resolution (SRTM1), SRTM data with 3 arc second resolution (SRTM3), ASTER global DEM data in the second version (GDEM-v2) and ALOS world 3D-30m (AW3D) data. Through process and selection, more than 300,000 ICESat/GLA14 points were used as the GCP data, and the vertical error was computed and compared among four typical global DEM datasets. Then, more than 2,600,000 ICESat/GLA14 point pairs were acquired using the distance threshold between 100 m and 500 m. Meanwhile, the horizontal distance between every point pair was computed, so the relative error was achieved using slope values based on vertical error difference and the horizontal distance of the point pairs. Finally, false slope ratio (FSR) index was computed through analyzing the difference between DEM and ICESat/GLA14 values for every point pair. Both relative error and FSR index were categorically compared for the four DEM datasets under different slope classes. Research results show: Overall, AW3D has the lowest relative error values in mean error, mean absolute error, root mean square error and standard deviation error; then the SRTM1 data, its values are a little higher than AW3D data; the SRTM3 and GDEM-v2 data have the highest relative error values, and the values for the two datasets are similar. Considering different slope conditions, all the four DEM data have better performance in flat areas but worse performance in sloping regions; AW3D has the best performance in all the slope classes, a litter better than SRTM1; with slope increasing

  16. Estimated Perennial Streams of Idaho and Related Geospatial Datasets

    Science.gov (United States)

    Rea, Alan; Skinner, Kenneth D.

    2009-01-01

    The perennial or intermittent status of a stream has bearing on many regulatory requirements. Because of changing technologies over time, cartographic representation of perennial/intermittent status of streams on U.S. Geological Survey (USGS) topographic maps is not always accurate and (or) consistent from one map sheet to another. Idaho Administrative Code defines an intermittent stream as one having a 7-day, 2-year low flow (7Q2) less than 0.1 cubic feet per second. To establish consistency with the Idaho Administrative Code, the USGS developed regional regression equations for Idaho streams for several low-flow statistics, including 7Q2. Using these regression equations, the 7Q2 streamflow may be estimated for naturally flowing streams anywhere in Idaho to help determine perennial/intermittent status of streams. Using these equations in conjunction with a Geographic Information System (GIS) technique known as weighted flow accumulation allows for an automated and continuous estimation of 7Q2 streamflow at all points along a stream, which in turn can be used to determine if a stream is intermittent or perennial according to the Idaho Administrative Code operational definition. The selected regression equations were applied to create continuous grids of 7Q2 estimates for the eight low-flow regression regions of Idaho. By applying the 0.1 ft3/s criterion, the perennial streams have been estimated in each low-flow region. Uncertainty in the estimates is shown by identifying a 'transitional' zone, corresponding to flow estimates of 0.1 ft3/s plus and minus one standard error. Considerable additional uncertainty exists in the model of perennial streams presented in this report. The regression models provide overall estimates based on general trends within each regression region. These models do not include local factors such as a large spring or a losing reach that may greatly affect flows at any given point. Site-specific flow data, assuming a sufficient period of

  17. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  18. Developing a minimum dataset for nursing team leader handover in the intensive care unit: A focus group study.

    Science.gov (United States)

    Spooner, Amy J; Aitken, Leanne M; Corley, Amanda; Chaboyer, Wendy

    2018-01-01

    Despite increasing demand for structured processes to guide clinical handover, nursing handover tools are limited in the intensive care unit. The study aim was to identify key items to include in a minimum dataset for intensive care nursing team leader shift-to-shift handover. This focus group study was conducted in a 21-bed medical/surgical intensive care unit in Australia. Senior registered nurses involved in team leader handovers were recruited. Focus groups were conducted using a nominal group technique to generate and prioritise minimum dataset items. Nurses were presented with content from previous team leader handovers and asked to select which content items to include in a minimum dataset. Participant responses were summarised as frequencies and percentages. Seventeen senior nurses participated in three focus groups. Participants agreed that ISBAR (Identify-Situation-Background-Assessment-Recommendations) was a useful tool to guide clinical handover. Items recommended to be included in the minimum dataset (≥65% agreement) included Identify (name, age, days in intensive care), Situation (diagnosis, surgical procedure), Background (significant event(s), management of significant event(s)) and Recommendations (patient plan for next shift, tasks to follow up for next shift). Overall, 30 of the 67 (45%) items in the Assessment category were considered important to include in the minimum dataset and focused on relevant observations and treatment within each body system. Other non-ISBAR items considered important to include related to the ICU (admissions to ICU, staffing/skill mix, theatre cases) and patients (infectious status, site of infection, end of life plan). Items were further categorised into those to include in all handovers and those to discuss only when relevant to the patient. The findings suggest a minimum dataset for intensive care nursing team leader shift-to-shift handover should contain items within ISBAR along with unit and patient specific

  19. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  20. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  1. Astronaut Photography of the Earth: A Long-Term Dataset for Earth Systems Research, Applications, and Education

    Science.gov (United States)

    Stefanov, William L.

    2017-01-01

    The NASA Earth observations dataset obtained by humans in orbit using handheld film and digital cameras is freely accessible to the global community through the online searchable database at https://eol.jsc.nasa.gov, and offers a useful compliment to traditional ground-commanded sensor data. The dataset includes imagery from the NASA Mercury (1961) through present-day International Space Station (ISS) programs, and currently totals over 2.6 million individual frames. Geographic coverage of the dataset includes land and oceans areas between approximately 52 degrees North and South latitudes, but is spatially and temporally discontinuous. The photographic dataset includes some significant impediments for immediate research, applied, and educational use: commercial RGB films and camera systems with overlapping bandpasses; use of different focal length lenses, unconstrained look angles, and variable spacecraft altitudes; and no native geolocation information. Such factors led to this dataset being underutilized by the community but recent advances in automated and semi-automated image geolocation, image feature classification, and web-based services are adding new value to the astronaut-acquired imagery. A coupled ground software and on-orbit hardware system for the ISS is in development for planned deployment in mid-2017; this system will capture camera pose information for each astronaut photograph to allow automated, full georegistration of the data. The ground system component of the system is currently in use to fully georeference imagery collected in response to International Disaster Charter activations, and the auto-registration procedures are being applied to the extensive historical database of imagery to add value for research and educational purposes. In parallel, machine learning techniques are being applied to automate feature identification and classification throughout the dataset, in order to build descriptive metadata that will improve search

  2. Quality-control of an hourly rainfall dataset and climatology of extremes for the UK.

    Science.gov (United States)

    Blenkinsop, Stephen; Lewis, Elizabeth; Chan, Steven C; Fowler, Hayley J

    2017-02-01

    Sub-daily rainfall extremes may be associated with flash flooding, particularly in urban areas but, compared with extremes on daily timescales, have been relatively little studied in many regions. This paper describes a new, hourly rainfall dataset for the UK based on ∼1600 rain gauges from three different data sources. This includes tipping bucket rain gauge data from the UK Environment Agency (EA), which has been collected for operational purposes, principally flood forecasting. Significant problems in the use of such data for the analysis of extreme events include the recording of accumulated totals, high frequency bucket tips, rain gauge recording errors and the non-operation of gauges. Given the prospect of an intensification of short-duration rainfall in a warming climate, the identification of such errors is essential if sub-daily datasets are to be used to better understand extreme events. We therefore first describe a series of procedures developed to quality control this new dataset. We then analyse ∼380 gauges with near-complete hourly records for 1992-2011 and map the seasonal climatology of intense rainfall based on UK hourly extremes using annual maxima, n-largest events and fixed threshold approaches. We find that the highest frequencies and intensities of hourly extreme rainfall occur during summer when the usual orographically defined pattern of extreme rainfall is replaced by a weaker, north-south pattern. A strong diurnal cycle in hourly extremes, peaking in late afternoon to early evening, is also identified in summer and, for some areas, in spring. This likely reflects the different mechanisms that generate sub-daily rainfall, with convection dominating during summer. The resulting quality-controlled hourly rainfall dataset will provide considerable value in several contexts, including the development of standard, globally applicable quality-control procedures for sub-daily data, the validation of the new generation of very high

  3. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  4. Multiple sclerosis

    International Nuclear Information System (INIS)

    Grunwald, I.Q.; Kuehn, A.L.; Backens, M.; Papanagiotou, P.; Shariat, K.; Kostopoulos, P.

    2008-01-01

    Multiple sclerosis is the most common chronic inflammatory disease of myelin with interspersed lesions in the white matter of the central nervous system. Magnetic resonance imaging (MRI) plays a key role in the diagnosis and monitoring of white matter diseases. This article focuses on key findings in multiple sclerosis as detected by MRI. (orig.) [de

  5. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Science.gov (United States)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  6. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets.

    Science.gov (United States)

    Ellis, D W; Srigley, J

    2016-01-01

    Key quality parameters in diagnostic pathology include timeliness, accuracy, completeness, conformance with current agreed standards, consistency and clarity in communication. In this review, we argue that with worldwide developments in eHealth and big data, generally, there are two further, often overlooked, parameters if our reports are to be fit for purpose. Firstly, population-level studies have clearly demonstrated the value of providing timely structured reporting data in standardised electronic format as part of system-wide quality improvement programmes. Moreover, when combined with multiple health data sources through eHealth and data linkage, structured pathology reports become central to population-level quality monitoring, benchmarking, interventions and benefit analyses in public health management. Secondly, population-level studies, particularly for benchmarking, require a single agreed international and evidence-based standard to ensure interoperability and comparability. This has been taken for granted in tumour classification and staging for many years, yet international standardisation of cancer datasets is only now underway through the International Collaboration on Cancer Reporting (ICCR). In this review, we present evidence supporting the role of structured pathology reporting in quality improvement for both clinical care and population-level health management. Although this review of available evidence largely relates to structured reporting of cancer, it is clear that the same principles can be applied throughout anatomical pathology generally, as they are elsewhere in the health system.

  7. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L. Johnson, Timothy; Malof, Jordan M.; Devarajan, Arjun; Zhang, Wuming; M. Collins, Leslie; G. Newell, Richard

    2016-12-01

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  8. Integrated dataset of anatomical, morphological, and architectural traits for plant species in Madagascar

    Directory of Open Access Journals (Sweden)

    Amira Azizan

    2017-12-01

    Full Text Available In this work, we present a dataset, which provides information on the structural diversity of some endemic tropical species in Madagascar. The data were from CIRAD xylotheque (since 1937, and were also collected during various fieldworks (since 1964. The field notes and photographs were provided by French botanists; particularly by Francis Hallé. The dataset covers 250 plant species with anatomical, morphological, and architectural traits indexed from digitized wood slides and fieldwork documents. The digitized wood slides were constituted by the transverse, tangential, and radial sections with three optical magnifications. The main specific anatomical traits can be found within the digitized area. Information on morphological and architectural traits were indexed from digitized field drawings including notes and photographs. The data are hosted in the website ArchiWood (http://archiwood.cirad.fr. Keywords: Morpho-architectural traits, Plant architecture, Wood anatomy, Madagascar

  9. Soil moisture datasets at five sites in the central Sierra Nevada and northern Coast Ranges, California

    Science.gov (United States)

    Stern, Michelle A.; Anderson, Frank A.; Flint, Lorraine E.; Flint, Alan L.

    2018-05-03

    In situ soil moisture datasets are important inputs used to calibrate and validate watershed, regional, or statewide modeled and satellite-based soil moisture estimates. The soil moisture dataset presented in this report includes hourly time series of the following: soil temperature, volumetric water content, water potential, and total soil water content. Data were collected by the U.S. Geological Survey at five locations in California: three sites in the central Sierra Nevada and two sites in the northern Coast Ranges. This report provides a description of each of the study areas, procedures and equipment used, processing steps, and time series data from each site in the form of comma-separated values (.csv) tables.

  10. Event recognition in personal photo collections via multiple instance learning-based classification of multiple images

    Science.gov (United States)

    Ahmad, Kashif; Conci, Nicola; Boato, Giulia; De Natale, Francesco G. B.

    2017-11-01

    Over the last few years, a rapid growth has been witnessed in the number of digital photos produced per year. This rapid process poses challenges in the organization and management of multimedia collections, and one viable solution consists of arranging the media on the basis of the underlying events. However, album-level annotation and the presence of irrelevant pictures in photo collections make event-based organization of personal photo albums a more challenging task. To tackle these challenges, in contrast to conventional approaches relying on supervised learning, we propose a pipeline for event recognition in personal photo collections relying on a multiple instance-learning (MIL) strategy. MIL is a modified form of supervised learning and fits well for such applications with weakly labeled data. The experimental evaluation of the proposed approach is carried out on two large-scale datasets including a self-collected and a benchmark dataset. On both, our approach significantly outperforms the existing state-of-the-art.

  11. The MetabolomeExpress Project: enabling web-based processing, analysis and transparent dissemination of GC/MS metabolomics datasets

    Directory of Open Access Journals (Sweden)

    Carroll Adam J

    2010-07-01

    Full Text Available Abstract Background Standardization of analytical approaches and reporting methods via community-wide collaboration can work synergistically with web-tool development to result in rapid community-driven expansion of online data repositories suitable for data mining and meta-analysis. In metabolomics, the inter-laboratory reproducibility of gas-chromatography/mass-spectrometry (GC/MS makes it an obvious target for such development. While a number of web-tools offer access to datasets and/or tools for raw data processing and statistical analysis, none of these systems are currently set up to act as a public repository by easily accepting, processing and presenting publicly submitted GC/MS metabolomics datasets for public re-analysis. Description Here, we present MetabolomeExpress, a new File Transfer Protocol (FTP server and web-tool for the online storage, processing, visualisation and statistical re-analysis of publicly submitted GC/MS metabolomics datasets. Users may search a quality-controlled database of metabolite response statistics from publicly submitted datasets by a number of parameters (eg. metabolite, species, organ/biofluid etc.. Users may also perform meta-analysis comparisons of multiple independent experiments or re-analyse public primary datasets via user-friendly tools for t-test, principal components analysis, hierarchical cluster analysis and correlation analysis. They may interact with chromatograms, mass spectra and peak detection results via an integrated raw data viewer. Researchers who register for a free account may upload (via FTP their own data to the server for online processing via a novel raw data processing pipeline. Conclusions MetabolomeExpress https://www.metabolome-express.org provides a new opportunity for the general metabolomics community to transparently present online the raw and processed GC/MS data underlying their metabolomics publications. Transparent sharing of these data will allow researchers to

  12. K, L, and M shell datasets for PIXE spectrum fitting and analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, David D., E-mail: dcz@ansto.gov.au; Crawford, Jagoda; Siegele, Rainer

    2015-11-15

    Highlights: • Differences between several datasets commonly used by PIXE codes for spectrum fitting and concentration estimates have been highlighted. • A preferred option dataset was selected which includes ionisation cross sections, fluorescence yield, Coster–Kronig probabilities and X-ray line emission rates for K, L and M subshells. • For PIXE codes differences of several tens of percent can be seen for selected elements for L and M lines depending on the data sets selected. - Abstract: Routine PIXE analysis programs, like GUPIX, GEOPIXE and PIXAN generally perform at least two key functions firstly, the fitting of K, L and M characteristic lines X-ray lines to a background, including unfolding of overlapping lines and secondly, the use of a fitted primary Kα, Lα or Mα line area to determine the elemental concentration in a given matrix. To achieve these two results to better than 3–5% the data sets for fluorescence yields, emission rates, Coster–Kronig transitions and ionisation cross sections should be determined to better than 3%. There are many different theoretical and experimental K, L and M datasets for these parameters. How they are applied and used in analysis programs can vary the results obtained for both fitting and concentration determinations. Here we discuss several commonly used datasets for fluorescence yields, emission rates, Coster–Kronig transitions and ionisation cross sections for K, L and M subshells and suggests an optimum set to obtain consistent results for PIXE analyses across a range of elements with atomic numbers from 5 ⩽ Z ⩽ 100.

  13. Using large hydrological datasets to create a robust, physically based, spatially distributed model for Great Britain

    Science.gov (United States)

    Lewis, Elizabeth; Kilsby, Chris; Fowler, Hayley

    2014-05-01

    The impact of climate change on hydrological systems requires further quantification in order to inform water management. This study intends to conduct such analysis using hydrological models. Such models are of varying forms, of which conceptual, lumped parameter models and physically-based models are two important types. The majority of hydrological studies use conceptual models calibrated against measured river flow time series in order to represent catchment behaviour. This method often shows impressive results for specific problems in gauged catchments. However, the results may not be robust under non-stationary conditions such as climate change, as physical processes and relationships amenable to change are not accounted for explicitly. Moreover, conceptual models are less readily applicable to ungauged catchments, in which hydrological predictions are also required. As such, the physically based, spatially distributed model SHETRAN is used in this study to develop a robust and reliable framework for modelling historic and future behaviour of gauged and ungauged catchments across the whole of Great Britain. In order to achieve this, a large array of data completely covering Great Britain for the period 1960-2006 has been collated and efficiently stored ready for model input. The data processed include a DEM, rainfall, PE and maps of geology, soil and land cover. A desire to make the modelling system easy for others to work with led to the development of a user-friendly graphical interface. This allows non-experts to set up and run a catchment model in a few seconds, a process that can normally take weeks or months. The quality and reliability of the extensive dataset for modelling hydrological processes has also been evaluated. One aspect of this has been an assessment of error and uncertainty in rainfall input data, as well as the effects of temporal resolution in precipitation inputs on model calibration. SHETRAN has been updated to accept gridded rainfall

  14. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  15. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  16. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  17. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  18. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  19. Dataset of mitochondrial genome variants in oncocytic tumors

    Directory of Open Access Journals (Sweden)

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  20. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ