WorldWideScience

Sample records for systematics limited dataset

  1. GO Trimming: Systematically reducing redundancy in large Gene Ontology datasets

    Directory of Open Access Journals (Sweden)

    Koop Ben F

    2011-07-01

    Full Text Available Abstract Background The increased accessibility of gene expression tools has enabled a wide variety of experiments utilizing transcriptomic analyses. As these tools increase in prevalence, the need for improved standardization in processing and presentation of data increases, as does the need to guard against interpretation bias. Gene Ontology (GO analysis is a powerful method of interpreting and summarizing biological functions. However, while there are many tools available to investigate GO enrichment, there remains a need for methods that directly remove redundant terms from enriched GO lists that often provide little, if any, additional information. Findings Here we present a simple yet novel method called GO Trimming that utilizes an algorithm designed to reduce redundancy in lists of enriched GO categories. Depending on the needs of the user, this method can be performed with variable stringency. In the example presented here, an initial list of 90 terms was reduced to 54, eliminating 36 largely redundant terms. We also compare this method to existing methods and find that GO Trimming, while simple, performs well to eliminate redundant terms in a large dataset throughout the depth of the GO hierarchy. Conclusions The GO Trimming method provides an alternative to other procedures, some of which involve removing large numbers of terms prior to enrichment analysis. This method should free up the researcher from analyzing overly large, redundant lists, and instead enable the concise presentation of manageable, informative GO lists. The implementation of this tool is freely available at: http://lucy.ceh.uvic.ca/go_trimming/cbr_go_trimming.py

  2. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  3. A new dataset for systematic assessments of climate change impacts as a function of global warming

    Science.gov (United States)

    Heinke, J.; Ostberg, S.; Schaphoff, S.; Frieler, K.; M{ü}ller, C.; Gerten, D.; Meinshausen, M.; Lucht, W.

    2012-11-01

    In the ongoing political debate on climate change, global mean temperature change (ΔTglob) has become the yardstick by which mitigation costs, impacts from unavoided climate change, and adaptation requirements are discussed. For a scientifically informed discourse along these lines systematic assessments of climate change impacts as a function of ΔTglob are required. The current availability of climate change scenarios constrains this type of assessment to a~narrow range of temperature change and/or a reduced ensemble of climate models. Here, a newly composed dataset of climate change scenarios is presented that addresses the specific requirements for global assessments of climate change impacts as a function of ΔTglob. A pattern-scaling approach is applied to extract generalized patterns of spatially explicit change in temperature, precipitation and cloudiness from 19 AOGCMs. The patterns are combined with scenarios of global mean temperature increase obtained from the reduced-complexity climate model MAGICC6 to create climate scenarios covering warming levels from 1.5 to 5 degrees above pre-industrial levels around the year 2100. The patterns are shown to sufficiently maintain the original AOGCMs' climate change properties, even though they, necessarily, utilize a simplified relationships betweenΔTglob and changes in local climate properties. The dataset (made available online upon final publication of this paper) facilitates systematic analyses of climate change impacts as it covers a wider and finer-spaced range of climate change scenarios than the original AOGCM simulations.

  4. A new climate dataset for systematic assessments of climate change impacts as a function of global warming

    Science.gov (United States)

    Heinke, J.; Ostberg, S.; Schaphoff, S.; Frieler, K.; Müller, C.; Gerten, D.; Meinshausen, M.; Lucht, W.

    2013-10-01

    In the ongoing political debate on climate change, global mean temperature change (ΔTglob) has become the yardstick by which mitigation costs, impacts from unavoided climate change, and adaptation requirements are discussed. For a scientifically informed discourse along these lines, systematic assessments of climate change impacts as a function of ΔTglob are required. The current availability of climate change scenarios constrains this type of assessment to a narrow range of temperature change and/or a reduced ensemble of climate models. Here, a newly composed dataset of climate change scenarios is presented that addresses the specific requirements for global assessments of climate change impacts as a function of ΔTglob. A pattern-scaling approach is applied to extract generalised patterns of spatially explicit change in temperature, precipitation and cloudiness from 19 Atmosphere-Ocean General Circulation Models (AOGCMs). The patterns are combined with scenarios of global mean temperature increase obtained from the reduced-complexity climate model MAGICC6 to create climate scenarios covering warming levels from 1.5 to 5 degrees above pre-industrial levels around the year 2100. The patterns are shown to sufficiently maintain the original AOGCMs' climate change properties, even though they, necessarily, utilise a simplified relationships between ΔTglob and changes in local climate properties. The dataset (made available online upon final publication of this paper) facilitates systematic analyses of climate change impacts as it covers a wider and finer-spaced range of climate change scenarios than the original AOGCM simulations.

  5. Systematic Discovery of Chromatin-Bound Protein Complexes from ChIP-seq Datasets.

    Science.gov (United States)

    Giannopoulou, Eugenia; Elemento, Olivier

    2017-01-01

    Chromatin immunoprecipitation followed by sequencing is an invaluable assay for identifying the genomic binding sites of transcription factors. However, transcription factors rarely bind chromatin alone but often bind together with other cofactors, forming protein complexes. Here, we describe a computational method that integrates multiple ChIP-seq and RNA-seq datasets to discover protein complexes and determine their role as activators or repressors. This chapter outlines a detailed computational pipeline for discovering and predicting binding partners from ChIP-seq data and inferring their role in regulating gene expression. This work aims at developing hypotheses about gene regulation via binding partners and deciphering the combinatorial nature of DNA-binding proteins.

  6. Combining global land cover datasets to quantify agricultural expansion into forests in Latin America: Limitations and challenges.

    Directory of Open Access Journals (Sweden)

    Florence Pendrill

    Full Text Available While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001-2011, by combining two "state-of-the-art" global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover. We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010's Grassland class (which we interpret as pasture being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010's Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon. This further suggests that the approach taken here generally leads to an underestimation (of up to ~60% of the role of pasture in replacing forest. Second, a large share (~33% of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too

  7. A new climate dataset for systematic assessments of climate change impacts as a function of global warming

    Directory of Open Access Journals (Sweden)

    J. Heinke

    2013-10-01

    Full Text Available In the ongoing political debate on climate change, global mean temperature change (ΔTglob has become the yardstick by which mitigation costs, impacts from unavoided climate change, and adaptation requirements are discussed. For a scientifically informed discourse along these lines, systematic assessments of climate change impacts as a function of ΔTglob are required. The current availability of climate change scenarios constrains this type of assessment to a narrow range of temperature change and/or a reduced ensemble of climate models. Here, a newly composed dataset of climate change scenarios is presented that addresses the specific requirements for global assessments of climate change impacts as a function of ΔTglob. A pattern-scaling approach is applied to extract generalised patterns of spatially explicit change in temperature, precipitation and cloudiness from 19 Atmosphere–Ocean General Circulation Models (AOGCMs. The patterns are combined with scenarios of global mean temperature increase obtained from the reduced-complexity climate model MAGICC6 to create climate scenarios covering warming levels from 1.5 to 5 degrees above pre-industrial levels around the year 2100. The patterns are shown to sufficiently maintain the original AOGCMs' climate change properties, even though they, necessarily, utilise a simplified relationships between ΔTglob and changes in local climate properties. The dataset (made available online upon final publication of this paper facilitates systematic analyses of climate change impacts as it covers a wider and finer-spaced range of climate change scenarios than the original AOGCM simulations.

  8. The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis

    Directory of Open Access Journals (Sweden)

    Pepper Stuart D

    2008-09-01

    Full Text Available Abstract Background The number of gene expression studies in the public domain is rapidly increasing, representing a highly valuable resource. However, dataset-specific bias precludes meta-analysis at the raw transcript level, even when the RNA is from comparable sources and has been processed on the same microarray platform using similar protocols. Here, we demonstrate, using Affymetrix data, that much of this bias can be removed, allowing multiple datasets to be legitimately combined for meaningful meta-analyses. Results A series of validation datasets comparing breast cancer and normal breast cell lines (MCF7 and MCF10A were generated to examine the variability between datasets generated using different amounts of starting RNA, alternative protocols, different generations of Affymetrix GeneChip or scanning hardware. We demonstrate that systematic, multiplicative biases are introduced at the RNA, hybridization and image-capture stages of a microarray experiment. Simple batch mean-centering was found to significantly reduce the level of inter-experimental variation, allowing raw transcript levels to be compared across datasets with confidence. By accounting for dataset-specific bias, we were able to assemble the largest gene expression dataset of primary breast tumours to-date (1107, from six previously published studies. Using this meta-dataset, we demonstrate that combining greater numbers of datasets or tumours leads to a greater overlap in differentially expressed genes and more accurate prognostic predictions. However, this is highly dependent upon the composition of the datasets and patient characteristics. Conclusion Multiplicative, systematic biases are introduced at many stages of microarray experiments. When these are reconciled, raw data can be directly integrated from different gene expression datasets leading to new biological findings with increased statistical power.

  9. Systematically biological prioritizing remediation sites based on datasets of biological investigations and heavy metals in soil

    Science.gov (United States)

    Lin, Wei-Chih; Lin, Yu-Pin; Anthony, Johnathen

    2015-04-01

    Heavy metal pollution has adverse effects on not only the focal invertebrate species of this study, such as reduction in pupa weight and increased larval mortality, but also on the higher trophic level organisms which feed on them, either directly or indirectly, through the process of biomagnification. Despite this, few studies regarding remediation prioritization take species distribution or biological conservation priorities into consideration. This study develops a novel approach for delineating sites which are both contaminated by any of 5 readily bioaccumulated heavy metal soil contaminants and are of high ecological importance for the highly mobile, low trophic level focal species. The conservation priority of each site was based on the projected distributions of 6 moth species simulated via the presence-only maximum entropy species distribution model followed by the subsequent application of a systematic conservation tool. In order to increase the number of available samples, we also integrated crowd-sourced data with professionally-collected data via a novel optimization procedure based on a simulated annealing algorithm. This integration procedure was important since while crowd-sourced data can drastically increase the number of data samples available to ecologists, still the quality or reliability of crowd-sourced data can be called into question, adding yet another source of uncertainty in projecting species distributions. The optimization method screens crowd-sourced data in terms of the environmental variables which correspond to professionally-collected data. The sample distribution data was derived from two different sources, including the EnjoyMoths project in Taiwan (crowd-sourced data) and the Global Biodiversity Information Facility (GBIF) ?eld data (professional data). The distributions of heavy metal concentrations were generated via 1000 iterations of a geostatistical co-simulation approach. The uncertainties in distributions of the heavy

  10. Resolution and systematic limitations in beam based alignment

    Energy Technology Data Exchange (ETDEWEB)

    Tenenbaum, P.G.

    2000-03-15

    Beam based alignment of quadrupoles by variation of quadrupole strength is a widely-used technique in accelerators today. The authors describe the dominant systematic limitation of this technique, which arises from the change in the center position of the quadrupole as the strength is varied, and derive expressions for the resulting error. In addition, the authors derive an expression for the statistical resolution of such techniques in a periodic transport line, given knowledge of the line's transport matrices, the resolution of the beam position monitor system, and the details of the strength variation procedure. These results are applied to the Next Linear Collider main linear accelerator, an 11 kilometer accelerator containing 750 quadrupoles and 5,000 accelerator structures. The authors find that in principle a statistical resolution of 1 micron is easily achievable but the systematic error due to variation of the magnetic centers could be several times larger.

  11. OA 2014-5 Dataset - Limited Entry and Open Access cost earnings survey collecting 2014-15 data

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This project collects economic data from vessel owners participating in the West Coast limited entry fixed gear and open access groundfish, salmon, crab, and shrimp...

  12. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  13. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2...... conducted the sample preparation and liquid chromatography mass spectrometry (LC-MS/MS) analysis of all samples in one batch, enabling label-free comparison between all biopsies. The datasets are made publicly available to enable critical or extended analyses. The proteomics data and search results, have...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  14. Calculation of the detection limit in radiation measurements with systematic uncertainties

    Energy Technology Data Exchange (ETDEWEB)

    Kirkpatrick, J.M., E-mail: john.kirkpatrick@canberra.com; Russ, W.; Venkataraman, R.; Young, B.M.

    2015-06-01

    The detection limit (L{sub D}) or Minimum Detectable Activity (MDA) is an a priori evaluation of assay sensitivity intended to quantify the suitability of an instrument or measurement arrangement for the needs of a given application. Traditional approaches as pioneered by Currie rely on Gaussian approximations to yield simple, closed-form solutions, and neglect the effects of systematic uncertainties in the instrument calibration. These approximations are applicable over a wide range of applications, but are of limited use in low-count applications, when high confidence values are required, or when systematic uncertainties are significant. One proposed modification to the Currie formulation attempts account for systematic uncertainties within a Gaussian framework. We have previously shown that this approach results in an approximation formula that works best only for small values of the relative systematic uncertainty, for which the modification of Currie's method is the least necessary, and that it significantly overestimates the detection limit or gives infinite or otherwise non-physical results for larger systematic uncertainties where such a correction would be the most useful. We have developed an alternative approach for calculating detection limits based on realistic statistical modeling of the counting distributions which accurately represents statistical and systematic uncertainties. Instead of a closed form solution, numerical and iterative methods are used to evaluate the result. Accurate detection limits can be obtained by this method for the general case.

  15. Upper limit for Poisson variable incorporating systematic uncertainties by Bayesian approach

    International Nuclear Information System (INIS)

    Zhu, Yongsheng

    2007-01-01

    To calculate the upper limit for the Poisson observable at given confidence level with inclusion of systematic uncertainties in background expectation and signal efficiency, formulations have been established along the line of Bayesian approach. A FORTRAN program, BPULE, has been developed to implement the upper limit calculation

  16. Limitations of previously published systematic reviews evaluating the outcome of endodontic treatment

    NARCIS (Netherlands)

    Wu, M.K.; Shemesh, H.; Wesselink, P.R.

    2009-01-01

    The aim of this work was to identify the limitations of previously published systematic reviews evaluating the outcome of root canal treatment. Traditionally, periapical radiography has been used to assess the outcome of root canal treatment with the absence of a periapical radiolucency being

  17. How Limited Systematicity Emerges: A Computational Cognitive Neuroscience Approach (Author’s Manuscript)

    Science.gov (United States)

    2014-09-01

    plausibility. It should be pointed out though that systematicity in ACT-R is limited both by the need to acquire the skills and knowledge needed to... inferential chain. All tokens of a symbol in logic must have identical meaning throughout the proof or else it is not a valid proof. Despite their natural...First, neurons do not communicate with symbols, despite the inevitable urge to think of them in this way (O’Reilly, 2010). Spikes are completely

  18. Accessible Home Environments for People with Functional Limitations: A Systematic Review.

    Science.gov (United States)

    Cho, Hea Young; MacLachlan, Malcolm; Clarke, Michael; Mannan, Hasheem

    2016-08-17

    The aim of this review is to evaluate the health and social effects of accessible home environments for people with functional limitations, in order to provide evidence to promote well-informed decision making for policy guideline development and choices about public health interventions. MEDLINE and nine other electronic databases were searched between December 2014 and January 2015, for articles published since 2004. All study types were included in this review. Two reviewers independently screened 12,544 record titles or titles and abstracts based on our pre-defined eligibility criteria. We identified 94 articles as potentially eligible; and assessed their full text. Included studies were critically appraised using the Mixed Method Appraisal Tool, version 2011. Fourteen studies were included in the review. We did not identify any meta-analysis or systematic review directly relevant to the question for this systematic review. A narrative approach was used to synthesise the findings of the included studies due to methodological and statistical heterogeneity. Results suggest that certain interventions to enhance the accessibility of homes can have positive health and social effects. Home environments that lack accessibility modifications appropriate to the needs of their users are likely to result in people with physical impairments becoming disabled at home.

  19. Palliative Oncologic Care Curricula for Providers in Resource-Limited and Underserved Communities: a Systematic Review.

    Science.gov (United States)

    Xu, Melody J; Su, David; Deboer, Rebecca; Garcia, Michael; Tahir, Peggy; Anderson, Wendy; Kinderman, Anne; Braunstein, Steve; Sherertz, Tracy

    2017-12-20

    Familiarity with principles of palliative care, supportive care, and palliative oncological treatment is essential for providers caring for cancer patients, though this may be challenging in global communities where resources are limited. Herein, we describe the scope of literature on palliative oncological care curricula for providers in resource-limited settings. A systematic literature review was conducted using PubMed, Embase, Cochrane Library, Web of Science, Cumulative Index to Nursing and Allied Health Literature, Med Ed Portal databases, and gray literature. All available prospective cohort studies, case reports, and narratives published up to July 2017 were eligible for review. Fourteen articles were identified and referenced palliative care education programs in Argentina, Uganda, Kenya, Australia, Germany, the USA, or multiple countries. The most common teaching strategy was lecture-based, followed by mentorship and experiential learning involving role play and simulation. Education topics included core principles of palliative care, pain and symptom management, and communication skills. Two programs included additional topics specific to the underserved or American Indian/Alaskan Native community. Only one program discussed supportive cancer care, and no program reported educational content on resource-stratified decision-making for palliative oncological treatment. Five programs reported positive participant satisfaction, and three programs described objective metrics of increased educational or research activity. There is scant literature on effective curricula for providers treating cancer patients in resource-limited settings. Emphasizing supportive cancer care and palliative oncologic treatments may help address gaps in education; increased outcome reporting may help define the impact of palliative care curriculum within resource-limited communities.

  20. A systematic review of portable electronic technology for health education in resource-limited settings.

    Science.gov (United States)

    McHenry, Megan S; Fischer, Lydia J; Chun, Yeona; Vreeman, Rachel C

    2017-08-01

    The objective of this study is to conduct a systematic review of the literature of how portable electronic technologies with offline functionality are perceived and used to provide health education in resource-limited settings. Three reviewers evaluated articles and performed a bibliography search to identify studies describing health education delivered by portable electronic device with offline functionality in low- or middle-income countries. Data extracted included: study population; study design and type of analysis; type of technology used; method of use; setting of technology use; impact on caregivers, patients, or overall health outcomes; and reported limitations. Searches yielded 5514 unique titles. Out of 75 critically reviewed full-text articles, 10 met inclusion criteria. Study locations included Botswana, Peru, Kenya, Thailand, Nigeria, India, Ghana, and Tanzania. Topics addressed included: development of healthcare worker training modules, clinical decision support tools, patient education tools, perceptions and usability of portable electronic technology, and comparisons of technologies and/or mobile applications. Studies primarily looked at the assessment of developed educational modules on trainee health knowledge, perceptions and usability of technology, and comparisons of technologies. Overall, studies reported positive results for portable electronic device-based health education, frequently reporting increased provider/patient knowledge, improved patient outcomes in both quality of care and management, increased provider comfort level with technology, and an environment characterized by increased levels of technology-based, informal learning situations. Negative assessments included high investment costs, lack of technical support, and fear of device theft. While the research is limited, portable electronic educational resources present promising avenues to increase access to effective health education in resource-limited settings, contingent

  1. Analysis of polyhydroxybutyrate flux limitations by systematic genetic and metabolic perturbations.

    Science.gov (United States)

    Tyo, Keith E J; Fischer, Curt R; Simeon, Fritz; Stephanopoulos, Gregory

    2010-05-01

    Poly-3-hydroxybutyrate (PHB) titers in Escherichia coli have benefited from 10+ years of metabolic engineering. In the majority of studies, PHB content, expressed as percent PHB (dry cell weight), is increased, although this increase can be explained by decreases in growth rate or increases in PHB flux. In this study, growth rate and PHB flux were quantified directly in response to systematic manipulation of (1) gene expression in the product-forming pathway and (2) growth rates in a nitrogen-limited chemostat. Gene expression manipulation revealed acetoacetyl-CoA reductase (phaB) limits flux to PHB, although overexpression of the entire pathway pushed the flux even higher. These increases in PHB flux are accompanied by decreases in growth rate, which can be explained by carbon diversion, rather than toxic effects of the PHB pathway. In chemostats, PHB flux was insensitive to growth rate. These results imply that PHB flux is primarily controlled by the expression levels of the product forming pathway and not by the availability of precursors. These results confirm prior in vitro measurements and metabolic models and show expression level is a major affecter of PHB flux. 2009 Elsevier Inc. All rights reserved.

  2. A systematic review of the efficacy and limitations of venous intervention in stasis ulceration.

    Science.gov (United States)

    Montminy, Myriam L; Jayaraj, Arjun; Raju, Seshadri

    2018-05-01

    Surgical techniques to address various components of chronic venous disease are rapidly evolving. Their efficacy and generally good results in treating superficial venous reflux (SVR) have been documented and compared in patients presenting with pain and swelling. A growing amount of literature is now available suggesting their efficacy in patients with venous leg ulcer (VLU). This review attempts to summarize the efficacy and limitations of commonly used venous interventions in the treatment of SVR and incompetent perforator veins (IPVs) in patients with VLU. A systematic review of the published literature was performed. Two different searches were conducted in MEDLINE, Embase, and EBSCOhost to identify studies that examined the efficacy of SVR ablation and IPV ablation on healing rate and recurrence rate of VLU. In the whole review, 1940 articles were screened. Of those, 45 were included in the SVR ablation review and 4 in the IPV ablation review. Data were too heterogeneous to perform an adequate meta-analysis. The quality of evidence assessed by the Grading of Recommendations Assessment, Development, and Evaluation for the two outcomes varied from very low to moderate. Ulcer healing rate and recurrence rate were between 70% and 100% and 0% and 49% in the SVR ablation review and between 59% and 93% and 4% and 33% in the IPV ablation review, respectively. To explain those variable results, limitations such as inadequate diagnostic techniques, saphenous size, concomitant calf pump dysfunction, and associated deep venous reflux are discussed. Currently available minimally invasive techniques correct most venous pathologic processes in chronic venous disease with a good sustainable healing rate. There are still specific diagnostic and efficacy limitations that mandate proper match of individual patients with the planned approach. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  3. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  4. Dataset of the first transcriptome assembly of the tree crop “yerba mate” (Ilex paraguariensis and systematic characterization of protein coding genes

    Directory of Open Access Journals (Sweden)

    Patricia M. Aguilera

    2018-04-01

    Full Text Available This contribution contains data associated to the research article entitled “Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly” (Debat et al., 2014 [1]. By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2 consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016 [2] Sequence Read Archive (SRA under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA under the accession GFHV00000000.

  5. Potential and Limitations of Cochrane Reviews in Pediatric Cardiology: A Systematic Analysis.

    Science.gov (United States)

    Poryo, Martin; Khosrawikatoli, Sara; Abdul-Khaliq, Hashim; Meyer, Sascha

    2017-04-01

    Evidence-based medicine has contributed substantially to the quality of medical care in pediatric and adult cardiology. However, our impression from the bedside is that a substantial number of Cochrane reviews generate inconclusive data that are of limited clinical benefit. We performed a systematic synopsis of Cochrane reviews published between 2001 and 2015 in the field of pediatric cardiology. Main outcome parameters were the number and percentage of conclusive, partly conclusive, and inconclusive reviews as well as their recommendations and their development over three a priori defined intervals. In total, 69 reviews were analyzed. Most of them examined preterm and term neonates (36.2%), whereas 33.3% included also non-pediatric patients. Leading topics were pharmacological issues (71.0%) followed by interventional (10.1%) and operative procedures (2.9%). The majority of reviews were inconclusive (42.9%), while 36.2% were conclusive and 21.7% partly conclusive. Although the number of published reviews increased during the three a priori defined time intervals, reviews with "no specific recommendations" remained stable while "recommendations in favor of an intervention" clearly increased. Main reasons for missing recommendations were insufficient data (n = 41) as well as an insufficient number of trials (n = 22) or poor study quality (n = 19). There is still need for high-quality research, which will likely yield a greater number of Cochrane reviews with conclusive results.

  6. Chances and Limitations of Video Games in the Fight against Childhood Obesity-A Systematic Review.

    Science.gov (United States)

    Mack, Isabelle; Bayer, Carolin; Schäffeler, Norbert; Reiband, Nadine; Brölz, Ellen; Zurstiege, Guido; Fernandez-Aranda, Fernando; Gawrilow, Caterina; Zipfel, Stephan

    2017-07-01

    A systematic literature search was conducted to assess the chances and limitations of video games to combat and prevent childhood obesity. This search included studies with video or computer games targeting nutrition, physical activity and obesity for children between 7 and 15 years of age. The study distinguished between games that aimed to (i) improve knowledge about nutrition, eating habits and exercise; (ii) increase physical activity; or (iii) combine both approaches. Overall, the games were well accepted. On a qualitative level, most studies reported positive effects on obesity-related outcomes (improvement of weight-related parameters, physical activity or dietary behaviour/knowledge). However, the observed effects were small. The games did not address psychosocial aspects. Using video games for weight management exclusively does not deliver satisfying results. Video games as an additional guided component of prevention and treatment programs have the potential to increase compliance and thus enhance treatment outcome. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association.

  7. Limiter

    Science.gov (United States)

    Cohen, S.A.; Hosea, J.C.; Timberlake, J.R.

    1984-10-19

    A limiter with a specially contoured front face is provided. The front face of the limiter (the plasma-side face) is flat with a central indentation. In addition, the limiter shape is cylindrically symmetric so that the limiter can be rotated for greater heat distribution. This limiter shape accommodates the various power scrape-off distances lambda p, which depend on the parallel velocity, V/sub parallel/, of the impacting particles.

  8. Limited evidence for effects of diet for type 2 diabetes from systematic reviews.

    NARCIS (Netherlands)

    Laar, F.A. van de; Akkermans, R.P.; Binsbergen, J.J. van

    2007-01-01

    OBJECTIVE: Systematic reviews are an appraised method to summarize research in a concise and transparent way, and may enable to draw conclusions beyond the sum of results of individual studies. We assessed the results, quality and external validity of systematic reviews on diet in patients with type

  9. Systematic reviewers commonly contact study authors but do so with limited rigor.

    Science.gov (United States)

    Mullan, Rebecca J; Flynn, David N; Carlberg, Bo; Tleyjeh, Imad M; Kamath, Celia C; LaBella, Matthew L; Erwin, Patricia J; Guyatt, Gordon H; Montori, Victor M

    2009-02-01

    Author contact can enhance the quality of systematic reviews. We conducted a systematic review of the practice of author contact in recently published systematic reviews to characterize its prevalence, quality, and results. Eligible studies were systematic reviews of efficacy published in 2005-2006 in the 25 journals with the highest impact factor publishing systematic reviews in clinical medicine and the Cochrane Library, identified by searching MEDLINE, EMBASE, and the Cochrane Library. Two researchers determined whether and why reviewers contacted authors. To assess the accuracy of the abstracted data, we surveyed reviewers by e-mail. Forty-six (50%) of the 93 eligible systematic reviews published in top journals and 46 (85%) of the 54 eligible Cochrane reviews reported contacting authors of eligible studies. Requests were made most commonly for missing information: 40 (76%) clinical medicine reviews and 45 (98%) Cochrane reviews. One hundred and nine of 147 (74%) reviewers responded to the survey, and reported a higher rate of author contact than apparent from the published record. Although common, author contact is not a universal feature of systematic reviews published in top journals and the Cochrane Library. The conduct and reporting of author contact purpose, procedures, and results require improvement.

  10. Network Intrusion Dataset Assessment

    Science.gov (United States)

    2013-03-01

    protection in the United States. AFIT-ENG-13-M-49 NETWORK INTRUSION DATASET ASSESSMENT THESIS Presented to the Faculty Department of Electrical and...conclusions as to its use as a benchmark dataset vary: Cho et al. [10] recommend not using the KDD99 dataset at all, while Engen et al. [16] suggest that...more care be taken in interpretation of results, but recommend continued use. As discussed by Engen et al. [16], researchers continue to use the KDD99

  11. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...

  12. Datasets2Tools, repository and search engine for bioinformatics datasets, tools and canned analyses

    Science.gov (United States)

    Torre, Denis; Krawczuk, Patrycja; Jagodnik, Kathleen M.; Lachmann, Alexander; Wang, Zichen; Wang, Lily; Kuleshov, Maxim V.; Ma'Ayan, Avi

    2018-02-01

    Biomedical data repositories such as the Gene Expression Omnibus (GEO) enable the search and discovery of relevant biomedical digital data objects. Similarly, resources such as OMICtools, index bioinformatics tools that can extract knowledge from these digital data objects. However, systematic access to pre-generated 'canned' analyses applied by bioinformatics tools to biomedical digital data objects is currently not available. Datasets2Tools is a repository indexing 31,473 canned bioinformatics analyses applied to 6,431 datasets. The Datasets2Tools repository also contains the indexing of 4,901 published bioinformatics software tools, and all the analyzed datasets. Datasets2Tools enables users to rapidly find datasets, tools, and canned analyses through an intuitive web interface, a Google Chrome extension, and an API. Furthermore, Datasets2Tools provides a platform for contributing canned analyses, datasets, and tools, as well as evaluating these digital objects according to their compliance with the findable, accessible, interoperable, and reusable (FAIR) principles. By incorporating community engagement, Datasets2Tools promotes sharing of digital resources to stimulate the extraction of knowledge from biomedical research data. Datasets2Tools is freely available from: http://amp.pharm.mssm.edu/datasets2tools.

  13. Systematic review of the limited evidence for different surgical techniques at benign hysterectomy

    DEFF Research Database (Denmark)

    Sloth, Sigurd Beier; Schroll, Jeppe Bennekou; Settnes, Annette

    2017-01-01

    guideline on the subject based on a systematic review of the literature. A guideline panel of seven gynecologists formulated the clinical questions for the guideline. A search specialist performed the comprehensive literature search. The guideline panel reviewed the literature and rated the quality...

  14. The limited prosocial effects of meditation: A systematic review and meta-analysis

    NARCIS (Netherlands)

    Kreplin, U.; Farias, M.; Brazil, I.A.

    2018-01-01

    Many individuals believe that meditation has the capacity to not only alleviate mental-illness but to improve prosociality. This article systematically reviewed and meta-analysed the effects of meditation interventions on prosociality in randomized controlled trials of healthy adults. Five types of

  15. Impairments, activity limitations and participation restrictions experienced in the first year following a critical illness: protocol for a systematic review.

    Science.gov (United States)

    Ohtake, Patricia J; Coffey Scott, Jacqueline; Hinman, Rana S; Lee, Alan Chong; Smith, James M

    2017-01-24

    Critical illness requiring intensive care unit (ICU) management is a life-altering event with ∼25% of ICU survivors experiencing persistent reductions in physical functioning, impairments in mental health, cognitive dysfunction and decreased quality of life. This constellation of problems is known as 'postintensive care syndrome' (PICS) and may persist for months and/or years. The purpose of this systematic review is to identify the scope and magnitude of physical problems associated with PICS during the first year after discharge from ICU, using the International Classification of Functioning, Disability and Health framework to elucidate the impairments of body functions and structures, activity limitations and participation restrictions. Medline (Ovid), Cochrane Database of Systematic Reviews (Ovid), Cochrane Central Register of Controlled Trials (Ovid), PubMed, CINAHL (EBSCO), Web of Science and EMBASE will be systematically searched for observational studies reporting the physical impairments of body functions and structures, activity limitations and participation restrictions associated with PICS. Two reviewers will assess the articles for eligibility according to prespecified selection criteria, after which an independent reviewer will perform data extraction which will be validated by a second independent reviewer. Quality appraisal will be performed by two independent reviewers. Outcomes of the included studies will be summarised in tables and in narrative format and meta-analyses will be conducted where appropriate. Formal ethical approval is not required as no primary data is collected. This systematic review will identify the scope and magnitude of physical problems associated with PICS during the first year after discharge from ICU and will be disseminated through a peer-reviewed publication and at conference meetings, to inform practice and future research on the physical problems associated with PICS. CRD42015023520. Published by the BMJ Publishing

  16. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  17. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  18. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  19. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  20. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  1. A systematic review and synthesis of the strengths and limitations of measuring malaria mortality through verbal autopsy.

    Science.gov (United States)

    Herrera, Samantha; Enuameh, Yeetey; Adjei, George; Ae-Ngibise, Kenneth Ayuurebobi; Asante, Kwaku Poku; Sankoh, Osman; Owusu-Agyei, Seth; Yé, Yazoume

    2017-10-23

    Lack of valid and reliable data on malaria deaths continues to be a problem that plagues the global health community. To address this gap, the verbal autopsy (VA) method was developed to ascertain cause of death at the population level. Despite the adoption and wide use of VA, there are many recognized limitations of VA tools and methods, especially for measuring malaria mortality. This study synthesizes the strengths and limitations of existing VA tools and methods for measuring malaria mortality (MM) in low- and middle-income countries through a systematic literature review. The authors searched PubMed, Cochrane Library, Popline, WHOLIS, Google Scholar, and INDEPTH Network Health and Demographic Surveillance System sites' websites from 1 January 1990 to 15 January 2016 for articles and reports on MM measurement through VA. article presented results from a VA study where malaria was a cause of death; article discussed limitations/challenges related to measurement of MM through VA. Two authors independently searched the databases and websites and conducted a synthesis of articles using a standard matrix. The authors identified 828 publications; 88 were included in the final review. Most publications were VA studies; others were systematic reviews discussing VA tools or methods; editorials or commentaries; and studies using VA data to develop MM estimates. The main limitation were low sensitivity and specificity of VA tools for measuring MM. Other limitations included lack of standardized VA tools and methods, lack of a 'true' gold standard to assess accuracy of VA malaria mortality. Existing VA tools and methods for measuring MM have limitations. Given the need for data to measure progress toward the World Health Organization's Global Technical Strategy for Malaria 2016-2030 goals, the malaria community should define strategies for improving MM estimates, including exploring whether VA tools and methods could be further improved. Longer term strategies should focus

  2. Maternal health interventions in resource limited countries: a systematic review of packages, impacts and factors for change

    Science.gov (United States)

    2011-01-01

    Background The burden of maternal mortality in resource limited countries is still huge despite being at the top of the global public health agenda for over the last 20 years. We systematically reviewed the impacts of interventions on maternal health and factors for change in these countries. Methods A systematic review was carried out using the guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Articles published in the English language reporting on implementation of interventions, their impacts and underlying factors for maternal health in resource limited countries in the past 23 years were searched from PubMed, Popline, African Index Medicus, internet sources including reproductive health gateway and Google, hand-searching, reference lists and grey literature. Results Out of a total of 5084 articles resulting from the search only 58 qualified for systematic review. Programs integrating multiple interventions were more likely to have significant positive impacts on maternal outcomes. Training in emergency obstetric care (EmOC), placement of care providers, refurbishment of existing health facility infrastructure and improved supply of drugs, consumables and equipment for obstetric care were the most frequent interventions integrated in 52% - 65% of all 54 reviewed programs. Statistically significant reduction of maternal mortality ratio and case fatality rate were reported in 55% and 40% of the programs respectively. Births in EmOC facilities and caesarean section rates increased significantly in 71% - 75% of programs using these indicators. Insufficient implementation of evidence-based interventions in resources limited countries was closely linked to a lack of national resources, leadership skills and end-users factors. Conclusions This article presents a list of evidenced-based packages of interventions for maternal health, their impacts and factors for change in resource limited countries. It indicates that no single

  3. Maternal health interventions in resource limited countries: a systematic review of packages, impacts and factors for change

    Directory of Open Access Journals (Sweden)

    Urassa David P

    2011-04-01

    Full Text Available Abstract Background The burden of maternal mortality in resource limited countries is still huge despite being at the top of the global public health agenda for over the last 20 years. We systematically reviewed the impacts of interventions on maternal health and factors for change in these countries. Methods A systematic review was carried out using the guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA. Articles published in the English language reporting on implementation of interventions, their impacts and underlying factors for maternal health in resource limited countries in the past 23 years were searched from PubMed, Popline, African Index Medicus, internet sources including reproductive health gateway and Google, hand-searching, reference lists and grey literature. Results Out of a total of 5084 articles resulting from the search only 58 qualified for systematic review. Programs integrating multiple interventions were more likely to have significant positive impacts on maternal outcomes. Training in emergency obstetric care (EmOC, placement of care providers, refurbishment of existing health facility infrastructure and improved supply of drugs, consumables and equipment for obstetric care were the most frequent interventions integrated in 52% - 65% of all 54 reviewed programs. Statistically significant reduction of maternal mortality ratio and case fatality rate were reported in 55% and 40% of the programs respectively. Births in EmOC facilities and caesarean section rates increased significantly in 71% - 75% of programs using these indicators. Insufficient implementation of evidence-based interventions in resources limited countries was closely linked to a lack of national resources, leadership skills and end-users factors. Conclusions This article presents a list of evidenced-based packages of interventions for maternal health, their impacts and factors for change in resource limited countries

  4. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  5. Criminal systematic and limits of Proposals Functionalists ( Weightings About Warranties , Citizenship and Human Rights

    Directory of Open Access Journals (Sweden)

    Felipe Augusto Forte de Negreiros Deodat

    2016-06-01

    Full Text Available Make a critique of functionalism means looking at the history of the construction of the penal systems. It is observed that the rigor of analysis is something that is imposed when we have a system as a tool work. It is essential for that what now arises in legal and criminal terms sees as the study of criminal law should be increasingly precise and also closer to the idea of human dignity. It will also be built a criticism for the two doctrines that have changed the face of the first systematic, designed in the nineteenth, which will allow us to see more accurately what can, or even should, be changed. One cannot help but praise the normativism, especially what received the indelible strengthening of the cultivators of criminal science of the past half century.

  6. The Systematic Evaluation of Instruments Designed to Assess Pain in Persons with Limited Ability to Communicate*

    Directory of Open Access Journals (Sweden)

    Michèle Aubin

    2007-01-01

    Full Text Available Chronic pain is often underdetected and undertreated in long-term care facilities. The use of self-report measures of pain (such as the visual analogue scale is often problematic for older adults residing in long-term care because of the high prevalence of visual and auditory deficits and severe cognitive impairment. Observational measures of pain have been developed to address this concern. A systematic grid designed to assess the properties of existing observational measures of pain was used for seniors with dementia. The grid focused on the evaluation of content validity (12 items, construct validity (12 items, reliability (13 items and clinical utility (10 items. Among the 24 instruments that were evaluated, several were deemed to be promising in the assessment of pain among older persons with severe dementia. Nonetheless, additional research is needed before their routine integration in the practices of long-term care settings.

  7. Assessment of activity limitations and participation restrictions with persons with chronic fatigue syndrome: a systematic review.

    Science.gov (United States)

    Vergauwen, Kuni; Huijnen, Ivan P J; Kos, Daphne; Van de Velde, Dominique; van Eupen, Inge; Meeus, Mira

    2015-01-01

    To summarize measurement instruments used to evaluate activity limitations and participation restrictions in patients with chronic fatigue syndrome (CFS) and review the psychometric properties of these instruments. General information of all included measurement instruments was extracted. The methodological quality was evaluated using the COSMIN checklist. Results of the measurement properties were rated based on the quality criteria of Terwee et al. Finally, overall quality was defined per psychometric property and measurement instrument by use of the quality criteria by Schellingerhout et al. A total of 68 articles were identified of which eight evaluated the psychometric properties of a measurement instrument assessing activity limitations and participation restrictions. One disease-specific and 37 generic measurement instruments were found. Limited evidence was found for the psychometric properties and clinical usability of these instruments. However, the CFS-activities and participation questionnaire (APQ) is a disease-specific instrument with moderate content and construct validity. The psychometric properties of the reviewed measurement instruments to evaluate activity limitations and participation restrictions are not sufficiently evaluated. Future research is needed to evaluate the psychometric properties of the measurement instruments, including the other properties of the CFS-APQ. If it is necessary to use a measurement instrument, the CFS-APQ is recommended. Chronic fatigue syndrome (CFS). Chronic fatigue syndrome causes activity limitations and participation restrictions in one or more areas of life. Standardized, reliable and valid measurement instruments are necessary to identify these limitations and restrictions. Currently, no measurement instrument is sufficiently evaluated with persons with CFS. If a measurement instrument is needed to identify activity limitations and participation restrictions with persons with CFS, it is recommended to use

  8. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  9. Resistance training for activity limitations in older adults with skeletal muscle function deficits: a systematic review

    Directory of Open Access Journals (Sweden)

    Papa EV

    2017-06-01

    Full Text Available Evan V Papa,1 Xiaoyang Dong,2 Mahdi Hassan1 1Department of Rehabilitation Medicine, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi Province, People’s Republic of China; 2Department of Physical Therapy, University of North Texas Health Science Center, Fort Worth, TX, USA Abstract: Human aging results in a variety of changes to skeletal muscle. Sarcopenia is the age-associated loss of muscle mass and is one of the main contributors to musculoskeletal impairments in the elderly. Previous research has demonstrated that resistance training can attenuate skeletal muscle function deficits in older adults, however few articles have focused on the effects of resistance training on functional mobility. The purpose of this systematic review was to 1 present the current state of literature regarding the effects of resistance training on functional mobility outcomes for older adults with skeletal muscle function deficits and 2 provide clinicians with practical guidelines that can be used with seniors during resistance training, or to encourage exercise. We set forth evidence that resistance training can attenuate age-related changes in functional mobility, including improvements in gait speed, static and dynamic balance, and fall risk reduction. Older adults should be encouraged to participate in progressive resistance training activities, and should be admonished to move along a continuum of exercise from immobility, toward the recommended daily amounts of activity. Keywords: aging, strength training, sarcopenia, mobility, balance

  10. TESTING THE RELIABILITY OF CLUSTER MASS INDICATORS WITH A SYSTEMATICS LIMITED DATA SET

    International Nuclear Information System (INIS)

    Juett, Adrienne M.; Mushotzky, Richard; Davis, David S.

    2010-01-01

    We present the mass-X-ray observable scaling relationships for clusters of galaxies using the XMM-Newton cluster catalog of Snowden et al. Our results are roughly consistent with previous observational and theoretical work, with one major exception. We find two to three times the scatter around the best-fit mass scaling relationships as expected from cluster simulations or seen in other observational studies. We suggest that this is a consequence of using hydrostatic mass, as opposed to virial mass, and is due to the explicit dependence of the hydrostatic mass on the gradients of the temperature and gas density profiles. We find a larger range of slope in the cluster temperature profiles at r 500 than previous observational studies. Additionally, we find only a weak dependence of the gas mass fraction on cluster mass, consistent with a constant. Our average gas mass fraction results argue for a closer study of the systematic errors due to instrumental calibration and analysis method variations. We suggest that a more careful study of the differences between various observational results and with cluster simulations is needed to understand sources of bias and scatter in cosmological studies of galaxy clusters.

  11. Systematic investigation of NLTE phenomena in the limit of small departures from LTE

    International Nuclear Information System (INIS)

    Libby, S.B.; Graziani, F.R.; More, R.M.; Kato, T.

    1997-01-01

    In this paper, we begin a systematic study of Non-Local Thermal Equilibrium (NLTE) phenomena in near equilibrium (LTE) high energy density, highly radiative plasmas. It is shown that the principle of minimum entropy production rate characterizes NLTE steady states for average atom rate equations in the case of small departures form LTE. With the aid of a novel hohlraum-reaction box thought experiment, we use the principles of minimum entropy production and detailed balance to derive Onsager reciprocity relations for the NLTE responses of a near equilibrium sample to non-Planckian perturbations in different frequency groups. This result is a significant symmetry constraint on the linear corrections to Kirchoff close-quote s law. We envisage applying our strategy to a number of test problems which include: the NLTE corrections to the ionization state of an ion located near the edge of an otherwise LTE medium; the effect of a monochromatic radiation field perturbation on an LTE medium; the deviation of Rydberg state populations from LTE in recombining or ionizing plasmas; multi-electron temperature models such as that of Busquet; and finally, the effect of NLTE population shifts on opacity models. copyright 1997 American Institute of Physics

  12. Navigating Language Barriers: A Systematic Review of Patient Navigators' Impact on Cancer Screening for Limited English Proficient Patients.

    Science.gov (United States)

    Genoff, Margaux C; Zaballa, Alexandra; Gany, Francesca; Gonzalez, Javier; Ramirez, Julia; Jewell, Sarah T; Diamond, Lisa C

    2016-04-01

    To systematically review the literature on the impact of patient navigators on cancer screening for limited English proficient (LEP) patients. Electronic databases (PubMed, PsycINFO via OVID, Web of Science, Cochrane, EMBASE, and Scopus) through 8 May 2015. Articles in this review had: (1) a study population of LEP patients eligible for breast, cervical or colorectal cancer screenings, (2) a patient navigator intervention to provide services prior to or during cancer screening, (3) a comparison of the patient navigator intervention to either a control group or another intervention, and (4) language-specific outcomes related to the patient navigator intervention. We assessed the quality of the articles using the Downs and Black Scale. Fifteen studies met the inclusion criteria and evaluated the screening rates for breast, colorectal, and cervical cancer in 15 language populations. Fourteen studies resulted in improved screening rates for LEP patients between 7 and 60%. There was great variability in the patient navigation interventions evaluated. Training received by navigators was not reported in nine of the studies and no studies assessed the language skills of the patient navigators in English or the target language. This study is limited by the variability in study designs and limited reporting on patient navigator interventions, which reduces the ability to draw conclusions on the full effect of patient navigators. Overall, we found evidence that navigators improved screening rates for breast, cervical and colorectal cancer screening for LEP patients. Future studies should systematically collect data on the training curricula for navigators and assess their English and non-English language skills in order to identify ways to reduce disparities for LEP patients.

  13. Isfahan MISP Dataset

    OpenAIRE

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled ?biosigdata.com.? It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their pr...

  14. Limited evidence for calcium supplementation in preeclampsia prevention: a meta-analysis and systematic review.

    Science.gov (United States)

    Tang, Reuben; Tang, Ing Ching; Henry, Amanda; Welsh, Alec

    2015-05-01

    This article synthesises evidence for calcium supplementation in preeclampsia prevention. Major databases and trial registries were searched, and comparisons were made against other meta-analyses. Calcium supplementation reduced the overall risk of preeclampsia in 10 trials (n = 24 787; risk ratio (RR) 0.62; 95% confidence interval [CI] 0.47-0.81). Its effect was larger in two subgroups: low-baseline calcium intake (RR 0.42 [0.23-0.76]) and increased risk of developing hypertensive disorders (RR 0.36 [0.10-0.98]). This effect was not significant amongst larger studies (RR 0.93 [0.83-1.04]). Funnel plotting suggested possible publication bias. Some evidence for calcium supplementation exists, but its utility is limited by the possibility of publication bias and a lack of large trials.

  15. Systematic review of power mobility outcomes for infants, children and adolescents with mobility limitations.

    Science.gov (United States)

    Livingstone, Roslyn; Field, Debra

    2014-10-01

    To summarize and critically appraise the evidence related to power mobility use in children (18 years or younger) with mobility limitations. Searches were performed in 12 electronic databases along with hand searching for articles published in English to September 2012 and updated February 2014. The search was restricted to quantitative studies including at least one child with a mobility limitation and measuring an outcome related to power mobility device use. Articles were appraised using American Academy of Cerebral Palsy and Developmental Medicine (AACPDM) criteria for group and single-subject designs. The PRISMA statement was followed with inclusion criteria set a priori. Two reviewers independently screened titles, abstracts and full-text articles. AACPDM quality ratings were completed for levels I-III studies. Of 259 titles, 29 articles met inclusion criteria, describing 28 primary research studies. One study, rated as strong level II evidence, supported positive impact of power mobility on overall development as well as independent mobility. Another study, rated as moderate level III evidence, supported positive impact on self-initiated movement. Remaining studies, rated evidence levels IV and V, provided support for a positive impact on a broad range of outcomes from to International Classification of Functioning (ICF) components of body structure and function, activity and participation. Some studies suggest that environmental factors may be influential in successful power mobility use and skill development. The body of evidence supporting outcomes for children using power mobility is primarily descriptive rather than experimental in nature, suggesting research in this area is in its infancy. © The Author(s) 2014.

  16. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication.

    Science.gov (United States)

    Moorhead, S Anne; Hazlett, Diane E; Harrison, Laura; Carroll, Jennifer K; Irwin, Anthea; Hoving, Ciska

    2013-04-23

    There is currently a lack of information about the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals from primary research. To review the current published literature to identify the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals, and identify current gaps in the literature to provide recommendations for future health communication research. This paper is a review using a systematic approach. A systematic search of the literature was conducted using nine electronic databases and manual searches to locate peer-reviewed studies published between January 2002 and February 2012. The search identified 98 original research studies that included the uses, benefits, and/or limitations of social media for health communication among the general public, patients, and health professionals. The methodological quality of the studies assessed using the Downs and Black instrument was low; this was mainly due to the fact that the vast majority of the studies in this review included limited methodologies and was mainly exploratory and descriptive in nature. Seven main uses of social media for health communication were identified, including focusing on increasing interactions with others, and facilitating, sharing, and obtaining health messages. The six key overarching benefits were identified as (1) increased interactions with others, (2) more available, shared, and tailored information, (3) increased accessibility and widening access to health information, (4) peer/social/emotional support, (5) public health surveillance, and (6) potential to influence health policy. Twelve limitations were identified, primarily consisting of quality concerns and lack of reliability, confidentiality, and privacy. Social media brings a new dimension to health care as it offers a medium to be used by the public, patients, and health

  17. Systematic review of electronic surveillance of infectious diseases with emphasis on antimicrobial resistance surveillance in resource-limited settings.

    Science.gov (United States)

    Rattanaumpawan, Pinyo; Boonyasiri, Adhiratha; Vong, Sirenda; Thamlikitkul, Visanu

    2018-02-01

    Electronic surveillance of infectious diseases involves rapidly collecting, collating, and analyzing vast amounts of data from interrelated multiple databases. Although many developed countries have invested in electronic surveillance for infectious diseases, the system still presents a challenge for resource-limited health care settings. We conducted a systematic review by performing a comprehensive literature search on MEDLINE (January 2000-December 2015) to identify studies relevant to electronic surveillance of infectious diseases. Study characteristics and results were extracted and systematically reviewed by 3 infectious disease physicians. A total of 110 studies were included. Most surveillance systems were developed and implemented in high-income countries; less than one-quarter were conducted in low-or middle-income countries. Information technologies can be used to facilitate the process of obtaining laboratory, clinical, and pharmacologic data for the surveillance of infectious diseases, including antimicrobial resistance (AMR) infections. These novel systems require greater resources; however, we found that using electronic surveillance systems could result in shorter times to detect targeted infectious diseases and improvement of data collection. This study highlights a lack of resources in areas where an effective, rapid surveillance system is most needed. The availability of information technology for the electronic surveillance of infectious diseases, including AMR infections, will facilitate the prevention and containment of such emerging infectious diseases. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  18. A systematic review of the sleep, sleepiness, and performance implications of limited wake shift work schedules.

    Science.gov (United States)

    Short, Michelle A; Agostini, Alexandra; Lushington, Kurt; Dorrian, Jillian

    2015-09-01

    The aim of this review was to identify which limited wake shift work schedules (LWSW) best promote sleep, alertness, and performance. LWSW are fixed work/rest cycles where the time-at-work does is ≤8 hours and there is >1 rest period per day, on average, for ≥2 consecutive days. These schedules are commonly used in safety-critical industries such as transport and maritime industries. Literature was sourced using PubMed, Embase, PsycInfo, Scopus, and Google Scholar databases. We identified 20 independent studies (plus a further 2 overlapping studies), including 5 laboratory and 17 field-based studies focused on maritime watch keepers, ship bridge officers, and long-haul train drivers. The measurement of outcome measures was varied, incorporating subjective and objective measures of sleep: sleep diaries (N=5), actigraphy (N=4), and polysomnography, (N=3); sleepiness: Karolinska Sleepiness Scale (N=5), visual analog scale (VAS) alertness (N=2) and author-derived measures (N=2); and performance: Psychomotor Vigilance Test (PVT) (N=5), Reaction Time or Vigilance tasks (N=4), Vector and Letter Cancellation Test (N=1), and subjective performance (N=2). Of the three primary rosters examined (6 hours-on/6 hours-off, 8 hours-on/8 hours-off and 4 hours-on/8 hours-off), the 4 hours-on/8 hours-off roster was associated with better sleep and lower levels of sleepiness. Individuals working 4 hours-on/8 hours-off rosters averaged 1 hour more sleep per night than those working 6 hours-on/6 hours-off and 1.3 hours more sleep than those working 8 hours-on/8 hours-off (Pworkplace as they facilitate at least some sleep during the biological night and minimize deficits associated with time-on-shift with shorter shifts. Overall, the 4 hour-on/8 hour-off roster best promoted sleep and minimized sleepiness compared to other LWSW schedules. Nevertheless, and considering the safety-critical nature of industries which employ LWSW, the limited literature needs to be greatly expanded with

  19. Molecular systematics and species limits in the Philippine fantails (Aves: Rhipidura).

    Science.gov (United States)

    Sánchez-González, Luis A; Moyle, Robert G

    2011-11-01

    Islands have long-attracted scientists because of their relatively simple biotas and stark geographic boundaries. However, for many islands and archipelagos, this simplicity may be overstated because of methodological and conceptual limitations when these biotas were described. One archipelago that has received relatively little recent attention is the Philippine islands. Although much of its biota was documented long ago, taxonomic revision and evolutionary study has been surprisingly scarce, and only a few molecular phylogenetic studies are beginning to appear. We present a molecular phylogeny and taxonomic revision for the Philippine fantails (Aves: Rhipidura) using nuclear and mitochondrial DNA sequences. Our results suggest that current taxonomy underestimates diversity in the group. Some morphologically distinct subspecies warrant species status, whereas one was indistinguishable genetically and morphologically and should not be retained. A few taxa require additional sampling for thorough taxonomic assessment. Patterns of diversity within Philippine Rhipidura mostly corroborate predictions of the Pleistocene aggregate island complex (PAIC) hypothesis, in which diversity is expected to be partitioned by deep water channels separating Pleistocene aggregate islands rather than by current islands. Substantial structure within PAIC clades indicates that additional drivers of diversification should be considered. Copyright © 2011 Elsevier Inc. All rights reserved.

  20. Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic approach to study the limits of detection.

    Science.gov (United States)

    van de Ven, Stephanie M W Y; Bemis, Kyle D; Lau, Kenneth; Adusumilli, Ravali; Kota, Uma; Stolowitz, Mark; Vitek, Olga; Mallick, Parag; Gambhir, Sanjiv S

    2016-06-01

    MALDI mass spectrometry imaging (MSI) is emerging as a tool for protein and peptide imaging across tissue sections. Despite extensive study, there does not yet exist a baseline study evaluating the potential capabilities for this technique to detect diverse proteins in tissue sections. In this study, we developed a systematic approach for characterizing MALDI-MSI workflows in terms of limits of detection, coefficients of variation, spatial resolution, and the identification of endogenous tissue proteins. Our goal was to quantify these figures of merit for a number of different proteins and peptides, in order to gain more insight in the feasibility of protein biomarker discovery efforts using this technique. Control proteins and peptides were deposited in serial dilutions on thinly sectioned mouse xenograft tissue. Using our experimental setup, coefficients of variation were biomarkers and a new benchmarking strategy that can be used for comparing diverse MALDI-MSI workflows. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  2. Parental limited English proficiency and health outcomes for children with special health care needs: a systematic review.

    Science.gov (United States)

    Eneriz-Wiemer, Monica; Sanders, Lee M; Barr, Donald A; Mendoza, Fernando S

    2014-01-01

    One in 10 US adults of childbearing age has limited English proficiency (LEP). Parental LEP is associated with worse health outcomes among healthy children. The relationship of parental LEP to health outcomes for children with special health care needs (CSHCN) has not been systematically reviewed. To conduct a systematic review of peer-reviewed literature examining relationships between parental LEP and health outcomes for CSHCN. PubMed, Scopus, Cochrane Library, Social Science Abstracts, bibliographies of included studies. Key search term categories: language, child, special health care needs, and health outcomes. US studies published between 1964 and 2012 were included if: 1) subjects were CSHCN; 2) studies included some measure of parental LEP; 3) at least 1 outcome measure of child health status, access, utilization, costs, or quality; and 4) primary or secondary data analysis. Three trained reviewers independently screened studies and extracted data. Two separate reviewers appraised studies for methodological rigor and quality. From 2765 titles and abstracts, 31 studies met eligibility criteria. Five studies assessed child health status, 12 assessed access, 8 assessed utilization, 2 assessed costs, and 14 assessed quality. Nearly all (29 of 31) studies used only parent- or child-reported outcome measures, rather than objective measures. LEP parents were substantially more likely than English-proficient parents to report that their CSHCN were uninsured and had no usual source of care or medical home. LEP parents were also less likely to report family-centered care and satisfaction with care. Disparities persisted for children with LEP parents after adjustment for ethnicity and socioeconomic status. Parental LEP is independently associated with worse health care access and quality for CSHCN. Health care providers should recognize LEP as an independent risk factor for poor health outcomes among CSHCN. Emerging models of chronic disease care should integrate and

  3. Data Integration for Heterogenous Datasets.

    Science.gov (United States)

    Hendler, James

    2014-12-01

    More and more, the needs of data analysts are requiring the use of data outside the control of their own organizations. The increasing amount of data available on the Web, the new technologies for linking data across datasets, and the increasing need to integrate structured and unstructured data are all driving this trend. In this article, we provide a technical overview of the emerging "broad data" area, in which the variety of heterogeneous data being used, rather than the scale of the data being analyzed, is the limiting factor in data analysis efforts. The article explores some of the emerging themes in data discovery, data integration, linked data, and the combination of structured and unstructured data.

  4. A New Dimension of Health Care: Systematic Review of the Uses, Benefits, and Limitations of Social Media for Health Communication

    Science.gov (United States)

    Hazlett, Diane E; Harrison, Laura; Carroll, Jennifer K; Irwin, Anthea; Hoving, Ciska

    2013-01-01

    Background There is currently a lack of information about the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals from primary research. Objective To review the current published literature to identify the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals, and identify current gaps in the literature to provide recommendations for future health communication research. Methods This paper is a review using a systematic approach. A systematic search of the literature was conducted using nine electronic databases and manual searches to locate peer-reviewed studies published between January 2002 and February 2012. Results The search identified 98 original research studies that included the uses, benefits, and/or limitations of social media for health communication among the general public, patients, and health professionals. The methodological quality of the studies assessed using the Downs and Black instrument was low; this was mainly due to the fact that the vast majority of the studies in this review included limited methodologies and was mainly exploratory and descriptive in nature. Seven main uses of social media for health communication were identified, including focusing on increasing interactions with others, and facilitating, sharing, and obtaining health messages. The six key overarching benefits were identified as (1) increased interactions with others, (2) more available, shared, and tailored information, (3) increased accessibility and widening access to health information, (4) peer/social/emotional support, (5) public health surveillance, and (6) potential to influence health policy. Twelve limitations were identified, primarily consisting of quality concerns and lack of reliability, confidentiality, and privacy. Conclusions Social media brings a new dimension to health care as it offers a

  5. Limits to modern contraceptive use among young women in developing countries: a systematic review of qualitative research

    Directory of Open Access Journals (Sweden)

    Wight Daniel

    2009-02-01

    Full Text Available Abstract Background Improving the reproductive health of young women in developing countries requires access to safe and effective methods of fertility control, but most rely on traditional rather than modern contraceptives such as condoms or oral/injectable hormonal methods. We conducted a systematic review of qualitative research to examine the limits to modern contraceptive use identified by young women in developing countries. Focusing on qualitative research allows the assessment of complex processes often missed in quantitative analyses. Methods Literature searches of 23 databases, including Medline, Embase and POPLINE®, were conducted. Literature from 1970–2006 concerning the 11–24 years age group was included. Studies were critically appraised and meta-ethnography was used to synthesise the data. Results Of the 12 studies which met the inclusion criteria, seven met the quality criteria and are included in the synthesis (six from sub-Saharan Africa; one from South-East Asia. Sample sizes ranged from 16 to 149 young women (age range 13–19 years. Four of the studies were urban based, one was rural, one semi-rural, and one mixed (predominantly rural. Use of hormonal methods was limited by lack of knowledge, obstacles to access and concern over side effects, especially fear of infertility. Although often more accessible, and sometimes more attractive than hormonal methods, condom use was limited by association with disease and promiscuity, together with greater male control. As a result young women often relied on traditional methods or abortion. Although the review was limited to five countries and conditions are not homogenous for all young women in all developing countries, the overarching themes were common across different settings and contexts, supporting the potential transferability of interventions to improve reproductive health. Conclusion Increasing modern contraceptive method use requires community-wide, multifaceted

  6. Cognitive functioning in children with self-limited epilepsy with centrotemporal spikes: A systematic review and meta-analysis.

    Science.gov (United States)

    Wickens, Steven; Bowden, Stephen C; D'Souza, Wendyl

    2017-10-01

    It is now well appreciated that benign epilepsy with centrotemporal spikes (BECTS, or more recently, ECTS) is associated with a range of cognitive and behavioral disturbances. Despite our improved understanding of cognitive functioning in ECTS, there have been to date no efforts to quantitatively synthesize the available literature within a comprehensive cognitive framework. The present systematic review and meta-analysis was conducted according to PRISMA guidelines. Forty-two case-control samples met eligibility criteria comprising a total of 1,237 children with ECTS and 1,137 healthy control children. Univariate, random-effects meta-analyses were conducted on eight cognitive factors in accordance with the Cattell-Horn-Carroll model of intelligence. Overall, children with ECTS demonstrated significantly lower scores on neuropsychological tests across all cognitive factors compared to healthy controls. Observed effects ranged from 0.42 to 0.81 pooled standard deviation units, with the largest effect for long-term storage and retrieval and the smallest effect for visual processing. The results of the present meta-analysis provide the first clear evidence that children with ECTS display a profile of pervasive cognitive difficulties and thus challenge current conceptions of ECTS as a benign disease or of limited specific or localized cognitive effect. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.

  7. Spatial Evolution of Openstreetmap Dataset in Turkey

    Science.gov (United States)

    Zia, M.; Seker, D. Z.; Cakir, Z.

    2016-10-01

    Large amount of research work has already been done regarding many aspects of OpenStreetMap (OSM) dataset in recent years for developed countries and major world cities. On the other hand, limited work is present in scientific literature for developing or underdeveloped ones, because of poor data coverage. In presented study it has been demonstrated how Turkey-OSM dataset has spatially evolved in an 8 year time span (2007-2015) throughout the country. It is observed that there is an east-west spatial biasedness in OSM features density across the country. Population density and literacy level are found to be the two main governing factors controlling this spatial trend. Future research paradigms may involve considering contributors involvement and commenting about dataset health.

  8. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  9. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  10. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  11. Open University Learning Analytics dataset

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-01

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  12. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  13. Indications, techniques, outcomes, and limitations for minimally ischemic and off-clamp partial nephrectomy: a systematic review of the literature.

    Science.gov (United States)

    Simone, Giuseppe; Gill, Inderbir S; Mottrie, Alexandre; Kutikov, Alexander; Patard, Jean-Jacques; Alcaraz, Antonio; Rogers, Craig G

    2015-10-01

    On-clamp partial nephrectomy (PN) has been considered the standard approach to minimize intraoperative bleeding and thus achieve adequate control of tumor margins. The potential negative impact of ischemia on renal function (RF) led to the development of techniques to minimize or avoid renal ischemia, such as off-clamp PN and minimally ischemic PN techniques. To review current evidence on the indications and techniques for and outcomes of minimally ischemic and off-clamp PN. A systematic review of English-language publications on PN without a main renal artery clamp from January 2005 to July 2014 was performed using the Medline, Embase, and Web of Science databases. The searches retrieved 52 papers. Off-clamp PN has been more commonly applied to small and peripheral renal tumors, while minimally ischemic PN is best suited for hilar and medially located renal tumors. These approaches are associated with increased intraoperative blood loss and perioperative transfusion rates compared to on-clamp PN. Minimally ischemic and off-clamp PN have potential functional benefits when longer ischemia time is anticipated, particularly for patients with lower baseline RF. Limitations include the lack of prospective randomized trials comparing minimally ischemic and off-clamp to on-clamp techniques, and the small sample size and short follow-up of most published series. The impact of different resection and renorrhaphy techniques on postoperative RF and its assessment via renal scintigraphy requires further investigations. Minimally ischemic and off-clamp PN are established procedures that may be particularly applicable for patients with decreased baseline RF. However, these techniques are technically demanding, with potential for increased blood loss, and require considerable experience with PN surgery. The role of ischemia in patients with a contralateral healthy kidney and consequently an indication for elective minimally ischemic or off-clamp PN remains a debatable issue. In

  14. REGISTRATION WITH ARCHIVED LIDAR DATASETS

    Directory of Open Access Journals (Sweden)

    M. S. L. Y. Magtalas

    2016-10-01

    Full Text Available Georeferencing gathered images is a common step before performing spatial analysis and other processes on acquired datasets using unmanned aerial systems (UAS. Methods of applying spatial information to aerial images or their derivatives is through onboard GPS (Global Positioning Systems geotagging, or through tying of models through GCPs (Ground Control Points acquired in the field. Currently, UAS (Unmanned Aerial System derivatives are limited to meter-levels of accuracy when their generation is unaided with points of known position on the ground. The use of ground control points established using survey-grade GPS or GNSS receivers can greatly reduce model errors to centimeter levels. However, this comes with additional costs not only with instrument acquisition and survey operations, but also in actual time spent in the field. This study uses a workflow for cloud-based post-processing of UAS data in combination with already existing LiDAR data. The georeferencing of the UAV point cloud is executed using the Iterative Closest Point algorithm (ICP. It is applied through the open-source CloudCompare software (Girardeau-Montaut, 2006 on a ‘skeleton point cloud’. This skeleton point cloud consists of manually extracted features consistent on both LiDAR and UAV data. For this cloud, roads and buildings with minimal deviations given their differing dates of acquisition are considered consistent. Transformation parameters are computed for the skeleton cloud which could then be applied to the whole UAS dataset. In addition, a separate cloud consisting of non-vegetation features automatically derived using CANUPO classification algorithm (Brodu and Lague, 2012 was used to generate a separate set of parameters. Ground survey is done to validate the transformed cloud. An RMSE value of around 16 centimeters was found when comparing validation data to the models georeferenced using the CANUPO cloud and the manual skeleton cloud. Cloud-to-cloud distance

  15. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  16. Combining Evidence from Homologous Datasets

    National Research Council Canada - National Science Library

    Feng, Ao; Allan, James

    2006-01-01

    .... We argue that combining evidence from these "homologous" datasets can give us better representation of the original data, and our experiments show that a model combining all sources outperforms each...

  17. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  18. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  19. Limited evidence for intranasal fentanyl in the emergency department and the prehospital setting--a systematic review

    DEFF Research Database (Denmark)

    Hansen, Morten Sejer; Dahl, Jørgen Berg

    2013-01-01

    The intranasal (IN) mode of application may be a valuable asset in non-invasive pain management. Fentanyl demonstrates pharmacokinetic and pharmacodynamic properties that are desirable in the management of acute pain, and IN fentanyl may be of value in the prehospital setting. The aim...... of this systematic review was to evaluate the current evidence for the use of IN fentanyl in the emergency department (ED) and prehospital setting....

  20. Protocol for the systematic review of the prevention, treatment and public health management of impetigo, scabies and fungal skin infections in resource-limited settings.

    Science.gov (United States)

    May, Philippa; Bowen, Asha; Tong, Steven; Steer, Andrew; Prince, Sam; Andrews, Ross; Currie, Bart; Carapetis, Jonathan

    2016-09-23

    Impetigo, scabies, and fungal skin infections disproportionately affect populations in resource-limited settings. Evidence for standard treatment of skin infections predominantly stem from hospital-based studies in high-income countries. The evidence for treatment in resource-limited settings is less clear, as studies in these populations may lack randomisation and control groups for cultural, ethical or economic reasons. Likewise, a synthesis of the evidence for public health control within endemic populations is also lacking. We propose a systematic review of the evidence for the prevention, treatment and public health management of skin infections in resource-limited settings, to inform the development of guidelines for the standardised and streamlined clinical and public health management of skin infections in endemic populations. The protocol has been designed in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols statement. All trial designs and analytical observational study designs will be eligible for inclusion. A systematic search of the peer-reviewed literature will include PubMed, Excertpa Medica and Global Health. Grey literature databases will also be systematically searched, and clinical trials registries scanned for future relevant studies. The primary outcome of interest will be the clinical cure or decrease in prevalence of impetigo, scabies, crusted scabies, tinea capitis, tinea corporis or tinea unguium. Two independent reviewers will perform eligibility assessment and data extraction using standardised electronic forms. Risk of bias assessment will be undertaken by two independent reviewers according to the Cochrane Risk of Bias tool. Data will be tabulated and narratively synthesised. We expect there will be insufficient data to conduct meta-analysis. The final body of evidence will be reported against the Grades of Recommendation, Assessment, Development and Evaluation grading system. The evidence

  1. Error characterisation of global active and passive microwave soil moisture datasets

    Science.gov (United States)

    Dorigo, W. A.; Scipal, K.; Parinussa, R. M.; Liu, Y. Y.; Wagner, W.; de Jeu, R. A. M.; Naeimi, V.

    2010-12-01

    Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT) and passive (AMSR-E and SSM/I) microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active), the use of different frequencies (C-, X-, and Ku-band) for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH). The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different datasets are

  2. Error characterisation of global active and passive microwave soil moisture datasets

    Directory of Open Access Journals (Sweden)

    W. A. Dorigo

    2010-12-01

    Full Text Available Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT and passive (AMSR-E and SSM/I microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active, the use of different frequencies (C-, X-, and Ku-band for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH. The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different

  3. Provenance Datasets Highlighting Capture Disparities

    Science.gov (United States)

    2014-01-01

    academic use case that has corollaries in offices everywhere. We also describe two distinct possibilities for provenance capture methods within this domain...Figure 1: Sample provenance graph of the librarians preparing the requested report, from the “Complete” dataset. The tool, SpectorSoft,2 was

  4. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  5. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  6. The impact of initiatives to limit the advertising of food and beverage products to children: a systematic review.

    Science.gov (United States)

    Galbraith-Emami, S; Lobstein, T

    2013-12-01

    In response to increasing evidence that advertising of foods and beverages affects children's food choices and food intake, several national governments and many of the world's larger food and beverage manufacturers have acted to restrict the marketing of their products to children or to advertise only 'better for you' products or 'healthier dietary choices' to children. Independent assessment of the impact of these pledges has been difficult due to the different criteria being used in regulatory and self-regulatory regimes. In this paper, we undertook a systematic review to examine the data available on levels of exposure of children to the advertising of less healthy foods since the introduction of the statutory and voluntary codes. The results indicate a sharp division in the evidence, with scientific, peer-reviewed papers showing that high levels of such advertising of less healthy foods continue to be found in several different countries worldwide. In contrast, the evidence provided in industry-sponsored reports indicates a remarkably high adherence to voluntary codes. We conclude that adherence to voluntary codes may not sufficiently reduce the advertising of foods which undermine healthy diets, or reduce children's exposure to this advertising. © 2013 The Authors. obesity reviews © 2013 International Association for the Study of Obesity.

  7. Limited Evidence for Robot-assisted Surgery: A Systematic Review and Meta-Analysis of Randomized Controlled Trials.

    Science.gov (United States)

    Broholm, Malene; Onsberg Hansen, Iben; Rosenberg, Jacob

    2016-04-01

    To evaluate available evidence on robot-assisted surgery compared with open and laparoscopic surgery. The databases Medline, Embase, and Cochrane Library were systematically searched for randomized controlled trials comparing robot-assisted surgery with open and laparoscopic surgery regardless of surgical procedure. Meta-analyses were performed on each outcome with appropriate data material available. Cochrane Collaboration's tool for assessing risk of bias was used to evaluate risk of bias on a study level. The GRADE approach was used to evaluate the quality of evidence of the meta-analyses. This review included 20 studies comprising 981 patients. The meta-analyses found no significant differences between robot-assisted and laparoscopic surgery regarding blood loss, complication rates, and hospital stay. A significantly longer operative time was found for robot-assisted surgery. Open versus robot-assisted surgery was investigated in 3 studies. A lower blood loss and a longer operative time were found after robot-assisted surgery. No other difference was detected. At this point there is not enough evidence to support the significantly higher costs with the implementation of robot-assisted surgery.

  8. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7) ......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  9. Matchmaking, datasets and physics analysis

    CERN Document Server

    Donno, Flavia; Eulisse, Giulio; Mazzucato, Mirco; Steenberg, Conrad; CERN. Geneva. IT Department; 10.1109/ICPPW.2005.48

    2005-01-01

    Grid enabled physics analysis requires a workload management system (WMS) that takes care of finding suitable computing resources to execute data intensive jobs. A typical example is the WMS available in the LCG2 (also referred to as EGEE-0) software system, used by several scientific experiments. Like many other current grid systems, LCG2 provides a file level granularity for accessing and analysing data. However, application scientists such as high energy physicists often require a higher abstraction level for accessing data, i.e. they prefer to use datasets rather than files in their physics analysis. We have improved the current WMS (in particular the Matchmaker) to allow physicists to express their analysis job requirements in terms of datasets. This required modifications to the WMS and its interface to potential data catalogues. As a result, we propose a simple data location interface that is based on a Web service approach and allows for interoperability of the WMS with new dataset and file catalogues...

  10. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  11. Methodological limitations of psychosocial interventions in patients with an implantable cardioverter-defibrillator (ICD A systematic review

    Directory of Open Access Journals (Sweden)

    Ockene Ira S

    2009-12-01

    Full Text Available Abstract Background Despite the potentially life-saving benefits of the implantable cardioverter-defibrillator (ICD, a significant group of patients experiences emotional distress after ICD implantation. Different psychosocial interventions have been employed to improve this condition, but previous reviews have suggested that methodological issues may limit the validity of such interventions. Aim: To review the methodology of previously published studies of psychosocial interventions in ICD patients, according to CONSORT statement guidelines for non-pharmacological interventions, and provide recommendations for future research. Methods We electronically searched the PubMed, PsycInfo and Cochrane databases. To be included, studies needed to be published in a peer-reviewed journal between 1980 and 2008, to involve a human population aged 18+ years and to have an experimental design. Results Twelve studies met the eligibility criteria. Samples were generally small. Interventions were very heterogeneous; most studies used cognitive behavioural therapy (CBT and exercise programs either as unique interventions or as part of a multi-component program. Overall, studies showed a favourable effect on anxiety (6/9 and depression (4/8. CBT appeared to be the most effective intervention. There was no effect on the number of shocks and arrhythmic events, probably because studies were not powered to detect such an effect. Physical functioning improved in the three studies evaluating this outcome. Lack of information about the indication for ICD implantation (primary vs. secondary prevention, limited or no information regarding use of anti-arrhythmic (9/12 and psychotropic (10/12 treatment, lack of assessments of providers' treatment fidelity (12/12 and patients' adherence to the intervention (11/12 were the most common methodological limitations. Conclusions Overall, this review supports preliminary evidence of a positive effect of psychosocial interventions

  12. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  13. Healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions: a qualitative systematic review protocol.

    Science.gov (United States)

    Ekberg, Stuart; Bradford, Natalie; Herbert, Anthony; Danby, Susan; Yates, Patsy

    2015-11-01

    -quality communication with and about children who have life-limiting conditions, this does not mean that these stakeholders necessarily share the same perspective of what constitutes high-quality communication and the best way of accomplishing this. Focusing on healthcare users' experiences of communication with healthcare professionals about children who have life-limiting conditions, the present review will explore the subjective impact of professionals' communication on the people for whom they provide care.It may be necessary to consider a range of contextual factors to understand healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions. For instance, age, developmental stage, cognitive capacity, emotional and social strengths, and family dynamics can influence a child's level of involvement in discussions about their condition and care. Although there are factors that appear more consistent across the range of pediatric palliative care users, such as parents' preferences for being treated by healthcare professionals as partners in making decisions about the care of their child, there is not always such consistency. Nor is it clear whether such findings can be generalized across different cultural contexts. In appraising existing research, this systematic review will therefore consider the relationship between the context of individual studies and their reported findings.The primary aim of this review is to identify, appraise and synthesize existing qualitative evidence of healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions. The review will consider relevant details of these findings, particularly whether factors like age are relevant for understanding particular experiences of communication. An outcome of this review will be the identification of best available qualitative evidence that can be used to inform professional practice, as well

  14. Selection criteria limit generalizability of smoking pharmacotherapy studies differentially across clinical trials and laboratory studies: A systematic review on varenicline.

    Science.gov (United States)

    Motschman, Courtney A; Gass, Julie C; Wray, Jennifer M; Germeroth, Lisa J; Schlienz, Nicolas J; Munoz, Diana A; Moore, Faith E; Rhodes, Jessica D; Hawk, Larry W; Tiffany, Stephen T

    2016-12-01

    The selection criteria used in clinical trials for smoking cessation and in laboratory studies that seek to understand mechanisms responsible for treatment outcomes may limit their generalizability to one another and to the general population. We reviewed studies on varenicline versus placebo and compared eligibility criteria and participant characteristics of clinical trials (N=23) and laboratory studies (N=22) across study type and to nationally representative survey data on adult, daily USA smokers (2014 National Health Interview Survey; 2014 National Survey on Drug Use and Health). Relative to laboratory studies, clinical trials more commonly reported excluding smokers who were unmotivated to quit and for specific medical conditions (e.g., cardiovascular disease, COPD), although both study types frequently reported excluding for general medical or psychiatric reasons. Laboratory versus clinical samples smoked less, had lower nicotine dependence, were younger, and more homogeneous with respect to smoking level and nicotine dependence. Application of common eligibility criteria to national survey data resulted in considerable elimination of the daily-smoking population for both clinical trials (≥47%) and laboratory studies (≥39%). Relative to the target population, studies in this review recruited participants who smoked considerably more and had a later smoking onset age, and were under-representative of Caucasians. Results suggest that selection criteria of varenicline studies limit generalizability in meaningful ways, and differences in criteria across study type may undermine efforts at translational research. Recommendations for improvements in participant selection and reporting standards are discussed. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  15. Sleep deprivation in resident physicians, work hour limitations, and related outcomes: a systematic review of the literature.

    Science.gov (United States)

    Mansukhani, Meghna P; Kolla, Bhanu Prakash; Surani, Salim; Varon, Joseph; Ramar, Kannan

    2012-07-01

    Extended work hours, interrupted sleep, and shift work are integral parts of medical training among all specialties. The need for 24-hour patient care coverage and economic factors have resulted in prolonged work hours for resident physicians. This has traditionally been thought to enhance medical educational experience. These long and erratic work hours lead to acute and chronic sleep deprivation and poor sleep quality, resulting in numerous adverse consequences. Impairments may occur in several domains, including attention, cognition, motor skills, and mood. Resident performance, professionalism, safety, and well-being are affected by sleep deprivation, causing potentially adverse implications for patient care. Studies have shown adverse health consequences, motor vehicle accidents, increased alcohol and medication use, and serious medical errors to occur in association with both sleep deprivation and shift work. Resident work hour limitations have been mandated by the Accreditation Council for Graduate Medical Education in response to patient safety concerns. Studies evaluating the impact of these regulations on resident physicians have generated conflicting reports on patient outcomes, demonstrating only a modest increase in sleep duration for resident physicians, along with negative perceptions regarding their education. This literature review summarizes research on the effects of sleep deprivation and shift work, and examines current literature on the impact of recent work hour limitations on resident physicians and patient-related outcomes.

  16. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  17. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  18. Limited electricity access in health facilities of sub-Saharan Africa: a systematic review of data on electricity access, sources, and reliability.

    Science.gov (United States)

    Adair-Rohani, Heather; Zukor, Karen; Bonjour, Sophie; Wilburn, Susan; Kuesel, Annette C; Hebert, Ryan; Fletcher, Elaine R

    2013-08-01

    Access to electricity is critical to health care delivery and to the overarching goal of universal health coverage. Data on electricity access in health care facilities are rarely collected and have never been reported systematically in a multi-country study. We conducted a systematic review of available national data on electricity access in health care facilities in sub-Saharan Africa. We identified publicly-available data from nationally representative facility surveys through a systematic review of articles in PubMed, as well as through websites of development agencies, ministries of health, and national statistics bureaus. To be included in our analysis, data sets had to be collected in or after 2000, be nationally representative of a sub-Saharan African country, cover both public and private health facilities, and include a clear definition of electricity access. We identified 13 health facility surveys from 11 sub-Saharan African countries that met our inclusion criteria. On average, 26% of health facilities in the surveyed countries reported no access to electricity. Only 28% of health care facilities, on average, had reliable electricity among the 8 countries reporting data. Among 9 countries, an average of 7% of facilities relied solely on a generator. Electricity access in health care facilities increased by 1.5% annually in Kenya between 2004 and 2010, and by 4% annually in Rwanda between 2001 and 2007. Energy access for health care facilities in sub-Saharan African countries varies considerably. An urgent need exists to improve the geographic coverage, quality, and frequency of data collection on energy access in health care facilities. Standardized tools should be used to collect data on all sources of power and supply reliability. The United Nations Secretary-General's "Sustainable Energy for All" initiative provides an opportunity to comprehensively monitor energy access in health care facilities. Such evidence about electricity needs and gaps would

  19. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  20. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  1. Systematic Review of Measures of Impairment and Activity Limitation for Persons With Upper Limb Trauma and Amputation.

    Science.gov (United States)

    Resnik, Linda; Borgia, Matt; Silver, Ben; Cancio, Jill

    2017-09-01

    (1) To identify outcome measures used in studies of persons with traumatic upper limb injury and/or amputation; and (2) to evaluate focus, content, and psychometric properties of each measure. Searches of PubMed and CINAHL for terms including upper extremity, function, activities of daily living, outcome assessment, amputation, and traumatic injuries. Included articles had a sample of ≥10 adults with limb trauma or amputation and were in English. Measures containing most items assessing impairment of body function or activity limitation were eligible. There were 260 articles containing 55 measures that were included. Data on internal consistency; test-retest, interrater, and intrarater reliability; content, structural, construct, concurrent, and predictive validity; responsiveness; and floor/ceiling effects were extracted and confirmed by a second investigator. The mostly highly rated performance measures included 2 amputation-specific measures (Activities Measure for Upper Limb Amputees and University of New Brunswick Test of Prosthetic Function skill and spontaneity subscales) and 2 non-amputation-specific measures (Box and Block Test and modified Jebsen-Taylor Hand Function Test light and heavy cans tests). Most highly rated self-report measures were Disabilities of the Arm, Shoulder and Hand; Patient Rated Wrist Evaluation; QuickDASH; Hand Assessment Tool; International Osteoporosis Foundation Quality of Life Questionnaire; and Patient Rated Wrist Evaluation functional recovery subscale. None were amputation specific. Few performance measures were recommended for patients with limb trauma and amputation. All top-rated self-report measures were suitable for use in both groups. These results will inform choice of outcome measures for these patients. Published by Elsevier Inc.

  2. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  3. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  4. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  5. Long-term neurodevelopmental outcome in high-risk newborns in resource-limited settings: a systematic review of the literature.

    Science.gov (United States)

    Milner, K M; Neal, E F G; Roberts, G; Steer, A C; Duke, T

    2015-08-01

    Improving outcomes beyond survival for high-risk newborns in resource-limited settings is an emerging challenge. Global estimates demonstrate the scale of this challenge and significant gaps in morbidity outcome data in high mortality contexts. A systematic review was conducted to document the prevalence of neurodevelopmental impairment in high-risk newborns who were followed up into childhood in low- and middle-income countries. High-risk newborns were defined as low, very or extremely low birthweight, preterm infants or those surviving birth asphyxia or serious infections. Electronic databases were searched and articles screened for eligibility. Included articles were appraised according to STROBE criteria. Narrative review was performed and median prevalence of key neurodevelopmental outcomes was calculated where data quality allowed. 6959 articles were identified with sixty included in final review. At follow-up in early childhood, median estimated prevalence (inter-quartile range) of overall neurodevelopmental impairment, cognitive impairment and cerebral palsy were: for survivors of prematurity/very low birthweight 21.4% (11.6-30.8), 16.3% (6.3-29.6) and 11.2% (5.9-16.1), respectively, and for survivors of birth asphyxia 34.6% (25.4-51.5), 11.3% (7.7-11.8) and 22.8% (15.7-31.4), respectively. Only three studies reporting outcomes following newborn serious bacterial infections were identified. There was limited reporting of important outcomes such as vision and hearing impairment. Major challenges with standardised reporting of key exposure and developmental outcome variables and lack of control data were identified. Understanding the limitations of the available data on neurodevelopmental outcome in newborns in resource-limited settings provides clear direction for research and efforts to improve long-term outcome in high-risk newborns in these settings.

  6. VT Hydrography Dataset - High Resolution NHD

    Data.gov (United States)

    Vermont Center for Geographic Information — (Link to Metadata) The Vermont Hydrography Dataset (VHD) is compliant with the local resolution (also known as High Resolution) National Hydrography Dataset (NHD)...

  7. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Department of Resources — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  8. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  9. Evaluation of measurement properties of self-administered PROMs aimed at patients with non-specific shoulder pain and "activity limitations": a systematic review.

    Science.gov (United States)

    Thoomes-de Graaf, M; Scholten-Peeters, G G M; Schellingerhout, J M; Bourne, A M; Buchbinder, R; Koehorst, M; Terwee, C B; Verhagen, A P

    2016-09-01

    To critically appraise and compare the measurement properties of self-administered patient-reported outcome measures (PROMs) focussing on the shoulder, assessing "activity limitations." Systematic review. The study population had to consist of patients with shoulder pain. We excluded postoperative patients or patients with generic diseases. The methodological quality of the selected studies and the results of the measurement properties were critically appraised and rated using the COSMIN checklist. Out of a total of 3427 unique hits, 31 articles, evaluating 7 different questionnaires, were included. The SPADI is the most frequently evaluated PROM and its measurement properties seem adequate apart from a lack of information regarding its measurement error and content validity. For English, Norwegian and Turkish users, we recommend to use the SPADI. Dutch users could use either the SDQ or the SST. In German, we recommend the DASH. In Tamil, Slovene, Spanish and the Danish languages, the evaluated PROMs were not yet of acceptable validity. None of these PROMs showed strong positive evidence for all measurement properties. We propose to develop a new shoulder PROM focused on activity limitations, taking new knowledge and techniques into account.

  10. Limited electricity access in health facilities of sub-Saharan Africa: a systematic review of data on electricity access, sources, and reliability

    Science.gov (United States)

    Adair-Rohani, Heather; Zukor, Karen; Bonjour, Sophie; Wilburn, Susan; Kuesel, Annette C; Hebert, Ryan; Fletcher, Elaine R

    2013-01-01

    ABSTRACT Background: Access to electricity is critical to health care delivery and to the overarching goal of universal health coverage. Data on electricity access in health care facilities are rarely collected and have never been reported systematically in a multi-country study. We conducted a systematic review of available national data on electricity access in health care facilities in sub-Saharan Africa. Methods: We identified publicly-available data from nationally representative facility surveys through a systematic review of articles in PubMed, as well as through websites of development agencies, ministries of health, and national statistics bureaus. To be included in our analysis, data sets had to be collected in or after 2000, be nationally representative of a sub-Saharan African country, cover both public and private health facilities, and include a clear definition of electricity access. Results: We identified 13 health facility surveys from 11 sub-Saharan African countries that met our inclusion criteria. On average, 26% of health facilities in the surveyed countries reported no access to electricity. Only 28% of health care facilities, on average, had reliable electricity among the 8 countries reporting data. Among 9 countries, an average of 7% of facilities relied solely on a generator. Electricity access in health care facilities increased by 1.5% annually in Kenya between 2004 and 2010, and by 4% annually in Rwanda between 2001 and 2007. Conclusions: Energy access for health care facilities in sub-Saharan African countries varies considerably. An urgent need exists to improve the geographic coverage, quality, and frequency of data collection on energy access in health care facilities. Standardized tools should be used to collect data on all sources of power and supply reliability. The United Nations Secretary-General's “Sustainable Energy for All” initiative provides an opportunity to comprehensively monitor energy access in health care

  11. Communication and support from health-care professionals to families, with dependent children, following the diagnosis of parental life-limiting illness: A systematic review.

    Science.gov (United States)

    Fearnley, Rachel; Boland, Jason W

    2017-03-01

    Communication between parents and their children about parental life-limiting illness is stressful. Parents want support from health-care professionals; however, the extent of this support is not known. Awareness of family's needs would help ensure appropriate support. To find the current literature exploring (1) how parents with a life-limiting illness, who have dependent children, perceive health-care professionals' communication with them about the illness, diagnosis and treatments, including how social, practical and emotional support is offered to them and (2) how this contributes to the parents' feelings of supporting their children. A systematic literature review and narrative synthesis. Embase, MEDLINE, PsycINFO, CINAHL and ASSIA ProQuest were searched in November 2015 for studies assessing communication between health-care professionals and parents about how to talk with their children about the parent's illness. There were 1342 records identified, five qualitative studies met the inclusion criteria (55 ill parents, 11 spouses/carers, 26 children and 16 health-care professionals). Parents wanted information from health-care professionals about how to talk to their children about the illness; this was not routinely offered. Children also want to talk with a health-care professional about their parents' illness. Health-care professionals are concerned that conversations with parents and their children will be too difficult and time-consuming. Parents with a life-limiting illness want support from their health-care professionals about how to communicate with their children about the illness. Their children look to health-care professionals for information about their parent's illness. Health-care professionals, have an important role but appear reluctant to address these concerns because of fears of insufficient time and expertise.

  12. SIS - Annual Catch Limit

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Annual Catch Limit (ACL) dataset within the Species Information System (SIS) contains information and data related to management reference points and catch data.

  13. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  14. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  15. Poor quality of external validity reporting limits generalizability of overweight and/or obesity lifestyle prevention interventions in young adults: a systematic review.

    Science.gov (United States)

    Partridge, S R; Juan, S J-H; McGeechan, K; Bauman, A; Allman-Farinelli, M

    2015-01-01

    Young adulthood is a high-risk life stage for weight gain. Evidence is needed to translate behavioural approaches into community practice to prevent weight gain in young adults. This systematic review assessed the effectiveness and reporting of external validity components in prevention interventions. The search was limited to randomized controlled trial (RCT) lifestyle interventions for the prevention of weight gain in young adults (18-35 years). Mean body weight and/or body mass index (BMI) change were the primary outcomes. External validity, quality assessment and risk of bias tools were applied to all studies. Twenty-one RCTs were identified through 14 major electronic databases. Over half of the studies were effective in the short term for significantly reducing body weight and/or BMI; however, few showed long-term maintenance. All studies lacked full reporting on external validity components. Description of the intervention components and participant attrition rates were reported by most studies. However, few studies reported the representativeness of participants, effectiveness of recruitment methods, process evaluation detail or costs. It is unclear from the information reported how to implement the interventions into community practice. Integrated reporting of intervention effectiveness and enhanced reporting of external validity components are needed for the translation and potential upscale of prevention strategies. © 2014 World Obesity.

  16. Retention of HIV-Infected Children in the First 12 Months of Anti-Retroviral Therapy and Predictors of Attrition in Resource Limited Settings: A Systematic Review.

    Science.gov (United States)

    Abuogi, Lisa L; Smith, Christiana; McFarland, Elizabeth J

    2016-01-01

    Current UNAIDS goals aimed to end the AIDS epidemic set out to ensure that 90% of all people living with HIV know their status, 90% initiate and continue life-long anti-retroviral therapy (ART), and 90% achieve viral load suppression. In 2014 there were an estimated 2.6 million children under 15 years of age living with HIV, of which only one-third were receiving ART. Little literature exists describing retention of HIV-infected children in the first year on ART. We conducted a systematic search for English language publications reporting on retention of children with median age at ART initiation less than ten years in resource limited settings. The proportion of children retained in care on ART and predictors of attrition were identified. Twelve studies documented retention at one year ranging from 71-95% amongst 31877 African children. Among the 5558 children not retained, 4082 (73%) were reported as lost to follow up (LFU) and 1476 (27%) were confirmed to have died. No studies confirmed the outcomes of children LFU. Predictors of attrition included younger age, shorter duration of time on ART, and severe immunosuppression. In conclusion, significant attrition occurs in children in the first 12 months after ART initiation, the majority attributed to LFU, although true outcomes of children labeled as LFU are unknown. Focused efforts to ensure retention and minimize early mortality are needed as universal ART for children is scaled up.

  17. Training in time-limited dynamic psychotherapy: A systematic comparison of pre- and post-training cases treated by one therapist.

    Science.gov (United States)

    Anderson, Timothy; Strupp, Hans H

    2015-01-01

    This qualitative study systematically compared cases treated by the same therapist in order to understand the group comparison findings of a larger study on training of experienced therapists (the "Vanderbilt II" psychotherapy project). The therapist, Dr C., was selected based on the therapist's overall treatment successes. His two patients were selected based on their outcomes and the relative training cohort from which they were drawn: a case with successful outcome from the pre-training cohort and a case of negligible improvement from the post-training cohort. Dr C. demonstrated a variety of interpersonal skills throughout his pre-training case, though there was also poor interpersonal process throughout. However, in the second case he had considerable difficulty in adapting his typical therapeutic approach to the requirements of the time-limited dynamic psychotherapy (TLDP) manual, even while appearing to work hard to find ways to use the manual. Dr C.'s spontaneity, and his unique set of interpersonal skills may enhanced his initial rapport and alliance building with clients and yet may not have interfaced well with TLDP. His unique interpersonal skills also may have contributed to problems of interpersonal process. Future research may benefit from examining the interaction of between therapist interpersonal skills and the implementation of the treatment manual.

  18. Dimensionality Reduction Algorithms on High Dimensional Datasets

    Directory of Open Access Journals (Sweden)

    Iwan Syarif

    2014-12-01

    Full Text Available Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA and Particle Swarm Optimization (PSO as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets. Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA, Particle Swarm Optmization (PSO.

  19. Boosting association rule mining in large datasets via Gibbs sampling

    Science.gov (United States)

    Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

    2016-01-01

    Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling–induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm. PMID:27091963

  20. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  1. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  2. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  3. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  4. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  5. The influence of systematic pulse-limited physical exercise on the parameters of the cardiovascular system in patients over 65 years of age.

    Science.gov (United States)

    Chomiuk, Tomasz; Folga, Andrzej; Mamcarz, Artur

    2013-04-20

    The influence of physical exercise on the parameters of the cardiovascular system of elderly persons has not been sufficiently investigated yet. The aim of the study was to assess the influence of regular 6-week physical exercise using the Nordic walking (NW) method in a group of elderly persons on their physical performance and regulation of selected parameters assessing the cardiovascular system. Fifty patients over 65 years of age participated in the study. The study encompassed: medical interview, physical examination, resting ECG, spiroergometry examination, 6MWT (6-minute walk test) and 24-hour ambulatory blood pressure monitoring (ABPM). During the exercise programme, the pulse was monitored using pulsometers. After the completion of the training, check-up tests assessing the same parameters were performed. The control group consisted of 18 persons over 65 years of age with similar cardiovascular problems. In the test group, duration of the physical effort increased by 1.02 min (p = 0.0001), the maximum load increased by 10.68 W (p = 0.0001), values of VO2max by 2.10 (p = 0.0218), distance improved in 6MWT by 75.04 m (p = 0.00001), systolic blood pressure decreased by 5.50 mm Hg (p = 0.035) and diastolic blood pressure by 3.50 mm Hg (p = 0.054) as compared to the control group. Systematic NW physical exercise limited by the pulse had a beneficial effect on the physical performance of elderly persons as assessed with main parameters. A short 6-week programme of endurance exercises had a hypotensive effect in elderly persons over 65 years of age.

  6. Persistent right ventricular dysfunction, functional capacity limitation, exercise intolerance, and quality of life impairment following pulmonary embolism: Systematic review with meta-analysis.

    Science.gov (United States)

    Sista, Akhilesh K; Miller, Larry E; Kahn, Susan R; Kline, Jeffrey A

    2017-02-01

    Long-term right ventricular (RV) function, functional capacity, exercise capacity, and quality of life following pulmonary embolism (PE), and the impact of thrombolysis, are unclear. A systematic review of studies that evaluated these outcomes with ⩾ 3-month mean follow-up after PE diagnosis was performed. For each outcome, random effects meta-analyses were performed. Twenty-six studies (3671 patients) with 18-month median follow-up were included. The pooled prevalence of RV dysfunction was 18.1%. Patients treated with thrombolysis had a lower, but not statistically significant, risk of RV dysfunction versus those treated with anticoagulation (odds ratio: 0.51, 95% CI: 0.24 to 1.13, p=0.10). Pooled prevalence of at least mild functional impairment (NYHA II-IV) was 33.2%, and at least moderate functional impairment (NYHA III-IV) was 11.3%. Patients treated with thrombolysis had a lower, but not statistically significant, risk of at least moderate functional impairment versus those treated with anticoagulation (odds ratio: 0.48, 95% CI: 0.15 to 1.49, p=0.20). Pooled 6-minute walk distance was 415 m (95% CI: 372 to 458 m), SF-36 Physical Component Score was 44.8 (95% CI: 43 to 46), and Pulmonary Embolism Quality of Life (QoL) Questionnaire total score was 9.1. Main limitations included heterogeneity among studies for many outcomes, variation in the completeness of data reported, and inclusion of data from non-randomized, non-controlled, and retrospective studies. Persistent RV dysfunction, impaired functional status, diminished exercise capacity, and reduced QoL are common in PE survivors. The effect of thrombolysis on RV function and functional status remains unclear.

  7. A Systematic Review of Non-Traumatic Spinal Cord Injuries in Sub-Saharan Africa and a Proposed Diagnostic Algorithm for Resource-Limited Settings

    Directory of Open Access Journals (Sweden)

    Abdu Kisekka Musubire

    2017-12-01

    Full Text Available BackgroundNon-traumatic myelopathy is common in Africa and there are geographic differences in etiology. Clinical management is challenging due to the broad differential diagnosis and the lack of diagnostics. The objective of this systematic review is to determine the most common etiologies of non-traumatic myelopathy in sub-Saharan Africa to inform a regionally appropriate diagnostic algorithm.MethodsWe conducted a systemic review searching Medline and Embase databases using the following search terms: “Non traumatic spinal cord injury” or “myelopathy” with limitations to epidemiology or etiologies and Sub-Saharan Africa. We described the frequencies of the different etiologies and proposed a diagnostic algorithm based on the most common diagnoses.ResultsWe identified 19 studies all performed at tertiary institutions; 15 were retrospective and 13 were published in the era of the HIV epidemic. Compressive bone lesions accounted for more than 48% of the cases; a majority were Pott’s disease and metastatic disease. No diagnosis was identified in up to 30% of cases in most studies; in particular, definitive diagnoses of non-compressive lesions were rare and a majority were clinical diagnoses of transverse myelitis and HIV myelopathy. Age and HIV were major determinants of etiology.ConclusionCompressive myelopathies represent a majority of non-traumatic myelopathies in sub-Saharan Africa, and most were due to Pott’s disease. Non-compressive myelopathies have not been well defined and need further research in Africa. We recommend a standardized approach to management of non-traumatic myelopathy focused on identifying treatable conditions with tests widely available in low-resource settings.

  8. Metadata-catalogue of European spatial datasets

    NARCIS (Netherlands)

    Willemen, J.P.M.; Kooistra, L.

    2004-01-01

    In order to facilitate a more effective accessibility of European spatial datasets, an assessment was carried out by the GeoDesk of the WUR to identify and describe key datasets that will be relevant for research carried out within WUR and MNP. The outline of the Metadata catalogue European spatial

  9. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  10. A novel dataset for real-life evaluation of facial expression recognition methodologies

    NARCIS (Netherlands)

    Siddiqi, Muhammad Hameed; Ali, Maqbool; Idris, Muhammad; Banos Legran, Oresti; Lee, Sungyoung; Choo, Hyunseung

    2016-01-01

    One limitation seen among most of the previous methods is that they were evaluated under settings that are far from real-life scenarios. The reason is that the existing facial expression recognition (FER) datasets are mostly pose-based and assume a predefined setup. The expressions in these datasets

  11. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  12. Low Tidal Volume versus Non-Volume-Limited Strategies for Patients with Acute Respiratory Distress Syndrome. A Systematic Review and Meta-Analysis.

    Science.gov (United States)

    Walkey, Allan J; Goligher, Ewan C; Del Sorbo, Lorenzo; Hodgson, Carol L; Adhikari, Neill K J; Wunsch, Hannah; Meade, Maureen O; Uleryk, Elizabeth; Hess, Dean; Talmor, Daniel S; Thompson, B Taylor; Brower, Roy G; Fan, Eddy

    2017-10-01

    Trials investigating use of lower tidal volumes and inspiratory pressures for patients with acute respiratory distress syndrome (ARDS) have shown mixed results. To compare clinical outcomes of mechanical ventilation strategies that limit tidal volumes and inspiratory pressures (LTV) to strategies with tidal volumes of 10 to 15 ml/kg among patients with ARDS. This is a systematic review and meta-analysis of clinical trials investigating LTV mechanical ventilation strategies. We used random effects models to evaluate the effect of LTV on 28-day mortality, organ failure, ventilator-free days, barotrauma, oxygenation, and ventilation. Our primary analysis excluded trials for which the LTV strategy was combined with the additional strategy of higher positive end-expiratory pressure (PEEP), but these trials were included in a stratified sensitivity analysis. We performed metaregression of tidal volume gradient achieved between intervention and control groups on mortality effect estimates. We used Grading of Recommendations Assessment, Development, and Evaluation methodology to determine the quality of evidence. Seven randomized trials involving 1,481 patients met eligibility criteria for this review. Mortality was not significantly lower for patients receiving an LTV strategy (33.6%) as compared with control strategies (40.4%) (relative risk [RR], 0.87; 95% confidence interval [CI], 0.70-1.08; heterogeneity statistic I 2  = 46%), nor did an LTV strategy significantly decrease barotrauma or ventilator-free days when compared with a lower PEEP strategy. Quality of evidence for clinical outcomes was downgraded for imprecision. Metaregression showed a significant inverse association between larger tidal volume gradient between LTV and control groups and log odds ratios for mortality (β, -0.1587; P = 0.0022). Sensitivity analysis including trials that protocolized an LTV/high PEEP cointervention showed lower mortality associated with LTV (nine trials and 1

  13. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  14. BASE MAP DATASET, WOODWARD COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  15. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  16. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  17. Environmental Dataset Gateway (EDG) Search Widget

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  18. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. BASE MAP DATASET, LANCASTER COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  20. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  1. BASE MAP DATASET, LOGAN COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  2. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  3. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. Dataset Curation through Renders and Ontology Matching

    Science.gov (United States)

    2015-09-01

    Dataset Curation through Renders and Ontology Matching Yair Movshovitz-Attias CMU-CS-15-119 September 2015 School of Computer Science Computer...REPORT TYPE 3. DATES COVERED 00-00-2015 to 00-00-2015 4. TITLE AND SUBTITLE Dataset Curation through Renders and Ontology Matching 5a...mapped to an ontology of geographical entities, we are able to extract multiple relevant labels per image. For the viewpoint estimation problem, by

  7. Managing large SNP datasets with SNPpy.

    Science.gov (United States)

    Mitha, Faheem

    2013-01-01

    Using relational databases to manage SNP datasets is a very useful technique that has significant advantages over alternative methods, including the ability to leverage the power of relational databases to perform data validation, and the use of the powerful SQL query language to export data. SNPpy is a Python program which uses the PostgreSQL database and the SQLAlchemy Python library to automate SNP data management. This chapter shows how to use SNPpy to store and manage large datasets.

  8. Dimensionality Reduction Algorithms on High Dimensional Datasets

    OpenAIRE

    Iwan Syarif

    2014-01-01

    Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experime...

  9. Limited evidence for the effect of sodium fluoride on deterioration of hearing loss in patients with otosclerosis: a systematic review of the literature

    NARCIS (Netherlands)

    Hentschel, M.A.; Huizinga, P.; van der Velden, D.L.; Wegner, I.; Bittermann, A.J.N.; van der Heijden, G.J.M.; Grolman, W.

    2014-01-01

    OBJECTIVE: To determine the protective effect of sodium fluoride on the deterioration of hearing loss in adult patients with otosclerosis. DATA SOURCES: PubMed, Embase, the Cochrane Library, and CINAHL. STUDY SELECTION: A systematic literature search was conducted. Studies reporting original study

  10. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  11. Improving search efficiency for systematic reviews of diagnostic test accuracy: an exploratory study to assess the viability of limiting to MEDLINE, EMBASE and reference checking.

    Science.gov (United States)

    Preston, Louise; Carroll, Christopher; Gardois, Paolo; Paisley, Suzy; Kaltenthaler, Eva

    2015-06-26

    Increasing numbers of systematic reviews evaluating the diagnostic test accuracy of technologies are being published. Currently, review teams tend to apply conventional systematic review standards to identify relevant studies for inclusion, for example sensitive searches of multiple bibliographic databases. There has been little evaluation of the efficiency of searching only one or two such databases for this type of review. The aim of this study was to assess the viability of an approach that restricted searches to MEDLINE, EMBASE and the reference lists of included studies. A convenience sample of nine Health Technology Assessment (HTA) systematic reviews of diagnostic test accuracy, with 302 included citations, was analysed to determine the number and proportion of included citations that were indexed in and retrieved from MEDLINE and EMBASE. An assessment was also made of the number and proportion of citations not retrieved from these databases but that could have been identified from the reference lists of included citations. 287/302 (95 %) of the included citations in the nine reviews were indexed across MEDLINE and EMBASE. The reviews' searches of MEDLINE and EMBASE accounted for 85 % of the included citations (256/302). Of the forty-six (15 %) included citations not retrieved by the published searches, 24 (8 %) could be found in the reference lists of included citations. Only 22/302 (7 %) of the included citations were not found by the proposed, more efficient approach. The proposed approach would have accounted for 280/302 (93 %) of included citations in this sample of nine systematic reviews. This exploratory study suggests that there might be a case for restricting searches for systematic reviews of diagnostic test accuracy studies to MEDLINE, EMBASE and the reference lists of included citations. The conduct of such reviews might be rendered more efficient by using this approach.

  12. Accounting for inertia in modal choices: some new evidence using a RP/SP dataset

    DEFF Research Database (Denmark)

    Cherchi, Elisabetta; Manca, Francesco

    2011-01-01

    proposed for both short and long RP panel datasets. We also explore new measures of inertia to test for the effect of “learning” (in the sense of acquiring experience or getting more familiar with) along the SP experiment and we disentangle this effect from the pure inertia effect. A mixed logit model...... effect is stable along the SP experiments. Inertia has been studied more extensively with panel datasets, but few investigations have used RP/SP datasets. In this paper we extend previous work in several ways. We test and compare several ways of measuring inertia, including measures that have been...... is used that allows us to account for both systematic and random taste variations in the inertia effect and for correlations among RP and SP observations. Finally we explore the relation between the utility specification (especially in the SP dataset) and the role of inertia in explaining current choices....

  13. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  14. Unleashing the Charges: An Improved Reduction of Key Exoplanet Datasets and a Tool for Ramp Effect Correction

    Science.gov (United States)

    Zhou, Yifan

    2017-08-01

    Among HST's most lasting and iconic results are WFC3/IR observations of transiting exoplanets that provided exciting insights into atmospheres of planets ranging from super-earths to hot-Jupiters. However, all time-resolved WFC3/IR observations suffer from an often-limiting detector systematic: the ramp effect. Current efforts are forced to discard orbits most affected and to apply an empirical procedure to minimize the amplitude. We developed and demonstrated a powerful new, solid state physics-motivated detector model that accurately corrects for the ramp effect and reaches essentially photon-noise limited performance for even the most affected orbits. We propose here to apply our RECTE ramp charge trap correction to key archival datasets for which significant improvements are expected. We will also use these datasets to further test and document the RECTE correction on data acquired in different observing modes and to seek further improvements in RECTE's detector parameters. We will document and release RECTE, along with a data reduction cookbook, to the community. We also expect important improvements in the science results from the four key HST datasets.Our charge trap correction will help increasing HST's efficiency for infrared transit spectroscopy by about 20-25% (no more need to discard first orbits), saving dozens of orbits in the future, and will also improve the reliability and reproducibility of infrared time-domain observations. Our work is especially important for the most challenging transit and phase curve observations and will likely provide an example for an approach that can be utilized for JWST instruments with architectures similar to WFC3.

  15. Interoperability of Multiple Datasets with JMARS

    Science.gov (United States)

    Smith, M. E.; Christensen, P. R.; Noss, D.; Anwar, S.; Dickenshied, S.

    2012-12-01

    Planetary Science includes all celestial bodies including Earth. However, when investigating Geographic Information System (GIS) applications, Earth and planetary bodies have the tendency to be separated. One reason is because we have been learning and investigating Earth's properties much longer than we have been studying the other planetary bodies, therefore, the archive of GCS and projections is much larger. The first latitude and longitude system of Earth was invented between 276 BC and 194 BC by Eratosthenes who was also the first to calculate the circumference of the Earth. As time went on, scientists continued to re-measure the Earth on both local and global scales which has created a large collection of projections and geographic coordinate systems (GCS) to choose from. The variety of options can create a time consuming task to determine which GCS or projection gets applied to each dataset and how to convert to the correct GCS or projection. Another issue is presented when determining if the dataset should be applied to a geocentric sphere or a geodetic spheroid. Both of which are measured and determine latitude values differently. This can lead to inconsistent results and frustration for the user. This is not the case with other planetary bodies. Although the existence of other planets have been known since the early Babylon times, the accuracy of the planets rotation, size and geologic properties weren't known for several hundreds of years later. Therefore, the options for projections or GCS's are much smaller than the options one has for Earth's data. Even then, the projection and GCS options for other celestial bodies are informal. So it can be hard for the user to determine which projection or GCS to apply to the other planets. JMARS (Java Mission Analysis for Remote Sensing) is an open source suite that was developed by Arizona State University's Mars Space Flight Facility. The beauty of JMARS is that the tool transforms all datasets behind the scenes

  16. Introduction of a simple-model-based land surface dataset for Europe

    Science.gov (United States)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar

  17. Two ultraviolet radiation datasets that cover China

    Science.gov (United States)

    Liu, Hui; Hu, Bo; Wang, Yuesi; Liu, Guangren; Tang, Liqin; Ji, Dongsheng; Bai, Yongfei; Bao, Weikai; Chen, Xin; Chen, Yunming; Ding, Weixin; Han, Xiaozeng; He, Fei; Huang, Hui; Huang, Zhenying; Li, Xinrong; Li, Yan; Liu, Wenzhao; Lin, Luxiang; Ouyang, Zhu; Qin, Boqiang; Shen, Weijun; Shen, Yanjun; Su, Hongxin; Song, Changchun; Sun, Bo; Sun, Song; Wang, Anzhi; Wang, Genxu; Wang, Huimin; Wang, Silong; Wang, Youshao; Wei, Wenxue; Xie, Ping; Xie, Zongqiang; Yan, Xiaoyuan; Zeng, Fanjiang; Zhang, Fawei; Zhang, Yangjian; Zhang, Yiping; Zhao, Chengyi; Zhao, Wenzhi; Zhao, Xueyong; Zhou, Guoyi; Zhu, Bo

    2017-07-01

    Ultraviolet (UV) radiation has significant effects on ecosystems, environments, and human health, as well as atmospheric processes and climate change. Two ultraviolet radiation datasets are described in this paper. One contains hourly observations of UV radiation measured at 40 Chinese Ecosystem Research Network stations from 2005 to 2015. CUV3 broadband radiometers were used to observe the UV radiation, with an accuracy of 5%, which meets the World Meteorology Organization's measurement standards. The extremum method was used to control the quality of the measured datasets. The other dataset contains daily cumulative UV radiation estimates that were calculated using an all-sky estimation model combined with a hybrid model. The reconstructed daily UV radiation data span from 1961 to 2014. The mean absolute bias error and root-mean-square error are smaller than 30% at most stations, and most of the mean bias error values are negative, which indicates underestimation of the UV radiation intensity. These datasets can improve our basic knowledge of the spatial and temporal variations in UV radiation. Additionally, these datasets can be used in studies of potential ozone formation and atmospheric oxidation, as well as simulations of ecological processes.

  18. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  19. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.

  20. 77 FR 15052 - Dataset Workshop-U.S. Billion Dollar Disasters Dataset (1980-2011): Assessing Dataset Strengths...

    Science.gov (United States)

    2012-03-14

    .... Pathways to overcome accuracy and bias issues will be an important focus. Participants will consider: Historical development and current state of the U.S. Billion Dollar Disasters Report; What additional data... dataset; Examination of unique uncertainties related to the cost of each of the major types of weather and...

  1. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  2. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  3. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  4. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  5. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    by the interpolation method used should be considered. The results indicate that conventional interpolation methods can be successfully applied to DWI datasets for mining anatomical details that are normally seen only at higher resolutions, which will aid in tractography and microstructural mapping of tissue...

  6. BMDExpress Data Viewer - a visualization tool to analyze BMDExpress datasets.

    Science.gov (United States)

    Kuo, Byron; Francina Webster, A; Thomas, Russell S; Yauk, Carole L

    2016-08-01

    Regulatory agencies increasingly apply benchmark dose (BMD) modeling to determine points of departure for risk assessment. BMDExpress applies BMD modeling to transcriptomic datasets to identify transcriptional BMDs. However, graphing and analytical capabilities within BMDExpress are limited, and the analysis of output files is challenging. We developed a web-based application, BMDExpress Data Viewer (http://apps.sciome.com:8082/BMDX_Viewer/), for visualizing and graphing BMDExpress output files. The application consists of "Summary Visualization" and "Dataset Exploratory" tools. Through analysis of transcriptomic datasets of the toxicants furan and 4,4'-methylenebis(N,N-dimethyl)benzenamine, we demonstrate that the "Summary Visualization Tools" can be used to examine distributions of gene and pathway BMD values, and to derive a potential point of departure value based on summary statistics. By applying filters on enrichment P-values and minimum number of significant genes, the "Functional Enrichment Analysis" tool enables the user to select biological processes or pathways that are selectively perturbed by chemical exposure and identify the related BMD. The "Multiple Dataset Comparison" tool enables comparison of gene and pathway BMD values across multiple experiments (e.g., across timepoints or tissues). The "BMDL-BMD Range Plotter" tool facilitates the observation of BMD trends across biological processes or pathways. Through our case studies, we demonstrate that BMDExpress Data Viewer is a useful tool to visualize, explore and analyze BMDExpress output files. Visualizing the data in this manner enables rapid assessment of data quality, model fit, doses of peak activity, most sensitive pathway perturbations and other metrics that will be useful in applying toxicogenomics in risk assessment. © 2015 Her Majesty the Queen in Right of Canada. Journal of Applied Toxicology published by John Wiley & Sons, Ltd. © 2015 Her Majesty the Queen in Right of Canada. Journal

  7. Integrated dataset of screening hits against multiple neglected disease pathogens.

    Directory of Open Access Journals (Sweden)

    Solomon Nwaka

    2011-12-01

    Full Text Available New chemical entities are desperately needed that overcome the limitations of existing drugs for neglected diseases. Screening a diverse library of 10,000 drug-like compounds against 7 neglected disease pathogens resulted in an integrated dataset of 744 hits. We discuss the prioritization of these hits for each pathogen and the strong correlation observed between compounds active against more than two pathogens and mammalian cell toxicity. Our work suggests that the efficiency of early drug discovery for neglected diseases can be enhanced through a collaborative, multi-pathogen approach.

  8. Omicseq: a web-based search engine for exploring omics datasets.

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  9. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Müller, Matthias

    2018-03-28

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  10. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    Science.gov (United States)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  11. Bayesian Method for Building Frequent Landsat-Like NDVI Datasets by Integrating MODIS and Landsat NDVI

    OpenAIRE

    Limin Liao; Jinling Song; Jindi Wang; Zhiqiang Xiao; Jian Wang

    2016-01-01

    Studies related to vegetation dynamics in heterogeneous landscapes often require Normalized Difference Vegetation Index (NDVI) datasets with both high spatial resolution and frequent coverage, which cannot be satisfied by a single sensor due to technical limitations. In this study, we propose a new method called NDVI-Bayesian Spatiotemporal Fusion Model (NDVI-BSFM) for accurately and effectively building frequent high spatial resolution Landsat-like NDVI datasets by integrating Moderate Resol...

  12. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  13. Scalable persistent identifier systems for dynamic datasets

    Science.gov (United States)

    Golodoniuc, P.; Cox, S. J. D.; Klump, J. F.

    2016-12-01

    Reliable and persistent identification of objects, whether tangible or not, is essential in information management. Many Internet-based systems have been developed to identify digital data objects, e.g., PURL, LSID, Handle, ARK. These were largely designed for identification of static digital objects. The amount of data made available online has grown exponentially over the last two decades and fine-grained identification of dynamically generated data objects within large datasets using conventional systems (e.g., PURL) has become impractical. We have compared capabilities of various technological solutions to enable resolvability of data objects in dynamic datasets, and developed a dataset-centric approach to resolution of identifiers. This is particularly important in Semantic Linked Data environments where dynamic frequently changing data is delivered live via web services, so registration of individual data objects to obtain identifiers is impractical. We use identifier patterns and pattern hierarchies for identification of data objects, which allows relationships between identifiers to be expressed, and also provides means for resolving a single identifier into multiple forms (i.e. views or representations of an object). The latter can be implemented through (a) HTTP content negotiation, or (b) use of URI querystring parameters. The pattern and hierarchy approach has been implemented in the Linked Data API supporting the United Nations Spatial Data Infrastructure (UNSDI) initiative and later in the implementation of geoscientific data delivery for the Capricorn Distal Footprints project using International Geo Sample Numbers (IGSN). This enables flexible resolution of multi-view persistent identifiers and provides a scalable solution for large heterogeneous datasets.

  14. Limited evidence to assess the impact of primary health care system or service level attributes on health outcomes of Indigenous people with type 2 diabetes: a systematic review.

    Science.gov (United States)

    Gibson, Odette R; Segal, Leonie

    2015-04-11

    To describe reported studies of the impact on HbA1C levels, diabetes-related hospitalisations, and other primary care health endpoints of initiatives aimed at improving the management of diabetes in Indigenous adult populations of Australia, Canada, New Zealand and the United States. Systematic literature review using data sources of MEDLINE, Embase, the Cochrane Library, CINHAL and PsycInfo from January 1985 to March 2012. Inclusion criteria were a clearly described primary care intervention, model of care or service, delivered to Indigenous adults with type 2 diabetes reporting a program impact on at least one quantitative diabetes-related health outcome, and where results were reported separately for Indigenous persons. Joanna Briggs Institute critical appraisal tools were used to assess the study quality. PRISMA guidelines were used for reporting. The search strategy retrieved 2714 articles. Of these, 13 studies met the review inclusion criteria. Three levels of primary care initiatives were identified: 1) addition of a single service component to the existing service, 2) system-level improvement processes to enhance the quality of diabetes care, 3) change in primary health funding to support better access to care. Initiatives included in the review were diverse and included comprehensive multi-disciplinary diabetes care, specific workforce development, systematic foot care and intensive individual hypertension management. Twelve studies reported HbA1C, of those one also reported hospitalisations and one reported the incidence of lower limb amputation. The methodological quality of the four comparable cohort and seven observational studies was good, and moderate for the two randomised control trials. The current literature provides an inadequate evidence base for making important policy and practice decisions in relation to primary care initiatives for Indigenous persons with type 2 diabetes. This reflects a very small number of published studies, the general

  15. A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

    Science.gov (United States)

    Contractor, S.; Donat, M.; Alexander, L. V.

    2017-12-01

    Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

  16. Brief report on a systematic review of youth violence prevention through media campaigns: Does the limited yield of strong evidence imply methodological challenges or absence of effect?

    Science.gov (United States)

    Cassidy, Tali; Bowman, Brett; McGrath, Chloe; Matzopoulos, Richard

    2016-10-01

    We present a brief report on a systematic review which identified, assessed and synthesized the existing evidence of the effectiveness of media campaigns in reducing youth violence. Search strategies made use of terms for youth, violence and a range of terms relating to the intervention. An array of academic databases and websites were searched. Although media campaigns to reduce violence are widespread, only six studies met the inclusion criteria. There is little strong evidence to support a direct link between media campaigns and a reduction in youth violence. Several studies measure proxies for violence such as empathy or opinions related to violence, but the link between these measures and violence perpetration is unclear. Nonetheless, some evidence suggests that a targeted and context-specific campaign, especially when combined with other measures, can reduce violence. However, such campaigns are less cost-effective to replicate over large populations than generalised campaigns. It is unclear whether the paucity of evidence represents a null effect or methodological challenges with evaluating media campaigns. Future studies need to be carefully planned to accommodate for methodological difficulties as well as to identify the specific elements of campaigns that work, especially in lower and middle income countries. Copyright © 2016 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  17. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap

    Science.gov (United States)

    Mann, Richard P.; Mushtaq, Faisal; White, Alan D.; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M.; Lodge, J. Peter A.; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public’s perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit “big data.” PMID:27990415

  18. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  19. Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions

    DEFF Research Database (Denmark)

    Kim, Yohan; Sidney, John; Buus, Søren

    2014-01-01

    are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last...... benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set. Results: We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due...... to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative...

  20. Development and validation of a national data registry for midwife-led births: the Midwives Alliance of North America Statistics Project 2.0 dataset.

    Science.gov (United States)

    Cheyney, Melissa; Bovbjerg, Marit; Everson, Courtney; Gordon, Wendy; Hannibal, Darcy; Vedam, Saraswathi

    2014-01-01

    In 2004, the Midwives Alliance of North America's (MANA's) Division of Research developed a Web-based data collection system to gather information on the practices and outcomes associated with midwife-led births in the United States. This system, called the MANA Statistics Project (MANA Stats), grew out of a widely acknowledged need for more reliable data on outcomes by intended place of birth. This article describes the history and development of the MANA Stats birth registry and provides an analysis of the 2.0 dataset's content, strengths, and limitations. Data collection and review procedures for the MANA Stats 2.0 dataset are described, along with methods for the assessment of data accuracy. We calculated descriptive statistics for client demographics and contributing midwife credentials, and assessed the quality of data by calculating point estimates, 95% confidence intervals, and kappa statistics for key outcomes on pre- and postreview samples of records. The MANA Stats 2.0 dataset (2004-2009) contains 24,848 courses of care, 20,893 of which are for women who planned a home or birth center birth at the onset of labor. The majority of these records were planned home births (81%). Births were attended primarily by certified professional midwives (73%), and clients were largely white (92%), married (87%), and college-educated (49%). Data quality analyses of 9932 records revealed no differences between pre- and postreviewed samples for 7 key benchmarking variables (kappa, 0.98-1.00). The MANA Stats 2.0 data were accurately entered by participants; any errors in this dataset are likely random and not systematic. The primary limitation of the 2.0 dataset is that the sample was captured through voluntary participation; thus, it may not accurately reflect population-based outcomes. The dataset's primary strength is that it will allow for the examination of research questions on normal physiologic birth and midwife-led birth outcomes by intended place of birth.

  1. Isokinetic strength assessment offers limited predictive validity for detecting risk of future hamstring strain in sport: a systematic review and meta-analysis.

    Science.gov (United States)

    Green, Brady; Bourne, Matthew N; Pizzari, Tania

    2018-03-01

    To examine the value of isokinetic strength assessment for predicting risk of hamstring strain injury, and to direct future research into hamstring strain injuries. Systematic review. Database searches for Medline, CINAHL, Embase, AMED, AUSPORT, SPORTDiscus, PEDro and Cochrane Library from inception to April 2017. Manual reference checks, ahead-of-press and citation tracking. Prospective studies evaluating isokinetic hamstrings, quadriceps and hip extensor strength testing as a risk factor for occurrence of hamstring muscle strain. Independent search result screening. Risk of bias assessment by independent reviewers using Quality in Prognosis Studies tool. Best evidence synthesis and meta-analyses of standardised mean difference (SMD). Twelve studies were included, capturing 508 hamstring strain injuries in 2912 athletes. Isokinetic knee flexor, knee extensor and hip extensor outputs were examined at angular velocities ranging 30-300°/s, concentric or eccentric, and relative (Nm/kg) or absolute (Nm) measures. Strength ratios ranged between 30°/s and 300°/s. Meta-analyses revealed a small, significant predictive effect for absolute (SMD=-0.16, P=0.04, 95% CI -0.31 to -0.01) and relative (SMD=-0.17, P=0.03, 95% CI -0.33 to -0.014) eccentric knee flexor strength (60°/s). No other testing speed or strength ratio showed statistical association. Best evidence synthesis found over half of all variables had moderate or strong evidence for no association with future hamstring injury. Despite an isolated finding for eccentric knee flexor strength at slow speeds, the role and application of isokinetic assessment for predicting hamstring strain risk should be reconsidered, particularly given costs and specialised training required. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  2. Contemporary adjuvant polymethyl methacrylate cementation optimally limits recurrence in primary giant cell tumor of bone patients compared to bone grafting: a systematic review and meta-analysis.

    Science.gov (United States)

    Zuo, Dongqing; Zheng, Longpo; Sun, Wei; Fu, Dong; Hua, Yingqi; Cai, Zhengdong

    2013-07-16

    Reports of recurrence following restructuring of primary giant cell tumor (GCT) defects using polymethyl methacrylate (PMMA) bone cementation or allogeneic bone graft with and without adjuvants for intralesional curettage vary widely. Systematic review and meta-analysis were conducted to investigate efficacy of PMMA bone cementation and allogeneic bone grafting following intralesional curettage for GCT. Medline, EMBASE, Google Scholar, and Cochrane databases were searched for studies reporting GCT of bone treatment with PMMA cementation and/or bone grafting with or without adjuvant therapy following intralesional curettage of primary GCTs. Pooled risk ratios and 95% confidence intervals (CIs) for local recurrence risks were calculated by fixed-effects methods. Of 1,690 relevant titles, 6 eligible studies (1,293 patients) spanning March 2008 to December 2011 were identified in published data. Treatment outcomes of PMMA-only (n = 374), bone graft-only (n = 436), PMMA with or without adjuvant (PMMA + adjuvant; n = 594), and bone graft filling with or without adjuvant (bone graft + adjuvant; n = 699) were compared. Bone graft-only patients exhibited higher recurrence rates than PMMA-treated patients (RR 2.09, 95% CI (1.64, 2.66), Overall effect: Z = 6.00; P <0.001), and bone graft + adjuvant patients exhibited higher recurrence rates than PMMA + adjuvant patients (RR 1.66, 95% CI (1.21, 2.28), Overall effect: Z = 3.15, P = 0.002). Local recurrence was minimal in PMMA cementation patients, suggesting that PMMA is preferable for routine clinical restructuring in eligible GCT patients. Relationships between tumor characteristics, other modern adjuvants, and recurrence require further exploration.

  3. Manual therapy for the management of pain and limited range of motion in subjects with signs and symptoms of temporomandibular disorder: a systematic review of randomised controlled trials.

    Science.gov (United States)

    Calixtre, L B; Moreira, R F C; Franchini, G H; Alburquerque-Sendín, F; Oliveira, A B

    2015-11-01

    There is a lack of knowledge about the effectiveness of manual therapy (MT) on subjects with temporomandibular disorders (TMD). The aim of this systematic review is to synthetise evidence regarding the isolated effect of MT in improving maximum mouth opening (MMO) and pain in subjects with signs and symptoms of TMD. MEDLINE(®) , Cochrane, Web of Science, SciELO and EMBASE(™) electronic databases were consulted, searching for randomised controlled trials applying MT for TMD compared to other intervention, no intervention or placebo. Two authors independently extracted data, PEDro scale was used to assess risk of bias, and GRADE (Grading of Recommendations Assessment, Development and Evaluation) was applied to synthetise overall quality of the body of evidence. Treatment effect size was calculated for pain, MMO and pressure pain threshold (PPT). Eight trials were included, seven of high methodological quality. Myofascial release and massage techniques applied on the masticatory muscles are more effective than control (low to moderate evidence) but as effective as toxin botulinum injections (moderate evidence). Upper cervical spine thrust manipulation or mobilisation techniques are more effective than control (low to high evidence), while thoracic manipulations are not. There is moderate-to-high evidence that MT techniques protocols are effective. The methodological heterogeneity across trials protocols frequently contributed to decrease quality of evidence. In conclusion, there is widely varying evidence that MT improves pain, MMO and PPT in subjects with TMD signs and symptoms, depending on the technique. Further studies should consider using standardised evaluations and better study designs to strengthen clinical relevance. © 2015 John Wiley & Sons Ltd.

  4. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  5. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  6. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Science.gov (United States)

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  7. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  8. Data quality control methodologies for large, non-conventional DC resistivity datasets

    Science.gov (United States)

    Mitchell, Michael A.; Oldenburg, Douglas W.

    2016-12-01

    With developments in instrumentation and computational resources, the collection of large, non-conventional DC resistivity datasets has become commonplace. While the increased data content of these large datasets can significantly improve the resolution of inverse models, these datasets also present challenges for standard data quality control (QC) methodologies. Standard QC methodologies for DC resistivity datasets typically rely on our ability to decompose the dataset into 2D lines and/or reciprocal measurements. Non-conventional electrode geometries and the cost of collecting a large number of reciprocal measurements can severely limit the applicability of standard DC resistivity QC methodologies. To address these limitations, we developed a more generalized data QC methodology which utilizes statistical analysis and classification tools. The merit of this methodology is illustrated using a field dataset collected in an underground potash mine and several synthetic examples. Results from these applications show that the methodology has the ability to identify and characterize highly noise-contaminated data from a number of different sources. The flexibility of the 4-stage methodology allows it be tailored to accommodate data from any type of DC resistivity survey and the use of statistical analysis and classification tools decreases the subjectivity of the process. Although this study focuses on the applicability of this methodology for DC resistivity data, it is potentially applicable to a variety of geophysical surveys.

  9. A systematic review of air pollution as a risk factor for cardiovascular disease in South Asia: limited evidence from India and Pakistan.

    Science.gov (United States)

    Yamamoto, S S; Phalkey, R; Malik, A A

    2014-03-01

    Cardiovascular diseases (CVD) are major contributors to mortality and morbidity in South Asia. Chronic exposure to air pollution is an important risk factor for cardiovascular diseases, although the majority of studies to date have been conducted in developed countries. Both indoor and outdoor air pollution are growing problems in developing countries in South Asia yet the impact on rising rates of CVD in these regions has largely been ignored. We aimed to assess the evidence available regarding air pollution effects on CVD and CVD risk factors in lower income countries in South Asia. A literature search was conducted in PubMed and Web of Science. Our inclusion criteria included peer-reviewed, original, empirical articles published in English between the years 1990 and 2012, conducted in the World Bank South Asia region (Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan and Sri Lanka). This resulted in 30 articles. Nine articles met our inclusion criteria and were assessed for this systematic review. Most of the studies were cross-sectional and examined measured particulate matter effects on CVD outcomes and indicators. We observed a bias as nearly all of the studies were from India. Hypertension and CVD deaths were positively associated with higher particulate matter levels. Biomarkers of oxidative stress such as increased levels of P-selection expressing platelets, depleted superoxide dismutase and reactive oxygen species generation as well as elevated levels of inflammatory-related C-reactive protein, interleukin-6 and interleukin-8 were also positively associated with biomass use or elevated particulate matter levels. An important outcome of this investigation was the evidence suggesting important air pollution effects regarding CVD risk in South Asia. However, too few studies have been conducted. There is as an urgent need for longer term investigations using robust measures of air pollution with different population groups that include a wider

  10. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  11. CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

    Directory of Open Access Journals (Sweden)

    K.-G. Karlsson

    2013-05-01

    Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.

  12. Hydrologic information server for benchmark precipitation dataset

    Science.gov (United States)

    McEnery, John A.; McKee, Paul W.; Shelton, Gregory P.; Ramsey, Ryan W.

    2013-01-01

    This paper will present the methodology and overall system development by which a benchmark dataset of precipitation information has been made available. Rainfall is the primary driver of the hydrologic cycle. High quality precipitation data is vital for hydrologic models, hydrometeorologic studies and climate analysis,and hydrologic time series observations are important to many water resources applications. Over the past two decades, with the advent of NEXRAD radar, science to measure and record rainfall has improved dramatically. However, much existing data has not been readily available for public access or transferable among the agricultural, engineering and scientific communities. This project takes advantage of the existing CUAHSI Hydrologic Information System ODM model and tools to bridge the gap between data storage and data access, providing an accepted standard interface for internet access to the largest time-series dataset of NEXRAD precipitation data ever assembled. This research effort has produced an operational data system to ingest, transform, load and then serve one of most important hydrologic variable sets.

  13. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  14. SDCLIREF - A sub-daily gridded reference dataset

    Science.gov (United States)

    Wood, Raul R.; Willkofer, Florian; Schmid, Franz-Josef; Trentini, Fabian; Komischke, Holger; Ludwig, Ralf

    2017-04-01

    Climate change is expected to impact the intensity and frequency of hydrometeorological extreme events. In order to adequately capture and analyze extreme rainfall events, in particular when assessing flood and flash flood situations, data is required at high spatial and sub-daily resolution which is often not available in sufficient density and over extended time periods. The ClimEx project (Climate Change and Hydrological Extreme Events) addresses the alteration of hydrological extreme events under climate change conditions. In order to differentiate between a clear climate change signal and the limits of natural variability, unique Single-Model Regional Climate Model Ensembles (CRCM5 driven by CanESM2, RCP8.5) were created for a European and North-American domain, each comprising 50 members of 150 years (1951-2100). In combination with the CORDEX-Database, this newly created ClimEx-Ensemble is a one-of-a-kind model dataset to analyze changes of sub-daily extreme events. For the purpose of bias-correcting the regional climate model ensembles as well as for the baseline calibration and validation of hydrological catchment models, a new sub-daily (3h) high-resolution (500m) gridded reference dataset (SDCLIREF) was created for a domain covering the Upper Danube and Main watersheds ( 100.000km2). As the sub-daily observations lack a continuous time series for the reference period 1980-2010, the need for a suitable method to bridge the gap of the discontinuous time series arouse. The Method of Fragments (Sharma and Srikanthan (2006); Westra et al. (2012)) was applied to transform daily observations to sub-daily rainfall events to extend the time series and densify the station network. Prior to applying the Method of Fragments and creating the gridded dataset using rigorous interpolation routines, data collection of observations, operated by several institutions in three countries (Germany, Austria, Switzerland), and the subsequent quality control of the observations

  15. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Science.gov (United States)

    Bose, Tungadri; Haque, Mohammed Monzoorul; Reddy, Cvsk; Mande, Sharmila S

    2015-01-01

    Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the flexibility of

  16. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

    Directory of Open Access Journals (Sweden)

    Tungadri Bose

    Full Text Available Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations.Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER.The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide end-users the

  17. Agile data management for curation of genomes to watershed datasets

    Science.gov (United States)

    Varadharajan, C.; Agarwal, D.; Faybishenko, B.; Versteeg, R.

    2015-12-01

    through the portal. The resulting product has an interface that is more intuitive and presents the highest priority datasets that are needed by the users. Our agile approach has enabled us to build a system that is keeping pace with the science needs while utilizing limited resources.

  18. Oxygen and carbon stable isotope systematics in Porites coral near its latitudinal limit: The coral response to low-thermal temperature stress

    Science.gov (United States)

    Omata, Tamano; Suzuki, Atsushi; Kawahata, Hodaka; Nojima, Satoshi; Minoshima, Kayo; Hata, Akiko

    2006-08-01

    We investigated oxygen and carbon isotopes ( δ18O and δ13C, respectively) along the growth axis of a Porites coral living near the northern limit of hermatypic corals, off Ushibuka, Japan, where winter temperatures fall below the minimum required by most hermatypic corals. The coral's seasonal δ18O cycle depended mainly on seawater temperature, and the slope of the regression line between δ18O and sea-surface temperature for this coral was within reported values. The coral's growth was inhibited in 1968, and at around this time the annual growth rate was reduced. This growth inhibition began in winter 1967/1968, a period of extraordinarily low seawater temperature. Moreover, the amplitude of the annual δ18O fluctuation was small from winter 1967/1968 to winter 1969/1970. Although δ18O and δ13C fluctuations were out of phase most years, they were in phase some years. The in-phase fluctuations of δ18O and δ13C indicate that kinetic isotope effects may have been more important than metabolic isotope effects during those years. Sclerochronologic records thus reveal the coral response to low-temperature stress.

  19. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  20. ASTER Global Emissivity Dataset, 100 meter, Binary V003

    Data.gov (United States)

    National Aeronautics and Space Administration — The AG100B.003 dataset was decommissioned as of December 14, 2016. Users are encouraged to use the ASTER Global Emissivity Dataset 100-meter (AG100.003 -...

  1. ASTER Global Emissivity Dataset, 1 kilometer, Binary V003

    Data.gov (United States)

    National Aeronautics and Space Administration — The AG1kmB.003 dataset was decommissioned as of December 14, 2016. Users are encouraged to use the ASTER Global Emissivity Dataset 1-kilometer (AG1km.003 -...

  2. A Method for Automating Geospatial Dataset Metadata

    Directory of Open Access Journals (Sweden)

    Robert I. Dunfey

    2009-11-01

    Full Text Available Metadata have long been recognised as crucial to geospatial asset management and discovery, and yet undertaking their creation remains an unenviable task often to be avoided. This paper proposes a practical approach designed to address such concerns, decomposing various data creation, management, update and documentation process steps that are subsequently leveraged to contribute towards metadata record completion. Using a customised utility embedded within a common GIS application, metadata elements are computationally derived from an imposed feature metadata standard, dataset geometry, an integrated storage protocol and pre-prepared content, and instantiated within a common geospatial discovery convention. Yielding 27 out of a 32 total metadata elements (or 15 out of 17 mandatory elements the approach demonstrably lessens the burden of metadata authorship. It also encourages improved geospatial asset management whilst outlining core requisites for developing a more open metadata strategy not bound to any particular application domain.

  3. Controlled Vocabulary Standards for Anthropological Datasets

    Directory of Open Access Journals (Sweden)

    Celia Emmelhainz

    2014-07-01

    Full Text Available This article seeks to outline the use of controlled vocabulary standards for qualitative datasets in cultural anthropology, which are increasingly held in researcher-accessible government repositories and online digital libraries. As a humanistic science that can address almost any aspect of life with meaning to humans, cultural anthropology has proven difficult for librarians and archivists to effectively organize. Yet as anthropology moves onto the web, the challenge of organizing and curating information within the field only grows. In considering the subject classification of digital information in anthropology, I ask how we might best use controlled vocabularies for indexing digital anthropological data. After a brief discussion of likely concerns, I outline thesauri which may potentially be used for vocabulary control in metadata fields for language, location, culture, researcher, and subject. The article concludes with recommendations for those existing thesauri most suitable to provide a controlled vocabulary for describing digital objects in the anthropological world.

  4. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available Original Path \\20060731_ifs_g_contd_2 Waveform Bandwidth 83.333 MHz Processor Version FMSCP Ver 01.22 Waveform File SC_FAW2_45m.txt Environment Value Geometry Value Processing Value Inst. Wind 16.5 kts, 149 deg. N Grazing Angle N/A Odd Gates Offset...-011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800...

  5. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  6. A Discretized Method for Deriving Vortex Impulse from Volumetric Datasets

    Science.gov (United States)

    Buckman, Noam; Mendelson, Leah; Techet, Alexandra

    2015-11-01

    Many biological and mechanical systems transfer momentum through a fluid by creating vortical structures. To study this mechanism, we derive a method for extracting impulse and its time derivative from flow fields observed in experiments and simulations. We begin by discretizing a thin-cored vortex filament, and extend the model to account for finite vortex core thickness and asymmetric distributions of vorticity. By solely using velocity fields to extract vortex cores and calculate circulation, this method is applicable to 3D PIV datasets, even with low spatial resolution flow fields and measurement noise. To assess the performance of this analysis method, we simulate vortex rings and arbitrary vortex structures using OpenFOAM computational fluid dynamics software and analyze the wake momentum using this model in order to validate this method. We further examine a piston-vortex experiment, using 3D synthetic particle image velocimetry (SAPIV) to capture velocity fields. Strengths, limitations, and improvements to the framework are discussed.

  7. SAGE Research Methods Datasets: A Data Analysis Educational Tool.

    Science.gov (United States)

    Vardell, Emily

    2016-01-01

    SAGE Research Methods Datasets (SRMD) is an educational tool designed to offer users the opportunity to obtain hands-on experience with data analysis. Users can search for and browse authentic datasets by method, discipline, and data type. Each of the datasets are supplemented with educational material on the research method and clear guidelines for how to approach data analysis.

  8. BDML Datasets - SSBD | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us SSBD BDML Datasets Data detail Data name BDML Datasets DOI 10.18908/lsdba.nbdc01349-001 Desc...This Database Database Description Download License Update History of This Database Site Policy | Contact Us BDML Datasets - SSBD | LSDB Archive ...

  9. Systematic optimization of multiplex zymography protocol to detect active cathepsins K, L, S, and V in healthy and diseased tissue: compromise among limits of detection, reduced time, and resources.

    Science.gov (United States)

    Dumas, Jerald E; Platt, Manu O

    2013-07-01

    Cysteine cathepsins are a family of proteases identified in cancer, atherosclerosis, osteoporosis, arthritis, and a number of other diseases. As this number continues to rise, so does the need for low cost, broad use quantitative assays to detect their activity and can be translated to the clinic in the hospital or in low resource settings. Multiplex cathepsin zymography is one such assay that detects subnanomolar levels of active cathepsins K, L, S, and V in cell or tissue preparations observed as clear bands of proteolytic activity after gelatin substrate SDS-PAGE with conditions optimal for cathepsin renaturing and activity. Densitometric analysis of the zymogram provides quantitative information from this low cost assay. After systematic modifications to optimize cathepsin zymography, we describe reduced electrophoresis time from 2 h to 10 min, incubation assay time from overnight to 4 h, and reduced minimal tissue protein necessary while maintaining sensitive detection limits; an evaluation of the pros and cons of each modification is also included. We further describe image acquisition by Smartphone camera, export to Matlab, and densitometric analysis code to quantify and report cathepsin activity, adding portability and replacing large scale, darkbox imaging equipment that could be cost prohibitive in limited resource settings.

  10. Systematic optimization of multiplex zymography protocol to detect active cathepsins K, L, S, and V in healthy and diseased tissue: compromise between limits of detection, reduced time, and resources

    Science.gov (United States)

    Dumas, Jerald E.; Platt, Manu O.

    2013-01-01

    Cysteine cathepsins are a family of proteases identified in cancer, atherosclerosis, osteoporosis, arthritis and a number of other diseases. As this number continues to rise, so does the need for low cost, broad use quantitative assays to detect their activity and can be translated to the clinic in the hospital or in low resource settings. Multiplex cathepsin zymography is one such assay that detects subnanomolar levels of active cathepsins K, L, S, and V in cell or tissue preparations observed as cleared bands of proteolytic activity after gelatin substrate SDS-PAGE with conditions optimal for cathepsin renaturing and activity. Densitometric analysis of the zymogram provides quantitative information from this low cost assay. After systematic modifications to optimize cathepsin zymography, we describe reduced electrophoresis time from 2 hours to 10 minutes, incubation assay time from overnight to 4 hours, and reduced minimal tissue protein necessary while maintaining sensitive detection limits; an evaluation of the pros and cons of each modification is also included. We further describe image acquisition by smartphone camera, export to Matlab, and densitometric analysis code to quantify and report cathepsin activity, adding portability and replacing large scale, darkbox imaging equipment that could be cost prohibitive in limited resource settings. PMID:23532386

  11. Utilizing Multiple Datasets for Snow Cover Mapping

    Science.gov (United States)

    Tait, Andrew B.; Hall, Dorothy K.; Foster, James L.; Armstrong, Richard L.

    1999-01-01

    Snow-cover maps generated from surface data are based on direct measurements, however they are prone to interpolation errors where climate stations are sparsely distributed. Snow cover is clearly discernable using satellite-attained optical data because of the high albedo of snow, yet the surface is often obscured by cloud cover. Passive microwave (PM) data is unaffected by clouds, however, the snow-cover signature is significantly affected by melting snow and the microwaves may be transparent to thin snow (less than 3cm). Both optical and microwave sensors have problems discerning snow beneath forest canopies. This paper describes a method that combines ground and satellite data to produce a Multiple-Dataset Snow-Cover Product (MDSCP). Comparisons with current snow-cover products show that the MDSCP draws together the advantages of each of its component products while minimizing their potential errors. Improved estimates of the snow-covered area are derived through the addition of two snow-cover classes ("thin or patchy" and "high elevation" snow cover) and from the analysis of the climate station data within each class. The compatibility of this method for use with Moderate Resolution Imaging Spectroradiometer (MODIS) data, which will be available in 2000, is also discussed. With the assimilation of these data, the resolution of the MDSCP would be improved both spatially and temporally and the analysis would become completely automated.

  12. Analyzing large datasets with bootstrap penalization.

    Science.gov (United States)

    Fang, Kuangnan; Ma, Shuangge

    2017-03-01

    Data with a large p (number of covariates) and/or a large n (sample size) are now commonly encountered. For many problems, regularization especially penalization is adopted for estimation and variable selection. The straightforward application of penalization to large datasets demands a "big computer" with high computational power. To improve computational feasibility, we develop bootstrap penalization, which dissects a big penalized estimation into a set of small ones, which can be executed in a highly parallel manner and each only demands a "small computer". The proposed approach takes different strategies for data with different characteristics. For data with a large p but a small to moderate n, covariates are first clustered into relatively homogeneous blocks. The proposed approach consists of two sequential steps. In each step and for each bootstrap sample, we select blocks of covariates and run penalization. The results from multiple bootstrap samples are pooled to generate the final estimate. For data with a large n but a small to moderate p, we bootstrap a small number of subjects, apply penalized estimation, and then conduct a weighted average over multiple bootstrap samples. For data with a large p and a large n, the natural marriage of the previous two methods is applied. Numerical studies, including simulations and data analysis, show that the proposed approach has computational and numerical advantages over the straightforward application of penalization. An R package has been developed to implement the proposed methods. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  13. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  14. The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-02-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.

  15. The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-01-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585

  16. Refugees welcome? A dataset on anti-refugee violence in Germany

    Directory of Open Access Journals (Sweden)

    David Benček

    2016-11-01

    Full Text Available The recent rise of xenophobic attacks against refugees in Germany has sparked both political and scholarly debates on the drivers, dynamics, and consequences of right-wing violence. Thus far, a lack of systematic data collection and data processing has inhibited quantitative analysis to help explain this current social phenomenon. This paper presents a georeferenced event dataset on anti-refugee violence and social unrest in Germany in 2014 and 2015 that is based on information collected by two civil society organizations, the Amadeu Antonio Foundation and PRO ASYL, who publicize their data in an online chronicle. We webscraped this information to create a scientifically usable dataset that includes information on 1 645 events of four different types of right-wing violence and social unrest: xenophobic demonstrations, assault, arson attacks, and miscellaneous attacks against refugee housing (such as swastika graffiti. After discussing how the dataset was constructed, we offer a descriptive analysis of patterns of right-wing violence and unrest in Germany in 2014 and 2015. This article concludes by outlining preliminary ideas on how the dataset can be used in future research of various disciplines in the social sciences.

  17. A Manual Segmentation Tool for Three-Dimensional Neuron Datasets

    Directory of Open Access Journals (Sweden)

    Chiara Magliaro

    2017-05-01

    Full Text Available To date, automated or semi-automated software and algorithms for segmentation of neurons from three-dimensional imaging datasets have had limited success. The gold standard for neural segmentation is considered to be the manual isolation performed by an expert. To facilitate the manual isolation of complex objects from image stacks, such as neurons in their native arrangement within the brain, a new Manual Segmentation Tool (ManSegTool has been developed. ManSegTool allows user to load an image stack, scroll down the images and to manually draw the structures of interest stack-by-stack. Users can eliminate unwanted regions or split structures (i.e., branches from different neurons that are too close each other, but, to the experienced eye, clearly belong to a unique cell, to view the object in 3D and save the results obtained. The tool can be used for testing the performance of a single-neuron segmentation algorithm or to extract complex objects, where the available automated methods still fail. Here we describe the software's main features and then show an example of how ManSegTool can be used to segment neuron images acquired using a confocal microscope. In particular, expert neuroscientists were asked to segment different neurons from which morphometric variables were subsequently extracted as a benchmark for precision. In addition, a literature-defined index for evaluating the goodness of segmentation was used as a benchmark for accuracy. Neocortical layer axons from a DIADEM challenge dataset were also segmented with ManSegTool and compared with the manual “gold-standard” generated for the competition.

  18. Integrating diverse datasets improves developmental enhancer prediction.

    Directory of Open Access Journals (Sweden)

    Genevieve D Erwin

    2014-06-01

    Full Text Available Gene-regulatory enhancers have been identified using various approaches, including evolutionary conservation, regulatory protein binding, chromatin modifications, and DNA sequence motifs. To integrate these different approaches, we developed EnhancerFinder, a two-step method for distinguishing developmental enhancers from the genomic background and then predicting their tissue specificity. EnhancerFinder uses a multiple kernel learning approach to integrate DNA sequence motifs, evolutionary patterns, and diverse functional genomics datasets from a variety of cell types. In contrast with prediction approaches that define enhancers based on histone marks or p300 sites from a single cell line, we trained EnhancerFinder on hundreds of experimentally verified human developmental enhancers from the VISTA Enhancer Browser. We comprehensively evaluated EnhancerFinder using cross validation and found that our integrative method improves the identification of enhancers over approaches that consider a single type of data, such as sequence motifs, evolutionary conservation, or the binding of enhancer-associated proteins. We find that VISTA enhancers active in embryonic heart are easier to identify than enhancers active in several other embryonic tissues, likely due to their uniquely high GC content. We applied EnhancerFinder to the entire human genome and predicted 84,301 developmental enhancers and their tissue specificity. These predictions provide specific functional annotations for large amounts of human non-coding DNA, and are significantly enriched near genes with annotated roles in their predicted tissues and lead SNPs from genome-wide association studies. We demonstrate the utility of EnhancerFinder predictions through in vivo validation of novel embryonic gene regulatory enhancers from three developmental transcription factor loci. Our genome-wide developmental enhancer predictions are freely available as a UCSC Genome Browser track, which we hope

  19. The need for a national LIDAR dataset

    Science.gov (United States)

    Stoker, Jason M.; Harding, David; Parrish, Jay

    2008-01-01

    On May 21st and 22nd 2008, the U.S. Geological Survey (USGS), the National Aeronautics and Space Administration (NASA), and the Association of American State Geologists (AASG) hosted the Second National Light Detection and Ranging (Lidar) Initiative Strategy Meeting at USGS Headquarters in Reston, Virginia. The USGS is taking the lead in cooperation with many partners to design and implement a future high-resolution National Lidar Dataset. Initial work is focused on determining viability, developing requirements and specifi cations, establishing what types of information contained in a lidar signal are most important, and identifying key stakeholders and their respective roles. In February 2007, USGS hosted the fi rst National Lidar Initiative Strategy Meeting at USGS Headquarters in Virginia. The presentations and a published summary report from the fi rst meeting can be found on the Center for Lidar Information Coordination and Knowledge (CLICK) Website: http://lidar.cr.usgs.gov. The fi rst meeting demonstrated the public need for consistent lidar data at the national scale. The goals of the second meeting were to further expand on the ideas and information developed in the fi rst meeting, to bring more stakeholders together, to both refi ne and expand on the requirements and capabilities needed, and to discuss an organizational and funding approach for an initiative of this magnitude. The approximately 200 participants represented Federal, State, local, commercial and academic interests. The second meeting included a public solicitation for presentations and posters to better democratize the workshop. All of the oral presentation abstracts that were submitted were accepted, and the 25 poster submissions augmented and expanded upon the oral presentations. The presentations from this second meeting, including audio, can be found on CLICK at http://lidar.cr.usgs.gov/national_lidar_2008.php. Based on the presentations and the discussion sessions, the following

  20. REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

    Science.gov (United States)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history

  1. Evaluation of Methods for Comparison of Spatiotemporal and Time Series Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Getman, Dan [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Bush, Brian [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Inman, Danny [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Elmore, Ryan [National Renewable Energy Laboratory (NREL), Golden, CO (United States)

    2015-04-07

    Data used by the National Renewable Energy Laboratory (NREL) in energy analysis are often produced by industry and licensed or purchased for analysis. While this practice provides needed flexibility in selecting data for analysis it presents challenges in understanding the differences among multiple, ostensibly similar, datasets. As options for source data become more varied, it is important to be able to articulate why certain datasets were chosen and to ensure those include the data that best meet the boundaries and/or limitations of a particular analysis. This report represents the first of three phases of research intended to develop methods to quantitatively assess and compare both input datasets and the results of analyses performed at NREL. This capability is critical to identifying tipping points in the costs or benefits of achieving high spatial and temporal resolution of input data.

  2. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  3. The WiLI benchmark dataset for written language identification

    OpenAIRE

    Thoma, Martin

    2018-01-01

    This paper describes the WiLI-2018 benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from Wikipedia. It contains 1000 paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in one dominant language, it has to be decided which language it is.

  4. Dataset definition for CMS operations and physics analyses

    OpenAIRE

    Franzoni, Giovanni

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to explo...

  5. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  6. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  7. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  8. ASTER Global Emissivity Dataset, 100 meter, HDF5 V003

    Data.gov (United States)

    National Aeronautics and Space Administration — Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Emissivity Dataset (GED) land surface temperature and emissivity (LST&E) data...

  9. ASTER Global Emissivity Dataset, 1 kilometer, HDF5 V003

    Data.gov (United States)

    National Aeronautics and Space Administration — Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Global Emissivity Dataset (GED) land surface temperature and emissivity (LST&E) data...

  10. A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

    International Nuclear Information System (INIS)

    Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

    2014-01-01

    With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative

  11. Self-reported juvenile firesetting: Results from two national survey datasets

    Directory of Open Access Journals (Sweden)

    Carrie Howell Bowling

    2013-12-01

    Full Text Available The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA were used. The Factor Analysis Dataset (N = 975 was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school are even more likely to set fires. Results also indicated that attention problems are predictive of self-reported firesetting. The National Survey Dataset (N =1,158 was analyzed to determine the prevalence of firesetting in a normative sample and also examine whether these children reported higher levels of internalizing and externalizing behavior problems. It was found that 4.5% of adolescents in the generalized sample reported firesetting. Firesetters reported more internalizing, externalizing and total problems than their non-firesetting peers. In this normative sample, firesetters were found to have lower academic performance and more attention problems. Limitations include the low overall number of firesetters in each dataset (Factor Analysis n = 123 and National Survey n = 53 and the inclusion of children who had been referred for services in the Factor Analysis Dataset.

  12. An innovative privacy preserving technique for incremental datasets on cloud computing.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

    2016-08-01

    Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Self-Reported Juvenile Firesetting: Results from Two National Survey Datasets

    Science.gov (United States)

    Howell Bowling, Carrie; Merrick, Joav; Omar, Hatim A.

    2013-01-01

    The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school are even more likely to set fires. Results also indicated that attention problems are predictive of self-reported firesetting. The National Survey Dataset (N = 1158) was analyzed to determine the prevalence of firesetting in a normative sample and also examine whether these children reported higher levels of internalizing and externalizing behavior problems. It was found that 4.5% of adolescents in the generalized sample reported firesetting. Firesetters reported more internalizing, externalizing, and total problems than their non-firesetting peers. In this normative sample, firesetters were found to have lower academic performance and more attention problems. Limitations include the low overall number of firesetters in each dataset (Factor Analysis n = 123 and National Survey n = 53) and the inclusion of children who had been referred for services in the Factor Analysis Dataset. PMID:24350229

  14. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  15. Automatic identification of variables in epidemiological datasets using logic regression

    NARCIS (Netherlands)

    M.W. Lorenz (Matthias W.); Abdi, N.A. (Negin Ashtiani); F. Scheckenbach (Frank); A. Pflug (Anja); A. Bulbul (Alpaslan); A.L. Catapano (Alberico); S. Agewall (Stefan); M. Ezhov (Marat); M.L. Bots (Michiel); S. Kiechl (Stefan); Orth, A. (Andreas); G.D. Norata (Giuseppe); J.P. Empana (Jean Philippe); Lin, H.-J. (Hung-Ju); S. McLachlan (Stela); L. Bokemark (Lena); K. Ronkainen (Kimmo); Amato, M. (Mauro); U. Schminke (Ulf); Srinivasan, S.R. (Sathanur R.); L. Lind (Lars); Kato, A. (Akihiko); Dimitriadis, C. (Chrystosomos); Przewlocki, T. (Tadeusz); Okazaki, S. (Shuhei); C.D. Stehouwer (Coen); Lazarevic, T. (Tatjana); J. Willeit (Johann); Yanez, D.N. (David N.); H. Steinmetz (helmuth); Sander, D. (Dirk); H. Poppert (Holger); M. Desvarieux (Moise); M.A. Ikram (Arfan); Bevc, S. (Sebastjan); Staub, D. (Daniel); Sirtori, C.R. (Cesare R.); B. Iglseder (Bernhard); G. Engström; G.L. Tripepi (Giovanni); Beloqui, O. (Oscar); Lee, M.-S. (Moo-Sik); A. Friera (Alfonsa); W. Xie (Wuxiang); L. Grigore (Liliana); M. Plichart (Matthieu); Su, T.-C. (Ta-Chen); C.M. Robertson (Christine M); C. Schmidt (Caroline); Tuomainen, T.-P. (Tomi-Pekka); F. Veglia (Fabrizio); H. Völzke (Henry); M.G.A.A.M. Nijpels (Giel); Jovanovic, A. (Aleksandar); J. Willeit (Johann); Sacco, R.L. (Ralph L.); O.H. Franco (Oscar); Hojs, R. (Radovan); Uthoff, H. (Heiko); B. Hedblad (Bo); Park, H.W. (Hyun Woong); Suarez, C. (Carmen); Zhao, D. (Dong); Catapano, A. (Alberico); P. Ducimetiere (P.); Chien, K.-L. (Kuo-Liong); Price, J.F. (Jackie F.); G. Bergstrom (Goran); J. Kauhanen (Jussi); E. Tremoli (Elena); M. Dörr (Marcus); Berenson, G. (Gerald); A. Papagianni (Aikaterini); Kablak-Ziembicka, A. (Anna); Kitagawa, K. (Kazuo); J.M. Dekker (Jacqueline); Stolic, R. (Radojica); J.F. Polak (Joseph F.); M. Sitzer (Matthias); H. Bickel (Horst); T. Rundek (Tatjana); A. Hofman (Albert); Ekart, R. (Robert); Frauchiger, B. (Beat); Castelnuovo, S. (Samuela); M. Rosvall (Maria); C. Zoccali (Carmine); Landecho, M.F. (Manuel F.); Bae, J.-H. (Jang-Ho); Gabriel, R. (Rafael); Liu, J. (Jing); D. Baldassarre (Damiano); M. Kavousi (Maryam)

    2017-01-01

    textabstractBackground: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or

  16. Automatic identification of variables in epidemiological datasets using logic regression

    NARCIS (Netherlands)

    Lorenz, Matthias W.; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L.; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L.; Kiechl, Stefan; Orth, Andreas; Norata, Giuseppe D.; Empana, Jean Philippe; Lin, Hung Ju; McLachlan, Stela; Bokemark, Lena; Ronkainen, Kimmo; Amato, Mauro; Schminke, Ulf; Srinivasan, Sathanur R.; Lind, Lars; Kato, Akihiko; Dimitriadis, Chrystosomos; Przewlocki, Tadeusz; Okazaki, Shuhei; Stehouwer, C. D.A.; Lazarevic, Tatjana; Willeit, Peter; Yanez, David N.; Steinmetz, Helmuth; Sander, Dirk; Poppert, Holger; Desvarieux, Moise; Ikram, M. Arfan; Bevc, Sebastjan; Staub, Daniel; Sirtori, Cesare R.; Iglseder, Bernhard; Engström, Gunnar; Tripepi, Giovanni; Beloqui, Oscar; Lee, Moo Sik; Friera, Alfonsa; Xie, Wuxiang; Grigore, Liliana; Plichart, Matthieu; Su, Ta Chen; Robertson, Christine; Schmidt, Caroline; Tuomainen, Tomi Pekka; Veglia, Fabrizio; Völzke, Henry; Nijpels, Giel; Jovanovic, Aleksandar; Willeit, Johann; Sacco, Ralph L.; Franco, Oscar H.; Hojs, Radovan; Uthoff, Heiko; Hedblad, Bo; Park, Hyun Woong; Suarez, Carmen; Zhao, Dong; Catapano, Alberico; Ducimetiere, Pierre; Chien, Kuo Liong; Price, Jackie F.; Bergström, Göran; Kauhanen, Jussi; Tremoli, Elena; Dörr, Marcus; Berenson, Gerald; Papagianni, Aikaterini; Kablak-Ziembicka, Anna; Kitagawa, Kazuo; Dekker, Jaqueline M.; Stolic, Radojica; Polak, Joseph F.; Sitzer, Matthias; Bickel, Horst; Rundek, Tatjana; Hofman, Albert; Ekart, Robert; Frauchiger, Beat; Castelnuovo, Samuela; Rosvall, Maria; Zoccali, Carmine; Landecho, Manuel F.; Bae, Jang Ho; Gabriel, Rafael; Liu, Jing; Baldassarre, Damiano; Kavousi, Maryam

    2017-01-01

    Background: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated

  17. Automatic identification of variables in epidemiological datasets using logic regression

    NARCIS (Netherlands)

    M.W. Lorenz (Matthias W.); N.A. Abdi (Negin Ashtiani); F. Scheckenbach (Frank); A. Pflug (Anja); A. Bulbul (Alpaslan); A.L. Catapano (Alberico L.); S. Agewall (Stefan); M. Ezhov (Marat); M.L. Bots (Michiel); S. Kiechl (Stefan); A. Orth (Andreas)

    2017-01-01

    markdownabstractBackground: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or

  18. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  19. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...

  20. Global Drought Assessment using a Multi-Model Dataset

    NARCIS (Netherlands)

    Lanen, van H.A.J.; Huijgevoort, van M.H.J.; Corzo Perez, G.; Wanders, N.; Hazenberg, P.; Loon, van A.F.; Estifanos, S.; Melsen, L.A.

    2011-01-01

    Large-scale models are often applied to study past drought (forced with global reanalysis datasets) and to assess future drought (using downscaled, bias-corrected forcing from climate models). The EU project WATer and global CHange (WATCH) provides a 0.5o degree global dataset of meteorological

  1. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  2. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  3. Interface between astrophysical datasets and distributed database management systems (DAVID)

    Science.gov (United States)

    Iyengar, S. S.

    1988-01-01

    This is a status report on the progress of the DAVID (Distributed Access View Integrated Database Management System) project being carried out at Louisiana State University, Baton Rouge, Louisiana. The objective is to implement an interface between Astrophysical datasets and DAVID. Discussed are design details and implementation specifics between DAVID and astrophysical datasets.

  4. Primary Datasets for Case Studies of River-Water Quality

    Science.gov (United States)

    Goulder, Raymond

    2008-01-01

    Level 6 (final-year BSc) students undertook case studies on between-site and temporal variation in river-water quality. They used professionally-collected datasets supplied by the Environment Agency. The exercise gave students the experience of working with large, real-world datasets and led to their understanding how the quality of river water is…

  5. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to

  6. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  7. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  8. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    Full Text Available Abstract Background The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR of P. falciparum malaria as an example. Methods The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution, LandScan (~1 km, UNEP Global Population Databases (~5 km, and GPW3 (~5 km. More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets. Results The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was

  9. Synthetic Dataset To Benchmark Global Tomographic Methods

    Science.gov (United States)

    Qin, Yilong; Capdeville, Yann; Maupin, Valerie; Montagner, Jean-Paul

    2006-11-01

    A new set of global synthetic seismograms calculated in a three-dimensional (3-D), heterogeneous, anisotropic, anelastic model of the Earth using the spectral element method has been released by the European network SPICE (Seismic Wave Propagation and Imaging in Complex Media: a European Network). The set consists of 7424 three-component records with a minimum period of 32 seconds, a sampling rate of one second, and a duration of 10,500 seconds. The aim of this synthetic data set is to conduct a blind test of existing global tomographic methods based on long-period data, in order to test how current imaging techniques are limited by approximations in theory and by the inadequacy of data quality and coverage.

  10. HLA diversity in the 1000 genomes dataset.

    Directory of Open Access Journals (Sweden)

    Pierre-Antoine Gourraud

    Full Text Available The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC, only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.

  11. The effect of oral appliances that advanced the mandible forward and limited mouth opening in patients with obstructive sleep apnea: a systematic review and meta-analysis of randomised controlled trials.

    Science.gov (United States)

    Okuno, K; Sato, K; Arisaka, T; Hosohama, K; Gotoh, M; Taga, H; Sasao, Y; Hamada, S

    2014-07-01

    Oral appliances (OAs) have demonstrated efficacy in treating obstructive sleep apnea (OSA), but many different OA devices are available. The Japanese Academy of Dental Sleep Medicine supported the use of OAs that advanced the mandible forward and limited mouth opening and suggested an evaluation of their effects in comparison with untreated or CPAP. A systematic search was undertaken in 16 April 2012. The outcome measures of interest were as follows: Apnea Hypopnea Index (AHI), lowest SpO2 , arousal index, Epworth Sleepiness Scale (ESS), the SF-36 Health Survey. We performed this meta-analysis using the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) system. Five studies remained eligible after applying the exclusion criteria. Comparing OA and control appliance, OA significantly reduced the weighted mean difference (WMD) in both AHI and the arousal index (favouring OA, AHI: -7.05 events h(-1) ; 95% CI, -12.07 to -2.03; P = 0.006, arousal index: -6.95 events h(-1) ; 95% CI, -11.75 to -2.15; P = 0.005). OAs were significantly less effective at reducing the WMD in AHI and improving lowest SpO2 and SF-36 than CPAP, (favouring OA, AHI: 6.11 events h(-1) ; 95% CI, 3.24 to 8.98; P = 0.0001, lowest SpO2 : -2.52%; 95% CI, -4.81 to -0.23; P = 0.03, SF-36: -1.80; 95% CI, -3.17 to -042; P = 0.01). Apnea Hypopnea Index and arousal index were significantly improved by OA relative to the untreated disease. Apnea Hypopnea Index, lowest SpO2 and SF-36 were significantly better with CPAP than with OA. The results of this study suggested that OAs improve OSA compared with untreated. CPAP appears to be more effective in improving OSA than OAs. © 2014 John Wiley & Sons Ltd.

  12. Systematic review and meta-analysis of randomized controlled trials for the management of limited vertical height in the posterior region: short implants (5 to 8 mm) vs longer implants (> 8 mm) in vertically augmented sites.

    Science.gov (United States)

    Lee, Sung-Ah; Lee, Chun-Teh; Fu, Martin M; Elmisalati, Waiel; Chuang, Sung-Kiang

    2014-01-01

    The aim of this study was to undertake a systematic review with meta-analysis on randomized controlled trials (RCTs) to compare the rates of survival, success, and complications of short implants to those of longer implants in the posterior regions. Electronic literature searches were conducted through the MEDLINE (PubMed) and EMBASE databases to locate all relevant articles published between January 1, 1990, and April 30, 2013. Eligible studies were selected based on inclusion criteria, and quality assessments were conducted. After data extraction, meta-analyses were performed. In total, 539 dental implants (265 short implants [length 5 to 8 mm] and 274 control implants [length > 8 mm]) from four RCTs were included. The fixed prostheses of multiple short and control implants were all splinted. The mean follow-up period was 2.1 years. The 1-year and 5-year cumulative survival rates (CSR) were 98.7% (95% confidence interval [CI], 97.8% to 99.5%) and 93.6% (95% CI, 89.8% to 97.5%), respectively, for the short implant group and 98.0% (95% CI, 96.9% to 99.1%) and 90.3% (95% CI, 85.2% to 95.4%), respectively, for the control implant group. The CSRs of the two groups did not demonstrate a statistically significant difference. There were also no statistically significant differences in success rates, failure rates, or complications between the two groups. Placement of short dental implants could be a predictable alternative to longer implants to reduce surgical complications and patient morbidity in situations where vertical augmentation procedures are needed. However, only four studies with potential risk of bias were selected in this meta-analysis. Within the limitations of this meta-analysis, these results should be confirmed with robust methodology and RCTs with longer follow-up duration.

  13. A simplified Excel tool for implementation of RUSLE2 in vineyards for stakeholders with limited dataset

    Science.gov (United States)

    Gomez, Jose Alfonso; Biddoccu, Marcella; Guzmán, Gema; Cavallo, Eugenio

    2016-04-01

    Analysis with simulation models is in many situations the only way to evaluate the impact of changes in soil management on soil erosion risk, and the Revised Universal Soil Loss Equation RUSLE (Renard et al. 1997, Dabney et al. 2012) remains as the most widely used. Even with their relative simplicity compared to other, more process based, erosion models proper RUSLE calibration for a given situation outside the modelling community can be challenging, especially in situations outside of those widely covered in the USA. An approach pursued by Gómez et al. (2003) to overcome this problems for calibrating RUSLE, specially the cover-management factor, C, was to build a summary model using the equations defined by the RUSLE manual (Renard et al. 1997) but considering that the basic information required to calibrate the subfactors, such as soil surface roughness and ground cover, soil moisture, … were calculated (or taken from available sources) elsewhere and added to the summary model instead of calculated by the RUSLE software. This strategy simplified the calibration process as well as the understanding and interpretation of the RUSLE parameters and model behavior by on-expert users for its application in olive orchards under a broad range of management conditions. Gómez et al. (2003) build this summary model in Excel and demonstrated the ability to calibrate RUSLE for a broad range of management conditions. Later on several studies (Vanwalleghem et al., 2011, Marin, 2013) demonstrated how this summary model successfully predicted soil losses at hillslope scale close to those determined experimentally. Vines are one of the most extended tree crops covering a wide range of environmental and management conditions, and conceptually present in terms of soil conservation several analogies with olives especially in relation to soil management (Gomez et al., 2011). In vine growing areas, besides topographic and rainfall characteristics, the soil management practices adopted in vineyards could favor erosion. Cultivation with rows running up-and-down the slope on sloping vineyards, maintenance of bare soil, compaction due to high traffic of machinery are some of the vineyard's management practices that expose soil to degradation, favoring runoff and soil erosion processes. On the other side, the adoption of grass cover in vineyards has a fundamental role in soil protection against erosion, in case of high rainfall intensity and erosivity. This communication presents a preliminary version of a summary model to calibrate RUSLE for vines under different soil management options following an approach analogous to that used by Gómez et al. (2003) for olive orchards in a simplified situation of an homogeneous hillslope, including the latest RUSLE conceptual updates (RUSLE2, Dabney et al., 2012). It also presents preliminary results for different values of the C factor under different soil management and environmental conditions, as well as its impact on predicted soil losses in the long term in vineyards located in Southern Spain and N Italy. Keywords: vines, erosion, soil management, RUSLE, model. References Dabney, S.M. Yoder, D.C. Yoder, Vieira, D.A.N. 2012. The application of the Revised Universal Soil Loss Equation, Version 2, to evaluate the impacts of alternative climate change scenarios on runoff and sediment yield. Journal of Soil and Water Conservation 67: 343 - 353. Gómez, J.A., Battany, M., Renschler, C.S., Fereres, E. 2003. Evaluating the impact of soil management on soil loss in olive orchards. Soil Use Manage. 19: 127- 134. Gómez, J.A., Llewellyn, C., Basch, G, Sutton, P.B., Dyson, J.S., Jones, C.A. 2011. The effects of cover crops and conventional tillage on soil and runoff loss in vineyards and olive groves in several Mediterranean countries. Soil Use and Management 27 502 - 514 Marín, V. 2013. Interfaz gráfica para la valoración de la pérdida de suelo en parcelas de olivar. Final Degree project. University of Cordoba. Vanwalleghem, T., Infante, J.A., González, M., Soto, D., Gómez, J.A. 2011. Quantifying the effect of historical soil management on soil erosion rates in Mediterranean olive orchards. Agriculture, Ecosystems & Environment 142: 341-351.

  14. SPECTRUS: A Dimensionality Reduction Approach for Identifying Dynamical Domains in Protein Complexes from Limited Structural Datasets.

    Science.gov (United States)

    Ponzoni, Luca; Polles, Guido; Carnevale, Vincenzo; Micheletti, Cristian

    2015-08-04

    Identifying dynamical, quasi-rigid domains in proteins provides a powerful means for characterizing functionally oriented structural changes via a parsimonious set of degrees of freedom. In fact, the relative displacements of few dynamical domains usually suffice to rationalize the mechanics underpinning biological functionality in proteins and can even be exploited for structure determination or refinement purposes. Here we present SPECTRUS, a general scheme that, by solely using amino acid distance fluctuations, can pinpoint the innate quasi-rigid domains of single proteins or large complexes in a robust way. Consistent domains are usually obtained by using either a pair of representative structures or thousands of conformers. The functional insights offered by the approach are illustrated for biomolecular systems of very different size and complexity such as kinases, ion channels, and viral capsids. The decomposition tool is available as a software package and web server at spectrus.sissa.it. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Comparison of automated and human assignment of MeSH terms on publicly-available molecular datasets.

    Science.gov (United States)

    Ruau, David; Mbagwu, Michael; Dudley, Joel T; Krishnan, Vijay; Butte, Atul J

    2011-12-01

    Publicly available molecular datasets can be used for independent verification or investigative repurposing, but depends on the presence, consistency and quality of descriptive annotations. Annotation and indexing of molecular datasets using well-defined controlled vocabularies or ontologies enables accurate and systematic data discovery, yet the majority of molecular datasets available through public data repositories lack such annotations. A number of automated annotation methods have been developed; however few systematic evaluations of the quality of annotations supplied by application of these methods have been performed using annotations from standing public data repositories. Here, we compared manually-assigned Medical Subject Heading (MeSH) annotations associated with experiments by data submitters in the PRoteomics IDEntification (PRIDE) proteomics data repository to automated MeSH annotations derived through the National Center for Biomedical Ontology Annotator and National Library of Medicine MetaMap programs. These programs were applied to free-text annotations for experiments in PRIDE. As many submitted datasets were referenced in publications, we used the manually curated MeSH annotations of those linked publications in MEDLINE as "gold standard". Annotator and MetaMap exhibited recall performance 3-fold greater than that of the manual annotations. We connected PRIDE experiments in a network topology according to shared MeSH annotations and found 373 distinct clusters, many of which were found to be biologically coherent by network analysis. The results of this study suggest that both Annotator and MetaMap are capable of annotating public molecular datasets with a quality comparable, and often exceeding, that of the actual data submitters, highlighting a continuous need to improve and apply automated methods to molecular datasets in public data repositories to maximize their value and utility. Copyright © 2011 Elsevier Inc. All rights reserved.

  16. Hydrological simulation of the Brahmaputra basin using global datasets

    Science.gov (United States)

    Bhattacharya, Biswa; Conway, Crystal; Craven, Joanne; Masih, Ilyas; Mazzolini, Maurizio; Shrestha, Shreedeepy; Ugay, Reyne; van Andel, Schalk Jan

    2017-04-01

    Brahmaputra River flows through China, India and Bangladesh to the Bay of Bengal and is one of the largest rivers of the world with a catchment size of 580K km2. The catchment is largely hilly and/or forested with sparse population and with limited urbanisation and economic activities. The catchment experiences heavy monsoon rainfall leading to very high flood discharges. Large inter-annual variation of discharge leading to flooding, erosion and morphological changes are among the major challenges. The catchment is largely ungauged; moreover, limited availability of hydro-meteorological data limits the possibility of carrying out evidence based research, which could provide trustworthy information for managing and when needed, controlling, the basin processes by the riparian countries for overall basin development. The paper presents initial results of a current research project on Brahmaputra basin. A set of hydrological and hydraulic models (SWAT, HMS, RAS) are developed by employing publicly available datasets of DEM, land use and soil and simulated using satellite based rainfall products, evapotranspiration and temperature estimates. Remotely sensed data are compared with sporadically available ground data. The set of models are able to produce catchment wide hydrological information that potentially can be used in the future in managing the basin's water resources. The model predications should be used with caution due to high level of uncertainty because the semi-calibrated models are developed with uncertain physical representation (e.g. cross-section) and simulated with global meteorological forcing (e.g. TRMM) with limited validation. Major scientific challenges are seen in producing robust information that can be reliably used in managing the basin. The information generated by the models are uncertain and as a result, instead of using them per se, they are used in improving the understanding of the catchment, and by running several scenarios with varying

  17. Fast Multivariate Search on Large Aviation Datasets

    Science.gov (United States)

    Bhaduri, Kanishka; Zhu, Qiang; Oza, Nikunj C.; Srivastava, Ashok N.

    2010-01-01

    Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem (1) an R-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual

  18. “Controlled, cross-species dataset for exploring biases in genome annotation and modification profiles”

    Directory of Open Access Journals (Sweden)

    Alison McAfee

    2015-12-01

    Full Text Available Since the sequencing of the honey bee genome, proteomics by mass spectrometry has become increasingly popular for biological analyses of this insect; but we have observed that the number of honey bee protein identifications is consistently low compared to other organisms [1]. In this dataset, we use nanoelectrospray ionization-coupled liquid chromatography–tandem mass spectrometry (nLC–MS/MS to systematically investigate the root cause of low honey bee proteome coverage. To this end, we present here data from three key experiments: a controlled, cross-species analyses of samples from Apis mellifera, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Mus musculus and Homo sapiens; a proteomic analysis of an individual honey bee whose genome was also sequenced; and a cross-tissue honey bee proteome comparison. The cross-species dataset was interrogated to determine relative proteome coverages between species, and the other two datasets were used to search for polymorphic sequences and to compare protein cleavage profiles, respectively.

  19. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  20. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  1. Display and retrieve datasets from a Starcam format archive floppy

    International Nuclear Information System (INIS)

    Kara, G.; Aras, G.; Kir, M.

    2002-01-01

    Aim: Aim of this study was to retrieve datasets from a Starcam format archive floppy which was widely used in nuclear medicine imaging systems and image processing by personal computer without their operational system and equipment Star/Starcam 2000/3000/3200/4000, Camstar, Maxxus and Optimas-GE Medical Systems. Materials and Methods:Datasets of two patients involving floppy with was formatted by Starcam format in Starcam 3200 XCT one head SPECT camera was used for this study.Datasets contained whole body bone scans and planar cisternography images.Floppy disk reader software for starcam floppy, RMX v1,53 (GE Medical System Software) and image displaying with processing software (Medic imaging, University of Geneva Hospital) were used for retrieved datasets from floppy with starcam format for personal computer with 600 MHz Intel CPU, 256 MB SDRAM 100 MHz and Windows ME operating system (Microsoft Co.). Retrieved datasets were not recognizable by PC. We removed dataset summary header which contain patient and study data for the patients by hexeditor. Results: Retrieved datasets of two patients without summary header were diagnosed and processed with medical imaging software and easily converted to other imaging format like DICOM, raw etc. Conclusion:Floppy and optical disks formatted by starcam format cannot be easily recognizable by PC.Retrieved datasets cannot be imaging without commercial Medvision and NEMA software. We offer free a displaying and processing method for this problem by removing dataset summary header for personal computer that can be routinely used in daily practice

  2. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  3. A nonparametric statistical technique for combining global precipitation datasets: development and hydrological evaluation over the Iberian Peninsula

    Science.gov (United States)

    Abul Ehsan Bhuiyan, Md; Nikolopoulos, Efthymios I.; Anagnostou, Emmanouil N.; Quintana-Seguí, Pere; Barella-Ortiz, Anaïs

    2018-02-01

    This study investigates the use of a nonparametric, tree-based model, quantile regression forests (QRF), for combining multiple global precipitation datasets and characterizing the uncertainty of the combined product. We used the Iberian Peninsula as the study area, with a study period spanning 11 years (2000-2010). Inputs to the QRF model included three satellite precipitation products, CMORPH, PERSIANN, and 3B42 (V7); an atmospheric reanalysis precipitation and air temperature dataset; satellite-derived near-surface daily soil moisture data; and a terrain elevation dataset. We calibrated the QRF model for two seasons and two terrain elevation categories and used it to generate ensemble for these conditions. Evaluation of the combined product was based on a high-resolution, ground-reference precipitation dataset (SAFRAN) available at 5 km 1 h-1 resolution. Furthermore, to evaluate relative improvements and the overall impact of the combined product in hydrological response, we used the generated ensemble to force a distributed hydrological model (the SURFEX land surface model and the RAPID river routing scheme) and compared its streamflow simulation results with the corresponding simulations from the individual global precipitation and reference datasets. We concluded that the proposed technique could generate realizations that successfully encapsulate the reference precipitation and provide significant improvement in streamflow simulations, with reduction in systematic and random error on the order of 20-99 and 44-88 %, respectively, when considering the ensemble mean.

  4. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

    Directory of Open Access Journals (Sweden)

    Lim Yan

    2010-06-01

    Full Text Available Abstract Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner.

  5. Dimension Reduction Aided Hyperspectral Image Classification with a Small-sized Training Dataset: Experimental Comparisons

    Directory of Open Access Journals (Sweden)

    Jinya Su

    2017-11-01

    Full Text Available Hyperspectral images (HSI provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification, and the need of a large amount of labelled data for training. Particularly, it is generally difficult and expensive for experts to acquire sufficient training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of training dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance and feature extraction (e.g., Principal Component Analysis (PCA, Kernel PCA are adopted to augment a baseline classification method, Support Vector Machine (SVM. The proposed algorithms are evaluated using a real HSI dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small training dataset, which makes it suitable for real-time applications or when only limited training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large datasets but with much less computing time.

  6. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  7. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  8. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  9. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  10. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  11. Visualisation of Massive Military Datasets: Human Factors, Applications, and Technologies

    National Research Council Canada - National Science Library

    2001-01-01

    This final report of IST-0l3/RTG-O()2 "Visualisation of Massive Military Datasets" presents some of the issues involved in visualisation as well as techniques that have been used in support of visualisation for military applications...

  12. An Evaluation of Knowledge Base Systems for Large OWL Datasets

    National Research Council Canada - National Science Library

    Guo, Yuanbo; Pan, Zhengxiang; Heflin, Jeff

    2004-01-01

    .... To this end, we have developed the Lehigh University Benchmark (LUBM). The benchmark is intended to evaluate knowledge base systems with respect to extensional queries over a large dataset that commits to a single realistic ontology...

  13. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  14. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  15. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  16. Client-server multitask learning from distributed datasets.

    Science.gov (United States)

    Dinuzzo, Francesco; Pillonetto, Gianluigi; De Nicolao, Giuseppe

    2011-02-01

    A client-server architecture to simultaneously solve multiple learning tasks from distributed datasets is described. In such architecture, each client corresponds to an individual learning task and the associated dataset of examples. The goal of the architecture is to perform information fusion from multiple datasets while preserving privacy of individual data. The role of the server is to collect data in real time from the clients and codify the information in a common database. Such information can be used by all the clients to solve their individual learning task, so that each client can exploit the information content of all the datasets without actually having access to private data of others. The proposed algorithmic framework, based on regularization and kernel methods, uses a suitable class of "mixed effect" kernels. The methodology is illustrated through a simulated recommendation system, as well as an experiment involving pharmacological data coming from a multicentric clinical trial.

  17. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  18. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  19. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  20. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  1. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  2. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  3. Sampling Within k-Means Algorithm to Cluster Large Datasets

    Energy Technology Data Exchange (ETDEWEB)

    Bejarano, Jeremy [Brigham Young University; Bose, Koushiki [Brown University; Brannan, Tyler [North Carolina State University; Thomas, Anita [Illinois Institute of Technology; Adragni, Kofi [University of Maryland; Neerchal, Nagaraj [University of Maryland; Ostrouchov, George [ORNL

    2011-08-01

    Due to current data collection technology, our ability to gather data has surpassed our ability to analyze it. In particular, k-means, one of the simplest and fastest clustering algorithms, is ill-equipped to handle extremely large datasets on even the most powerful machines. Our new algorithm uses a sample from a dataset to decrease runtime by reducing the amount of data analyzed. We perform a simulation study to compare our sampling based k-means to the standard k-means algorithm by analyzing both the speed and accuracy of the two methods. Results show that our algorithm is significantly more efficient than the existing algorithm with comparable accuracy. Further work on this project might include a more comprehensive study both on more varied test datasets as well as on real weather datasets. This is especially important considering that this preliminary study was performed on rather tame datasets. Also, these datasets should analyze the performance of the algorithm on varied values of k. Lastly, this paper showed that the algorithm was accurate for relatively low sample sizes. We would like to analyze this further to see how accurate the algorithm is for even lower sample sizes. We could find the lowest sample sizes, by manipulating width and confidence level, for which the algorithm would be acceptably accurate. In order for our algorithm to be a success, it needs to meet two benchmarks: match the accuracy of the standard k-means algorithm and significantly reduce runtime. Both goals are accomplished for all six datasets analyzed. However, on datasets of three and four dimension, as the data becomes more difficult to cluster, both algorithms fail to obtain the correct classifications on some trials. Nevertheless, our algorithm consistently matches the performance of the standard algorithm while becoming remarkably more efficient with time. Therefore, we conclude that analysts can use our algorithm, expecting accurate results in considerably less time.

  4. Economics of Development NGOs: a survey of existing datasets

    OpenAIRE

    Cecilia Navarra

    2013-01-01

    This work is a survey of the existing datasets on development NGOs (Non-Governmental Organizations) and of the related empirical literature. We define NGOs as non-profit and non-governmental aid intermediaries, public good providers that channel donorsÂ’ funds; these can be both international NGOs and local NGOs in recipient countries. We organize the surveyed datasets in four categories following the unit of observation: information at the NGO level on Northern NGOs, accounts of aid flows th...

  5. Classification of Intrusion Detection Dataset using machine learning Approaches

    OpenAIRE

    Neethu B

    2012-01-01

    The paper describes about a method of intrusion detection that uses machine learning algorithms. Here we discuss about the combinational use of two machine learning algorithms called Principal Component Analysis and Naive Bayes classifier. The dimensionality of the dataset is reduced by using the principal component analysis and the classification of the dataset in to normal and attack classes is done by using Naïve Bayes Classifier. The experiments were conducted on the intrusion detection d...

  6. Seeing the forests and the trees--innovative approaches to exploring heterogeneity in systematic reviews of complex interventions to enhance health system decision-making: a protocol.

    Science.gov (United States)

    Ivers, Noah; Tricco, Andrea C; Trikalinos, Thomas A; Dahabreh, Issa J; Danko, Kristin J; Moher, David; Straus, Sharon E; Lavis, John N; Yu, Catherine H; Shojania, Kaveh; Manns, Braden; Tonelli, Marcello; Ramsay, Timothy; Edwards, Alun; Sargious, Peter; Paprica, Alison; Hillmer, Michael; Grimshaw, Jeremy M

    2014-08-12

    To improve quality of care and patient outcomes, health system decision-makers need to identify and implement effective interventions. An increasing number of systematic reviews document the effects of quality improvement programs to assist decision-makers in developing new initiatives. However, limitations in the reporting of primary studies and current meta-analysis methods (including approaches for exploring heterogeneity) reduce the utility of existing syntheses for health system decision-makers. This study will explore the role of innovative meta-analysis approaches and the added value of enriched and updated data for increasing the utility of systematic reviews of complex interventions. We will use the dataset from our recent systematic review of 142 randomized trials of diabetes quality improvement programs to evaluate novel approaches for exploring heterogeneity. These will include exploratory methods, such as multivariate meta-regression analyses and all-subsets combinatorial meta-analysis. We will then update our systematic review to include new trials and enrich the dataset by surveying authors of all included trials. In doing so, we will explore the impact of variables not, reported in previous publications, such as details of study context, on the effectiveness of the intervention. We will use innovative analytical methods on the enriched and updated dataset to identify key success factors in the implementation of quality improvement interventions for diabetes. Decision-makers will be involved throughout to help identify and prioritize variables to be explored and to aid in the interpretation and dissemination of results. This study will inform future systematic reviews of complex interventions and describe the value of enriching and updating data for exploring heterogeneity in meta-analysis. It will also result in an updated comprehensive systematic review of diabetes quality improvement interventions that will be useful to health system decision

  7. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  8. Visual Analysis and Processing of Clusters Structures in Multidimensional Datasets

    Science.gov (United States)

    Bondarev, A. E.

    2017-05-01

    The article is devoted to problems of visual analysis of clusters structures for a multidimensional datasets. For visual analyzing an approach of elastic maps design [1,2] is applied. This approach is quite suitable for processing and visualizing of multidimensional datasets. To analyze clusters in original data volume the elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. Then the points of dataset in question are projected to the map. The extension of designed map to a flat plane allows one to get an insight about the cluster structure of multidimensional dataset. The approach of elastic maps does not require any a priori information about data in question and does not depend on data nature, data origin, etc. Elastic maps are usually combined with PCA approach. Being presented in the space based on three first principal components the elastic maps provide quite good results. The article describes the results of elastic maps approach application to visual analysis of clusters for different multidimensional datasets including medical data.

  9. Current limiters

    Energy Technology Data Exchange (ETDEWEB)

    Loescher, D.H. [Sandia National Labs., Albuquerque, NM (United States). Systems Surety Assessment Dept.; Noren, K. [Univ. of Idaho, Moscow, ID (United States). Dept. of Electrical Engineering

    1996-09-01

    The current that flows between the electrical test equipment and the nuclear explosive must be limited to safe levels during electrical tests conducted on nuclear explosives at the DOE Pantex facility. The safest way to limit the current is to use batteries that can provide only acceptably low current into a short circuit; unfortunately this is not always possible. When it is not possible, current limiters, along with other design features, are used to limit the current. Three types of current limiters, the fuse blower, the resistor limiter, and the MOSFET-pass-transistor limiters, are used extensively in Pantex test equipment. Detailed failure mode and effects analyses were conducted on these limiters. Two other types of limiters were also analyzed. It was found that there is no best type of limiter that should be used in all applications. The fuse blower has advantages when many circuits must be monitored, a low insertion voltage drop is important, and size and weight must be kept low. However, this limiter has many failure modes that can lead to the loss of over current protection. The resistor limiter is simple and inexpensive, but is normally usable only on circuits for which the nominal current is less than a few tens of milliamperes. The MOSFET limiter can be used on high current circuits, but it has a number of single point failure modes that can lead to a loss of protective action. Because bad component placement or poor wire routing can defeat any limiter, placement and routing must be designed carefully and documented thoroughly.

  10. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  11. An Improved Method for Producing High Spatial-Resolution NDVI Time Series Datasets with Multi-Temporal MODIS NDVI Data and Landsat TM/ETM+ Images

    OpenAIRE

    Rao, Yuhan; Zhu, Xiaolin; Chen, Jin; Wang, Jianmin

    2015-01-01

    Due to technical limitations, it is impossible to have high resolution in both spatial and temporal dimensions for current NDVI datasets. Therefore, several methods are developed to produce high resolution (spatial and temporal) NDVI time-series datasets, which face some limitations including high computation loads and unreasonable assumptions. In this study, an unmixing-based method, NDVI Linear Mixing Growth Model (NDVI-LMGM), is proposed to achieve the goal of accurately and efficiently bl...

  12. Analysis of plant-derived miRNAs in animal small RNA datasets

    Directory of Open Access Journals (Sweden)

    Zhang Yuanji

    2012-08-01

    Full Text Available Abstract Background Plants contain significant quantities of small RNAs (sRNAs derived from various sRNA biogenesis pathways. Many of these sRNAs play regulatory roles in plants. Previous analysis revealed that numerous sRNAs in corn, rice and soybean seeds have high sequence similarity to animal genes. However, exogenous RNA is considered to be unstable within the gastrointestinal tract of many animals, thus limiting potential for any adverse effects from consumption of dietary RNA. A recent paper reported that putative plant miRNAs were detected in animal plasma and serum, presumably acquired through ingestion, and may have a functional impact in the consuming organisms. Results To address the question of how common this phenomenon could be, we searched for plant miRNAs sequences in public sRNA datasets from various tissues of mammals, chicken and insects. Our analyses revealed that plant miRNAs were present in the animal sRNA datasets, and significantly miR168 was extremely over-represented. Furthermore, all or nearly all (>96% miR168 sequences were monocot derived for most datasets, including datasets for two insects reared on dicot plants in their respective experiments. To investigate if plant-derived miRNAs, including miR168, could accumulate and move systemically in insects, we conducted insect feeding studies for three insects including corn rootworm, which has been shown to be responsive to plant-produced long double-stranded RNAs. Conclusions Our analyses suggest that the observed plant miRNAs in animal sRNA datasets can originate in the process of sequencing, and that accumulation of plant miRNAs via dietary exposure is not universal in animals.

  13. Quench limits

    International Nuclear Information System (INIS)

    Sapinski, M.

    2012-01-01

    With thirteen beam induced quenches and numerous Machine Development tests, the current knowledge of LHC magnets quench limits still contains a lot of unknowns. Various approaches to determine the quench limits are reviewed and results of the tests are presented. Attempt to reconstruct a coherent picture emerging from these results is taken. The available methods of computation of the quench levels are presented together with dedicated particle shower simulations which are necessary to understand the tests. The future experiments, needed to reach better understanding of quench limits as well as limits for the machine operation are investigated. The possible strategies to set BLM (Beam Loss Monitor) thresholds are discussed. (author)

  14. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    International Nuclear Information System (INIS)

    Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

    2010-01-01

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  15. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    Energy Technology Data Exchange (ETDEWEB)

    Sim, Alexander [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Balman, Mehmet [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shoshani, Arie [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Natarajan, Vijaya [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2010-07-16

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  16. Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

    Science.gov (United States)

    Robinson, Nathaniel Paul

    Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production

  17. Secondary Analysis of Existing Datasets for Dementia and Palliative Care Research: High-Value Applications and Key Considerations.

    Science.gov (United States)

    Hunt, Lauren J; Lee, See J; Harrison, Krista L; Smith, Alexander K

    2018-02-01

    To provide a guide to researchers selecting a dataset pertinent to the study of palliative care for people with dementia and to aid readers who seek to critically evaluate a secondary analysis study in this domain. The impact of dementia at end-of-life is large and growing. Secondary dataset analysis can play a critical role in advancing research on palliative care for people with dementia. We conducted a broad search of a variety of resources to: 1. identity datasets that include information germane to dementia and palliative care research; 2. review relevant applications of secondary dataset analysis in the published literature; and 3. explore potential validity and reliability concerns. We synthesize findings regarding: 1. Methodological approaches for determining the presence of dementia; 2. Inclusion and measurement of key palliative care items as they relate to people with dementia; and 3. Sampling and study design issues, including the role and implications of proxy-respondents. We describe and compare a selection of high-value existing datasets relevant to palliative care and dementia research. While secondary analysis of existing datasets requires consideration of key limitations, it can be a powerful tool for efficiently enhancing knowledge of palliative care needs among people with dementia.

  18. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  19. Dose limits

    International Nuclear Information System (INIS)

    Fitoussi, L.

    1987-12-01

    The dose limit is defined to be the level of harmfulness which must not be exceeded, so that an activity can be exercised in a regular manner without running a risk unacceptable to man and the society. The paper examines the effects of radiation categorised into stochastic and non-stochastic. Dose limits for workers and the public are discussed

  20. UNDERSTANDING THE DETERMINANTS OF FIRMS’ PERFORMANCE. EMPIRICAL STUDY USING A ROMANIAN DATASET

    Directory of Open Access Journals (Sweden)

    Gyula Laszlo I. FLORIAN

    2013-10-01

    Full Text Available After initially depicted the multidimensionality of organizational performance, I use a pooled dataset of 1204 observations across the years 2005, 2006 and 2007. The observations included in the analysis have all non-missing data. An initial selection from a much larger dataset has been conducted in order to eliminate observations with missing data. Our results show that time effect in itself is not enough to impact significantly on organizational performance. Yet we have found evidence for structural changes in Romanian Economy affecting organizational performance. We have also documented that the effect of financial leverage on organizational performance is limited to certain industries. The model is valid in spite that the model does not account for (full multidimensionality of organizational performance.

  1. Geostatistical exploration of dataset assessing the heavy metal contamination in Ewekoro limestone, Southwestern Nigeria

    Directory of Open Access Journals (Sweden)

    Kehinde D. Oyeyemi

    2017-10-01

    Full Text Available The dataset for this article contains geostatistical analysis of heavy metals contamination from limestone samples collected from Ewekoro Formation in the eastern Dahomey basin, Ogun State Nigeria. The samples were manually collected and analysed using Microwave Plasma Atomic Absorption Spectrometer (MPAS. Analysis of the twenty different samples showed different levels of heavy metals concentration. The analysed nine elements are Arsenic, Mercury, Cadmium, Cobalt, Chromium, Nickel, Lead, Vanadium and Zinc. Descriptive statistics was used to explore the heavy metal concentrations individually. Pearson, Kendall tau and Spearman rho correlation coefficients was used to establish the relationships among the elements and the analysis of variance showed that there is a significant difference in the mean distribution of the heavy metals concentration within and between the groups of the 20 samples analysed. The dataset can provide insights into the health implications of the contaminants especially when the mean concentration levels of the heavy metals are compared with recommended regulatory limit concentration.

  2. jMOSAiCS: joint analysis of multiple ChIP-seq datasets

    Science.gov (United States)

    2013-01-01

    The ChIP-seq technique enables genome-wide mapping of in vivo protein-DNA interactions and chromatin states. Current analytical approaches for ChIP-seq analysis are largely geared towards single-sample investigations, and have limited applicability in comparative settings that aim to identify combinatorial patterns of enrichment across multiple datasets. We describe a novel probabilistic method, jMOSAiCS, for jointly analyzing multiple ChIP-seq datasets. We demonstrate its usefulness with a wide range of data-driven computational experiments and with a case study of histone modifications on GATA1-occupied segments during erythroid differentiation. jMOSAiCS is open source software and can be downloaded from Bioconductor [1]. PMID:23844871

  3. Applying MetaMap to Medline for identifying novel associations in a large clinical dataset: a feasibility analysis.

    Science.gov (United States)

    Hanauer, David A; Saeed, Mohammed; Zheng, Kai; Mei, Qiaozhu; Shedden, Kerby; Aronson, Alan R; Ramakrishnan, Naren

    2014-01-01

    We describe experiments designed to determine the feasibility of distinguishing known from novel associations based on a clinical dataset comprised of International Classification of Disease, V.9 (ICD-9) codes from 1.6 million patients by comparing them to associations of ICD-9 codes derived from 20.5 million Medline citations processed using MetaMap. Associations appearing only in the clinical dataset, but not in Medline citations, are potentially novel. Pairwise associations of ICD-9 codes were independently identified in both the clinical and Medline datasets, which were then compared to quantify their degree of overlap. We also performed a manual review of a subset of the associations to validate how well MetaMap performed in identifying diagnoses mentioned in Medline citations that formed the basis of the Medline associations. The overlap of associations based on ICD-9 codes in the clinical and Medline datasets was low: only 6.6% of the 3.1 million associations found in the clinical dataset were also present in the Medline dataset. Further, a manual review of a subset of the associations that appeared in both datasets revealed that co-occurring diagnoses from Medline citations do not always represent clinically meaningful associations. Identifying novel associations derived from large clinical datasets remains challenging. Medline as a sole data source for existing knowledge may not be adequate to filter out widely known associations. In this study, novel associations were not readily identified. Further improvements in accuracy and relevance for tools such as MetaMap are needed to realize their expected utility. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  4. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  5. Securely measuring the overlap between private datasets with cryptosets.

    Science.gov (United States)

    Swamidass, S Joshua; Matlock, Matthew; Rozenblit, Leon

    2015-01-01

    Many scientific questions are best approached by sharing data--collected by different groups or across large collaborative networks--into a combined analysis. Unfortunately, some of the most interesting and powerful datasets--like health records, genetic data, and drug discovery data--cannot be freely shared because they contain sensitive information. In many situations, knowing if private datasets overlap determines if it is worthwhile to navigate the institutional, ethical, and legal barriers that govern access to sensitive, private data. We report the first method of publicly measuring the overlap between private datasets that is secure under a malicious model without relying on private protocols or message passing. This method uses a publicly shareable summary of a dataset's contents, its cryptoset, to estimate its overlap with other datasets. Cryptosets approach "information-theoretic" security, the strongest type of security possible in cryptography, which is not even crackable with infinite computing power. We empirically and theoretically assess both the accuracy of these estimates and the security of the approach, demonstrating that cryptosets are informative, with a stable accuracy, and secure.

  6. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447" target="_blank">https://doi.org/10.11922/sciencedb.447).

  7. Systematic review

    DEFF Research Database (Denmark)

    Enggaard, Helle

    Title: Systematic review a method to promote nursing students skills in Evidence Based Practice Background: Department of nursing educate students to practice Evidence Based Practice (EBP), where clinical decisions is based on the best available evidence, patient preference, clinical experience...... and resources available. In order to incorporate evidence in clinical decisions, nursing students need to learn how to transfer knowledge in order to utilize evidence in clinical decisions. The method of systematic review can be one approach to achieve this in nursing education. Method: As an associate lecturer...... I have taken a Comprehensive Systematic Review Training course provide by Center of Clinical Guidelines in Denmark and Jonna Briggs Institute (JBI) and practice in developing a systematic review on how patients with ischemic heart disease experiences peer support. This insight and experience...

  8. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  9. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  10. Review Studies for the ATLAS Open Data Dataset

    CERN Document Server

    The ATLAS collaboration

    2016-01-01

    This document presents approval plots from selected analyses using the ATLAS Open Data dataset. This dataset containing "1\\ \\text{fb}^{-1}" of "8 \\text{TeV}" data collected by ATLAS along with a selection of Monte Carlo simulated events, is intended to be released to the public for educational use only alongside tools to enable students to get started quickly and easily. The corrections applied to the Monte Carlo have been simplified for the purposes of the intended use and reduce processing time, and the approval plots should indicate clearly reasons for disagreement between Monte Carlo and data. As the dataset is for educational purposes only, although some low statistic analyses can be done and educational objectives achieved it will be clear that the user can not use it beyond the use case due to the low statistics.

  11. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  12. Inverse Limits

    CERN Document Server

    Ingram, WT

    2012-01-01

    Inverse limits provide a powerful tool for constructing complicated spaces from simple ones. They also turn the study of a dynamical system consisting of a space and a self-map into a study of a (likely more complicated) space and a self-homeomorphism. In four chapters along with an appendix containing background material the authors develop the theory of inverse limits. The book begins with an introduction through inverse limits on [0,1] before moving to a general treatment of the subject. Special topics in continuum theory complete the book. Although it is not a book on dynamics, the influen

  13. Fast randomization of large genomic datasets while preserving alteration counts.

    Science.gov (United States)

    Gobbi, Andrea; Iorio, Francesco; Dawson, Kevin J; Wedge, David C; Tamborero, David; Alexandrov, Ludmil B; Lopez-Bigas, Nuria; Garnett, Mathew J; Jurman, Giuseppe; Saez-Rodriguez, Julio

    2014-09-01

    Studying combinatorial patterns in cancer genomic datasets has recently emerged as a tool for identifying novel cancer driver networks. Approaches have been devised to quantify, for example, the tendency of a set of genes to be mutated in a 'mutually exclusive' manner. The significance of the proposed metrics is usually evaluated by computing P-values under appropriate null models. To this end, a Monte Carlo method (the switching-algorithm) is used to sample simulated datasets under a null model that preserves patient- and gene-wise mutation rates. In this method, a genomic dataset is represented as a bipartite network, to which Markov chain updates (switching-steps) are applied. These steps modify the network topology, and a minimal number of them must be executed to draw simulated datasets independently under the null model. This number has previously been deducted empirically to be a linear function of the total number of variants, making this process computationally expensive. We present a novel approximate lower bound for the number of switching-steps, derived analytically. Additionally, we have developed the R package BiRewire, including new efficient implementations of the switching-algorithm. We illustrate the performances of BiRewire by applying it to large real cancer genomics datasets. We report vast reductions in time requirement, with respect to existing implementations/bounds and equivalent P-value computations. Thus, we propose BiRewire to study statistical properties in genomic datasets, and other data that can be modeled as bipartite networks. BiRewire is available on BioConductor at http://www.bioconductor.org/packages/2.13/bioc/html/BiRewire.html. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.

  14. Dataset of herbarium specimens of threatened vascular plants in Catalonia

    Directory of Open Access Journals (Sweden)

    Neus Nualart

    2017-02-01

    Full Text Available This data paper describes a specimens’ dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE. Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU. This dataset includes 1,618 records collected from 17th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  15. Comprehensive comparison of large-scale tissue expression datasets

    DEFF Research Database (Denmark)

    Santos Delgado, Alberto; Tsafou, Kalliopi; Stolte, Christian

    2015-01-01

    a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between...... tissue-specific and ubiquitous expression. By developing comparable confidence scores for all types of evidence, we show that it is possible to improve both quality and coverage by combining the datasets. To facilitate use and visualization of our work, we have developed the TISSUES resource (http...

  16. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  17. a Critical Review of Automated Photogrammetric Processing of Large Datasets

    Science.gov (United States)

    Remondino, F.; Nocerino, E.; Toschi, I.; Menna, F.

    2017-08-01

    The paper reports some comparisons between commercial software able to automatically process image datasets for 3D reconstruction purposes. The main aspects investigated in the work are the capability to correctly orient large sets of image of complex environments, the metric quality of the results, replicability and redundancy. Different datasets are employed, each one featuring a diverse number of images, GSDs at cm and mm resolutions, and ground truth information to perform statistical analyses of the 3D results. A summary of (photogrammetric) terms is also provided, in order to provide rigorous terms of reference for comparisons and critical analyses.

  18. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  19. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  20. The Challenges of Searching, Finding, Reading, Understanding and Using Mars Mission Datasets for Science Analysis

    Science.gov (United States)

    Johnson, Jeffrey R.

    2006-01-01

    This viewgraph presentation reviews the problems that non-mission researchers have in accessing data to use in their analysis of Mars. The increasing complexity of Mars datasets results in custom software development by instrument teams that is often the only means to visualize and analyze the data. The solutions to the problem are to continue efforts toward synergizing data from multiple missions and making the data, s/w, derived products available in standardized, easily-accessible formats, encourage release of "lite" versions of mission-related software prior to end-of-mission, and planetary image data should be systematically processed in a coordinated way and made available in an easily accessed form. The recommendations of Mars Environmental GIS Workshop are reviewed.

  1. State hydrocarbon rents, authoritarian survival and the onset of democracy: Evidence from a new dataset

    Directory of Open Access Journals (Sweden)

    Viola Lucas

    2016-08-01

    Full Text Available This article surveys the effects of state hydrocarbon rents—defined as government income from oil and natural gas—on authoritarian survival and the onset of democracy. We also examine the association of changing state hydrocarbon rents with state spending and taxation based on a new collection of historical data, the Global State Revenues and Expenditures dataset. Using these novel data, we provide evidence that increasing state rents from oil and gas hinder democratization by reducing citizens’ tax burden. However, an increase in the oil and gas income flowing directly into state coffers does not appear to lower the average risk of ouster by rival authoritarian elites. We have found no evidence of the systematic distributional effects of state hydrocarbon income on regime survival.

  2. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  3. A dataset comprising 141 magnetic resonance imaging scans of 98 extant sea urchin species.

    Science.gov (United States)

    Ziegler, Alexander; Faber, Cornelius; Mueller, Susanne; Nagelmann, Nina; Schröder, Leif

    2014-01-01

    Apart from its application in human diagnostics, magnetic resonance imaging (MRI) can also be used to study the internal anatomy of zoological specimens. As a non-invasive imaging technique, MRI has several advantages, such as rapid data acquisition, output of true three-dimensional imagery, and provision of digital data right from the onset of a study. Of particular importance for comparative zoological studies is the capacity of MRI to conduct high-throughput analyses of multiple specimens. In this study, MRI was applied to systematically document the internal anatomy of 98 representative species of sea urchins (Echinodermata: Echinoidea). The dataset includes raw and derived image data from 141 MRI scans. Most of the whole sea urchin specimens analyzed were obtained from museum collections. The attained scan resolutions permit differentiation of various internal organs, including the digestive tract, reproductive system, coelomic compartments, and lantern musculature. All data deposited in the GigaDB repository can be accessed using open source software. Potential uses of the dataset include interactive exploration of sea urchin anatomy, morphometric and volumetric analyses of internal organs observed in their natural context, as well as correlation of hard and soft tissue structures. The dataset covers a broad taxonomical and morphological spectrum of the Echinoidea, focusing on 'regular' sea urchin taxa. The deposited files significantly expand the amount of morphological data on echinoids that are electronically available. The approach chosen here can be extended to various other vertebrate and invertebrate taxa. We argue that publicly available digital anatomical and morphological data gathered during experiments involving non-invasive imaging techniques constitute one of the prerequisites for future large-scale genotype-phenotype correlations.

  4. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  5. Review of access, licenses and understandability of open datasets used in hydrology research

    Science.gov (United States)

    Falkenroth, Esa; Arheimer, Berit; Lagerbäck Adolphi, Emma

    2015-04-01

    The amount of open data available for hydrology research is continually growing. In the EU-funded project SWITCH-ON (Sharing Water-related Information to Tackle Changes in the Hydrosphere - for Operational Needs), we are addressing water concerns by exploring and exploiting the untapped potential of these new open data. This work is enabled by many ongoing efforts to facilitate the use of open data. For instance, a number of portals (such as the GEOSS Portal and the INSPIRE community geoportal) provide the means to search for such open data sets and open spatial data services. However, in general, the systematic use of available open data is still fairly uncommon in hydrology research. Factors that limits (re)usability of a data set include: (1) accessibility, (2) understandability and (3) licences. If you cannot access the data set, you cannot use if for research. If you cannot understand the data set you cannot use it for research. Finally, if you are not permitted to use the data, you cannot use it for research. Early on in the project, we sent out a questionnaire to our research partners (SMHI, Universita di Bologna, University of Bristol, Technische Universiteit Delft and Technische Universitaet Wien) to find out what data sets they were planning to use in their experiments. The result was a comprehensive list of useful open data sets. Later, this list of data sets was extended with additional information on data sets for planned commercial water-information products and services. With the list of 50 common data sets as a starting point, we reviewed issues related to access, understandability and licence conditions. Regarding access to data sets, a majority of data sets were available through direct internet download via some well-known transfer protocol such as ftp or http. However, several data sets were found to be inaccessible due to server downtime, incorrect links or problems with the host database management system. One possible explanation for this

  6. Finding the Maine Story in Hugh Cumbersome National Monitoring Datasets

    Science.gov (United States)

    What’s a manager, analyst, or concerned citizen to do with the complex datasets generated by State and Federal monitoring efforts? Is it possible to use such information to address Maine’s environmental issues without having a degree in informatics and statistics? This presentati...

  7. Basin-scale water-balance dataset (BSWB): an update

    Science.gov (United States)

    Hirschi, Martin; Seneviratne, Sonia I.

    2017-04-01

    This contribution presents an update of a basin-scale diagnostic dataset of monthly variations in terrestrial water storage for large river basins worldwide (BSWB v2016; Hirschi et al., in review). Terrestrial water storage comprises all forms of water storage on land surfaces, and its seasonal and inter-annual variations are mostly determined by soil moisture, groundwater, snow cover, and surface water. The presented dataset is derived using a combined atmospheric and terrestrial water-balance approach with conventional streamflow measurements and re-analysis data of atmospheric moisture flux convergence and water vapor content. It extends a previous existing version of the dataset (Mueller et al., 2011) temporally and spatially. Comparison of BSWB v2016 to independent estimates of terrestrial water storage from the Gravity Recovery and Climate Experiment (GRACE) show good agreement. Hirschi, M., and Seneviratne, S. I.: Basin-scale water-balance dataset (BSWB): an update. Earth Syst. Sci. Data Discuss., doi:10.5194/essd-2016-33, in review, 2016. Mueller, B., Hirschi, M., and Seneviratne, S. I.: New diagnostic estimates of variations in terrestrial water storage based on ERA-Interim data. Hydrol. Process., 25, 996-1008, doi:10.1002/hyp.7652, 2011.

  8. Image dataset for testing search and detection models

    NARCIS (Netherlands)

    Toet, A.; Bijl, P.; Valeton, J.M.

    2001-01-01

    The TNO Human Factors Searchû2 image dataset consists of: a set of 44 high-resolution digital color images of different complex natural scenes, the ground truth corresponding to each of these scenes, and the results of psychophysical experiments on each of these images. The images in the Searchû2

  9. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...

  10. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    The dataset includes the major ion chemical composition and numerous calculated variables that can, amongst others, be used to determine accuracy of the analysis. The methods described here have potential for improving quality control measures in water chemistry laboratories by detecting anomalous samples.

  11. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  12. The IZA Evaluation Dataset Survey: A Scientific Use File

    NARCIS (Netherlands)

    P., Arni,; Caliendo, M.; Künn, Steffen; Zimmermann, K.F.

    2014-01-01

    This reference paper describes the sampling and contents of the IZA Evaluation Dataset Survey and outlines its vast potential for research in labor economics. The data have been part of a unique IZA project to connect administrative data from the German Federal Employment Agency with innovative

  13. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  14. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  15. DATS, the data tag suite to enable discoverability of datasets.

    Science.gov (United States)

    Sansone, Susanna-Assunta; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Alter, George; Grethe, Jeffrey S; Xu, Hua; Fore, Ian M; Lyle, Jared; Gururaj, Anupama E; Chen, Xiaoling; Kim, Hyeon-Eui; Zong, Nansu; Li, Yueling; Liu, Ruiling; Ozyurt, I Burak; Ohno-Machado, Lucila

    2017-06-06

    Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.

  16. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based o...

  17. Automated single particle detection and tracking for large microscopy datasets.

    Science.gov (United States)

    Wilson, Rhodri S; Yang, Lei; Dun, Alison; Smyth, Annya M; Duncan, Rory R; Rickman, Colin; Lu, Weiping

    2016-05-01

    Recent advances in optical microscopy have enabled the acquisition of very large datasets from living cells with unprecedented spatial and temporal resolutions. Our ability to process these datasets now plays an essential role in order to understand many biological processes. In this paper, we present an automated particle detection algorithm capable of operating in low signal-to-noise fluorescence microscopy environments and handling large datasets. When combined with our particle linking framework, it can provide hitherto intractable quantitative measurements describing the dynamics of large cohorts of cellular components from organelles to single molecules. We begin with validating the performance of our method on synthetic image data, and then extend the validation to include experiment images with ground truth. Finally, we apply the algorithm to two single-particle-tracking photo-activated localization microscopy biological datasets, acquired from living primary cells with very high temporal rates. Our analysis of the dynamics of very large cohorts of 10 000 s of membrane-associated protein molecules show that they behave as if caged in nanodomains. We show that the robustness and efficiency of our method provides a tool for the examination of single-molecule behaviour with unprecedented spatial detail and high acquisition rates.

  18. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  19. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  20. Malaysian sign language dataset for automatic sign language ...

    African Journals Online (AJOL)

    advancements in computing technologies have the potential to be applied in the field of SL recognition. These computer-based approaches are able to translate the SL into verbal language and vice-versa. This paper describes the development of a dataset for an automated. SL recognition system based on the Malaysian ...

  1. Performance evaluation of Apache Mahout for mining large datasets

    OpenAIRE

    Bogza, Adriana Maria

    2016-01-01

    The main purpose of this project is to evaluate the performance of the Apache Mahout library, that contains data mining algorithms for data processing, using a twitter dataset. Performance is evaluated in terms of processing time, in-memory usage, I/O performance and algorithmic accuracy.

  2. A dataset of forest biomass structure for Eurasia

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-01

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  3. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  4. Homogenization of a surface solar radiation dataset over Italy

    Science.gov (United States)

    Manara, Veronica; Brunetti, Michele; Maugeri, Maurizio; Sanchez-Lorenzo, Arturo; Wild, Martin

    2017-02-01

    Observational data cannot be used for climate research without a clear knowledge about the state of the data in terms of temporal homogeneity. The main steps and results of the homogenization procedure applied to a surface solar radiation dataset over the Italian territory for the period 1959-2013 are discussed.

  5. Large Dataset of Acute Oral Toxicity Data Created for Testing ...

    Science.gov (United States)

    Acute toxicity data is a common requirement for substance registration in the US. Currently only data derived from animal tests are accepted by regulatory agencies, and the standard in vivo tests use lethality as the endpoint. Non-animal alternatives such as in silico models are being developed due to animal welfare and resource considerations. We compiled a large dataset of oral rat LD50 values to assess the predictive performance currently available in silico models. Our dataset combines LD50 values from five different sources: literature data provided by The Dow Chemical Company, REACH data from eChemportal, HSDB (Hazardous Substances Data Bank), RTECS data from Leadscope, and the training set underpinning TEST (Toxicity Estimation Software Tool). Combined these data sources yield 33848 chemical-LD50 pairs (data points), with 23475 unique data points covering 16439 compounds. The entire dataset was loaded into a chemical properties database. All of the compounds were registered in DSSTox and 59.5% have publically available structures. Compounds without a structure in DSSTox are currently having their structures registered. The structural data will be used to evaluate the predictive performance and applicable chemical domains of three QSAR models (TIMES, PROTOX, and TEST). Future work will combine the dataset with information from ToxCast assays, and using random forest modeling, assess whether ToxCast assays are useful in predicting acute oral toxicity. Pre

  6. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  7. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  8. Use of country of birth as an indicator of refugee background in health datasets

    Science.gov (United States)

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  9. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  10. VENNTURE--a novel Venn diagram investigational tool for multiple pharmacological dataset analysis.

    Directory of Open Access Journals (Sweden)

    Bronwen Martin

    Full Text Available As pharmacological data sets become increasingly large and complex, new visual analysis and filtering programs are needed to aid their appreciation. One of the most commonly used methods for visualizing biological data is the Venn diagram. Currently used Venn analysis software often presents multiple problems to biological scientists, in that only a limited number of simultaneous data sets can be analyzed. An improved appreciation of the connectivity between multiple, highly-complex datasets is crucial for the next generation of data analysis of genomic and proteomic data streams. We describe the development of VENNTURE, a program that facilitates visualization of up to six datasets in a user-friendly manner. This program includes versatile output features, where grouped data points can be easily exported into a spreadsheet. To demonstrate its unique experimental utility we applied VENNTURE to a highly complex parallel paradigm, i.e. comparison of multiple G protein-coupled receptor drug dose phosphoproteomic data, in multiple cellular physiological contexts. VENNTURE was able to reliably and simply dissect six complex data sets into easily identifiable groups for straightforward analysis and data output. Applied to complex pharmacological datasets, VENNTURE's improved features and ease of analysis are much improved over currently available Venn diagram programs. VENNTURE enabled the delineation of highly complex patterns of dose-dependent G protein-coupled receptor activity and its dependence on physiological cellular contexts. This study highlights the potential for such a program in fields such as pharmacology, genomics, and bioinformatics.

  11. Systematic reviews on leptospirosis

    OpenAIRE

    Guidugli, Fabio; Castro, Aldemar Araujo [UNIFESP; Atallah, Álvaro Nagib [UNIFESP

    2000-01-01

    OBJECTIVES: To find the existing clinical evidence on interventions for leptospirosis. The objective is to evaluate the effectiveness and safety of any intervention on leptospirosis through systematic reviews of randomized controlled trials (RCTs). DATA SOURCE: The sources of studies used (where there were no limitations concerning language, date, or other restrictions) were: EMBASE, LILACS, MEDLINE, the Cochrane Controlled Clinical Trials Database, and the Cochrane Hepato-Biliary Group Ra...

  12. Systematic review

    DEFF Research Database (Denmark)

    Bager, Palle; Chauhan, Usha; Greveson, Kay

    2017-01-01

    OBJECTIVE: Advice lines for patients with inflammatory bowel diseases (IBD) have been introduced internationally. However, only a few publications have described the advice line service and evaluated the efficiency of it with many results presented as conference posters. A systematic synthesis...... of evidence is needed and the aim of this article was to systematically review the evidence of IBD advice lines. MATERIALS AND METHODS: A broad systematic literature search was performed to identify relevant studies addressing the effect of advice lines. The process of selection of the retrieved studies...... congress abstracts were included in the review. The studies were heterogeneous both in scientific quality and in the focus of the study. No rigorous evidence was found to support that advice lines improve disease activity in IBD and correspondingly no studies reported worsening in disease activity. Advice...

  13. Status and Preliminary Evaluation for Chinese Re-Analysis Datasets

    Science.gov (United States)

    bin, zhao; chunxiang, shi; tianbao, zhao; dong, si; jingwei, liu

    2016-04-01

    Based on operational T639L60 spectral model, combined with Hybird_GSI assimilation system by using meteorological observations including radiosondes, buoyes, satellites el al., a set of Chinese Re-Analysis (CRA) datasets is developing by Chinese National Meteorological Information Center (NMIC) of Chinese Meteorological Administration (CMA). The datasets are run at 30km (0.28°latitude / longitude) resolution which holds higher resolution than most of the existing reanalysis dataset. The reanalysis is done in an effort to enhance the accuracy of historical synoptic analysis and aid to find out detailed investigation of various weather and climate systems. The current status of reanalysis is in a stage of preliminary experimental analysis. One-year forecast data during Jun 2013 and May 2014 has been simulated and used in synoptic and climate evaluation. We first examine the model prediction ability with the new assimilation system, and find out that it represents significant improvement in Northern and Southern hemisphere, due to addition of new satellite data, compared with operational T639L60 model, the effect of upper-level prediction is improved obviously and overall prediction stability is enhanced. In climatological analysis, compared with ERA-40, NCEP/NCAR and NCEP/DOE reanalyses, the results show that surface temperature simulates a bit lower in land and higher over ocean, 850-hPa specific humidity reflects weakened anomaly and the zonal wind value anomaly is focus on equatorial tropics. Meanwhile, the reanalysis dataset shows good ability for various climate index, such as subtropical high index, ESMI (East-Asia subtropical Summer Monsoon Index) et al., especially for the Indian and western North Pacific monsoon index. Latter we will further improve the assimilation system and dynamical simulating performance, and obtain 40-years (1979-2018) reanalysis datasets. It will provide a more comprehensive analysis for synoptic and climate diagnosis.

  14. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  15. Management and assimilation of diverse, distributed watershed datasets

    Science.gov (United States)

    Varadharajan, C.; Faybishenko, B.; Versteeg, R.; Agarwal, D.; Hubbard, S. S.; Hendrix, V.

    2016-12-01

    The U.S. Department of Energy's (DOE) Watershed Function Scientific Focus Area (SFA) seeks to determine how perturbations to mountainous watersheds (e.g., floods, drought, early snowmelt) impact the downstream delivery of water, nutrients, carbon, and metals over seasonal to decadal timescales. We are building a software platform that enables integration of diverse and disparate field, laboratory, and simulation datasets, of various types including hydrological, geological, meteorological, geophysical, geochemical, ecological and genomic datasets across a range of spatial and temporal scales within the Rifle floodplain and the East River watershed, Colorado. We are using agile data management and assimilation approaches, to enable web-based integration of heterogeneous, multi-scale dataSensor-based observations of water-level, vadose zone and groundwater temperature, water quality, meteorology as well as biogeochemical analyses of soil and groundwater samples have been curated and archived in federated databases. Quality Assurance and Quality Control (QA/QC) are performed on priority datasets needed for on-going scientific analyses, and hydrological and geochemical modeling. Automated QA/QC methods are used to identify and flag issues in the datasets. Data integration is achieved via a brokering service that dynamically integrates data from distributed databases via web services, based on user queries. The integrated results are presented to users in a portal that enables intuitive search, interactive visualization and download of integrated datasets. The concepts, approaches and codes being used are shared across various data science components of various large DOE-funded projects such as the Watershed Function SFA, Next Generation Ecosystem Experiment (NGEE) Tropics, Ameriflux/FLUXNET, and Advanced Simulation Capability for Environmental Management (ASCEM), and together contribute towards DOE's cyberinfrastructure for data management and model-data integration.

  16. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  17. Sharing is caring? Measurement error and the issues arising from combining 3D morphometric datasets.

    Science.gov (United States)

    Fruciano, Carmelo; Celik, Mélina A; Butler, Kaylene; Dooley, Tom; Weisbecker, Vera; Phillips, Matthew J

    2017-09-01

    Geometric morphometrics is routinely used in ecology and evolution and morphometric datasets are increasingly shared among researchers, allowing for more comprehensive studies and higher statistical power (as a consequence of increased sample size). However, sharing of morphometric data opens up the question of how much nonbiologically relevant variation (i.e., measurement error) is introduced in the resulting datasets and how this variation affects analyses. We perform a set of analyses based on an empirical 3D geometric morphometric dataset. In particular, we quantify the amount of error associated with combining data from multiple devices and digitized by multiple operators and test for the presence of bias. We also extend these analyses to a dataset obtained with a recently developed automated method, which does not require human-digitized landmarks. Further, we analyze how measurement error affects estimates of phylogenetic signal and how its effect compares with the effect of phylogenetic uncertainty. We show that measurement error can be substantial when combining surface models produced by different devices and even more among landmarks digitized by different operators. We also document the presence of small, but significant, amounts of nonrandom error (i.e., bias). Measurement error is heavily reduced by excluding landmarks that are difficult to digitize. The automated method we tested had low levels of error, if used in combination with a procedure for dimensionality reduction. Estimates of phylogenetic signal can be more affected by measurement error than by phylogenetic uncertainty. Our results generally highlight the importance of landmark choice and the usefulness of estimating measurement error. Further, measurement error may limit comparisons of estimates of phylogenetic signal across studies if these have been performed using different devices or by different operators. Finally, we also show how widely held assumptions do not always hold true

  18. City Limits, 2004, East Baton Rouge Parish, Louisiana

    Data.gov (United States)

    Louisiana Geographic Information Center — This is a graphical polygon dataset depicting the polygon boundaries of the incorporated city limits of Baton Rouge, Baker, and Zachary within East Baton Rouge...

  19. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Science.gov (United States)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  20. Systematic review

    DEFF Research Database (Denmark)

    Lødrup, Anders Bergh; Reimer, Christina; Bytzer, Peter

    2013-01-01

    in getting off acid-suppressive medication and partly explain the increase in long-term use of PPI. A number of studies addressing this issue have been published recently. The authors aimed to systematically review the existing evidence of clinically relevant symptoms caused by acid rebound following PPI...

  1. Age Limits

    OpenAIRE

    Jan Antfolk

    2017-01-01

    Whereas women of all ages prefer slightly older sexual partners, men—regardless of their age—have a preference for women in their 20s. Earlier research has suggested that this difference between the sexes’ age preferences is resolved according to women’s preferences. This research has not, however, sufficiently considered that the age range of considered partners might change over the life span. Here we investigated the age limits (youngest and oldest) of considered and actual sex partners in...

  2. A nonparametric statistical technique for combining global precipitation datasets: development and hydrological evaluation over the Iberian Peninsula

    Directory of Open Access Journals (Sweden)

    M. A. E. Bhuiyan

    2018-02-01

    Full Text Available This study investigates the use of a nonparametric, tree-based model, quantile regression forests (QRF, for combining multiple global precipitation datasets and characterizing the uncertainty of the combined product. We used the Iberian Peninsula as the study area, with a study period spanning 11 years (2000–2010. Inputs to the QRF model included three satellite precipitation products, CMORPH, PERSIANN, and 3B42 (V7; an atmospheric reanalysis precipitation and air temperature dataset; satellite-derived near-surface daily soil moisture data; and a terrain elevation dataset. We calibrated the QRF model for two seasons and two terrain elevation categories and used it to generate ensemble for these conditions. Evaluation of the combined product was based on a high-resolution, ground-reference precipitation dataset (SAFRAN available at 5 km 1 h−1 resolution. Furthermore, to evaluate relative improvements and the overall impact of the combined product in hydrological response, we used the generated ensemble to force a distributed hydrological model (the SURFEX land surface model and the RAPID river routing scheme and compared its streamflow simulation results with the corresponding simulations from the individual global precipitation and reference datasets. We concluded that the proposed technique could generate realizations that successfully encapsulate the reference precipitation and provide significant improvement in streamflow simulations, with reduction in systematic and random error on the order of 20–99 and 44–88 %, respectively, when considering the ensemble mean.

  3. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  4. Adaptive Gaussian Predictive Process Models for Large Spatial Datasets

    Science.gov (United States)

    Guhaniyogi, Rajarshi; Finley, Andrew O.; Banerjee, Sudipto; Gelfand, Alan E.

    2011-01-01

    Large point referenced datasets occur frequently in the environmental and natural sciences. Use of Bayesian hierarchical spatial models for analyzing these datasets is undermined by onerous computational burdens associated with parameter estimation. Low-rank spatial process models attempt to resolve this problem by projecting spatial effects to a lower-dimensional subspace. This subspace is determined by a judicious choice of “knots” or locations that are fixed a priori. One such representation yields a class of predictive process models (e.g., Banerjee et al., 2008) for spatial and spatial-temporal data. Our contribution here expands upon predictive process models with fixed knots to models that accommodate stochastic modeling of the knots. We view the knots as emerging from a point pattern and investigate how such adaptive specifications can yield more flexible hierarchical frameworks that lead to automated knot selection and substantial computational benefits. PMID:22298952

  5. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  6. Serial femtosecond crystallography datasets from G protein-coupled receptors.

    Science.gov (United States)

    White, Thomas A; Barty, Anton; Liu, Wei; Ishchenko, Andrii; Zhang, Haitao; Gati, Cornelius; Zatsepin, Nadia A; Basu, Shibom; Oberthür, Dominik; Metz, Markus; Beyerlein, Kenneth R; Yoon, Chun Hong; Yefanov, Oleksandr M; James, Daniel; Wang, Dingjie; Messerschmidt, Marc; Koglin, Jason E; Boutet, Sébastien; Weierstall, Uwe; Cherezov, Vadim

    2016-08-01

    We describe the deposition of four datasets consisting of X-ray diffraction images acquired using serial femtosecond crystallography experiments on microcrystals of human G protein-coupled receptors, grown and delivered in lipidic cubic phase, at the Linac Coherent Light Source. The receptors are: the human serotonin receptor 2B in complex with an agonist ergotamine, the human δ-opioid receptor in complex with a bi-functional peptide ligand DIPP-NH2, the human smoothened receptor in complex with an antagonist cyclopamine, and finally the human angiotensin II type 1 receptor in complex with the selective antagonist ZD7155. All four datasets have been deposited, with minimal processing, in an HDF5-based file format, which can be used directly for crystallographic processing with CrystFEL or other software. We have provided processing scripts and supporting files for recent versions of CrystFEL, which can be used to validate the data.

  7. European industrial water use: a new dataset with high spatial and sectorial detail

    Science.gov (United States)

    Bernhard, Jeroen; Reynaud, Arnaud; de Roo, Ad; Karssenberg, Derek

    2017-04-01

    One of the most important components of the water balance in terms of water scarcity modelling is an accurate quantification of water abstractions by water using sectors. Data availability for this topic is sadly strikingly limited, most notably for the industry sector. Due to the lack of data, many global and continental scale modelling studies rely on relatively outdated water use datasets with course resolution which generally treat the industry sector as a single unit. The lack of spatial and sectorial detail hurts the local relevance and applicability of these large-scale models to the point that results might be meaningless for regional policy support, especially because economic assessments of potential water allocation policies require the separation of economic activities with different water use behavior and water productivity (industrial production per unit of water). With this work, we aim to solve this knowledge gap for Europe by providing a pan-European dataset with regional relevance of water use and water productivity values at the highest sectorial and spatial detail possible. We gathered industrial water use data from national statistical offices and other organizational bodies, separating ten different industry subsections of the NACE classification (Nomenclature of Economic Activities). Where data was not adequately available from national databases, we used complementary figures from EUROSTAT (official database of the European Commission). Then we used national GVA (Gross Value Added) to calculate water productivity values per country for all industrial subsections. As a final step, we used a database with locations and production records of nearly 20,000 individual industrial activities to proportionally distribute the national water use values for each industry section to roughly 1200 regions in Europe. This resulted in a pan-European dataset of water use at regional level and water productivity at the national level for ten industry sections

  8. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    Science.gov (United States)

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting

  9. Microscopic images dataset for automation of RBCs counting

    Directory of Open Access Journals (Sweden)

    Sherif Abbas

    2015-12-01

    Full Text Available A method for Red Blood Corpuscles (RBCs counting has been developed using RBCs light microscopic images and Matlab algorithm. The Dataset consists of Red Blood Corpuscles (RBCs images and there RBCs segmented images. A detailed description using flow chart is given in order to show how to produce RBCs mask. The RBCs mask was used to count the number of RBCs in the blood smear image.

  10. Identifying Food Insecurity in Africa Using Remote Sensing Datasets

    Science.gov (United States)

    Husak, G. J.; Davenport, F.; Shukla, S.; McNally, A.; Turner, W.

    2016-12-01

    The Famine Early Warning Systems Network (FEWS NET) monitors critical environmental variables that impact food production in developing countries, including over 30 in Africa. However, there is a notable lack of consistent quantitative data accurately capturing crop yields or the number of people facing food insecurity. The recently implemented Integrated Food Security Phase Classification (IPC) protocol seeks to address this issue through a set of protocols that define the severity of food security ranging from "None/Minimal" to "Humanitarian Catastrophe/Famine". The IPC framework considers both the severity of the hazard and the vulnerability of the population, as well as the four dimensions of food security (availability, access, utilization and stability). This framework is applied at a fairly fine sub-national level and its consistent application across national borders provides a large dataset to work with. This presentation reports on an ongoing project to examine correlations between a number of geophysical variables and IPC condition. These variables are rainfall, reference evapotranspiration (ETo) and soil moisture (SM), along with combinations of these variables, such as the Standardized Precipitation Evapotranspiration Index (SPEI). We use the Climate Hazards Group InfraRed Precipitation with Stations (CHIRPS) dataset as the rainfall product and an experimental ETo dataset generated using NASA's MERRA-2 atmospheric forcings. Measures of SM use simulations coming from the FEWSNET Land Data Assimilation System (FLDAS). The variables will be compared based on predicative accuracy of IPC and how that accuracy varies across regions and calendar year. The goal is to identify the optimal geophysical predictor of agricultural drought and food insecurity. The results of this research could prioritize the datasets used in identifying and quantifying food insecurity in Africa, and may allow for accurate and frequent updates of the food security conditions.

  11. The Changing Shape of Global Inequality - exploring a new dataset

    OpenAIRE

    Jan Luiten van Zanden; Joerg Baten; Peter Foldvari; Bas van Leeuwen

    2011-01-01

    A new dataset for estimating the development of global inequality between 1820 and 2000 is presented, based on a large variety of sources and methods for estimating (gross household) income inequality. On this basis, and two sets of benchmarks for estimating between-country inequality (the Maddison 1990 benchmark and the recent 2005 ICP round), we estimate the evolution of global income inequality and of the number of people below various poverty lines over the past two centuries. We find tha...

  12. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  13. Benchmark of Deep Learning Models on Large Healthcare MIMIC Datasets

    OpenAIRE

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2017-01-01

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking res...

  14. Circumpolar dataset of sequenced specimens of Promachocrinus kerguelensis (Echinodermata, Crinoidea).

    Science.gov (United States)

    Hemery, Lenaïg G; Améziane, Nadia; Eléaume, Marc

    2013-01-01

    This circumpolar dataset of the comatulid (Echinodermata: Crinoidea) Promachocrinus kerguelensis (Carpenter, 1888) from the Southern Ocean, documents biodiversity associated with the specimens sequenced in Hemery et al. (2012). The aim of Hemery et al. (2012) paper was to use phylogeographic and phylogenetic tools to assess the genetic diversity, demographic history and evolutionary relationships of this very common and abundant comatulid, in the context of the glacial history of the Antarctic and Sub-Antarctic shelves (Thatje et al. 2005, 2008). Over one thousand three hundred specimens (1307) used in this study were collected during seventeen cruises from 1996 to 2010, in eight regions of the Southern Ocean: Kerguelen Plateau, Davis Sea, Dumont d'Urville Sea, Ross Sea, Amundsen Sea, West Antarctic Peninsula, East Weddell Sea and Scotia Arc including the tip of the Antarctic Peninsula and the Bransfield Strait. We give here the metadata of this dataset, which lists sampling sources (cruise ID, ship name, sampling date, sampling gear), sampling sites (station, geographic coordinates, depth) and genetic data (phylogroup, haplotype, sequence ID) for each of the 1307 specimens. The identification of the specimens was controlled by an expert taxonomist specialist of crinoids (Marc Eléaume, Muséum national d'Histoire naturelle, Paris) and all the COI sequences were matched against those available on the Barcode of Life Data System (BOLD: http://www.boldsystems.org/index.php/IDS_OpenIdEngine). This dataset can be used by studies dealing with, among other interests, Antarctic and/or crinoid diversity (species richness, distribution patterns), biogeography or habitat / ecological niche modeling. This dataset is accessible through the GBIF network at http://ipt.biodiversity.aq/resource.do?r=proke.

  15. Systematic reviews on leptospirosis

    Directory of Open Access Journals (Sweden)

    GUIDUGLI Fabio

    2000-01-01

    Full Text Available OBJECTIVES: To find the existing clinical evidence on interventions for leptospirosis. The objective is to evaluate the effectiveness and safety of any intervention on leptospirosis through systematic reviews of randomized controlled trials (RCTs. DATA SOURCE: The sources of studies used (where there were no limitations concerning language, date, or other restrictions were: EMBASE, LILACS, MEDLINE, the Cochrane Controlled Clinical Trials Database, and the Cochrane Hepato-Biliary Group Randomized Trials register. SELECTION OF STUDIES: Type of Study: All systematic reviews of randomized controlled trials. Participants: patients with clinical and/or laboratorial diagnosis of leptospirosis, and subjects potencially exposed to leptospirosis as defined by the authors Interventions: any intervention for leptospirosis (as antibiotics or vaccines for prevention or treatment. DATA COLLECTION: The assessment will be independently made by the reviewers and cross-checked. The external validity was assessed by analysis of: studies, interventions, and outcomes. DATA SYNTHESIS: Located 163 studies using the search strategy described above, at the electronic databases above. Only 2 hits were selected, which are protocols of systematic reviews of Cochrane Collaboration, and not full reviews. One of the protocols evaluates antibiotics for treatment, and the other evaluates antibiotics for prevention of leptospirosis. CONCLUSIONS: There were not complete systematic reviews on interventions for leptospirosis. Any interventions for leptospirosis, such as prevention and treatment remains unclear for guidelines and practice.

  16. A gridded dataset of hourly precipitation in Germany: Its construction, climatology and application

    Directory of Open Access Journals (Sweden)

    Marcus Paulat

    2008-12-01

    Full Text Available A so-called disaggregation technique is used to combine daily rain gauge measurements and hourly radar composites in order to produce a dataset of hourly precipitation in Germany on a grid with a horizontal resolution of 7 km for the years 2001-2004. This state-of-the-art observation-based dataset of precipitation has a high temporal and spatial resolution and will be extended continuously during the upcoming years. Limitations of its quality, which are due to intrinsic problems with observing the highly variable field of precipitation, are discussed and quantified where possible. The dataset offers novel possibilities to investigate the climatology of precipitation and to verify precipitation forecasts from numerical weather prediction models. The frequency of hourly precipitation in Germany above the detection limit of 0.1 mm/h amounts to 10-30 % in winter, with clear maxima in the mountainous regions, and to 6-20 % in summer, when the spatial variability is considerably reduced. The 95th percentile of the frequency distribution is significantly larger in summer than in winter, with local maxima in the mountainous regions in winter, and in the Alpine Foreland and upper Elbe catchment in summer. It is shown that the operational model COSMO-7 with a horizontal resolution of 7 km captures the geographical distribution of the frequency and of the 95th percentile of hourly precipitation in Germany very well. In contrast, the model is not able to realistically simulate the diurnal cycle of precipitation in any region of Germany during summer.

  17. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    Science.gov (United States)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  18. Development of an Operational TS Dataset Production System for the Data Assimilation System

    Science.gov (United States)

    Kim, Sung Dae; Park, Hyuk Min; Kim, Young Ho; Park, Kwang Soon

    2017-04-01

    An operational TS (Temperature and Salinity) dataset production system was developed to provide near real-time data to the data assimilation system periodically. It collects the latest 15 days' TS data of the north western pacific area (20°N - 55°N, 110°E - 150°E), applies QC tests to the archived data and supplies them to numerical prediction models of KIOST (Korea Institute of Ocean Science and Technology). The latest real-time TS data are collected from Argo GDAC and GTSPP data server every week. Argo data are downloaded from /latest_data directory of Argo GDAC. Because many duplicated data exist when all profile data are extracted from all Argo netCDF files, DB system is used to avoid duplication. All metadata (float ID, location, observation date and time, etc) of all Argo floats is stored into Database system and a Matlab program was developed to manipulate DB data, to check the duplication and to exclude duplicated data. GTSPP data are downloaded from /realtime directory of GTSPP data service. The latest data except ARGO data are extracted from the original data. Another Matlab program was coded to inspect all collected data using 10 QC tests and produce final dataset which can be used by the assimilation system. Three regional range tests to inspect annual, seasonal and monthly variations are included in the QC procedures. The C program was developed to provide regional ranges to data managers. It can calculate upper limit and lower limit of temperature and salinity at depth from 0 to 1550m. The final TS dataset contains the latest 15 days' TS data in netCDF format. It is updated every week and transmitted to numerical modeler of KIOST for operational use.

  19. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  20. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  1. A dataset on human navigation strategies in foreign networked systems

    Science.gov (United States)

    Kőrösi, Attila; Csoma, Attila; Rétvári, Gábor; Heszberger, Zalán; Bíró, József; Tapolcai, János; Pelle, István; Klajbár, Dávid; Novák, Márton; Halasi, Valentina; Gulyás, András

    2018-03-01

    Humans are involved in various real-life networked systems. The most obvious examples are social and collaboration networks but the language and the related mental lexicon they use, or the physical map of their territory can also be interpreted as networks. How do they find paths between endpoints in these networks? How do they obtain information about a foreign networked world they find themselves in, how they build mental model for it and how well they succeed in using it? Large, open datasets allowing the exploration of such questions are hard to find. Here we report a dataset collected by a smartphone application, in which players navigate between fixed length source and destination English words step-by-step by changing only one letter at a time. The paths reflect how the players master their navigation skills in such a foreign networked world. The dataset can be used in the study of human mental models for the world around us, or in a broader scope to investigate the navigation strategies in complex networked systems.

  2. Enhanced Data Discoverability for in Situ Hyperspectral Datasets

    Science.gov (United States)

    Rasaiah, B.; Bellman, C.; Hewson, R. D.; Jones, S. D.; Malthus, T. J.

    2016-06-01

    Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015) with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  3. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  4. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  5. Synchronization of networks of chaotic oscillators: Structural and dynamical datasets

    Directory of Open Access Journals (Sweden)

    Ricardo Sevilla-Escoboza

    2016-06-01

    Full Text Available We provide the topological structure of a series of N=28 Rössler chaotic oscillators diffusively coupled through one of its variables. The dynamics of the y variable describing the evolution of the individual nodes of the network are given for a wide range of coupling strengths. Datasets capture the transition from the unsynchronized behavior to the synchronized one, as a function of the coupling strength between oscillators. The fact that both the underlying topology of the system and the dynamics of the nodes are given together makes this dataset a suitable candidate to evaluate the interplay between functional and structural networks and serve as a benchmark to quantify the ability of a given algorithm to extract the structural network of connections from the observation of the dynamics of the nodes. At the same time, it is possible to use the dataset to analyze the different dynamical properties (randomness, complexity, reproducibility, etc. of an ensemble of oscillators as a function of the coupling strength.

  6. Automatic run-time provenance capture for scientific dataset generation

    Science.gov (United States)

    Frew, J.; Slaughter, P.

    2008-12-01

    Provenance---the directed graph of a dataset's processing history---is difficult to capture effectively. Human- generated provenance, as narrative metadata, is labor-intensive and thus often incorrect, incomplete, or simply not recorded. Workflow systems capture some provenance implicitly in workflow specifications, but these systems are not ubiquitous or standardized, and a workflow specification may not capture all of the factors involved in a dataset's production. System audit trails capture potentially all processing activities, but not the relationships between them. We describe a system that transparently (i.e., without any modification to science codes) and automatically (i.e. without any human intervention) captures the low-level interactions (files read/written, parameters accessed, etc.) between scientific processes, and then synthesizes these relationships into a provenance graph. This system---the Earth System Science Server (ES3)---is sufficiently general that it can accommodate any combination of stand-alone programs, interpreted codes (e.g. IDL), and command- language scripts. Provenance in ES3 can be published in well-defined XML formats (including formats suitable for graphical visualization), and queried to determine the ancestors or descendants of any specific data file or process invocation. We demonstrate how ES3 can be used to capture the provenance of a large operational ocean color dataset.

  7. Evaluation of Greenland near surface air temperature datasets

    Science.gov (United States)

    Reeves Eyre, J. E. Jack; Zeng, Xubin

    2017-07-01

    Near-surface air temperature (SAT) over Greenland has important effects on mass balance of the ice sheet, but it is unclear which SAT datasets are reliable in the region. Here extensive in situ SAT measurements ( ˜ 1400 station-years) are used to assess monthly mean SAT from seven global reanalysis datasets, five gridded SAT analyses, one satellite retrieval and three dynamically downscaled reanalyses. Strengths and weaknesses of these products are identified, and their biases are found to vary by season and glaciological regime. MERRA2 reanalysis overall performs best with mean absolute error less than 2 °C in all months. Ice sheet-average annual mean SAT from different datasets are highly correlated in recent decades, but their 1901-2000 trends differ even in sign. Compared with the MERRA2 climatology combined with gridded SAT analysis anomalies, thirty-one earth system model historical runs from the CMIP5 archive reach ˜ 5 °C for the 1901-2000 average bias and have opposite trends for a number of sub-periods.

  8. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  9. Clustering of Biological Datasets in the Era of Big Data

    Directory of Open Access Journals (Sweden)

    Röttger Richard

    2016-03-01

    Full Text Available Clustering is a long-standing problem in computer science and is applied in virtually any scientific field for exploring the inherent structure of datasets. In biomedical research, clustering tools have been utilized in manifold areas, among many others in expression analysis, disease subtyping or protein research. A plethora of different approaches have been developed but there is only little guideline what approach is the optimal in what particular situation. Furthermore, a typical cluster analysis is an entire process with several highly interconnected steps; from preprocessing, proximity calculation, the actual clustering to evaluation and optimization. Only when all steps seamlessly work together, an optimal result can be achieved. This renders a cluster analyses tiresome and error-prone especially for non-experts. A mere trial-and-error approach renders increasingly infeasible when considering the tremendous growth of available datasets; thus, a strategic and thoughtful course of action is crucial for a cluster analysis. This manuscript provides an overview of the crucial steps and the most common techniques involved in conducting a state-of-the-art cluster analysis of biomedical datasets.

  10. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  11. Systematic Avocating

    Directory of Open Access Journals (Sweden)

    Jan Green

    2014-12-01

    Full Text Available Feeling obliged to undertake complex research tasks outside core working hours is a common occurrence in academia. Detailed and timely research projects are expected; the creation and defence of sufficient intervals within a crowded working schedule is one concern explored in this short version paper. Merely working longer hours fails to provide a satisfactory solution for individuals experiencing concerns of this nature. Personal effort and drive are utilised and requires the application of mental mustering and systematic procedures. The attitude to research work is treating the task as a hobby conceptualised as avocating. Whilst this provides a personal solution through immersion in the task, this approach should raise concerns for employers. The flexibility of grounded theory is evident and the freedom to draw on various bodies of knowledge provides fresh insight into a problem that occurs in organizations in many sectors experiencing multiple priorities. The application of the core category, systematic avocating, may prove beneficial.

  12. Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation

    Science.gov (United States)

    Liu, Qian; Pineda-García, Garibaldi; Stromatias, Evangelos; Serrano-Gotarredona, Teresa; Furber, Steve B.

    2016-01-01

    Today, increasing attention is being paid to research into spike-based neural computation both to gain a better understanding of the brain and to explore biologically-inspired computation. Within this field, the primate visual pathway and its hierarchical organization have been extensively studied. Spiking Neural Networks (SNNs), inspired by the understanding of observed biological structure and function, have been successfully applied to visual recognition and classification tasks. In addition, implementations on neuromorphic hardware have enabled large-scale networks to run in (or even faster than) real time, making spike-based neural vision processing accessible on mobile robots. Neuromorphic sensors such as silicon retinas are able to feed such mobile systems with real-time visual stimuli. A new set of vision benchmarks for spike-based neural processing are now needed to measure progress quantitatively within this rapidly advancing field. We propose that a large dataset of spike-based visual stimuli is needed to provide meaningful comparisons between different systems, and a corresponding evaluation methodology is also required to measure the performance of SNN models and their hardware implementations. In this paper we first propose an initial NE (Neuromorphic Engineering) dataset based on standard computer vision benchmarksand that uses digits from the MNIST database. This dataset is compatible with the state of current research on spike-based image recognition. The corresponding spike trains are produced using a range of techniques: rate-based Poisson spike generation, rank order encoding, and recorded output from a silicon retina with both flashing and oscillating input stimuli. In addition, a complementary evaluation methodology is presented to assess both model-level and hardware-level performance. Finally, we demonstrate the use of the dataset and the evaluation methodology using two SNN models to validate the performance of the models and their hardware

  13. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  14. Systematic study on the radiation exposure of flora and fauna in case of compliance with the dose limits of the StrlSchV (radiation protection regulation) for men. Final report

    International Nuclear Information System (INIS)

    Kueppers, Christian; Ustohalova, Veronika; Ulanovsky, Alexander

    2012-01-01

    Dose limits for members of the public exposed to the discharge of radioactive substances into the air or water bodies are defined in the German Radiation Protection Ordinance. This study tested whether non-human species are protected within the human dose limits for all 750 radionuclides as compared to a set of reference biota. External and, where possible, internal doses were calculated for the reference biota. In addition new exposure pathways such as submersion and inhalation (for rat and deer) were incorporated. The upper limit as ordered for adequate biota protection is 10 μGy/h. This study found that radionuclide discharges into the air never exceeded the reference dose rate limit. However, violations were detected for discharges of some very short-lived radionuclides into freshwater or seawater, if the maximum water contamination is assumed. Protection of non-human species is guaranteed for more realistic emission and immission situations. This means that damage to populations living in small water volumes cannot be excluded solely on the basis of regulations for the human dose limit. Therefore, it is necessary to judge the individual case in very unfavourable immission situations. (orig.)

  15. Reproducibility of studies on text mining for citation screening in systematic reviews: Evaluation and checklist.

    Science.gov (United States)

    Olorisade, Babatunde Kazeem; Brereton, Pearl; Andras, Peter

    2017-09-01

    Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and

  16. Developing a minimum dataset for nursing team leader handover in the intensive care unit: A focus group study.

    Science.gov (United States)

    Spooner, Amy J; Aitken, Leanne M; Corley, Amanda; Chaboyer, Wendy

    2018-01-01

    Despite increasing demand for structured processes to guide clinical handover, nursing handover tools are limited in the intensive care unit. The study aim was to identify key items to include in a minimum dataset for intensive care nursing team leader shift-to-shift handover. This focus group study was conducted in a 21-bed medical/surgical intensive care unit in Australia. Senior registered nurses involved in team leader handovers were recruited. Focus groups were conducted using a nominal group technique to generate and prioritise minimum dataset items. Nurses were presented with content from previous team leader handovers and asked to select which content items to include in a minimum dataset. Participant responses were summarised as frequencies and percentages. Seventeen senior nurses participated in three focus groups. Participants agreed that ISBAR (Identify-Situation-Background-Assessment-Recommendations) was a useful tool to guide clinical handover. Items recommended to be included in the minimum dataset (≥65% agreement) included Identify (name, age, days in intensive care), Situation (diagnosis, surgical procedure), Background (significant event(s), management of significant event(s)) and Recommendations (patient plan for next shift, tasks to follow up for next shift). Overall, 30 of the 67 (45%) items in the Assessment category were considered important to include in the minimum dataset and focused on relevant observations and treatment within each body system. Other non-ISBAR items considered important to include related to the ICU (admissions to ICU, staffing/skill mix, theatre cases) and patients (infectious status, site of infection, end of life plan). Items were further categorised into those to include in all handovers and those to discuss only when relevant to the patient. The findings suggest a minimum dataset for intensive care nursing team leader shift-to-shift handover should contain items within ISBAR along with unit and patient specific

  17. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    Science.gov (United States)

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  18. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

    Directory of Open Access Journals (Sweden)

    Odile Sauzet

    2017-07-01

    Full Text Available Abstract Background The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Methods Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. Results The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. Conclusions This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  19. Age Limits.

    Science.gov (United States)

    Antfolk, Jan

    2017-03-01

    Whereas women of all ages prefer slightly older sexual partners, men-regardless of their age-have a preference for women in their 20s. Earlier research has suggested that this difference between the sexes' age preferences is resolved according to women's preferences. This research has not, however, sufficiently considered that the age range of considered partners might change over the life span. Here we investigated the age limits (youngest and oldest) of considered and actual sex partners in a population-based sample of 2,655 adults (aged 18-50 years). Over the investigated age span, women reported a narrower age range than men and women tended to prefer slightly older men. We also show that men's age range widens as they get older: While they continue to consider sex with young women, men also consider sex with women their own age or older. Contrary to earlier suggestions, men's sexual activity thus reflects also their own age range, although their potential interest in younger women is not likely converted into sexual activity. Compared to homosexual men, bisexual and heterosexual men were more unlikely to convert young preferences into actual behavior, supporting female-choice theory.

  20. Age Limits

    Directory of Open Access Journals (Sweden)

    Jan Antfolk

    2017-01-01

    Full Text Available Whereas women of all ages prefer slightly older sexual partners, men—regardless of their age—have a preference for women in their 20s. Earlier research has suggested that this difference between the sexes’ age preferences is resolved according to women’s preferences. This research has not, however, sufficiently considered that the age range of considered partners might change over the life span. Here we investigated the age limits (youngest and oldest of considered and actual sex partners in a population-based sample of 2,655 adults (aged 18-50 years. Over the investigated age span, women reported a narrower age range than men and women tended to prefer slightly older men. We also show that men’s age range widens as they get older: While they continue to consider sex with young women, men also consider sex with women their own age or older. Contrary to earlier suggestions, men’s sexual activity thus reflects also their own age range, although their potential interest in younger women is not likely converted into sexual activity. Compared to homosexual men, bisexual and heterosexual men were more unlikely to convert young preferences into actual behavior, supporting female-choice theory.

  1. Rapid global fitting of large fluorescence lifetime imaging microscopy datasets.

    Directory of Open Access Journals (Sweden)

    Sean C Warren

    Full Text Available Fluorescence lifetime imaging (FLIM is widely applied to obtain quantitative information from fluorescence signals, particularly using Förster Resonant Energy Transfer (FRET measurements to map, for example, protein-protein interactions. Extracting FRET efficiencies or population fractions typically entails fitting data to complex fluorescence decay models but such experiments are frequently photon constrained, particularly for live cell or in vivo imaging, and this leads to unacceptable errors when analysing data on a pixel-wise basis. Lifetimes and population fractions may, however, be more robustly extracted using global analysis to simultaneously fit the fluorescence decay data of all pixels in an image or dataset to a multi-exponential model under the assumption that the lifetime components are invariant across the image (dataset. This approach is often considered to be prohibitively slow and/or computationally expensive but we present here a computationally efficient global analysis algorithm for the analysis of time-correlated single photon counting (TCSPC or time-gated FLIM data based on variable projection. It makes efficient use of both computer processor and memory resources, requiring less than a minute to analyse time series and multiwell plate datasets with hundreds of FLIM images on standard personal computers. This lifetime analysis takes account of repetitive excitation, including fluorescence photons excited by earlier pulses contributing to the fit, and is able to accommodate time-varying backgrounds and instrument response functions. We demonstrate that this global approach allows us to readily fit time-resolved fluorescence data to complex models including a four-exponential model of a FRET system, for which the FRET efficiencies of the two species of a bi-exponential donor are linked, and polarisation-resolved lifetime data, where a fluorescence intensity and bi-exponential anisotropy decay model is applied to the analysis

  2. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  3. Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

    Directory of Open Access Journals (Sweden)

    AnaM. Palacios

    2012-04-01

    Full Text Available Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassifications are able to produce knowledge bases that are also adequate under common metrics for imbalanced classification.

  4. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  5. Dataset of statements on policy integration of selected intergovernmental organizations.

    Science.gov (United States)

    Tosun, Jale; Peters, B Guy

    2018-04-01

    This article describes data for 78 intergovernmental organizations (IGOs) working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI), climate policy integration (CPI) and/or energy policy integration (EnPI).

  6. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  7. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  8. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  9. Power analysis dataset for QCA based multiplexer circuits

    Directory of Open Access Journals (Sweden)

    Md. Abdullah-Al-Shafi

    2017-04-01

    Full Text Available Power consumption in irreversible QCA logic circuits is a vital and a major issue; however in the practical cases, this focus is mostly omitted.The complete power depletion dataset of different QCA multiplexers have been worked out in this paper. At −271.15 °C temperature, the depletion is evaluated under three separate tunneling energy levels. All the circuits are designed with QCADesigner, a broadly used simulation engine and QCAPro tool has been applied for estimating the power dissipation.

  10. Identifying frauds and anomalies in Medicare-B dataset.

    Science.gov (United States)

    Jiwon Seo; Mendelevitch, Ofer

    2017-07-01

    Healthcare industry is growing at a rapid rate to reach a market value of $7 trillion dollars world wide. At the same time, fraud in healthcare is becoming a serious problem, amounting to 5% of the total healthcare spending, or $100 billion dollars each year in US. Manually detecting healthcare fraud requires much effort. Recently, machine learning and data mining techniques are applied to automatically detect healthcare frauds. This paper proposes a novel PageRank-based algorithm to detect healthcare frauds and anomalies. We apply the algorithm to Medicare-B dataset, a real-life data with 10 million healthcare insurance claims. The algorithm successfully identifies tens of previously unreported anomalies.

  11. Systematic review

    DEFF Research Database (Denmark)

    Christensen, Troels Dreier; Spindler, Karen-Lise Garm; Palshof, Jesper Andreas

    2016-01-01

    Background: Brain metastases (BM) from colorectal cancer (CRC) are a rare event. However, the implications for affected patients are severe, and the incidence has been reported to be increasing. For clinicians, knowledge about the characteristics associated with BM is important and could lead...... to earlier diagnosis and improved survival. Method: In this paper, we describe the incidence as well as characteristics associated with BM based on a systematic review of the current literature, following the PRISMA guidelines. Results: We show that the incidence of BM in CRC patients ranges from 0.6 to 3...... of brain involvement in patients with these characteristics is necessary....

  12. Systematic review

    DEFF Research Database (Denmark)

    Borup, H; Kirkeskov, L; Hanskov, Dorte Jessing Agerby

    2017-01-01

    : To assess the occurrence of COPD among construction workers. Methods: We performed a systematic search in PubMed and Embase between 1 January 1990 and 31 August 2016 in order to identify epidemiological studies with a risk estimate for either COPD morbidity/mortality or a spirometry-based definition....... Conclusions: This review suggests that COPD occurs more often among construction workers than among workers who are not exposed to construction dust. It is not possible to draw any conclusions on specific subgroups as most studies analysed construction workers as one united group. In addition, no potential...

  13. Gaussian processes retrieval of leaf parameters from a multi-species reflectance, absorbance and fluorescence dataset.

    Science.gov (United States)

    Van Wittenberghe, Shari; Verrelst, Jochem; Rivera, Juan Pablo; Alonso, Luis; Moreno, José; Samson, Roeland

    2014-05-05

    Biochemical and structural leaf properties such as chlorophyll content (Chl), nitrogen content (N), leaf water content (LWC), and specific leaf area (SLA) have the benefit to be estimated through nondestructive spectral measurements. Current practices, however, mainly focus on a limited amount of wavelength bands while more information could be extracted from other wavelengths in the full range (400-2500nm) spectrum. In this research, leaf characteristics were estimated from a field-based multi-species dataset, covering a wide range in leaf structures and Chl concentrations. The dataset contains leaves with extremely high Chl concentrations (>100μgcm(-2)), which are seldom estimated. Parameter retrieval was conducted with the machine learning regression algorithm Gaussian Processes (GP), which is able to perform adaptive, nonlinear data fitting for complex datasets. Moreover, insight in relevant bands is provided during the development of a regression model. Consequently, the physical meaning of the model can be explored. Best estimates of SLA, LWC and Chl yielded a best obtained normalized root mean square error of 6.0%, 7.7%, 9.1%, respectively. Several distinct wavebands were chosen across the whole spectrum. A band in the red edge (710nm) appeared to be most important for the estimation of Chl. Interestingly, spectral features related to biochemicals with a structural or carbon storage function (e.g. 1090, 1550, 1670, 1730nm) were found important not only for estimation of SLA, but also for LWC, Chl or N estimation. Similar, Chl estimation was also helped by some wavebands related to water content (950, 1430nm) due to correlation between the parameters. It is shown that leaf parameter retrieval by GP regression is successful, and able to cope with large structural differences between leaves. Copyright © 2014 Elsevier B.V. All rights reserved.

  14. Sensitivity of the interannual variability of mineral aerosol simulations to meteorological forcing dataset

    Science.gov (United States)

    Smith, Molly B.; Mahowald, Natalie M.; Albani, Samuel; Perry, Aaron; Losno, Remi; Qu, Zihan; Marticorena, Beatrice; Ridley, David A.; Heald, Colette L.

    2017-03-01

    Interannual variability in desert dust is widely observed and simulated, yet the sensitivity of these desert dust simulations to a particular meteorological dataset, as well as a particular model construction, is not well known. Here we use version 4 of the Community Atmospheric Model (CAM4) with the Community Earth System Model (CESM) to simulate dust forced by three different reanalysis meteorological datasets for the period 1990-2005. We then contrast the results of these simulations with dust simulated using online winds dynamically generated from sea surface temperatures, as well as with simulations conducted using other modeling frameworks but the same meteorological forcings, in order to determine the sensitivity of climate model output to the specific reanalysis dataset used. For the seven cases considered in our study, the different model configurations are able to simulate the annual mean of the global dust cycle, seasonality and interannual variability approximately equally well (or poorly) at the limited observational sites available. Overall, aerosol dust-source strength has remained fairly constant during the time period from 1990 to 2005, although there is strong seasonal and some interannual variability simulated in the models and seen in the observations over this time period. Model interannual variability comparisons to observations, as well as comparisons between models, suggest that interannual variability in dust is still difficult to simulate accurately, with averaged correlation coefficients of 0.1 to 0.6. Because of the large variability, at least 1 year of observations at most sites are needed to correctly observe the mean, but in some regions, particularly the remote oceans of the Southern Hemisphere, where interannual variability may be larger than in the Northern Hemisphere, 2-3 years of data are likely to be needed.

  15. Quantitative super-resolution single molecule microscopy dataset of YFP-tagged growth factor receptors.

    Science.gov (United States)

    Lukeš, Tomáš; Pospíšil, Jakub; Fliegel, Karel; Lasser, Theo; Hagen, Guy M

    2018-01-19

    Super-resolution single molecule localization microscopy (SMLM) is a method for achieving resolution beyond the classical limit in optical microscopes (approx. 200 nm laterally). Yellow fluorescent protein (YFP) has been used for super-resolution single molecule localization microscopy, but less frequently than other fluorescent probes. Working with YFP in SMLM is a challenge because a lower number of photons are emitted per molecule compared to organic dyes which are more commonly used. Publically available experimental data can facilitate development of new data analysis algorithms. Four complete, freely available single molecule super-resolution microscopy datasets on YFP-tagged growth factor receptors expressed in a human cell line are presented including both raw and analyzed data. We report methods for sample preparation, for data acquisition, and for data analysis, as well as examples of the acquired images. We also analyzed the SMLM data sets using a different method: super-resolution optical fluctuation imaging (SOFI). The two modes of analysis offer complementary information about the sample. A fifth single molecule super-resolution microscopy dataset acquired with the dye Alexa 532 is included for comparison purposes. This dataset has potential for extensive reuse. Complete raw data from SMLM experiments has typically not been published. The YFP data exhibits low signal to noise ratios, making data analysis a challenge. These data sets will be useful to investigators developing their own algorithms for SMLM, SOFI, and related methods. The data will also be useful for researchers investigating growth factor receptors such as ErbB3. © The Author(s) 2018. Published by Oxford University Press.

  16. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

    Science.gov (United States)

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  17. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  18. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  19. Predicting weather regime transitions in Northern Hemisphere datasets

    Energy Technology Data Exchange (ETDEWEB)

    Kondrashov, D. [University of California, Department of Atmospheric and Oceanic Sciences and Institute of Geophysics and Planetary Physics, Los Angeles, CA (United States); Shen, J. [UCLA, Department of Statistics, Los Angeles, CA (United States); Berk, R. [UCLA, Department of Statistics, Los Angeles, CA (United States); University of Pennsylvania, Department of Criminology, Philadelphia, PA (United States); D' Andrea, F.; Ghil, M. [Ecole Normale Superieure, Departement Terre-Atmosphere-Ocean and Laboratoire de Meteorologie Dynamique (CNRS and IPSL), Paris Cedex 05 (France)

    2007-10-15

    A statistical learning method called random forests is applied to the prediction of transitions between weather regimes of wintertime Northern Hemisphere (NH) atmospheric low-frequency variability. A dataset composed of 55 winters of NH 700-mb geopotential height anomalies is used in the present study. A mixture model finds that the three Gaussian components that were statistically significant in earlier work are robust; they are the Pacific-North American (PNA) regime, its approximate reverse (the reverse PNA, or RNA), and the blocked phase of the North Atlantic Oscillation (BNAO). The most significant and robust transitions in the Markov chain generated by these regimes are PNA {yields} BNAO, PNA {yields} RNA and BNAO {yields} PNA. The break of a regime and subsequent onset of another one is forecast for these three transitions. Taking the relative costs of false positives and false negatives into account, the random-forests method shows useful forecasting skill. The calculations are carried out in the phase space spanned by a few leading empirical orthogonal functions of dataset variability. Plots of estimated response functions to a given predictor confirm the crucial influence of the exit angle on a preferred transition path. This result points to the dynamic origin of the transitions. (orig.)

  20. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  1. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  2. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  3. Open Dataset for the Automatic Recognition of Sedentary Behaviors.

    Science.gov (United States)

    Possos, William; Cruz, Robinson; Cerón, Jesús D; López, Diego M; Sierra-Torres, Carlos H

    2017-01-01

    Sedentarism is associated with the development of noncommunicable diseases (NCD) such as cardiovascular diseases (CVD), type 2 diabetes, and cancer. Therefore, the identification of specific sedentary behaviors (TV viewing, sitting at work, driving, relaxing, etc.) is especially relevant for planning personalized prevention programs. To build and evaluate a public a dataset for the automatic recognition (classification) of sedentary behaviors. The dataset included data from 30 subjects, who performed 23 sedentary behaviors while wearing a commercial wearable on the wrist, a smartphone on the hip and another in the thigh. Bluetooth Low Energy (BLE) beacons were used in order to improve the automatic classification of different sedentary behaviors. The study also compared six well know data mining classification techniques in order to identify the more precise method of solving the classification problem of the 23 defined behaviors. A better classification accuracy was obtained using the Random Forest algorithm and when data were collected from the phone on the hip. Furthermore, the use of beacons as a reference for obtaining the symbolic location of the individual improved the precision of the classification.

  4. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  5. Comparing species tree estimation with large anchored phylogenomic and small Sanger-sequenced molecular datasets: an empirical study on Malagasy pseudoxyrhophiine snakes.

    Science.gov (United States)

    Ruane, Sara; Raxworthy, Christopher J; Lemmon, Alan R; Lemmon, Emily Moriarty; Burbrink, Frank T

    2015-10-12

    Using molecular data generated by high throughput next generation sequencing (NGS) platforms to infer phylogeny is becoming common as costs go down and the ability to capture loci from across the genome goes up. While there is a general consensus that greater numbers of independent loci should result in more robust phylogenetic estimates, few studies have compared phylogenies resulting from smaller datasets for commonly used genetic markers with the large datasets captured using NGS. Here, we determine how a 5-locus Sanger dataset compares with a 377-locus anchored genomics dataset for understanding the evolutionary history of the pseudoxyrhophiine snake radiation centered in Madagascar. The Pseudoxyrhophiinae comprise ~86 % of Madagascar's serpent diversity, yet they are poorly known with respect to ecology, behavior, and systematics. Using the 377-locus NGS dataset and the summary statistics species-tree methods STAR and MP-EST, we estimated a well-supported species tree that provides new insights concerning intergeneric relationships for the pseudoxyrhophiines. We also compared how these and other methods performed with respect to estimating tree topology using datasets with varying numbers of loci. Using Sanger sequencing and an anchored phylogenomics approach, we sequenced datasets comprised of 5 and 377 loci, respectively, for 23 pseudoxyrhophiine taxa. For each dataset, we estimated phylogenies using both gene-tree (concatenation) and species-tree (STAR, MP-EST) approaches. We determined the similarity of resulting tree topologies from the different datasets using Robinson-Foulds distances. In addition, we examined how subsets of these data performed compared to the complete Sanger and anchored datasets for phylogenetic accuracy using the same tree inference methodologies, as well as the program *BEAST to determine if a full coalescent model for species tree estimation could generate robust results with fewer loci compared to the summary statistics species

  6. MOBBED: A computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB

    Directory of Open Access Journals (Sweden)

    Jeremy eCockfield

    2013-10-01

    Full Text Available Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms.MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed.

  7. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

    Science.gov (United States)

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED

  8. Explaining diversity in metagenomic datasets by phylogenetic-based feature weighting.

    Science.gov (United States)

    Albanese, Davide; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

    2015-03-01

    Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.

  9. Open Source Bayesian Models. 1. Application to ADME/Tox and Drug Discovery Datasets

    Science.gov (United States)

    2015-01-01

    On the order of hundreds of absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) models have been described in the literature in the past decade which are more often than not inaccessible to anyone but their authors. Public accessibility is also an issue with computational models for bioactivity, and the ability to share such models still remains a major challenge limiting drug discovery. We describe the creation of a reference implementation of a Bayesian model-building software module, which we have released as an open source component that is now included in the Chemistry Development Kit (CDK) project, as well as implemented in the CDD Vault and in several mobile apps. We use this implementation to build an array of Bayesian models for ADME/Tox, in vitro and in vivo bioactivity, and other physicochemical properties. We show that these models possess cross-validation receiver operator curve values comparable to those generated previously in prior publications using alternative tools. We have now described how the implementation of Bayesian models with FCFP6 descriptors generated in the CDD Vault enables the rapid production of robust machine learning models from public data or the user’s own datasets. The current study sets the stage for generating models in proprietary software (such as CDD) and exporting these models in a format that could be run in open source software using CDK components. This work also demonstrates that we can enable biocomputation across distributed private or public datasets to enhance drug discovery. PMID:25994950

  10. Northern Chinese dental ages estimated from southern Chinese reference datasets closely correlate with chronological age

    Directory of Open Access Journals (Sweden)

    Hai Ming Wong

    2016-12-01

    Full Text Available While northern and southern Chinese are genetically correlated, there exists notable environmental differences in their living conditions. This study aimed to evaluate validity of the southern Chinese reference dataset for dental age estimation applied to northern Chinese. Dental panoramic tomographs of 437 northern Chinese aged 3 to 21 years were analysed. All the left maxillary and mandibular permanent teeth plus the 2 third molars on the right side were scored based on Demirjian’s classification of tooth development stages. Mean and standard error of dental age were obtained for each tooth development stage, followed by random effect meta-analysis for mean dental age estimation. Validity of the method was examined through measures of agreement (95% limits of agreement, standard error of measurement, and Lin’s concordance correlation coefficient and measure of reliability (Intraclass correlation coefficient. On average, the estimated dental age overestimated chronological age by only around 1 month in both females and males. The Intraclass correlation coefficient values were 0.99 for both sexes, suggesting excellent reliability of the method. Reference dataset for dental age estimation developed on the basis of southern Chinese was applicable for use among the northern Chinese.

  11. A semi-simulated EEG/EOG dataset for the comparison of EOG artifact rejection techniques.

    Science.gov (United States)

    Klados, Manousos A; Bamidis, Panagiotis D

    2016-09-01

    Artifact rejection techniques are used to recover the brain signals underlying artifactual electroencephalographic (EEG) segments. Although over the last few years many different artifact rejection techniques have been proposed (http://dx.doi.org/10.1109/JSEN.2011.2115236[1], http://dx.doi.org/10.1016/j.clinph.2006.09.003[2], http://dx.doi.org/10.3390/e16126553[3]), none has been established as a gold standard so far, because assessing their performance is difficult and subjective (http://dx.doi.org/10.1109/ITAB.2009.5394295[4], http://dx.doi.org/10.1016/j.bspc.2011.02.001[5], http://dx.doi.org/10.1007/978-3-540-89208-3_300. [6]). This limitation is mainly based on the fact that the underlying artifact-free brain signal is unknown, so there is no objective way to measure how close the retrieved signal is to the real one. This article solves the aforementioned problem by presenting a semi-simulated EEG dataset, where artifact-free EEG signals are manually contaminated with ocular artifacts, using a realistic head model. The significant part of this dataset is that it contains the pre-contamination EEG signals, so the brain signals underlying the EOG artifacts are known and thus the performance of every artifact rejection technique can be objectively assessed.

  12. DataPflex: a MATLAB-based tool for the manipulation and visualization of multidimensional datasets.

    Science.gov (United States)

    Hendriks, Bart S; Espelin, Christopher W

    2010-02-01

    DataPflex is a MATLAB-based application that facilitates the manipulation and visualization of multidimensional datasets. The strength of DataPflex lies in the intuitive graphical user interface for the efficient incorporation, manipulation and visualization of high-dimensional data that can be generated by multiplexed protein measurement platforms including, but not limited to Luminex or Meso-Scale Discovery. Such data can generally be represented in the form of multidimensional datasets [for example (time x stimulation x inhibitor x inhibitor concentration x cell type x measurement)]. For cases where measurements are made in a combinational fashion across multiple dimensions, there is a need for a tool to efficiently manipulate and reorganize such data for visualization. DataPflex accepts data consisting of up to five arbitrary dimensions in addition to a measurement dimension. Data are imported from a simple .xls format and can be exported to MATLAB or .xls. Data dimensions can be reordered, subdivided, merged, normalized and visualized in the form of collections of line graphs, bar graphs, surface plots, heatmaps, IC50's and other custom plots. Open source implementation in MATLAB enables easy extension for custom plotting routines and integration with more sophisticated analysis tools. DataPflex is distributed under the GPL license (http://www.gnu.org/licenses/) together with documentation, source code and sample data files at: http://code.google.com/p/datapflex. Supplementary data available at Bioinformatics online.

  13. Smed454 dataset: unravelling the transcriptome of Schmidtea mediterranea

    Directory of Open Access Journals (Sweden)

    Fraguas Susanna

    2010-12-01

    Full Text Available Abstract Background Freshwater planarians are an attractive model for regeneration and stem cell research and have become a promising tool in the field of regenerative medicine. With the availability of a sequenced planarian genome, the recent application of modern genetic and high-throughput tools has resulted in revitalized interest in these animals, long known for their amazing regenerative capabilities, which enable them to regrow even a new head after decapitation. However, a detailed description of the planarian transcriptome is essential for future investigation into regenerative processes using planarians as a model system. Results In order to complement and improve existing gene annotations, we used a 454 pyrosequencing approach to analyze the transcriptome of the planarian species Schmidtea mediterranea Altogether, 598,435 454-sequencing reads, with an average length of 327 bp, were assembled together with the ~10,000 sequences of the S. mediterranea UniGene set using different similarity cutoffs. The assembly was then mapped onto the current genome data. Remarkably, our Smed454 dataset contains more than 3 million novel transcribed nucleotides sequenced for the first time. A descriptive analysis of planarian splice sites was conducted on those Smed454 contigs that mapped univocally to the current genome assembly. Sequence analysis allowed us to identify genes encoding putative proteins with defined structural properties, such as transmembrane domains. Moreover, we annotated the Smed454 dataset using Gene Ontology, and identified putative homologues of several gene families that may play a key role during regeneration, such as neurotransmitter and hormone receptors, homeobox-containing genes, and genes related to eye function. Conclusions We report the first planarian transcript dataset, Smed454, as an open resource tool that can be accessed via a web interface. Smed454 contains significant novel sequence information about most

  14. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  15. SOME LIMIT-THEOREMS IN LOG DENSITY

    NARCIS (Netherlands)

    BERKES, [No Value; DEHLING, H

    Motivated by recent results on pathwise central limit theorems, we study in a systematic way log-average versions of classical limit theorems. For partial sums S(k) of independent r.v.'s we prove under mild technical conditions that (1/log N)SIGMA(k less-than-or-equal-to N)(1/k)I{S(k)/a(k)

  16. Evaluating satellite-derived long-term historical precipitation datasets for drought monitoring in Chile

    Science.gov (United States)

    Zambrano, Francisco; Wardlow, Brian; Tadesse, Tsegaye; Lillo-Saavedra, Mario; Lagos, Octavio

    2017-04-01

    Precipitation is a key parameter for the study of climate change and variability and the detection and monitoring of natural disaster such as drought. Precipitation datasets that accurately capture the amount and spatial variability of rainfall is critical for drought monitoring and a wide range of other climate applications. This is challenging in many parts of the world, which often have a limited number of weather stations and/or historical data records. Satellite-derived precipitation products offer a viable alternative with several remotely sensed precipitation datasets now available with long historical data records (+30years), which include the Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) datasets. This study presents a comparative analysis of three historical satellite-based precipitation datasets that include Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) 3B43 version 7 (1998-2015), PERSIANN-CDR (1983-2015) and CHIRPS 2.0 (1981-2015) over Chile to assess their performance across the country and for the case of the two long-term products the applicability for agricultural drought were evaluated when used in the calculation of commonly used drought indicator as the Standardized Precipitation Index (SPI). In this analysis, 278 weather stations of in situ rainfall measurements across Chile were initially compared to the satellite data. The study area (Chile) was divided into five latitudinal zones: North, North-Central, Central, South-Central and South to determine if there were a regional difference among these satellite products, and nine statistics were used to evaluate their performance to estimate the amount and spatial distribution of historical rainfall across Chile. Hierarchical cluster analysis, k-means and singular value decomposition were used to analyze

  17. Last of the Wild Project, Version 1, 2002 (LWP-1): Last of the Wild Dataset (IGHP)

    Data.gov (United States)

    National Aeronautics and Space Administration — The Last of the Wild Dataset of the Last of the Wild Project, Version 1, 2002 (LWP-1) is derived from the LWP-1 Human Footprint Dataset. The gridded data are...

  18. Ecohydrological Index, Native Fish, and Climate Trends and Relationships in the Kansas River Basin_dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset is an excel file that contain data for the figures in the manuscript. This dataset is associated with the following publication: Sinnathamby, S., K....

  19. Last of the Wild Project, Version 2, 2005 (LWP-2): Last of the Wild Dataset (Geographic)

    Data.gov (United States)

    National Aeronautics and Space Administration — The Last of the Wild Dataset of the Last of the Wild Project, Version 2, 2005 (LWP-2) is derived from the LWP-2 Human Footprint Dataset. The gridded data are...

  20. Last of the Wild Project, Version 1, 2002 (LWP-1): Last of the Wild Dataset (Geographic)

    Data.gov (United States)

    National Aeronautics and Space Administration — The Last of the Wild Dataset of the Last of the Wild Project, Version 1, 2002 (LWP-1) is derived from the LWP-1 Human Footprint Dataset. The gridded data are...

  1. Last of the Wild Project, Version 2, 2005 (LWP-2): Last of the Wild Dataset (IGHP)

    Data.gov (United States)

    National Aeronautics and Space Administration — The Last of the Wild Dataset of the Last of the Wild Project, Version 2, 2005 (LWP-2) is derived from the LWP-2 Human Footprint Dataset. The gridded data are...

  2. Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...

  3. Dataset for Probabilistic estimation of residential air exchange rates for population-based exposure modeling

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset provides the city-specific air exchange rate measurements, modeled, literature-based as well as housing characteristics. This dataset is associated with...

  4. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  5. Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Human Built-up And Settlement Extent (HBASE) Dataset from Landsat is a global map of HBASE derived from the Global Land Survey (GLS) Landsat dataset for...

  6. Studying prescription drug use and outcomes with medicaid claims data: strengths, limitations, and strategies.

    Science.gov (United States)

    Crystal, Stephen; Akincigil, Ayse; Bilder, Scott; Walkup, James T

    2007-10-01

    Medicaid claims and eligibility data, particularly when linked to other sources of patient-level and contextual information, represent a powerful and under-used resource for health services research on the use and outcomes of prescription drugs. However, their effective use poses many methodological and inferential challenges. This article reviews strengths, limitations, challenges, and recommended strategies in using Medicaid data for research on the initiation, continuation, and outcomes of prescription drug therapies. Drawing from published research using Medicaid data by the investigators and other groups, we review several key validity and methodological issues. We discuss strategies for claims-based identification of diagnostic subgroups and procedures, measuring and modeling initiation and persistence of regimens, analysis of treatment disparities, and examination of comorbidity patterns. Based on this review, we discuss "best practices" for appropriate data use and validity checking, approaches to statistical modeling of longitudinal patterns in the presence of typical challenges, and strategies for strengthening the power and potential of Medicaid datasets. Finally, we discuss policy implications, including the potential for the research use of Medicare Part D data and the need for further initiatives to systematically develop and optimally use research datasets that link Medicaid and other sources of clinical and outcome information.

  7. Planck 2015 results: III. LFI systematic uncertainties

    DEFF Research Database (Denmark)

    Ade, P. A R; Aumont, J.; Baccigalupi, C.

    2016-01-01

    complementary approaches: (i) simulations based on measured data and physical models of the known systematic effects; and (ii) analysis of difference maps containing the same sky signal ("null-maps"). The LFI temperature data are limited by instrumental noise. At large angular scales the systematic effects...... are below the cosmic microwave background (CMB) temperature power spectrum by several orders of magnitude. In polarization the systematic uncertainties are dominated by calibration uncertainties and compete with the CMB E-modes in the multipole range 10-20. Based on our model of all known systematic effects...

  8. An Automatic Matcher and Linker for Transportation Datasets

    Directory of Open Access Journals (Sweden)

    Ali Masri

    2017-01-01

    Full Text Available Multimodality requires the integration of heterogeneous transportation data to construct a broad view of the transportation network. Many new transportation services are emerging while being isolated from previously-existing networks. This leads them to publish their data sources to the web, according to linked data principles, in order to gain visibility. Our interest is to use these data to construct an extended transportation network that links these new services to existing ones. The main problems we tackle in this article fall in the categories of automatic schema matching and data interlinking. We propose an approach that uses web services as mediators to help in automatically detecting geospatial properties and mapping them between two different schemas. On the other hand, we propose a new interlinking approach that enables the user to define rich semantic links between datasets in a flexible and customizable way.

  9. [Parallel virtual reality visualization of extreme large medical datasets].

    Science.gov (United States)

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  10. The wildland-urban interface raster dataset of Catalonia

    Directory of Open Access Journals (Sweden)

    Fermín J. Alcasena

    2018-04-01

    Full Text Available We provide the wildland urban interface (WUI map of the autonomous community of Catalonia (Northeastern Spain. The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region. Keywords: Wildland-urban interface, Wildfire risk, Urban planning, Human communities, Catalonia

  11. The wildland-urban interface raster dataset of Catalonia.

    Science.gov (United States)

    Alcasena, Fermín J; Evers, Cody R; Vega-Garcia, Cristina

    2018-04-01

    We provide the wildland urban interface (WUI) map of the autonomous community of Catalonia (Northeastern Spain). The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region.

  12. xarray: N-D labeled Arrays and Datasets in Python

    Directory of Open Access Journals (Sweden)

    Stephan Hoyer

    2017-04-01

    Full Text Available xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Our approach combines an application programing interface (API inspired by pandas with the Common Data Model for self-described scientific data. Key features of the xarray package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, out-of-core computation on datasets that don’t fit into memory, a wide range of serialization and input/output (I/O options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. xarray, as a data model and analytics toolkit, has been widely adopted in the geoscience community but is also used more broadly for multi-dimensional data analysis in physics, machine learning and finance.

  13. Automatic identification of variables in epidemiological datasets using logic regression.

    Science.gov (United States)

    Lorenz, Matthias W; Abdi, Negin Ashtiani; Scheckenbach, Frank; Pflug, Anja; Bülbül, Alpaslan; Catapano, Alberico L; Agewall, Stefan; Ezhov, Marat; Bots, Michiel L; Kiechl, Stefan; Orth, Andreas

    2017-04-13

    For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

  14. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets.

    Science.gov (United States)

    Narechania, Apurva; Baker, Richard; DeSalle, Rob; Mathema, Barun; Kolokotronis, Sergios-Orestis; Kreiswirth, Barry; Planet, Paul J

    2016-10-24

    Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.

  15. GIS measured environmental correlates of active school transport: a systematic review of 14 studies.

    Science.gov (United States)

    Wong, Bonny Yee-Man; Faulkner, Guy; Buliung, Ron

    2011-05-06

    Emerging frameworks to examine active school transportation (AST) commonly emphasize the built environment (BE) as having an influence on travel mode decisions. Objective measures of BE attributes have been recommended for advancing knowledge about the influence of the BE on school travel mode choice. An updated systematic review on the relationships between GIS-measured BE attributes and AST is required to inform future research in this area. The objectives of this review are: i) to examine and summarize the relationships between objectively measured BE features and AST in children and adolescents and ii) to critically discuss GIS methodologies used in this context. Six electronic databases, and websites were systematically searched, and reference lists were searched and screened to identify studies examining AST in students aged five to 18 and reporting GIS as an environmental measurement tool. Fourteen cross-sectional studies were identified. The analyses were classified in terms of density, diversity, and design and further differentiated by the measures used or environmental condition examined. Only distance was consistently found to be negatively associated with AST. Consistent findings of positive or negative associations were not found for land use mix, residential density, and intersection density. Potential modifiers of any relationship between these attributes and AST included age, school travel mode, route direction (e.g., to/from school), and trip-end (home or school). Methodological limitations included inconsistencies in geocoding, selection of study sites, buffer methods and the shape of zones (Modifiable Areal Unit Problem [MAUP]), the quality of road and pedestrian infrastructure data, and school route estimation. The inconsistent use of spatial concepts limits the ability to draw conclusions about the relationship between objectively measured environmental attributes and AST. Future research should explore standardizing buffer size, assess the

  16. LS-GKM: a new gkm-SVM for large-scale datasets.

    Science.gov (United States)

    Lee, Dongwon

    2016-07-15

    gkm-SVM is a sequence-based method for predicting and detecting the regulatory vocabulary encoded in functional DNA elements, and is a commonly used tool for studying gene regulatory mechanisms. Here we introduce new software, LS-GKM, which removes several limitations of our previous releases, enabling training on much larger scale (LS) datasets. LS-GKM also provides additional advanced gapped k-mer based kernel functions. With these improvements, LS-GKM achieves considerably higher accuracy than the original gkm-SVM. C/C ++ source codes and related scripts are freely available from http://github.com/Dongwon-Lee/lsgkm/, and supported on Linux and Mac OS X. dwlee@jhu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

    NARCIS (Netherlands)

    Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

    2017-01-01

    Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

  18. Analysis of GRACE Range-rate Residuals with Emphasis on Reprocessed Star-Camera Datasets

    Science.gov (United States)

    Goswami, S.; Flury, J.; Naeimi, M.; Bandikova, T.; Guerr, T. M.; Klinger, B.

    2015-12-01

    Since March 2002 the two GRACE satellites orbit the Earth at rela-tively low altitude. Determination of the gravity field of the Earth including itstemporal variations from the satellites' orbits and the inter-satellite measure-ments is the goal of the mission. Yet, the time-variable gravity signal has notbeen fully exploited. This can be seen better in the computed post-fit range-rateresiduals. The errors reflected in the range-rate residuals are due to the differ-ent sources as systematic errors, mismodelling errors and tone errors. Here, weanalyse the effect of three different star-camera data sets on the post-fit range-rate residuals. On the one hand, we consider the available attitude data andon other hand we take the two different data sets which has been reprocessedat Institute of Geodesy, Hannover and Institute of Theoretical Geodesy andSatellite Geodesy, TU Graz Austria respectively. Then the differences in therange-rate residuals computed from different attitude dataset are analyzed inthis study. Details will be given and results will be discussed.

  19. A Scalable Permutation Approach Reveals Replication and Preservation Patterns of Network Modules in Large Datasets.

    Science.gov (United States)

    Ritchie, Scott C; Watts, Stephen; Fearnley, Liam G; Holt, Kathryn E; Abraham, Gad; Inouye, Michael

    2016-07-01

    Network modules-topologically distinct groups of edges and nodes-that are preserved across datasets can reveal common features of organisms, tissues, cell types, and molecules. Many statistics to identify such modules have been developed, but testing their significance requires heuristics. Here, we demonstrate that current methods for assessing module preservation are systematically biased and produce skewed p values. We introduce NetRep, a rapid and computationally efficient method that uses a permutation approach to score module preservation without assuming data are normally distributed. NetRep produces unbiased p values and can distinguish between true and false positives during multiple hypothesis testing. We use NetRep to quantify preservation of gene coexpression modules across murine brain, liver, adipose, and muscle tissues. Complex patterns of multi-tissue preservation were revealed, including a liver-derived housekeeping module that displayed adipose- and muscle-specific association with body weight. Finally, we demonstrate the broader applicability of NetRep by quantifying preservation of bacterial networks in gut microbiota between men and women. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  20. Improving AfriPop dataset with settlement extents extracted from RapidEye for the border region comprising South-Africa, Swaziland and Mozambique

    Directory of Open Access Journals (Sweden)

    Julie Deleu

    2015-11-01

    Full Text Available For modelling the spatial distribution of malaria incidence, accurate and detailed information on population size and distribution are of significant importance. Different, global, spatial, standard datasets of population distribution have been developed and are widely used. However, most of them are not up-to-date and the low spatial resolution of the input census data has limitations for contemporary, national- scale analyses. The AfriPop project, launched in July 2009, was initiated with the aim of producing detailed, contemporary and easily updatable population distribution datasets for the whole of Africa. High-resolution satellite sensors can help to further improve this dataset through the generation of high-resolution settlement layers at greater spatial details. In the present study, the settlement extents included in the MALAREO land use classification were used to generate an enhanced and updated version of the AfriPop dataset for the study area covering southern Mozambique, eastern Swaziland and the malarious part of KwaZulu-Natal in South Africa. Results show that it is possible to easily produce a detailed and updated population distribution dataset applying the AfriPop modelling approach with the use of high-resolution settlement layers and population growth rates. The 2007 and 2011 population datasets are freely available as a product of the MALAREO project and can be downloaded from the project website.

  1. Exploring the Limitations of Measures of Students' Socioeconomic Status (SES)

    Science.gov (United States)

    Dickinson, Emily R.; Adelson, Jill L.

    2014-01-01

    This study uses a nationally representative student dataset to explore the limitations of commonly used measures of socioeconomic status (SES). Among the identified limitations are patterns of missing data that conflate the traditional conceptualization of SES with differences in family structure that have emerged in recent years and a lack of…

  2. Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

    Science.gov (United States)

    Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng

    2016-07-18

    To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. We propose a new privacy model called MS(k, θ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. With all three

  3. Mining hydrogeological data from existing AEM datasets for mineral Mining

    Science.gov (United States)

    Menghini, Antonio; Viezzoli, Andrea; Teatini, Pietro; Cattarossi, Andrea

    2017-04-01

    Large amount of existing Airborne Electromagnetic (AEM) data are potentially available all over the World. Originally acquired for mining purposes, AEM data traditionally do not get processed in detail and inverted: most of the orebodies can be easily detected by analyzing just the peak anomaly directly evidenced by voltage values (the so-called "bump detection"). However, the AEM acquisitions can be accurately re-processed and inverted to provide detailed 3D models of resistivity: a first step towards hydrogeological studies and modelling. This is a great opportunity especially for the African continent, where the detection of exploitable groundwater resources is a crucial issue. In many cases, a while after AEM data have been acquired by the mining company, Governments become owners of those datasets and have the opportunity to develop detailed hydrogeological characterizations at very low costs. We report the case in which existing VTEM (Versatile Time Domain Electromagnetic - Geotech Ltd) data, originally acquired to detect gold deposits, are used to improve the hydrogeological knowledge of a roughly 50 km2 pilot-test area in Sierra Leone. Thanks to an accurate processing workflow and an advanced data inversion, based on the Spatially Constrained Inversion (SCI) algorithm, we have been able to resolve the thickness of the regolith aquifer and the top of the granitic-gneiss or greenstone belt bedrock. Moreover, the occurrence of different lithological units (more or less conductive) directly related to groundwater flow, sometimes having also a high chargeability (e.g. in the case of lateritic units), has been detailed within the regolith. The most promising areas to drill new productive wells have been recognized where the bedrock is deeper and the regolith thickness is larger. A further info that was considered in hydrogeological mapping is the resistivity of the regolith, provided that the most permeable layers coincide with the most resistive units. The

  4. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  5. MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection

    Directory of Open Access Journals (Sweden)

    Fereshteh S. Bashiri

    2018-04-01

    Full Text Available A fully-labeled image dataset provides a unique resource for reproducible research inquiries and data analyses in several computational fields, such as computer vision, machine learning and deep learning machine intelligence. With the present contribution, a large-scale fully-labeled image dataset is provided, and made publicly and freely available to the research community. The current dataset entitled MCIndoor20000 includes more than 20,000 digital images from three different indoor object categories, including doors, stairs, and hospital signs. To make a comprehensive dataset addressing current challenges that exist in indoor objects modeling, we cover a multiple set of variations in images, such as rotation, intra-class variation plus various noise models. The current dataset is freely and publicly available at https://github.com/bircatmcri/MCIndoor20000. Keywords: Image dataset, Large-scale dataset, Image classification, Supervised learning, Indoor objects, Deep learning

  6. Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach

    Directory of Open Access Journals (Sweden)

    Simon de Lusignan

    2011-06-01

    Conclusion Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data. It would offer considerable advantages over using limited datasets to define cases. This approach should be considered by those involved in research and quality improvement projects which utilise routine data.

  7. Daily precipitation grids for Austria since 1961—development and evaluation of a spatial dataset for hydroclimatic monitoring and modelling

    Science.gov (United States)

    Hiebl, Johann; Frei, Christoph

    2017-03-01

    Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.

  8. Daily precipitation grids for Austria since 1961—development and evaluation of a spatial dataset for hydroclimatic monitoring and modelling

    Science.gov (United States)

    Hiebl, Johann; Frei, Christoph

    2018-04-01

    Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.

  9. Challenges and Experiences of Building Multidisciplinary Datasets across Cultures

    Science.gov (United States)

    Jamiyansharav, K.; Laituri, M.; Fernandez-Gimenez, M.; Fassnacht, S. R.; Venable, N. B. H.; Allegretti, A. M.; Reid, R.; Baival, B.; Jamsranjav, C.; Ulambayar, T.; Linn, S.; Angerer, J.

    2017-12-01

    Efficient data sharing and management are key challenges to multidisciplinary scientific research. These challenges are further complicated by adding a multicultural component. We address the construction of a complex database for social-ecological analysis in Mongolia. Funded by the National Science Foundation (NSF) Dynamics of Coupled Natural and Human (CNH) Systems, the Mongolian Rangelands and Resilience (MOR2) project focuses on the vulnerability of Mongolian pastoral systems to climate change and adaptive capacity. The MOR2 study spans over three years of fieldwork in 36 paired districts (Soum) from 18 provinces (Aimag) of Mongolia that covers steppe, mountain forest steppe, desert steppe and eastern steppe ecological zones. Our project team is composed of hydrologists, social scientists, geographers, and ecologists. The MOR2 database includes multiple ecological, social, meteorological, geospatial and hydrological datasets, as well as archives of original data and survey in multiple formats. Managing this complex database requires significant organizational skills, attention to detail and ability to communicate within collective team members from diverse disciplines and across multiple institutions in the US and Mongolia. We describe the database's rich content, organization, structure and complexity. We discuss lessons learned, best practices and recommendations for complex database management, sharing, and archiving in creating a cross-cultural and multi-disciplinary database.

  10. Structural dataset for the PPARγ V290M mutant

    Directory of Open Access Journals (Sweden)

    Ana C. Puhl

    2016-06-01

    Full Text Available Loss-of-function mutation V290M in the ligand-binding domain of the peroxisome proliferator activated receptor γ (PPARγ is associated with a ligand resistance syndrome (PLRS, characterized by partial lipodystrophy and severe insulin resistance. In this data article we discuss an X-ray diffraction dataset that yielded the structure of PPARγ LBD V290M mutant refined at 2.3 Å resolution, that allowed building of 3D model of the receptor mutant with high confidence and revealed continuous well-defined electron density for the partial agonist diclofenac bound to hydrophobic pocket of the PPARγ. These structural data provide significant insights into molecular basis of PLRS caused by V290M mutation and are correlated with the receptor disability of rosiglitazone binding and increased affinity for corepressors. Furthermore, our structural evidence helps to explain clinical observations which point out to a failure to restore receptor function by the treatment with a full agonist of PPARγ, rosiglitazone.

  11. Mathematical modelling of the MAP kinase pathway using proteomic datasets.

    Science.gov (United States)

    Tian, Tianhai; Song, Jiangning

    2012-01-01

    The advances in proteomics technologies offer an unprecedented opportunity and valuable resources to understand how living organisms execute necessary functions at systems levels. However, little work has been done up to date to utilize the highly accurate spatio-temporal dynamic proteome data generated by phosphoprotemics for mathematical modeling of complex cell signaling pathways. This work proposed a novel computational framework to develop mathematical models based on proteomic datasets. Using the MAP kinase pathway as the test system, we developed a mathematical model including the cytosolic and nuclear subsystems; and applied the genetic algorithm to infer unknown model parameters. Robustness property of the mathematical model was used as a criterion to select the appropriate rate constants from the estimated candidates. Quantitative information regarding the absolute protein concentrations was used to refine the mathematical model. We have demonstrated that the incorporation of more experimental data could significantly enhance both the simulation accuracy and robustness property of the proposed model. In addition, we used the MAP kinase pathway inhibited by phosphatases with different concentrations to predict the signal output influenced by different cellular conditions. Our predictions are in good agreement with the experimental observations when the MAP kinase pathway was inhibited by phosphatase PP2A and MKP3. The successful application of the proposed modeling framework to the MAP kinase pathway suggests that our method is very promising for developing accurate mathematical models and yielding insights into the regulatory mechanisms of complex cell signaling pathways.

  12. Creating a global sub-daily precipitation dataset

    Science.gov (United States)

    Lewis, Elizabeth; Blenkinsop, Stephen; Fowler, Hayley

    2017-04-01

    Extremes of precipitation can cause flooding and droughts which can lead to substantial damages to infrastructure and ecosystems and can result in loss of life. It is still uncertain how hydrological extremes will change with global warming as we do not fully understand the processes that cause extreme precipitation under current climate variability. The INTENSE project is using a novel and fully-integrated data-modelling approach to provide a step-change in our understanding of the nature and drivers of global precipitation extremes and change on societally relevant timescales, leading to improved high-resolution climate model representation of extreme rainfall processes. The INTENSE project is in conjunction with the World Climate Research Programme (WCRP)'s Grand Challenge on 'Understanding and Predicting Weather and Climate Extremes' and the Global Water and Energy Exchanges Project (GEWEX) Science questions. The first step towards achieving this is to construct a new global sub-daily precipitation dataset. Data collection is ongoing and already covers North America, Europe, Asia and Australasia. Comprehensive, open source quality control software is being developed to set a new standard for verifying sub-daily precipitation data and a set of global hydroclimatic indices will be produced based upon stakeholder recommendations. This will provide a unique global data resource on sub-daily precipitation whose derived indices, e.g. monthly/annual maxima, will be freely available to the wider scientific community.

  13. Privacy-preserving record linkage on large real world datasets.

    Science.gov (United States)

    Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

    2014-08-01

    Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised. Copyright © 2013 Elsevier Inc. All rights reserved.

  14. Archive Issues Associated with NASA Earth Science Datasets

    Science.gov (United States)

    Behnke, J.; Moses, J.; Byrnes, J.

    2008-12-01

    The Earth Science Data and Information System (ESDIS) Project at NASA Goddard Space Flight Center was established in the early 1990s to develop and maintain a core collection of NASA's critical earth science data. Part of its mission was to provide a home for legacy earth science data from early NASA missions. Examples of these datasets include data from such missions as NIMBUS (1960s) and the Heat Capacity Mapping Mission (HCMM) from the late 1970s at GSFC and the Earth Radiation Budget Experiment (ERBE) from the late 1970s at Langley Research Center. Much of this information has been kept on old media and in many cases is not readily accessible by the science community. This presentation will describe several science data issues we have experienced as part of our efforts to recover data from these missions. We will share problems encountered with data formats, data resolution, representation, and documentation. The presentation will also suggestion best practices and identify key missing elements that would enable easier recovery if incorporated into future archives. The authors offer an opportunity to discuss plans for NASA's heritage assets and their disposition.

  15. Genomics dataset on unclassified published organism (patent US 7547531

    Directory of Open Access Journals (Sweden)

    Mohammad Mahfuz Ali Khan Shawan

    2016-12-01

    Full Text Available Nucleotide (DNA sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531 is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5% which was followed by GP445198 (61.8% and GP445189 (59.44%, while lowest was in GP445178 (24.39%. In addition, New England BioLabs (NEB database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms’ hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  16. Vectorized Radviz and its application to multiple cluster datasets.

    Science.gov (United States)

    Sharko, John; Grinstein, Georges; Marx, Kenneth A

    2008-01-01

    Radviz is a radial visualization with dimensions assigned to points called dimensional anchors (DAs) placed on the circumference of a circle. Records are assigned locations within the circle as a function of its relative attraction to each of the DAs. The DAs can be moved either interactively or algorithmically to reveal different meaningful patterns in the dataset. In this paper we describe Vectorized Radviz (VRV) which extends the number of dimensions through data flattening. We show how VRV increases the power of Radviz through these extra dimensions by enhancing the flexibility in the layout of the DAs. We apply VRV to the problem of analyzing the results of multiple clusterings of the same data set, called multiple cluster sets or cluster ensembles. We show how features of VRV help discern patterns across the multiple cluster sets. We use the Iris data set to explain VRV and a newt gene microarray data set used in studying limb regeneration to show its utility. We then discuss further applications of VRV.

  17. A dataset from bottom trawl survey around Taiwan

    Directory of Open Access Journals (Sweden)

    Kwang-tsao Shao

    2012-05-01

    Full Text Available Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000–2003, in the waters around Taiwan and Penghu (Pescadore Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw. They have also been published through TaiBIF (http://taibif.tw, FishBase and GBIF (website see below. This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan.

  18. The Centennial Trends Greater Horn of Africa precipitation dataset.

    Science.gov (United States)

    Funk, Chris; Nicholson, Sharon E; Landsfeld, Martin; Klotter, Douglas; Peterson, Pete; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded 'Centennial Trends' precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.

  19. Observational Evidence for Desert Amplification Using Multiple Satellite Datasets.

    Science.gov (United States)

    Wei, Nan; Zhou, Liming; Dai, Yongjiu; Xia, Geng; Hua, Wenjian

    2017-05-17

    Desert amplification identified in recent studies has large uncertainties due to data paucity over remote deserts. Here we present observational evidence using multiple satellite-derived datasets that desert amplification is a real large-scale pattern of warming mode in near surface and low-tropospheric temperatures. Trend analyses of three long-term temperature products consistently confirm that near-surface warming is generally strongest over the driest climate regions and this spatial pattern of warming maximizes near the surface, gradually decays with height, and disappears in the upper troposphere. Short-term anomaly analyses show a strong spatial and temporal coupling of changes in temperatures, water vapor and downward longwave radiation (DLR), indicating that the large increase in DLR drives primarily near surface warming and is tightly associated with increasing water vapor over deserts. Atmospheric soundings of temperature and water vapor anomalies support the results of the long-term temperature trend analysis and suggest that desert amplification is due to comparable warming and moistening effects of the troposphere. Likely, desert amplification results from the strongest water vapor feedbacks near the surface over the driest deserts, where the air is very sensitive to changes in water vapor and thus efficient in enhancing the longwave greenhouse effect in a warming climate.

  20. CREMA-D: Crowd-sourced Emotional Multimodal Actors Dataset

    Science.gov (United States)

    Cao, Houwei; Cooper, David G.; Keutmann, Michael K.; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini

    2014-01-01

    People convey their emotional state in their face and voice. We present an audio-visual data set uniquely suited for the study of multi-modal emotion expression and perception. The data set consists of facial and vocal emotional expressions in sentences spoken in a range of basic emotional states (happy, sad, anger, fear, disgust, and neutral). 7,442 clips of 91 actors with diverse ethnic backgrounds were rated by multiple raters in three modalities: audio, visual, and audio-visual. Categorical emotion labels and real-value intensity values for the perceived emotion were collected using crowd-sourcing from 2,443 raters. The human recognition of intended emotion for the audio-only, visual-only, and audio-visual data are 40.9%, 58.2% and 63.6% respectively. Recognition rates are highest for neutral, followed by happy, anger, disgust, fear, and sad. Average intensity levels of emotion are rated highest for visual-only perception. The accurate recognition of disgust and fear requires simultaneous audio-visual cues, while anger and happiness can be well recognized based on evidence from a single modality. The large dataset we introduce can be used to probe other questions concerning the audio-visual perception of emotion. PMID:25653738

  1. Computational fluid dynamics benchmark dataset of airflow in tracheas

    Directory of Open Access Journals (Sweden)

    A.J. Bates

    2017-02-01

    Full Text Available Computational Fluid Dynamics (CFD is fast becoming a useful tool to aid clinicians in pre-surgical planning through the ability to provide information that could otherwise be extremely difficult if not impossible to obtain. However, in order to provide clinically relevant metrics, the accuracy of the computational method must be sufficiently high. There are many alternative methods employed in the process of performing CFD simulations within the airways, including different segmentation and meshing strategies, as well as alternative approaches to solving the Navier–Stokes equations. However, as in vivo validation of the simulated flow patterns within the airways is not possible, little exists in the way of validation of the various simulation techniques. The data presented here consists of very highly resolved flow data. The degree of resolution is compared to the highest necessary resolutions of the Kolmogorov length and time scales. Therefore this data is ideally suited to act as a benchmark case to which cheaper computational methods may be compared. A dataset and solution setup for one such more efficient method, large eddy simulation (LES, is also presented.

  2. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying

    2014-11-07

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  3. Science Mapping: A Systematic Review of the Literature

    Directory of Open Access Journals (Sweden)

    Chaomei Chen

    2017-03-01

    Full Text Available Purpose: We present a systematic review of the literature concerning major aspects of science mapping to serve two primary purposes: First, to demonstrate the use of a science mapping approach to perform the review so that researchers may apply the procedure to the review of a scientific domain of their own interest, and second, to identify major areas of research activities concerning science mapping, intellectual milestones in the development of key specialties, evolutionary stages of major specialties involved, and the dynamics of transitions from one specialty to another. Design/methodology/approach: We first introduce a theoretical framework of the evolution of a scientific specialty. Then we demonstrate a generic search strategy that can be used to construct a representative dataset of bibliographic records of a domain of research. Next, progressively synthesized co-citation networks are constructed and visualized to aid visual analytic studies of the domain’s structural and dynamic patterns and trends. Finally, trajectories of citations made by particular types of authors and articles are presented to illustrate the predictive potential of the analytic approach. Findings: The evolution of the science mapping research involves the development of a number of interrelated specialties. Four major specialties are discussed in detail in terms of four evolutionary stages: conceptualization, tool construction, application, and codification. Underlying connections between major specialties are also explored. The predictive analysis demonstrates citations trajectories of potentially transformative contributions. Research limitations: The systematic review is primarily guided by citation patterns in the dataset retrieved from the literature. The scope of the data is limited by the source of the retrieval, i.e. the Web of Science, and the composite query used. An iterative query refinement is possible if one would like to improve the data quality

  4. CHASE-PL Climate Projection dataset over Poland - bias adjustment of EURO-CORDEX simulations

    Science.gov (United States)

    Mezghani, Abdelkader; Dobler, Andreas; Haugen, Jan Erik; Benestad, Rasmus E.; Parding, Kajsa M.; Piniewski, Mikołaj; Kardel, Ignacy; Kundzewicz, Zbigniew W.

    2017-11-01

    The CHASE-PL (Climate change impact assessment for selected sectors in Poland) Climate Projections - Gridded Daily Precipitation and Temperature dataset 5 km (CPLCP-GDPT5) consists of projected daily minimum and maximum air temperatures and precipitation totals of nine EURO-CORDEX regional climate model outputs bias corrected and downscaled to a 5 km × 5 km grid. Simulations of one historical period (1971-2000) and two future horizons (2021-2050 and 2071-2100) assuming two representative concentration pathways (RCP4.5 and RCP8.5) were produced. We used the quantile mapping method and corrected any systematic seasonal bias in these simulations before assessing the changes in annual and seasonal means of precipitation and temperature over Poland. Projected changes estimated from the multi-model ensemble mean showed that annual means of temperature are expected to increase steadily by 1 °C until 2021-2050 and by 2 °C until 2071-2100 assuming the RCP4.5 emission scenario. Assuming the RCP8.5 emission scenario, this can reach up to almost 4 °C by 2071-2100. Similarly to temperature, projected changes in regional annual means of precipitation are expected to increase by 6 to 10 % and by 8 to 16 % for the two future horizons and RCPs, respectively. Similarly, individual model simulations also exhibited warmer and wetter conditions on an annual scale, showing an intensification of the magnitude of the change at the end of the 21st century. The same applied for projected changes in seasonal means of temperature showing a higher winter warming rate by up to 0.5 °C compared to the other seasons. However, projected changes in seasonal means of precipitation by the individual models largely differ and are sometimes inconsistent, exhibiting spatial variations which depend on the selected season, location, future horizon, and RCP. The overall range of the 90 % confidence interval predicted by the ensemble of multi-model simulations was found to likely vary between -7

  5. Avulsion research using flume experiments and highly accurate and temporal-rich SfM datasets

    Science.gov (United States)

    Javernick, L.; Bertoldi, W.; Vitti, A.

    2017-12-01

    SfM's ability to produce high-quality, large-scale digital elevation models (DEMs) of complicated and rapidly evolving systems has made it a valuable technique for low-budget researchers and practitioners. While SfM has provided valuable datasets that capture single-flood event DEMs, there is an increasing scientific need to capture higher temporal resolution datasets that can quantify the evolutionary processes instead of pre- and post-flood snapshots. However, flood events' dangerous field conditions and image matching challenges (e.g. wind, rain) prevent quality SfM-image acquisition. Conversely, flume experiments offer opportunities to document flood events, but achieving consistent and accurate DEMs to detect subtle changes in dry and inundated areas remains a challenge for SfM (e.g. parabolic error signatures).This research aimed at investigating the impact of naturally occurring and manipulated avulsions on braided river morphology and on the encroachment of floodplain vegetation, using laboratory experiments. This required DEMs with millimeter accuracy and precision and at a temporal resolution to capture the processes. SfM was chosen as it offered the most practical method. Through redundant local network design and a meticulous ground control point (GCP) survey with a Leica Total Station in red laser configuration (reported 2 mm accuracy), the SfM residual errors compared to separate ground truthing data produced mean errors of 1.5 mm (accuracy) and standard deviations of 1.4 mm (precision) without parabolic error signatures. Lighting conditions in the flume were limited to uniform, oblique, and filtered LED strips, which removed glint and thus improved bed elevation mean errors to 4 mm, but errors were further reduced by means of an open source software for refraction correction. The obtained datasets have provided the ability to quantify how small flood events with avulsion can have similar morphologic and vegetation impacts as large flood events

  6. Image-based Exploration of Iso-surfaces for Large Multi- Variable Datasets using Parameter Space.

    KAUST Repository

    Binyahib, Roba S.

    2013-05-13

    With an increase in processing power, more complex simulations have resulted in larger data size, with higher resolution and more variables. Many techniques have been developed to help the user to visualize and analyze data from such simulations. However, dealing with a large amount of multivariate data is challenging, time- consuming and often requires high-end clusters. Consequently, novel visualization techniques are needed to explore such data. Many users would like to visually explore their data and change certain visual aspects without the need to use special clusters or having to load a large amount of data. This is the idea behind explorable images (EI). Explorable images are a novel approach that provides limited interactive visualization without the need to re-render from the original data [40]. In this work, the concept of EI has been used to create a workflow that deals with explorable iso-surfaces for scalar fields in a multivariate, time-varying dataset. As a pre-processing step, a set of iso-values for each scalar field is inferred and extracted from a user-assisted sampling technique in time-parameter space. These iso-values are then used to generate iso- surfaces that are then pre-rendered (from a fixed viewpoint) along with additional buffers (i.e. normals, depth, values of other fields, etc.) to provide a compressed representation of iso-surfaces in the dataset. We present a tool that at run-time allows the user to interactively browse and calculate a combination of iso-surfaces superimposed on each other. The result is the same as calculating multiple iso- surfaces from the original data but without the memory and processing overhead. Our tool also allows the user to change the (scalar) values superimposed on each of the surfaces, modify their color map, and interactively re-light the surfaces. We demonstrate the effectiveness of our approach over a multi-terabyte combustion dataset. We also illustrate the efficiency and accuracy of our

  7. Lessons learned in the generation of biomedical research datasets using Semantic Open Data technologies.

    Science.gov (United States)

    Legaz-García, María del Carmen; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

    2015-01-01

    Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources. Such heterogeneity makes difficult not only the generation of research-oriented dataset but also its exploitation. In recent years, the Open Data paradigm has proposed new ways for making data available in ways that sharing and integration are facilitated. Open Data approaches may pursue the generation of content readable only by humans and by both humans and machines, which are the ones of interest in our work. The Semantic Web provides a natural technological space for data integration and exploitation and offers a range of technologies for generating not only Open Datasets but also Linked Datasets, that is, open datasets linked to other open datasets. According to the Berners-Lee's classification, each open dataset can be given a rating between one and five stars attending to can be given to each dataset. In the last years, we have developed and applied our SWIT tool, which automates the generation of semantic datasets from heterogeneous data sources. SWIT produces four stars datasets, given that fifth one can be obtained by being the dataset linked from external ones. In this paper, we describe how we have applied the tool in two projects related to health care records and orthology data, as well as the major lessons learned from such efforts.

  8. Using pre-existing microarray datasets to increase experimental power: application to insulin resistance.

    Directory of Open Access Journals (Sweden)

    Bernie J Daigle

    2010-03-01

    Full Text Available Although they have become a widely used experimental technique for identifying differentially expressed (DE genes, DNA microarrays are notorious for generating noisy data. A common strategy for mitigating the effects of noise is to perform many experimental replicates. This approach is often costly and sometimes impossible given limited resources; thus, analytical methods are needed which increase accuracy at no additional cost. One inexpensive source of microarray replicates comes from prior work: to date, data from hundreds of thousands of microarray experiments are in the public domain. Although these data assay a wide range of conditions, they cannot be used directly to inform any particular experiment and are thus ignored by most DE gene methods. We present the SVD Augmented Gene expression Analysis Tool (SAGAT, a mathematically principled, data-driven approach for identifying DE genes. SAGAT increases the power of a microarray experiment by using observed coexpression relationships from publicly available microarray datasets to reduce uncertainty in individual genes' expression measurements. We tested the method on three well-replicated human microarray datasets and demonstrate that use of SAGAT increased effective sample sizes by as many as 2.72 arrays. We applied SAGAT to unpublished data from a microarray study investigating transcriptional responses to insulin resistance, resulting in a 50% increase in the number of significant genes detected. We evaluated 11 (58% of these genes experimentally using qPCR, confirming the directions of expression change for all 11 and statistical significance for three. Use of SAGAT revealed coherent biological changes in three pathways: inflammation, differentiation, and fatty acid synthesis, furthering our molecular understanding of a type 2 diabetes risk factor. We envision SAGAT as a means to maximize the potential for biological discovery from subtle transcriptional responses, and we provide it as a

  9. Explaining diversity in metagenomic datasets by phylogenetic-based feature weighting.

    Directory of Open Access Journals (Sweden)

    Davide Albanese

    2015-03-01

    Full Text Available Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information.

  10. Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

    Science.gov (United States)

    Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

    2011-01-01

    Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus

  11. Comparative and joint analysis of two metagenomic datasets from a biogas fermenter obtained by 454-pyrosequencing.

    Directory of Open Access Journals (Sweden)

    Sebastian Jaenicke

    Full Text Available Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen

  12. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  13. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  14. Systematic evaluation of atmospheric chemistry-transport model CHIMERE

    Science.gov (United States)

    Khvorostyanov, Dmitry; Menut, Laurent; Mailler, Sylvain; Siour, Guillaume; Couvidat, Florian; Bessagnet, Bertrand; Turquety, Solene

    2017-04-01

    Regional-scale atmospheric chemistry-transport models (CTM) are used to develop air quality regulatory measures, to support environmentally sensitive decisions in the industry, and to address variety of scientific questions involving the atmospheric composition. Model performance evaluation with measurement data is critical to understand their limits and the degree of confidence in model results. CHIMERE CTM (http://www.lmd.polytechnique.fr/chimere/) is a French national tool for operational forecast and decision support and is widely used in the international research community in various areas of atmospheric chemistry and physics, climate, and environment (http://www.lmd.polytechnique.fr/chimere/CW-articles.php). This work presents the model evaluation framework applied systematically to the new CHIMERE CTM versions in the course of the continuous model development. The framework uses three of the four CTM evaluation types identified by the Environmental Protection Agency (EPA) and the American Meteorological Society (AMS): operational, diagnostic, and dynamic. It allows to compare the overall model performance in subsequent model versions (operational evaluation), identify specific processes and/or model inputs that could be improved (diagnostic evaluation), and test the model sensitivity to the changes in air quality, such as emission reductions and meteorological events (dynamic evaluation). The observation datasets currently used for the evaluation are: EMEP (surface concentrations), AERONET (optical depths), and WOUDC (ozone sounding profiles). The framework is implemented as an automated processing chain and allows interactive exploration of the results via a web interface.

  15. Oil palm mapping for Malaysia using PALSAR-2 dataset

    Science.gov (United States)

    Gong, P.; Qi, C. Y.; Yu, L.; Cracknell, A.

    2016-12-01

    Oil palm is one of the most productive vegetable oil crops in the world. The main oil palm producing areas are distributed in humid tropical areas such as Malaysia, Indonesia, Thailand, western and central Africa, northern South America, and central America. Increasing market demands, high yields and low production costs of palm oil are the primary factors driving large-scale commercial cultivation of oil palm, especially in Malaysia and Indonesia. Global demand for palm oil has grown exponentially during the last 50 years, and the expansion of oil palm plantations is linked directly to the deforestation of natural forests. Satellite remote sensing plays an important role in monitoring expansion of oil palm. However, optical remote sensing images are difficult to acquire in the Tropics because of the frequent occurrence of thick cloud cover. This problem has led to the use of data obtained by synthetic aperture radar (SAR), which is a sensor capable of all-day/all-weather observation for studies in the Tropics. In this study, the ALOS-2 (Advanced Land Observing Satellite) PALSAR-2 (Phased Array type L-band SAR) datasets for year 2015 were used as an input to a support vector machine (SVM) based machine learning algorithm. Oil palm/non-oil palm samples were collected using a hexagonal equal-area sampling design. High-resolution images in Google Earth and PALSAR-2 imagery were used in human photo-interpretation to separate oil palm from others (i.e. cropland, forest, grassland, shrubland, water, hard surface and bareland). The characteristics of oil palms from various aspects, including PALSAR-2 backscattering coefficients (HH, HV), terrain and climate by using this sample set were further explored to post-process the SVM output. The average accuracy of oil palm type is better than 80% in the final oil palm map for Malaysia.

  16. Automatic aortic root segmentation in CTA whole-body dataset

    Science.gov (United States)

    Gao, Xinpei; Kitslaar, Pieter H.; Scholte, Arthur J. H. A.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke; Reiber, Johan H. C.

    2016-03-01

    Trans-catheter aortic valve replacement (TAVR) is an evolving technique for patients with serious aortic stenosis disease. Typically, in this application a CTA data set is obtained of the patient's arterial system from the subclavian artery to the femoral arteries, to evaluate the quality of the vascular access route and analyze the aortic root to determine if and which prosthesis should be used. In this paper, we concentrate on the automated segmentation of the aortic root. The purpose of this study was to automatically segment the aortic root in computed tomography angiography (CTA) datasets to support TAVR procedures. The method in this study includes 4 major steps. First, the patient's cardiac CTA image was resampled to reduce the computation time. Next, the cardiac CTA image was segmented using an atlas-based approach. The most similar atlas was selected from a total of 8 atlases based on its image similarity to the input CTA image. Third, the aortic root segmentation from the previous step was transferred to the patient's whole-body CTA image by affine registration and refined in the fourth step using a deformable subdivision surface model fitting procedure based on image intensity. The pipeline was applied to 20 patients. The ground truth was created by an analyst who semi-automatically corrected the contours of the automatic method, where necessary. The average Dice similarity index between the segmentations of the automatic method and the ground truth was found to be 0.965±0.024. In conclusion, the current results are very promising.

  17. Multivariate Spatial Data Fusion for Very Large Remote Sensing Datasets

    Directory of Open Access Journals (Sweden)

    Hai Nguyen

    2017-02-01

    Full Text Available Global maps of total-column carbon dioxide (CO2 mole fraction (in units of parts per million are important tools for climate research since they provide insights into the spatial distribution of carbon intake and emissions as well as their seasonal and annual evolutions. Currently, two main remote sensing instruments for total-column CO2 are the Orbiting Carbon Observatory-2 (OCO-2 and the Greenhouse gases Observing SATellite (GOSAT, both of which produce estimates of CO2 concentration, called profiles, at 20 different pressure levels. Operationally, each profile estimate is then convolved into a single estimate of column-averaged CO2 using a linear pressure weighting function. This total-column CO2 is then used for subsequent analyses such as Level 3 map generation and colocation for validation. In principle, total-column CO2 in these applications may be more efficiently estimated by making optimal estimates of the vector-valued CO2 profiles and applying the pressure weighting function afterwards. These estimates will be more efficient if there is multivariate dependence between CO2 values in the profile. In this article, we describe a methodology that uses a modified Spatial Random Effects model to account for the multivariate nature of the data fusion of OCO-2 and GOSAT. We show that multivariate fusion of the profiles has improved mean squared error relative to scalar fusion of the column-averaged CO2 values from OCO-2 and GOSAT. The computations scale linearly with the number of data points, making it suitable for the typically massive remote sensing datasets. Furthermore, the methodology properly accounts for differences in instrument footprint, measurement-error characteristics, and data coverages.

  18. Appraising city-scale pollution monitoring capabilities of multi-satellite datasets using portable pollutant monitors

    Science.gov (United States)

    Aliyu, Yahaya A.; Botai, Joel O.

    2018-04-01

    The retrieval characteristics for a city-scale satellite experiment was explored over a Nigerian city. The study evaluated carbon monoxide and aerosol contents in the city atmosphere. We utilized the MSA Altair 5× gas detector and CW-HAT200 particulate counter to investigate the city-scale monitoring capabilities of satellite pollution observing instruments; atmospheric infrared sounder (AIRS), measurement of pollution in the troposphere (MOPITT), moderate resolution imaging spectroradiometer (MODIS), multi-angle imaging spectroradiometer (MISR) and ozone monitoring instrument (OMI). To achieve this, we employed the Kriging interpolation technique to collocate the satellite pollutant estimations over 19 ground sample sites for the period of 2015-2016. The portable pollutant devices were validated using the WHO air filter sampling model. To determine the city-scale performance of the satellite datasets, performance indicators: correlation coefficient, model efficiency, reliability index and root mean square error, were adopted as measures. The comparative analysis revealed that MOPITT carbon monoxide (CO) and MODIS aerosol optical depth (AOD) estimates are the appropriate satellite measurements for ground equivalents in Zaria, Nigeria. Our findings were within the acceptable limits of similar studies that utilized reference stations. In conclusion, this study offers direction to Nigeria's air quality policy organizers about available alternative air pollution measurements for mitigating air quality effects within its limited resource environment.

  19. New systematic review methodology for visual impairment and blindness for the 2010 Global Burden of Disease study.

    Science.gov (United States)

    Bourne, Rupert; Price, Holly; Taylor, Hugh; Leasher, Janet; Keeffe, Jill; Glanville, Julie; Sieving, Pamela C; Khairallah, Moncef; Wong, Tien Yin; Zheng, Yingfeng; Mathew, Anu; Katiyar, Suchitra; Mascarenhas, Maya; Stevens, Gretchen A; Resnikoff, Serge; Gichuhi, Stephen; Naidoo, Kovin; Wallace, Diane; Kymes, Steven; Peters, Colleen; Pesudovs, Konrad; Braithwaite, Tasanee; Limburg, Hans

    2013-01-01

    To describe a systematic review of population-based prevalence studies of visual impairment (VI) and blindness worldwide over the past 32 years that informs the Global Burden of Diseases, Injuries and Risk Factors Study. A systematic review (Stage 1) of medical literature from 1 January 1980 to 31 January 2012 identified indexed articles containing data on incidence, prevalence and causes of blindness and VI. Only cross-sectional population-based representative studies were selected from which to extract data for a database of age- and sex-specific data of prevalence of four distance and one near vision loss categories (presenting and best-corrected). Unpublished data and data from studies using rapid assessment methodology were later added (Stage 2). Stage 1 identified 14,908 references, of which 204 articles met the inclusion criteria. Stage 2 added unpublished data from 44 rapid assessment studies and four other surveys. This resulted in a final dataset of 252 articles of 243 studies, of which 238 (98%) reported distance vision loss categories. A total of 37 studies of the final dataset reported prevalence of mild VI and four reported near VI. We report a comprehensive systematic review of over 30 years of VI/blindness studies. While there has been an increase in population-based studies conducted in the 2000s compared to previous decades, there is limited information from certain regions (eg, Central Africa and Central and Eastern Europe, and the Caribbean and Latin America), and younger age groups, and minimal data regarding prevalence of near vision and mild distance VI.

  20. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets

    Science.gov (United States)

    Benndorf, Matthias; Burnside, Elizabeth S.; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-01-01

    DDSM data is 0.876/0.895 (MLO/CC view) and AUC for the MMassDx (inclusive) model in the DDSM data is 0.891/0.900 (MLO/CC view). AUC for the MMassDx (descriptor) model in the MM data is 0.862 and AUC for the MMassDx (inclusive) model in the MM data is 0.900. In all scenarios, MMassDx performs significantly better than clinical performance, P < 0.05 each. The authors furthermore demonstrate that the MMassDx algorithm systematically underestimates the risk of malignancy in the DDSM and MM datasets, especially when low probabilities of malignancy are assigned. Conclusions: The authors’ results reveal that the MMassDx algorithms have good discriminatory performance but less accurate calibration when tested on two independent validation datasets. Improvement in calibration and testing in a prospective clinical population will be important steps in the pursuit of translation of these algorithms to the clinic. PMID:26233224