WorldWideScience

Sample records for systematics limited dataset

  1. Resolution testing and limitations of geodetic and tsunami datasets for finite fault inversions along subduction zones

    Science.gov (United States)

    Williamson, A.; Newman, A. V.

    2017-12-01

    Finite fault inversions utilizing multiple datasets have become commonplace for large earthquakes pending data availability. The mixture of geodetic datasets such as Global Navigational Satellite Systems (GNSS) and InSAR, seismic waveforms, and when applicable, tsunami waveforms from Deep-Ocean Assessment and Reporting of Tsunami (DART) gauges, provide slightly different observations that when incorporated together lead to a more robust model of fault slip distribution. The merging of different datasets is of particular importance along subduction zones where direct observations of seafloor deformation over the rupture area are extremely limited. Instead, instrumentation measures related ground motion from tens to hundreds of kilometers away. The distance from the event and dataset type can lead to a variable degree of resolution, affecting the ability to accurately model the spatial distribution of slip. This study analyzes the spatial resolution attained individually from geodetic and tsunami datasets as well as in a combined dataset. We constrain the importance of distance between estimated parameters and observed data and how that varies between land-based and open ocean datasets. Analysis focuses on accurately scaled subduction zone synthetic models as well as analysis of the relationship between slip and data in recent large subduction zone earthquakes. This study shows that seafloor deformation sensitive datasets, like open-ocean tsunami waveforms or seafloor geodetic instrumentation, can provide unique offshore resolution for understanding most large and particularly tsunamigenic megathrust earthquake activity. In most environments, we simply lack the capability to resolve static displacements using land-based geodetic observations.

  2. Handling limited datasets with neural networks in medical applications: A small-data approach.

    Science.gov (United States)

    Shaikhina, Torgyn; Khovanova, Natalia A

    2017-01-01

    Single-centre studies in medical domain are often characterised by limited samples due to the complexity and high costs of patient data collection. Machine learning methods for regression modelling of small datasets (less than 10 observations per predictor variable) remain scarce. Our work bridges this gap by developing a novel framework for application of artificial neural networks (NNs) for regression tasks involving small medical datasets. In order to address the sporadic fluctuations and validation issues that appear in regression NNs trained on small datasets, the method of multiple runs and surrogate data analysis were proposed in this work. The approach was compared to the state-of-the-art ensemble NNs; the effect of dataset size on NN performance was also investigated. The proposed framework was applied for the prediction of compressive strength (CS) of femoral trabecular bone in patients suffering from severe osteoarthritis. The NN model was able to estimate the CS of osteoarthritic trabecular bone from its structural and biological properties with a standard error of 0.85MPa. When evaluated on independent test samples, the NN achieved accuracy of 98.3%, outperforming an ensemble NN model by 11%. We reproduce this result on CS data of another porous solid (concrete) and demonstrate that the proposed framework allows for an NN modelled with as few as 56 samples to generalise on 300 independent test samples with 86.5% accuracy, which is comparable to the performance of an NN developed with 18 times larger dataset (1030 samples). The significance of this work is two-fold: the practical application allows for non-destructive prediction of bone fracture risk, while the novel methodology extends beyond the task considered in this study and provides a general framework for application of regression NNs to medical problems characterised by limited dataset sizes. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  3. Improved statistical models for limited datasets in uncertainty quantification using stochastic collocation

    Energy Technology Data Exchange (ETDEWEB)

    Alwan, Aravind; Aluru, N.R.

    2013-12-15

    This paper presents a data-driven framework for performing uncertainty quantification (UQ) by choosing a stochastic model that accurately describes the sources of uncertainty in a system. This model is propagated through an appropriate response surface function that approximates the behavior of this system using stochastic collocation. Given a sample of data describing the uncertainty in the inputs, our goal is to estimate a probability density function (PDF) using the kernel moment matching (KMM) method so that this PDF can be used to accurately reproduce statistics like mean and variance of the response surface function. Instead of constraining the PDF to be optimal for a particular response function, we show that we can use the properties of stochastic collocation to make the estimated PDF optimal for a wide variety of response functions. We contrast this method with other traditional procedures that rely on the Maximum Likelihood approach, like kernel density estimation (KDE) and its adaptive modification (AKDE). We argue that this modified KMM method tries to preserve what is known from the given data and is the better approach when the available data is limited in quantity. We test the performance of these methods for both univariate and multivariate density estimation by sampling random datasets from known PDFs and then measuring the accuracy of the estimated PDFs, using the known PDF as a reference. Comparing the output mean and variance estimated with the empirical moments using the raw data sample as well as the actual moments using the known PDF, we show that the KMM method performs better than KDE and AKDE in predicting these moments with greater accuracy. This improvement in accuracy is also demonstrated for the case of UQ in electrostatic and electrothermomechanical microactuators. We show how our framework results in the accurate computation of statistics in micromechanical systems.

  4. Improved statistical models for limited datasets in uncertainty quantification using stochastic collocation

    International Nuclear Information System (INIS)

    Alwan, Aravind; Aluru, N.R.

    2013-01-01

    This paper presents a data-driven framework for performing uncertainty quantification (UQ) by choosing a stochastic model that accurately describes the sources of uncertainty in a system. This model is propagated through an appropriate response surface function that approximates the behavior of this system using stochastic collocation. Given a sample of data describing the uncertainty in the inputs, our goal is to estimate a probability density function (PDF) using the kernel moment matching (KMM) method so that this PDF can be used to accurately reproduce statistics like mean and variance of the response surface function. Instead of constraining the PDF to be optimal for a particular response function, we show that we can use the properties of stochastic collocation to make the estimated PDF optimal for a wide variety of response functions. We contrast this method with other traditional procedures that rely on the Maximum Likelihood approach, like kernel density estimation (KDE) and its adaptive modification (AKDE). We argue that this modified KMM method tries to preserve what is known from the given data and is the better approach when the available data is limited in quantity. We test the performance of these methods for both univariate and multivariate density estimation by sampling random datasets from known PDFs and then measuring the accuracy of the estimated PDFs, using the known PDF as a reference. Comparing the output mean and variance estimated with the empirical moments using the raw data sample as well as the actual moments using the known PDF, we show that the KMM method performs better than KDE and AKDE in predicting these moments with greater accuracy. This improvement in accuracy is also demonstrated for the case of UQ in electrostatic and electrothermomechanical microactuators. We show how our framework results in the accurate computation of statistics in micromechanical systems

  5. Combining global land cover datasets to quantify agricultural expansion into forests in Latin America: Limitations and challenges

    Science.gov (United States)

    Persson, U. Martin

    2017-01-01

    While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001–2011, by combining two “state-of-the-art” global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover). We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010’s Grassland class (which we interpret as pasture) being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010’s Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon). This further suggests that the approach taken here generally leads to an underestimation (of up to ~60%) of the role of pasture in replacing forest. Second, a large share (~33%) of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too poor for

  6. Combining global land cover datasets to quantify agricultural expansion into forests in Latin America: Limitations and challenges.

    Directory of Open Access Journals (Sweden)

    Florence Pendrill

    Full Text Available While we know that deforestation in the tropics is increasingly driven by commercial agriculture, most tropical countries still lack recent and spatially-explicit assessments of the relative importance of pasture and cropland expansion in causing forest loss. Here we present a spatially explicit quantification of the extent to which cultivated land and grassland expanded at the expense of forests across Latin America in 2001-2011, by combining two "state-of-the-art" global datasets (Global Forest Change forest loss and GlobeLand30-2010 land cover. We further evaluate some of the limitations and challenges in doing this. We find that this approach does capture some of the major patterns of land cover following deforestation, with GlobeLand30-2010's Grassland class (which we interpret as pasture being the most common land cover replacing forests across Latin America. However, our analysis also reveals some major limitations to combining these land cover datasets for quantifying pasture and cropland expansion into forest. First, a simple one-to-one translation between GlobeLand30-2010's Cultivated land and Grassland classes into cropland and pasture respectively, should not be made without caution, as GlobeLand30-2010 defines its Cultivated land to include some pastures. Comparisons with the TerraClass dataset over the Brazilian Amazon and with previous literature indicates that Cultivated land in GlobeLand30-2010 includes notable amounts of pasture and other vegetation (e.g. in Paraguay and the Brazilian Amazon. This further suggests that the approach taken here generally leads to an underestimation (of up to ~60% of the role of pasture in replacing forest. Second, a large share (~33% of the Global Forest Change forest loss is found to still be forest according to GlobeLand30-2010 and our analysis suggests that the accuracy of the combined datasets, especially for areas with heterogeneous land cover and/or small-scale forest loss, is still too

  7. A new climate dataset for systematic assessments of climate change impacts as a function of global warming

    Directory of Open Access Journals (Sweden)

    J. Heinke

    2013-10-01

    Full Text Available In the ongoing political debate on climate change, global mean temperature change (ΔTglob has become the yardstick by which mitigation costs, impacts from unavoided climate change, and adaptation requirements are discussed. For a scientifically informed discourse along these lines, systematic assessments of climate change impacts as a function of ΔTglob are required. The current availability of climate change scenarios constrains this type of assessment to a narrow range of temperature change and/or a reduced ensemble of climate models. Here, a newly composed dataset of climate change scenarios is presented that addresses the specific requirements for global assessments of climate change impacts as a function of ΔTglob. A pattern-scaling approach is applied to extract generalised patterns of spatially explicit change in temperature, precipitation and cloudiness from 19 Atmosphere–Ocean General Circulation Models (AOGCMs. The patterns are combined with scenarios of global mean temperature increase obtained from the reduced-complexity climate model MAGICC6 to create climate scenarios covering warming levels from 1.5 to 5 degrees above pre-industrial levels around the year 2100. The patterns are shown to sufficiently maintain the original AOGCMs' climate change properties, even though they, necessarily, utilise a simplified relationships between ΔTglob and changes in local climate properties. The dataset (made available online upon final publication of this paper facilitates systematic analyses of climate change impacts as it covers a wider and finer-spaced range of climate change scenarios than the original AOGCM simulations.

  8. Systematically biological prioritizing remediation sites based on datasets of biological investigations and heavy metals in soil

    Science.gov (United States)

    Lin, Wei-Chih; Lin, Yu-Pin; Anthony, Johnathen

    2015-04-01

    Heavy metal pollution has adverse effects on not only the focal invertebrate species of this study, such as reduction in pupa weight and increased larval mortality, but also on the higher trophic level organisms which feed on them, either directly or indirectly, through the process of biomagnification. Despite this, few studies regarding remediation prioritization take species distribution or biological conservation priorities into consideration. This study develops a novel approach for delineating sites which are both contaminated by any of 5 readily bioaccumulated heavy metal soil contaminants and are of high ecological importance for the highly mobile, low trophic level focal species. The conservation priority of each site was based on the projected distributions of 6 moth species simulated via the presence-only maximum entropy species distribution model followed by the subsequent application of a systematic conservation tool. In order to increase the number of available samples, we also integrated crowd-sourced data with professionally-collected data via a novel optimization procedure based on a simulated annealing algorithm. This integration procedure was important since while crowd-sourced data can drastically increase the number of data samples available to ecologists, still the quality or reliability of crowd-sourced data can be called into question, adding yet another source of uncertainty in projecting species distributions. The optimization method screens crowd-sourced data in terms of the environmental variables which correspond to professionally-collected data. The sample distribution data was derived from two different sources, including the EnjoyMoths project in Taiwan (crowd-sourced data) and the Global Biodiversity Information Facility (GBIF) ?eld data (professional data). The distributions of heavy metal concentrations were generated via 1000 iterations of a geostatistical co-simulation approach. The uncertainties in distributions of the heavy

  9. Resolution and systematic limitations in beam based alignment

    Energy Technology Data Exchange (ETDEWEB)

    Tenenbaum, P.G.

    2000-03-15

    Beam based alignment of quadrupoles by variation of quadrupole strength is a widely-used technique in accelerators today. The authors describe the dominant systematic limitation of this technique, which arises from the change in the center position of the quadrupole as the strength is varied, and derive expressions for the resulting error. In addition, the authors derive an expression for the statistical resolution of such techniques in a periodic transport line, given knowledge of the line's transport matrices, the resolution of the beam position monitor system, and the details of the strength variation procedure. These results are applied to the Next Linear Collider main linear accelerator, an 11 kilometer accelerator containing 750 quadrupoles and 5,000 accelerator structures. The authors find that in principle a statistical resolution of 1 micron is easily achievable but the systematic error due to variation of the magnetic centers could be several times larger.

  10. Reliability analysis - systematic approach based on limited data

    International Nuclear Information System (INIS)

    Bourne, A.J.

    1975-11-01

    The initial approaches required for reliability analysis are outlined. These approaches highlight the system boundaries, examine the conditions under which the system is required to operate, and define the overall performance requirements. The discussion is illustrated by a simple example of an automatic protective system for a nuclear reactor. It is then shown how the initial approach leads to a method of defining the system, establishing performance parameters of interest and determining the general form of reliability models to be used. The overall system model and the availability of reliability data at the system level are next examined. An iterative process is then described whereby the reliability model and data requirements are systematically refined at progressively lower hierarchic levels of the system. At each stage, the approach is illustrated with examples from the protective system previously described. The main advantages of the approach put forward are the systematic process of analysis, the concentration of assessment effort in the critical areas and the maximum use of limited reliability data. (author)

  11. Neonatal Encephalopathy With Group B Streptococcal Disease Worldwide: Systematic Review, Investigator Group Datasets, and Meta-analysis.

    Science.gov (United States)

    Tann, Cally J; Martinello, Kathryn A; Sadoo, Samantha; Lawn, Joy E; Seale, Anna C; Vega-Poblete, Maira; Russell, Neal J; Baker, Carol J; Bartlett, Linda; Cutland, Clare; Gravett, Michael G; Ip, Margaret; Le Doare, Kirsty; Madhi, Shabir A; Rubens, Craig E; Saha, Samir K; Schrag, Stephanie; Sobanjo-Ter Meulen, Ajoke; Vekemans, Johan; Heath, Paul T

    2017-11-06

    Neonatal encephalopathy (NE) is a leading cause of child mortality and longer-term impairment. Infection can sensitize the newborn brain to injury; however, the role of group B streptococcal (GBS) disease has not been reviewed. This paper is the ninth in an 11-article series estimating the burden of GBS disease; here we aim to assess the proportion of GBS in NE cases. We conducted systematic literature reviews (PubMed/Medline, Embase, Latin American and Caribbean Health Sciences Literature [LILACS], World Health Organization Library Information System [WHOLIS], and Scopus) and sought unpublished data from investigator groups reporting GBS-associated NE. Meta-analyses estimated the proportion of GBS disease in NE and mortality risk. UK population-level data estimated the incidence of GBS-associated NE. Four published and 25 unpublished datasets were identified from 13 countries (N = 10436). The proportion of NE associated with GBS was 0.58% (95% confidence interval [CI], 0.18%-.98%). Mortality was significantly increased in GBS-associated NE vs NE alone (risk ratio, 2.07 [95% CI, 1.47-2.91]). This equates to a UK incidence of GBS-associated NE of 0.019 per 1000 live births. The consistent increased proportion of GBS disease in NE and significant increased risk of mortality provides evidence that GBS infection contributes to NE. Increased information regarding this and other organisms is important to inform interventions, especially in low- and middle-resource contexts. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America.

  12. Assessment of average of normals (AON) procedure for outlier-free datasets including qualitative values below limit of detection (LoD): an application within tumor markers such as CA 15-3, CA 125, and CA 19-9.

    Science.gov (United States)

    Usta, Murat; Aral, Hale; Mete Çilingirtürk, Ahmet; Kural, Alev; Topaç, Ibrahim; Semerci, Tuna; Hicri Köseoğlu, Mehmet

    2016-11-01

    Average of normals (AON) is a quality control procedure that is sensitive only to systematic errors that can occur in an analytical process in which patient test results are used. The aim of this study was to develop an alternative model in order to apply the AON quality control procedure to datasets that include qualitative values below limit of detection (LoD). The reported patient test results for tumor markers, such as CA 15-3, CA 125, and CA 19-9, analyzed by two instruments, were retrieved from the information system over a period of 5 months, using the calibrator and control materials with the same lot numbers. The median as a measure of central tendency and the median absolute deviation (MAD) as a measure of dispersion were used for the complementary model of AON quality control procedure. The u bias values, which were determined for the bias component of the measurement uncertainty, were partially linked to the percentages of the daily median values of the test results that fall within the control limits. The results for these tumor markers, in which lower limits of reference intervals are not medically important for clinical diagnosis and management, showed that the AON quality control procedure, using the MAD around the median, can be applied for datasets including qualitative values below LoD.

  13. OA 2014-5 Dataset - Limited Entry and Open Access cost earnings survey collecting 2014-15 data

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This project collects economic data from vessel owners participating in the West Coast limited entry fixed gear and open access groundfish, salmon, crab, and shrimp...

  14. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  15. Calculation of the detection limit in radiation measurements with systematic uncertainties

    International Nuclear Information System (INIS)

    Kirkpatrick, J.M.; Russ, W.; Venkataraman, R.; Young, B.M.

    2015-01-01

    The detection limit (L D ) or Minimum Detectable Activity (MDA) is an a priori evaluation of assay sensitivity intended to quantify the suitability of an instrument or measurement arrangement for the needs of a given application. Traditional approaches as pioneered by Currie rely on Gaussian approximations to yield simple, closed-form solutions, and neglect the effects of systematic uncertainties in the instrument calibration. These approximations are applicable over a wide range of applications, but are of limited use in low-count applications, when high confidence values are required, or when systematic uncertainties are significant. One proposed modification to the Currie formulation attempts account for systematic uncertainties within a Gaussian framework. We have previously shown that this approach results in an approximation formula that works best only for small values of the relative systematic uncertainty, for which the modification of Currie's method is the least necessary, and that it significantly overestimates the detection limit or gives infinite or otherwise non-physical results for larger systematic uncertainties where such a correction would be the most useful. We have developed an alternative approach for calculating detection limits based on realistic statistical modeling of the counting distributions which accurately represents statistical and systematic uncertainties. Instead of a closed form solution, numerical and iterative methods are used to evaluate the result. Accurate detection limits can be obtained by this method for the general case

  16. Upper limit for Poisson variable incorporating systematic uncertainties by Bayesian approach

    International Nuclear Information System (INIS)

    Zhu, Yongsheng

    2007-01-01

    To calculate the upper limit for the Poisson observable at given confidence level with inclusion of systematic uncertainties in background expectation and signal efficiency, formulations have been established along the line of Bayesian approach. A FORTRAN program, BPULE, has been developed to implement the upper limit calculation

  17. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  18. Instruments used to assess functional limitations in workers applying for disability benefit : a systematic review

    NARCIS (Netherlands)

    Spanjer, Jerry; Groothoff, Johan W.; Brouwer, Sandra

    2011-01-01

    Purpose. To systematically review the quality of the psychometric properties of instruments for assessing functional limitations in workers applying for disability benefit. Method. Electronic searches of Medline, Embase, CINAHL and PsycINFO were performed to identify studies focusing on the

  19. Registration factors that limit international mobility of people holding physiotherapy qualifications: A systematic review.

    Science.gov (United States)

    Foo, Jonathan S; Storr, Michael; Maloney, Stephen

    2016-06-01

    There is no enforced international standardisation of the physiotherapy profession. Thus, registration is used in many countries to maintain standards of care and to protect the public. However, registration may also limit international workforce mobility. What is known about the professional registration factors that may limit the international mobility of people holding physiotherapy qualifications? Systematic review using an electronic database search and hand searching of the World Confederation for Physical Therapy and International Network of Physiotherapy Regulatory Authorities websites. Analysis was conducted using thematic analysis. 10 articles and eight websites were included from the search strategy. Data is representative of high-income English speaking countries. Four themes emerged regarding limitations to professional mobility: practice context, qualification recognition, verification of fitness to practice, and incidental limitations arising from the registration process. Professional mobility is limited by differences in physiotherapy education programmes, resulting in varying standards of competency. Thus, it is often necessary to verify clinical competencies through assessments, as well as determining professional attributes and ability to apply competencies in a different practice context, as part of the registration process. There has been little evaluation of registration practices, and at present, there is a need to re-evaluate current registration processes to ensure they are efficient and effective, thereby enhancing workforce mobility. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Assessment of the effects and limitations of the 1998 to 2008 Abbreviated Injury Scale map using a large population-based dataset

    Directory of Open Access Journals (Sweden)

    Franklyn Melanie

    2011-01-01

    Full Text Available Abstract Background Trauma systems should consistently monitor a given trauma population over a period of time. The Abbreviated Injury Scale (AIS and derived scores such as the Injury Severity Score (ISS are commonly used to quantify injury severities in trauma registries. To reflect contemporary trauma management and treatment, the most recent version of the AIS (AIS08 contains many codes which differ in severity from their equivalents in the earlier 1998 version (AIS98. Consequently, the adoption of AIS08 may impede comparisons between data coded using different AIS versions. It may also affect the number of patients classified as major trauma. Methods The entire AIS98-coded injury dataset of a large population based trauma registry was retrieved and mapped to AIS08 using the currently available AIS98-AIS08 dictionary map. The percentage of codes which had increased or decreased in severity, or could not be mapped, was examined in conjunction with the effect of these changes to the calculated ISS. The potential for free text information accompanying AIS coding to improve the quality of AIS mapping was explored. Results A total of 128280 AIS98-coded injuries were evaluated in 32134 patients, 15471 patients of whom were classified as major trauma. Although only 4.5% of dictionary codes decreased in severity from AIS98 to AIS08, this represented almost 13% of injuries in the registry. In 4.9% of patients, no injuries could be mapped. ISS was potentially unreliable in one-third of patients, as they had at least one AIS98 code which could not be mapped. Using AIS08, the number of patients classified as major trauma decreased by between 17.3% and 30.3%. Evaluation of free text descriptions for some injuries demonstrated the potential to improve mapping between AIS versions. Conclusions Converting AIS98-coded data to AIS08 results in a significant decrease in the number of patients classified as major trauma. Many AIS98 codes are missing from the

  1. Limited-sampling strategies for anti-infective agents: systematic review.

    Science.gov (United States)

    Sprague, Denise A; Ensom, Mary H H

    2009-09-01

    Area under the concentration-time curve (AUC) is a pharmacokinetic parameter that represents overall exposure to a drug. For selected anti-infective agents, pharmacokinetic-pharmacodynamic parameters, such as AUC/MIC (where MIC is the minimal inhibitory concentration), have been correlated with outcome in a few studies. A limited-sampling strategy may be used to estimate pharmacokinetic parameters such as AUC, without the frequent, costly, and inconvenient blood sampling that would be required to directly calculate the AUC. To discuss, by means of a systematic review, the strengths, limitations, and clinical implications of published studies involving a limited-sampling strategy for anti-infective agents and to propose improvements in methodology for future studies. The PubMed and EMBASE databases were searched using the terms "anti-infective agents", "limited sampling", "optimal sampling", "sparse sampling", "AUC monitoring", "abbreviated AUC", "abbreviated sampling", and "Bayesian". The reference lists of retrieved articles were searched manually. Included studies were classified according to modified criteria from the US Preventive Services Task Force. Twenty studies met the inclusion criteria. Six of the studies (involving didanosine, zidovudine, nevirapine, ciprofloxacin, efavirenz, and nelfinavir) were classified as providing level I evidence, 4 studies (involving vancomycin, didanosine, lamivudine, and lopinavir-ritonavir) provided level II-1 evidence, 2 studies (involving saquinavir and ceftazidime) provided level II-2 evidence, and 8 studies (involving ciprofloxacin, nelfinavir, vancomycin, ceftazidime, ganciclovir, pyrazinamide, meropenem, and alpha interferon) provided level III evidence. All of the studies providing level I evidence used prospectively collected data and proper validation procedures with separate, randomly selected index and validation groups. However, most of the included studies did not provide an adequate description of the methods or

  2. Palliative Oncologic Care Curricula for Providers in Resource-Limited and Underserved Communities: a Systematic Review.

    Science.gov (United States)

    Xu, Melody J; Su, David; Deboer, Rebecca; Garcia, Michael; Tahir, Peggy; Anderson, Wendy; Kinderman, Anne; Braunstein, Steve; Sherertz, Tracy

    2017-12-20

    Familiarity with principles of palliative care, supportive care, and palliative oncological treatment is essential for providers caring for cancer patients, though this may be challenging in global communities where resources are limited. Herein, we describe the scope of literature on palliative oncological care curricula for providers in resource-limited settings. A systematic literature review was conducted using PubMed, Embase, Cochrane Library, Web of Science, Cumulative Index to Nursing and Allied Health Literature, Med Ed Portal databases, and gray literature. All available prospective cohort studies, case reports, and narratives published up to July 2017 were eligible for review. Fourteen articles were identified and referenced palliative care education programs in Argentina, Uganda, Kenya, Australia, Germany, the USA, or multiple countries. The most common teaching strategy was lecture-based, followed by mentorship and experiential learning involving role play and simulation. Education topics included core principles of palliative care, pain and symptom management, and communication skills. Two programs included additional topics specific to the underserved or American Indian/Alaskan Native community. Only one program discussed supportive cancer care, and no program reported educational content on resource-stratified decision-making for palliative oncological treatment. Five programs reported positive participant satisfaction, and three programs described objective metrics of increased educational or research activity. There is scant literature on effective curricula for providers treating cancer patients in resource-limited settings. Emphasizing supportive cancer care and palliative oncologic treatments may help address gaps in education; increased outcome reporting may help define the impact of palliative care curriculum within resource-limited communities.

  3. A systematic review of portable electronic technology for health education in resource-limited settings.

    Science.gov (United States)

    McHenry, Megan S; Fischer, Lydia J; Chun, Yeona; Vreeman, Rachel C

    2017-08-01

    The objective of this study is to conduct a systematic review of the literature of how portable electronic technologies with offline functionality are perceived and used to provide health education in resource-limited settings. Three reviewers evaluated articles and performed a bibliography search to identify studies describing health education delivered by portable electronic device with offline functionality in low- or middle-income countries. Data extracted included: study population; study design and type of analysis; type of technology used; method of use; setting of technology use; impact on caregivers, patients, or overall health outcomes; and reported limitations. Searches yielded 5514 unique titles. Out of 75 critically reviewed full-text articles, 10 met inclusion criteria. Study locations included Botswana, Peru, Kenya, Thailand, Nigeria, India, Ghana, and Tanzania. Topics addressed included: development of healthcare worker training modules, clinical decision support tools, patient education tools, perceptions and usability of portable electronic technology, and comparisons of technologies and/or mobile applications. Studies primarily looked at the assessment of developed educational modules on trainee health knowledge, perceptions and usability of technology, and comparisons of technologies. Overall, studies reported positive results for portable electronic device-based health education, frequently reporting increased provider/patient knowledge, improved patient outcomes in both quality of care and management, increased provider comfort level with technology, and an environment characterized by increased levels of technology-based, informal learning situations. Negative assessments included high investment costs, lack of technical support, and fear of device theft. While the research is limited, portable electronic educational resources present promising avenues to increase access to effective health education in resource-limited settings, contingent

  4. A systematic review of the efficacy and limitations of venous intervention in stasis ulceration.

    Science.gov (United States)

    Montminy, Myriam L; Jayaraj, Arjun; Raju, Seshadri

    2018-05-01

    Surgical techniques to address various components of chronic venous disease are rapidly evolving. Their efficacy and generally good results in treating superficial venous reflux (SVR) have been documented and compared in patients presenting with pain and swelling. A growing amount of literature is now available suggesting their efficacy in patients with venous leg ulcer (VLU). This review attempts to summarize the efficacy and limitations of commonly used venous interventions in the treatment of SVR and incompetent perforator veins (IPVs) in patients with VLU. A systematic review of the published literature was performed. Two different searches were conducted in MEDLINE, Embase, and EBSCOhost to identify studies that examined the efficacy of SVR ablation and IPV ablation on healing rate and recurrence rate of VLU. In the whole review, 1940 articles were screened. Of those, 45 were included in the SVR ablation review and 4 in the IPV ablation review. Data were too heterogeneous to perform an adequate meta-analysis. The quality of evidence assessed by the Grading of Recommendations Assessment, Development, and Evaluation for the two outcomes varied from very low to moderate. Ulcer healing rate and recurrence rate were between 70% and 100% and 0% and 49% in the SVR ablation review and between 59% and 93% and 4% and 33% in the IPV ablation review, respectively. To explain those variable results, limitations such as inadequate diagnostic techniques, saphenous size, concomitant calf pump dysfunction, and associated deep venous reflux are discussed. Currently available minimally invasive techniques correct most venous pathologic processes in chronic venous disease with a good sustainable healing rate. There are still specific diagnostic and efficacy limitations that mandate proper match of individual patients with the planned approach. Copyright © 2017 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  5. Potential and Limitations of Cochrane Reviews in Pediatric Cardiology: A Systematic Analysis.

    Science.gov (United States)

    Poryo, Martin; Khosrawikatoli, Sara; Abdul-Khaliq, Hashim; Meyer, Sascha

    2017-04-01

    Evidence-based medicine has contributed substantially to the quality of medical care in pediatric and adult cardiology. However, our impression from the bedside is that a substantial number of Cochrane reviews generate inconclusive data that are of limited clinical benefit. We performed a systematic synopsis of Cochrane reviews published between 2001 and 2015 in the field of pediatric cardiology. Main outcome parameters were the number and percentage of conclusive, partly conclusive, and inconclusive reviews as well as their recommendations and their development over three a priori defined intervals. In total, 69 reviews were analyzed. Most of them examined preterm and term neonates (36.2%), whereas 33.3% included also non-pediatric patients. Leading topics were pharmacological issues (71.0%) followed by interventional (10.1%) and operative procedures (2.9%). The majority of reviews were inconclusive (42.9%), while 36.2% were conclusive and 21.7% partly conclusive. Although the number of published reviews increased during the three a priori defined time intervals, reviews with "no specific recommendations" remained stable while "recommendations in favor of an intervention" clearly increased. Main reasons for missing recommendations were insufficient data (n = 41) as well as an insufficient number of trials (n = 22) or poor study quality (n = 19). There is still need for high-quality research, which will likely yield a greater number of Cochrane reviews with conclusive results.

  6. Chances and Limitations of Video Games in the Fight against Childhood Obesity-A Systematic Review.

    Science.gov (United States)

    Mack, Isabelle; Bayer, Carolin; Schäffeler, Norbert; Reiband, Nadine; Brölz, Ellen; Zurstiege, Guido; Fernandez-Aranda, Fernando; Gawrilow, Caterina; Zipfel, Stephan

    2017-07-01

    A systematic literature search was conducted to assess the chances and limitations of video games to combat and prevent childhood obesity. This search included studies with video or computer games targeting nutrition, physical activity and obesity for children between 7 and 15 years of age. The study distinguished between games that aimed to (i) improve knowledge about nutrition, eating habits and exercise; (ii) increase physical activity; or (iii) combine both approaches. Overall, the games were well accepted. On a qualitative level, most studies reported positive effects on obesity-related outcomes (improvement of weight-related parameters, physical activity or dietary behaviour/knowledge). However, the observed effects were small. The games did not address psychosocial aspects. Using video games for weight management exclusively does not deliver satisfying results. Video games as an additional guided component of prevention and treatment programs have the potential to increase compliance and thus enhance treatment outcome. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association. Copyright © 2017 John Wiley & Sons, Ltd and Eating Disorders Association.

  7. Dataset of the first transcriptome assembly of the tree crop “yerba mate” (Ilex paraguariensis and systematic characterization of protein coding genes

    Directory of Open Access Journals (Sweden)

    Patricia M. Aguilera

    2018-04-01

    Full Text Available This contribution contains data associated to the research article entitled “Exploring the genes of yerba mate (Ilex paraguariensis A. St.-Hil. by NGS and de novo transcriptome assembly” (Debat et al., 2014 [1]. By means of a bioinformatic approach involving extensive NGS data analyses, we provide a resource encompassing the full transcriptome assembly of yerba mate, the first available reference for the Ilex L. genus. This dataset (Supplementary files 1 and 2 consolidates the transcriptome-wide assembled sequences of I. paraguariensis with further comprehensive annotation of the protein coding genes of yerba mate via the integration of Arabidopsis thaliana databases. The generated data is pivotal for the characterization of agronomical relevant genes in the tree crop yerba mate -a non-model species- and related taxa in Ilex. The raw sequencing data dissected here is available at DDBJ/ENA/GenBank (NCBI Resource Coordinators, 2016 [2] Sequence Read Archive (SRA under the accession SRP043293 and the assembled sequences have been deposited at the Transcriptome Shotgun Assembly Sequence Database (TSA under the accession GFHV00000000.

  8. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  9. Limited evidence of abnormal intra-colonic pressure profiles in diverticular disease - a systematic review.

    Science.gov (United States)

    Jaung, R; Robertson, J; O'Grady, G; Milne, T; Rowbotham, D; Bissett, I P

    2017-06-01

    Abnormal colonic pressure profiles and high intraluminal pressures are postulated to contribute to the formation of sigmoid colon diverticulosis and the pathophysiology of diverticular disease. This study aimed to review evidence for abnormal colonic pressure profiles in diverticulosis. All published studies investigating colonic pressure in patients with diverticulosis were searched in three databases (Medline, Embase, Scopus). No language restrictions were applied. Any manometry studies in which patients with diverticulosis were compared with controls were included. The Newcastle-Ottawa Quality Assessment Scale (NOS) for case-control studies was used as a measure of risk of bias. A cut-off of five or more points on the NOS (fair quality in terms of risk of bias) was chosen for inclusion in the meta-analysis. Ten studies (published 1962-2005) met the inclusion criteria. The studies followed a wide variety of protocols and all used low-resolution manometry (sensor spacing range 7.5-15 cm). Six studies compared intra-sigmoid pressure, with five of six showing higher pressure in diverticulosis vs controls, but only two reached statistical significance. A meta-analysis was not performed as only two studies were above the cut-off and these did not have comparable outcomes. This systematic review of manometry data shows that evidence for abnormal pressure in the sigmoid colon in patients with diverticulosis is weak. Existing studies utilized inconsistent methodology, showed heterogeneous results and are of limited quality. Higher quality studies using modern manometric techniques and standardized reporting methods are needed to clarify the role of colonic pressure in diverticulosis. Colorectal Disease © 2017 The Association of Coloproctology of Great Britain and Ireland.

  10. Limited Evidence on the Management of Respiratory Tract Infections in Down's Syndrome : A Systematic Review

    NARCIS (Netherlands)

    Manikam, Logan; Reed, Kate; Venekamp, Roderick P; Hayward, Andrew; Littlejohns, Peter; Schilder, Anne; Lakhanpaul, Monica

    2016-01-01

    AIMS: To systematically review the effectiveness of preventative and therapeutic interventions for respiratory tract infections (RTIs) in people with Down's syndrome. METHODS: Databases were searched for any published and ongoing studies of respiratory tract diseases in children and adults with

  11. Systematic review of the limited evidence for different surgical techniques at benign hysterectomy

    DEFF Research Database (Denmark)

    Sloth, Sigurd Beier; Schroll, Jeppe Bennekou; Settnes, Annette

    2017-01-01

    guideline on the subject based on a systematic review of the literature. A guideline panel of seven gynecologists formulated the clinical questions for the guideline. A search specialist performed the comprehensive literature search. The guideline panel reviewed the literature and rated the quality...

  12. The limited prosocial effects of meditation: A systematic review and meta-analysis

    NARCIS (Netherlands)

    Kreplin, U.; Farias, M.; Brazil, I.A.

    2018-01-01

    Many individuals believe that meditation has the capacity to not only alleviate mental-illness but to improve prosociality. This article systematically reviewed and meta-analysed the effects of meditation interventions on prosociality in randomized controlled trials of healthy adults. Five types of

  13. Limited evidence for the use of imaging to detect prostate cancer: A systematic review

    International Nuclear Information System (INIS)

    Blomqvist, L.; Carlsson, S.; Gjertsson, P.; Heintz, E.; Hultcrantz, M.; Mejare, I.; Andrén, O.

    2014-01-01

    Highlights: • In men with clinical suspicion of prostate cancer, ultrasound guided systematic biopsies is the golden standard for diagnosis. • Diagnostic imaging techniques, especially magnetic resonance imaging, is being used in trials to aid detection of prostate cancer. • To date, there is insufficient scientific evidence for the use of imaging techniques to detect prostate cancer. - Abstract: Objective: To assess the diagnostic accuracy of imaging technologies for detecting prostate cancer in patients with elevated PSA-values or suspected findings on clinical examination. Methods: The databases Medline, EMBASE, Cochrane, CRD HTA/DARE/NHS EED and EconLit were searched until June 2013. Pre-determined inclusion criteria were used to select full text articles. Risk of bias in individual studies was rated according to QUADAS or AMSTAR. Abstracts and full text articles were assessed independently by two reviewers. The performance of diagnostic imaging was compared with systematic biopsies (reference standard) and sensitivity and specificity were calculated. Results: The literature search yielded 5141 abstracts, which were reviewed by two independent reviewers. Of these 4852 were excluded since they did not meet the inclusion criteria. 288 articles were reviewed in full text for quality assessment. Six studies, three using MRI and three using transrectal ultrasound were included. All were rated as high risk of bias. Relevant studies on PET/CT were not identified. Conclusion: Despite clinical use, there is insufficient evidence regarding the accuracy of imaging technologies for detecting cancer in patients with suspected prostate cancer using TRUS guided systematic biopsies as reference standard

  14. Limited evidence for the use of imaging to detect prostate cancer: A systematic review

    Energy Technology Data Exchange (ETDEWEB)

    Blomqvist, L., E-mail: lennart.k.blomqvist@ki.se [Department of Diagnostic Radiology, Karolinska University Hospital, Solna (Sweden); Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm (Sweden); Carlsson, S. [Department of Molecular Medicine and Surgery, Karolinska Institutet, Stockholm (Sweden); Department of Urology, Karolinska University Hospital, Solna (Sweden); Gjertsson, P. [Department of Clinical Physiology, Sahlgrenska University Hospital, Gothenburg (Sweden); Heintz, E.; Hultcrantz, M.; Mejare, I. [The Swedish Council on Health Technology Assessment, Stockholm (Sweden); Andrén, O. [School of Health and Medical Sciences, Örebro University, Örebro (Sweden); Department of Urology, Örebro University Hospital, Örebro (Sweden)

    2014-09-15

    Highlights: • In men with clinical suspicion of prostate cancer, ultrasound guided systematic biopsies is the golden standard for diagnosis. • Diagnostic imaging techniques, especially magnetic resonance imaging, is being used in trials to aid detection of prostate cancer. • To date, there is insufficient scientific evidence for the use of imaging techniques to detect prostate cancer. - Abstract: Objective: To assess the diagnostic accuracy of imaging technologies for detecting prostate cancer in patients with elevated PSA-values or suspected findings on clinical examination. Methods: The databases Medline, EMBASE, Cochrane, CRD HTA/DARE/NHS EED and EconLit were searched until June 2013. Pre-determined inclusion criteria were used to select full text articles. Risk of bias in individual studies was rated according to QUADAS or AMSTAR. Abstracts and full text articles were assessed independently by two reviewers. The performance of diagnostic imaging was compared with systematic biopsies (reference standard) and sensitivity and specificity were calculated. Results: The literature search yielded 5141 abstracts, which were reviewed by two independent reviewers. Of these 4852 were excluded since they did not meet the inclusion criteria. 288 articles were reviewed in full text for quality assessment. Six studies, three using MRI and three using transrectal ultrasound were included. All were rated as high risk of bias. Relevant studies on PET/CT were not identified. Conclusion: Despite clinical use, there is insufficient evidence regarding the accuracy of imaging technologies for detecting cancer in patients with suspected prostate cancer using TRUS guided systematic biopsies as reference standard.

  15. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  16. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  17. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  18. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  19. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  20. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  1. Why Are Older People Often So Responsible and Considerate Even When Their Future Seems Limited? A Systematic Review.

    Science.gov (United States)

    Moss, Simon A; Wilson, Samuel G

    2018-01-01

    Socioemotional selectivity theory assumes that older individuals tend to perceive their identity or life as limited in time and, therefore, prioritize meaningful relationships. Yet, other research shows that people who perceive their identity as limited in time tend to behave impulsively-contrary to the behavior of many older individuals. To redress this paradox, this article reports a systematic review, comprising 86 papers, that examined the consequences of whether individuals perceive their identity as limited or enduring. To reconcile conflicts in the literature, we propose that, before an impending transition, some individuals perceive their life now as dissociated from their future goals and, therefore, will tend to behave impulsively. Other individuals however, especially if older, tend to pursue a quest or motivation that transcends this transition, fostering delayed gratification, and responsible behavior.

  2. A systematic review and synthesis of the strengths and limitations of measuring malaria mortality through verbal autopsy.

    Science.gov (United States)

    Herrera, Samantha; Enuameh, Yeetey; Adjei, George; Ae-Ngibise, Kenneth Ayuurebobi; Asante, Kwaku Poku; Sankoh, Osman; Owusu-Agyei, Seth; Yé, Yazoume

    2017-10-23

    Lack of valid and reliable data on malaria deaths continues to be a problem that plagues the global health community. To address this gap, the verbal autopsy (VA) method was developed to ascertain cause of death at the population level. Despite the adoption and wide use of VA, there are many recognized limitations of VA tools and methods, especially for measuring malaria mortality. This study synthesizes the strengths and limitations of existing VA tools and methods for measuring malaria mortality (MM) in low- and middle-income countries through a systematic literature review. The authors searched PubMed, Cochrane Library, Popline, WHOLIS, Google Scholar, and INDEPTH Network Health and Demographic Surveillance System sites' websites from 1 January 1990 to 15 January 2016 for articles and reports on MM measurement through VA. article presented results from a VA study where malaria was a cause of death; article discussed limitations/challenges related to measurement of MM through VA. Two authors independently searched the databases and websites and conducted a synthesis of articles using a standard matrix. The authors identified 828 publications; 88 were included in the final review. Most publications were VA studies; others were systematic reviews discussing VA tools or methods; editorials or commentaries; and studies using VA data to develop MM estimates. The main limitation were low sensitivity and specificity of VA tools for measuring MM. Other limitations included lack of standardized VA tools and methods, lack of a 'true' gold standard to assess accuracy of VA malaria mortality. Existing VA tools and methods for measuring MM have limitations. Given the need for data to measure progress toward the World Health Organization's Global Technical Strategy for Malaria 2016-2030 goals, the malaria community should define strategies for improving MM estimates, including exploring whether VA tools and methods could be further improved. Longer term strategies should focus

  3. Criminal systematic and limits of Proposals Functionalists ( Weightings About Warranties , Citizenship and Human Rights

    Directory of Open Access Journals (Sweden)

    Felipe Augusto Forte de Negreiros Deodat

    2016-06-01

    Full Text Available Make a critique of functionalism means looking at the history of the construction of the penal systems. It is observed that the rigor of analysis is something that is imposed when we have a system as a tool work. It is essential for that what now arises in legal and criminal terms sees as the study of criminal law should be increasingly precise and also closer to the idea of human dignity. It will also be built a criticism for the two doctrines that have changed the face of the first systematic, designed in the nineteenth, which will allow us to see more accurately what can, or even should, be changed. One cannot help but praise the normativism, especially what received the indelible strengthening of the cultivators of criminal science of the past half century.

  4. Assessment of activity limitations and participation restrictions with persons with chronic fatigue syndrome: a systematic review.

    Science.gov (United States)

    Vergauwen, Kuni; Huijnen, Ivan P J; Kos, Daphne; Van de Velde, Dominique; van Eupen, Inge; Meeus, Mira

    2015-01-01

    To summarize measurement instruments used to evaluate activity limitations and participation restrictions in patients with chronic fatigue syndrome (CFS) and review the psychometric properties of these instruments. General information of all included measurement instruments was extracted. The methodological quality was evaluated using the COSMIN checklist. Results of the measurement properties were rated based on the quality criteria of Terwee et al. Finally, overall quality was defined per psychometric property and measurement instrument by use of the quality criteria by Schellingerhout et al. A total of 68 articles were identified of which eight evaluated the psychometric properties of a measurement instrument assessing activity limitations and participation restrictions. One disease-specific and 37 generic measurement instruments were found. Limited evidence was found for the psychometric properties and clinical usability of these instruments. However, the CFS-activities and participation questionnaire (APQ) is a disease-specific instrument with moderate content and construct validity. The psychometric properties of the reviewed measurement instruments to evaluate activity limitations and participation restrictions are not sufficiently evaluated. Future research is needed to evaluate the psychometric properties of the measurement instruments, including the other properties of the CFS-APQ. If it is necessary to use a measurement instrument, the CFS-APQ is recommended. Chronic fatigue syndrome (CFS). Chronic fatigue syndrome causes activity limitations and participation restrictions in one or more areas of life. Standardized, reliable and valid measurement instruments are necessary to identify these limitations and restrictions. Currently, no measurement instrument is sufficiently evaluated with persons with CFS. If a measurement instrument is needed to identify activity limitations and participation restrictions with persons with CFS, it is recommended to use

  5. Colorado River sediment transport: 2. Systematic bed‐elevation and grain‐size effects of sand supply limitation

    Science.gov (United States)

    Topping, David J.; Rubin, David M.; Nelson, Jonathan M.; Kinzel, Paul J.; Corson, Ingrid C.

    2000-01-01

    The Colorado River in Marble and Grand Canyons displays evidence of annual supply limitation with respect to sand both prior to [Topping et al, this issue] and after the closure of Glen Canyon Dam in 1963. Systematic changes in bed elevation and systematic coupled changes in suspended‐sand concentration and grain size result from this supply limitation. During floods, sand supply limitation either causes or modifies a lag between the time of maximum discharge and the time of either maximum or minimum (depending on reach geometry) bed elevation. If, at a cross section where the bed aggrades with increasing flow, the maximum bed elevation is observed to lead the peak or the receding limb of a flood, then this observed response of the bed is due to sand supply limitation. Sand supply limitation also leads to the systematic evolution of sand grain size (both on the bed and in suspension) in the Colorado River. Sand input during a tributary flood travels down the Colorado River as an elongating sediment wave, with the finest sizes (because of their lower settling velocities) traveling the fastest. As the fine front of a sediment wave arrives at a given location, the bed fines and suspended‐sand concentrations increase in response to the enhanced upstream supply of finer sand. Then, as the front of the sediment wave passes that location, the bed is winnowed and suspended‐sand concentrations decrease in response to the depletion of the upstream supply of finer sand. The grain‐size effects of depletion of the upstream sand supply are most obvious during periods of higher dam releases (e.g., the 1996 flood experiment and the 1997 test flow). Because of substantial changes in the grain‐size distribution of the bed, stable relationships between the discharge of water and sand‐transport rates (i.e., stable sand rating curves) are precluded. Sand budgets in a supply‐limited river like the Colorado River can only be constructed through inclusion of the physical

  6. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  7. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  8. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  9. Resistance training for activity limitations in older adults with skeletal muscle function deficits: a systematic review

    Directory of Open Access Journals (Sweden)

    Papa EV

    2017-06-01

    Full Text Available Evan V Papa,1 Xiaoyang Dong,2 Mahdi Hassan1 1Department of Rehabilitation Medicine, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi Province, People’s Republic of China; 2Department of Physical Therapy, University of North Texas Health Science Center, Fort Worth, TX, USA Abstract: Human aging results in a variety of changes to skeletal muscle. Sarcopenia is the age-associated loss of muscle mass and is one of the main contributors to musculoskeletal impairments in the elderly. Previous research has demonstrated that resistance training can attenuate skeletal muscle function deficits in older adults, however few articles have focused on the effects of resistance training on functional mobility. The purpose of this systematic review was to 1 present the current state of literature regarding the effects of resistance training on functional mobility outcomes for older adults with skeletal muscle function deficits and 2 provide clinicians with practical guidelines that can be used with seniors during resistance training, or to encourage exercise. We set forth evidence that resistance training can attenuate age-related changes in functional mobility, including improvements in gait speed, static and dynamic balance, and fall risk reduction. Older adults should be encouraged to participate in progressive resistance training activities, and should be admonished to move along a continuum of exercise from immobility, toward the recommended daily amounts of activity. Keywords: aging, strength training, sarcopenia, mobility, balance

  10. Systematic investigation of NLTE phenomena in the limit of small departures from LTE

    Science.gov (United States)

    Libby, S. B.; Graziani, F. R.; More, R. M.; Kato, T.

    1997-04-01

    In this paper, we begin a systematic study of Non-Local Thermal Equilibrium (NLTE) phenomena in near equilibrium (LTE) high energy density, highly radiative plasmas. It is shown that the principle of minimum entropy production rate characterizes NLTE steady states for average atom rate equations in the case of small departures form LTE. With the aid of a novel hohlraum-reaction box thought experiment, we use the principles of minimum entropy production and detailed balance to derive Onsager reciprocity relations for the NLTE responses of a near equilibrium sample to non-Planckian perturbations in different frequency groups. This result is a significant symmetry constraint on the linear corrections to Kirchoff's law. We envisage applying our strategy to a number of test problems which include: the NLTE corrections to the ionization state of an ion located near the edge of an otherwise LTE medium; the effect of a monochromatic radiation field perturbation on an LTE medium; the deviation of Rydberg state populations from LTE in recombining or ionizing plasmas; multi-electron temperature models such as that of Busquet; and finally, the effect of NLTE population shifts on opacity models.

  11. Systematic investigation of NLTE phenomena in the limit of small departures from LTE

    International Nuclear Information System (INIS)

    Libby, S.B.; Graziani, F.R.; More, R.M.; Kato, T.

    1997-01-01

    In this paper, we begin a systematic study of Non-Local Thermal Equilibrium (NLTE) phenomena in near equilibrium (LTE) high energy density, highly radiative plasmas. It is shown that the principle of minimum entropy production rate characterizes NLTE steady states for average atom rate equations in the case of small departures form LTE. With the aid of a novel hohlraum-reaction box thought experiment, we use the principles of minimum entropy production and detailed balance to derive Onsager reciprocity relations for the NLTE responses of a near equilibrium sample to non-Planckian perturbations in different frequency groups. This result is a significant symmetry constraint on the linear corrections to Kirchoff close-quote s law. We envisage applying our strategy to a number of test problems which include: the NLTE corrections to the ionization state of an ion located near the edge of an otherwise LTE medium; the effect of a monochromatic radiation field perturbation on an LTE medium; the deviation of Rydberg state populations from LTE in recombining or ionizing plasmas; multi-electron temperature models such as that of Busquet; and finally, the effect of NLTE population shifts on opacity models. copyright 1997 American Institute of Physics

  12. Systematic investigation of NLTE phenomena in the limit of small departures from LTE

    International Nuclear Information System (INIS)

    Libby, S. B.; Graziani, F. R.; More, R. M.; Kato, T.

    1997-01-01

    In this paper, we begin a systematic study of Non-Local Thermal Equilibrium (NLTE) phenomena in near equilibrium (LTE) high energy density, highly radiative plasmas. It is shown that the principle of minimum entropy production rate characterizes NLTE steady states for average atom rate equations in the case of small departures form LTE. With the aid of a novel hohlraum-reaction box thought experiment, we use the principles of minimum entropy production and detailed balance to derive Onsager reciprocity relations for the NLTE responses of a near equilibrium sample to non-Planckian perturbations in different frequency groups. This result is a significant symmetry constraint on the linear corrections to Kirchoff's law. We envisage applying our strategy to a number of test problems which include: the NLTE corrections to the ionization state of an ion located near the edge of an otherwise LTE medium; the effect of a monochromatic radiation field perturbation on an LTE medium; the deviation of Rydberg state populations from LTE in recombining or ionizing plasmas; multi-electron temperature models such as that of Busquet; and finally, the effect of NLTE population shifts on opacity models

  13. Neuropathic pain screening questionnaires have limited measurement properties. A systematic review.

    Science.gov (United States)

    Mathieson, Stephanie; Maher, Christopher G; Terwee, Caroline B; Folly de Campos, Tarcisio; Lin, Chung-Wei Christine

    2015-08-01

    The Douleur Neuropathique 4 (DN4), ID Pain, Leeds Assessment of Neuropathic Symptoms and Signs (LANSS), PainDETECT, and Neuropathic Pain Questionnaire have been recommended as screening questionnaires for neuropathic pain. This systematic review aimed to evaluate the measurement properties (eg, criterion validity and reliability) of these questionnaires. Online database searches were conducted and two independent reviewers screened studies and extracted data. Methodological quality of included studies and the measurement properties were assessed against established criteria. A modified Grading of Recommendations Assessment, Development and Evaluation approach was used to summarize the level of evidence. Thirty-seven studies were included. Most studies recruited participants from pain clinics. The original version of the DN4 (French) and Neuropathic Pain Questionnaire (English) had the most number of satisfactory measurement properties. The ID Pain (English) demonstrated satisfactory hypothesis testing and reliability, but all other properties tested were unsatisfactory. The LANSS (English) was unsatisfactory for all properties, except specificity. The PainDETECT (English) demonstrated satisfactory hypothesis testing and criterion validity. In general, the cross-cultural adaptations had less evidence than the original versions. Overall, the DN4 and Neuropathic Pain Questionnaire were most suitable for clinical use. These screening questionnaires should not replace a thorough clinical assessment. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  14. Systematic classical continuum limits of integrable spin chains and emerging novel dualities

    International Nuclear Information System (INIS)

    Avan, Jean; Doikou, Anastasia; Sfetsos, Konstadinos

    2010-01-01

    We examine certain classical continuum long wave-length limits of prototype integrable quantum spin chains. We define the corresponding construction of classical continuum Lax operators. Our discussion starts with the XXX chain, the anisotropic Heisenberg model and their generalizations and extends to the generic isotropic and anisotropic gl n magnets. Certain classical and quantum integrable models emerging from special 'dualities' of quantum spin chains, parametrized by c-number matrices, are also presented.

  15. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  16. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  17. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  18. Systematic review of power mobility outcomes for infants, children and adolescents with mobility limitations.

    Science.gov (United States)

    Livingstone, Roslyn; Field, Debra

    2014-10-01

    To summarize and critically appraise the evidence related to power mobility use in children (18 years or younger) with mobility limitations. Searches were performed in 12 electronic databases along with hand searching for articles published in English to September 2012 and updated February 2014. The search was restricted to quantitative studies including at least one child with a mobility limitation and measuring an outcome related to power mobility device use. Articles were appraised using American Academy of Cerebral Palsy and Developmental Medicine (AACPDM) criteria for group and single-subject designs. The PRISMA statement was followed with inclusion criteria set a priori. Two reviewers independently screened titles, abstracts and full-text articles. AACPDM quality ratings were completed for levels I-III studies. Of 259 titles, 29 articles met inclusion criteria, describing 28 primary research studies. One study, rated as strong level II evidence, supported positive impact of power mobility on overall development as well as independent mobility. Another study, rated as moderate level III evidence, supported positive impact on self-initiated movement. Remaining studies, rated evidence levels IV and V, provided support for a positive impact on a broad range of outcomes from to International Classification of Functioning (ICF) components of body structure and function, activity and participation. Some studies suggest that environmental factors may be influential in successful power mobility use and skill development. The body of evidence supporting outcomes for children using power mobility is primarily descriptive rather than experimental in nature, suggesting research in this area is in its infancy. © The Author(s) 2014.

  19. A systematic review of the sleep, sleepiness, and performance implications of limited wake shift work schedules.

    Science.gov (United States)

    Short, Michelle A; Agostini, Alexandra; Lushington, Kurt; Dorrian, Jillian

    2015-09-01

    The aim of this review was to identify which limited wake shift work schedules (LWSW) best promote sleep, alertness, and performance. LWSW are fixed work/rest cycles where the time-at-work does is ≤8 hours and there is >1 rest period per day, on average, for ≥2 consecutive days. These schedules are commonly used in safety-critical industries such as transport and maritime industries. Literature was sourced using PubMed, Embase, PsycInfo, Scopus, and Google Scholar databases. We identified 20 independent studies (plus a further 2 overlapping studies), including 5 laboratory and 17 field-based studies focused on maritime watch keepers, ship bridge officers, and long-haul train drivers. The measurement of outcome measures was varied, incorporating subjective and objective measures of sleep: sleep diaries (N=5), actigraphy (N=4), and polysomnography, (N=3); sleepiness: Karolinska Sleepiness Scale (N=5), visual analog scale (VAS) alertness (N=2) and author-derived measures (N=2); and performance: Psychomotor Vigilance Test (PVT) (N=5), Reaction Time or Vigilance tasks (N=4), Vector and Letter Cancellation Test (N=1), and subjective performance (N=2). Of the three primary rosters examined (6 hours-on/6 hours-off, 8 hours-on/8 hours-off and 4 hours-on/8 hours-off), the 4 hours-on/8 hours-off roster was associated with better sleep and lower levels of sleepiness. Individuals working 4 hours-on/8 hours-off rosters averaged 1 hour more sleep per night than those working 6 hours-on/6 hours-off and 1.3 hours more sleep than those working 8 hours-on/8 hours-off (Pwork, (ii) more frequent rest breaks, (iii) shifts that start and end at the same clock time every 24 hours, and (iv) work shifts commencing in the daytime (as opposed to night). The findings for performance remain incomplete due to the small number of studies containing a performance measure and the heterogeneity of performance measures within those that did. The literature supports the utility of LWSW in

  20. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication.

    Science.gov (United States)

    Moorhead, S Anne; Hazlett, Diane E; Harrison, Laura; Carroll, Jennifer K; Irwin, Anthea; Hoving, Ciska

    2013-04-23

    There is currently a lack of information about the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals from primary research. To review the current published literature to identify the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals, and identify current gaps in the literature to provide recommendations for future health communication research. This paper is a review using a systematic approach. A systematic search of the literature was conducted using nine electronic databases and manual searches to locate peer-reviewed studies published between January 2002 and February 2012. The search identified 98 original research studies that included the uses, benefits, and/or limitations of social media for health communication among the general public, patients, and health professionals. The methodological quality of the studies assessed using the Downs and Black instrument was low; this was mainly due to the fact that the vast majority of the studies in this review included limited methodologies and was mainly exploratory and descriptive in nature. Seven main uses of social media for health communication were identified, including focusing on increasing interactions with others, and facilitating, sharing, and obtaining health messages. The six key overarching benefits were identified as (1) increased interactions with others, (2) more available, shared, and tailored information, (3) increased accessibility and widening access to health information, (4) peer/social/emotional support, (5) public health surveillance, and (6) potential to influence health policy. Twelve limitations were identified, primarily consisting of quality concerns and lack of reliability, confidentiality, and privacy. Social media brings a new dimension to health care as it offers a medium to be used by the public, patients, and health

  1. Systematic review of electronic surveillance of infectious diseases with emphasis on antimicrobial resistance surveillance in resource-limited settings.

    Science.gov (United States)

    Rattanaumpawan, Pinyo; Boonyasiri, Adhiratha; Vong, Sirenda; Thamlikitkul, Visanu

    2018-02-01

    Electronic surveillance of infectious diseases involves rapidly collecting, collating, and analyzing vast amounts of data from interrelated multiple databases. Although many developed countries have invested in electronic surveillance for infectious diseases, the system still presents a challenge for resource-limited health care settings. We conducted a systematic review by performing a comprehensive literature search on MEDLINE (January 2000-December 2015) to identify studies relevant to electronic surveillance of infectious diseases. Study characteristics and results were extracted and systematically reviewed by 3 infectious disease physicians. A total of 110 studies were included. Most surveillance systems were developed and implemented in high-income countries; less than one-quarter were conducted in low-or middle-income countries. Information technologies can be used to facilitate the process of obtaining laboratory, clinical, and pharmacologic data for the surveillance of infectious diseases, including antimicrobial resistance (AMR) infections. These novel systems require greater resources; however, we found that using electronic surveillance systems could result in shorter times to detect targeted infectious diseases and improvement of data collection. This study highlights a lack of resources in areas where an effective, rapid surveillance system is most needed. The availability of information technology for the electronic surveillance of infectious diseases, including AMR infections, will facilitate the prevention and containment of such emerging infectious diseases. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  2. Systematic Expansion of Active Spaces beyond the CASSCF Limit: A GASSCF/SplitGAS Benchmark Study.

    Science.gov (United States)

    Vogiatzis, Konstantinos D; Li Manni, Giovanni; Stoneburner, Samuel J; Ma, Dongxia; Gagliardi, Laura

    2015-07-14

    The applicability and accuracy of the generalized active space self-consistent field, (GASSCF), and (SplitGAS) methods are presented. The GASSCF method enables the exploration of larger active spaces than with the conventional complete active space SCF, (CASSCF), by fragmentation of a large space into subspaces and by controlling the interspace excitations. In the SplitGAS method, the GAS configuration interaction, CI, expansion is further partitioned in two parts: the principal, which includes the most important configuration state functions, and an extended, containing less relevant but not negligible ones. An effective Hamiltonian is then generated, with the extended part acting as a perturbation to the principal space. Excitation energies of ozone, furan, pyrrole, nickel dioxide, and copper tetrachloride dianion are reported. Various partitioning schemes of the GASSCF and SplitGAS CI expansions are considered and compared with the complete active space followed by second-order perturbation theory, (CASPT2), and multireference CI method, (MRCI), or available experimental data. General guidelines for the optimum applicability of these methods are discussed together with their current limitations.

  3. Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic approach to study the limits of detection.

    Science.gov (United States)

    van de Ven, Stephanie M W Y; Bemis, Kyle D; Lau, Kenneth; Adusumilli, Ravali; Kota, Uma; Stolowitz, Mark; Vitek, Olga; Mallick, Parag; Gambhir, Sanjiv S

    2016-06-01

    MALDI mass spectrometry imaging (MSI) is emerging as a tool for protein and peptide imaging across tissue sections. Despite extensive study, there does not yet exist a baseline study evaluating the potential capabilities for this technique to detect diverse proteins in tissue sections. In this study, we developed a systematic approach for characterizing MALDI-MSI workflows in terms of limits of detection, coefficients of variation, spatial resolution, and the identification of endogenous tissue proteins. Our goal was to quantify these figures of merit for a number of different proteins and peptides, in order to gain more insight in the feasibility of protein biomarker discovery efforts using this technique. Control proteins and peptides were deposited in serial dilutions on thinly sectioned mouse xenograft tissue. Using our experimental setup, coefficients of variation were biomarkers and a new benchmarking strategy that can be used for comparing diverse MALDI-MSI workflows. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  4. Effective convergence to complete orbital bases and to the atomic Hartree--Fock limit through systematic sequences of Gaussian primitives

    International Nuclear Information System (INIS)

    Schmidt, M.W.; Ruedenberg, K.

    1979-01-01

    Optimal starting points for expanding molecular orbitals in terms of atomic orbitals are the self-consistent-field orbitals of the free atoms and accurate information about the latter is essential for the construction of effective AO bases for molecular calculations. For expansions of atomic SCF orbitals in terms of Gaussian primitives, which are of particular interest for applications in polyatomic quantum chemistry, previous information has been limited in accuracy. In the present investigation a simple procedure is given for finding expansions of atomic self-consistent-field orbitals in terms of Gaussian primitives to arbitrarily high accuracy. The method furthermore opens the first avenue so far for approaching complete basis sets through systematic sequences of atomic orbitals

  5. Polysomnographic Aspects of Sleep Architecture on Self-limited Epilepsy with Centrotemporal Spikes: A Systematic Review and Meta-analysis

    Directory of Open Access Journals (Sweden)

    Camila dos Santos Halal

    Full Text Available Self-limited epilepsy with centrotemporal spikes is the most common paediatric epileptic syndrome, with growing evidence linking it to various degrees and presentations of neuropsychological dysfunction. The objective of this study is to evaluate the possible sleep macro and microstructural alterations in children with this diagnosis. A systematic review of published manuscripts was carried out in Medline, LILACS and Scielo databases, using the MeSH terms epilepsy, sleep and polysomnography. From 753 retrieved references, 5 were selected, and data from macro and, when available, microstructure of sleep were extracted. Meta-analysis was performed with data from 4 studies using standardized mean difference. Findings were heterogeneous between studies, being the most frequent macrostructural findings a smaller proportion and greater latency of REM sleep in two studies and, in meta-analysis, a longer sleep latency was the most significant finding among epileptic patients. Only one study evaluated sleep microstructure, suggesting possible alterations in cyclic alternating pattern in diagnosed children. Studies evaluating macro and microstructure of sleep in children with self-limited epilepsy with centrotemporal spikes are necessary to a better understanding of mechanisms of the neuropsychologic disturbances that are frequently seen in children with this diagnosis.

  6. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  7. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  8. Parental limited English proficiency and health outcomes for children with special health care needs: a systematic review.

    Science.gov (United States)

    Eneriz-Wiemer, Monica; Sanders, Lee M; Barr, Donald A; Mendoza, Fernando S

    2014-01-01

    One in 10 US adults of childbearing age has limited English proficiency (LEP). Parental LEP is associated with worse health outcomes among healthy children. The relationship of parental LEP to health outcomes for children with special health care needs (CSHCN) has not been systematically reviewed. To conduct a systematic review of peer-reviewed literature examining relationships between parental LEP and health outcomes for CSHCN. PubMed, Scopus, Cochrane Library, Social Science Abstracts, bibliographies of included studies. Key search term categories: language, child, special health care needs, and health outcomes. US studies published between 1964 and 2012 were included if: 1) subjects were CSHCN; 2) studies included some measure of parental LEP; 3) at least 1 outcome measure of child health status, access, utilization, costs, or quality; and 4) primary or secondary data analysis. Three trained reviewers independently screened studies and extracted data. Two separate reviewers appraised studies for methodological rigor and quality. From 2765 titles and abstracts, 31 studies met eligibility criteria. Five studies assessed child health status, 12 assessed access, 8 assessed utilization, 2 assessed costs, and 14 assessed quality. Nearly all (29 of 31) studies used only parent- or child-reported outcome measures, rather than objective measures. LEP parents were substantially more likely than English-proficient parents to report that their CSHCN were uninsured and had no usual source of care or medical home. LEP parents were also less likely to report family-centered care and satisfaction with care. Disparities persisted for children with LEP parents after adjustment for ethnicity and socioeconomic status. Parental LEP is independently associated with worse health care access and quality for CSHCN. Health care providers should recognize LEP as an independent risk factor for poor health outcomes among CSHCN. Emerging models of chronic disease care should integrate and

  9. A New Dimension of Health Care: Systematic Review of the Uses, Benefits, and Limitations of Social Media for Health Communication

    Science.gov (United States)

    Hazlett, Diane E; Harrison, Laura; Carroll, Jennifer K; Irwin, Anthea; Hoving, Ciska

    2013-01-01

    Background There is currently a lack of information about the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals from primary research. Objective To review the current published literature to identify the uses, benefits, and limitations of social media for health communication among the general public, patients, and health professionals, and identify current gaps in the literature to provide recommendations for future health communication research. Methods This paper is a review using a systematic approach. A systematic search of the literature was conducted using nine electronic databases and manual searches to locate peer-reviewed studies published between January 2002 and February 2012. Results The search identified 98 original research studies that included the uses, benefits, and/or limitations of social media for health communication among the general public, patients, and health professionals. The methodological quality of the studies assessed using the Downs and Black instrument was low; this was mainly due to the fact that the vast majority of the studies in this review included limited methodologies and was mainly exploratory and descriptive in nature. Seven main uses of social media for health communication were identified, including focusing on increasing interactions with others, and facilitating, sharing, and obtaining health messages. The six key overarching benefits were identified as (1) increased interactions with others, (2) more available, shared, and tailored information, (3) increased accessibility and widening access to health information, (4) peer/social/emotional support, (5) public health surveillance, and (6) potential to influence health policy. Twelve limitations were identified, primarily consisting of quality concerns and lack of reliability, confidentiality, and privacy. Conclusions Social media brings a new dimension to health care as it offers a

  10. Limits to modern contraceptive use among young women in developing countries: a systematic review of qualitative research

    Directory of Open Access Journals (Sweden)

    Wight Daniel

    2009-02-01

    Full Text Available Abstract Background Improving the reproductive health of young women in developing countries requires access to safe and effective methods of fertility control, but most rely on traditional rather than modern contraceptives such as condoms or oral/injectable hormonal methods. We conducted a systematic review of qualitative research to examine the limits to modern contraceptive use identified by young women in developing countries. Focusing on qualitative research allows the assessment of complex processes often missed in quantitative analyses. Methods Literature searches of 23 databases, including Medline, Embase and POPLINE®, were conducted. Literature from 1970–2006 concerning the 11–24 years age group was included. Studies were critically appraised and meta-ethnography was used to synthesise the data. Results Of the 12 studies which met the inclusion criteria, seven met the quality criteria and are included in the synthesis (six from sub-Saharan Africa; one from South-East Asia. Sample sizes ranged from 16 to 149 young women (age range 13–19 years. Four of the studies were urban based, one was rural, one semi-rural, and one mixed (predominantly rural. Use of hormonal methods was limited by lack of knowledge, obstacles to access and concern over side effects, especially fear of infertility. Although often more accessible, and sometimes more attractive than hormonal methods, condom use was limited by association with disease and promiscuity, together with greater male control. As a result young women often relied on traditional methods or abortion. Although the review was limited to five countries and conditions are not homogenous for all young women in all developing countries, the overarching themes were common across different settings and contexts, supporting the potential transferability of interventions to improve reproductive health. Conclusion Increasing modern contraceptive method use requires community-wide, multifaceted

  11. Current Guidelines Have Limited Applicability to Patients with Comorbid Conditions: A Systematic Analysis of Evidence-Based Guidelines

    Science.gov (United States)

    Lugtenberg, Marjolein; Burgers, Jako S.; Clancy, Carolyn; Westert, Gert P.; Schneider, Eric C.

    2011-01-01

    Background Guidelines traditionally focus on the diagnosis and treatment of single diseases. As almost half of the patients with a chronic disease have more than one disease, the applicability of guidelines may be limited. The aim of this study was to assess the extent that guidelines address comorbidity and to assess the supporting evidence of recommendations related to comorbidity. Methodology/Principal Findings We conducted a systematic analysis of evidence-based guidelines focusing on four highly prevalent chronic conditions with a high impact on quality of life: chronic obstructive pulmonary disease, depressive disorder, diabetes mellitus type 2, and osteoarthritis. Data were abstracted from each guideline on the extent that comorbidity was addressed (general comments, specific recommendations), the type of comorbidity discussed (concordant, discordant), and the supporting evidence of the comorbidity-related recommendations (level of evidence, translation of evidence). Of the 20 guidelines, 17 (85%) addressed the issue of comorbidity and 14 (70%) provided specific recommendations on comorbidity. In general, the guidelines included few recommendations on patients with comorbidity (mean 3 recommendations per guideline, range 0 to 26). Of the 59 comorbidity-related recommendations provided, 46 (78%) addressed concordant comorbidities, 8 (14%) discordant comorbidities, and for 5 (8%) the type of comorbidity was not specified. The strength of the supporting evidence was moderate for 25% (15/59) and low for 37% (22/59) of the recommendations. In addition, for 73% (43/59) of the recommendations the evidence was not adequately translated into the guidelines. Conclusions/Significance Our study showed that the applicability of current evidence-based guidelines to patients with comorbid conditions is limited. Most guidelines do not provide explicit guidance on treatment of patients with comorbidity, particularly for discordant combinations. Guidelines should be more

  12. Internal limiting membrane peeling and gas tamponade for myopic foveoschisis: a systematic review and meta-analysis.

    Science.gov (United States)

    Meng, Bo; Zhao, Lu; Yin, Yi; Li, Hongyang; Wang, Xiaolei; Yang, Xiufen; You, Ran; Wang, Jialin; Zhang, Youjing; Wang, Hui; Du, Ran; Wang, Ningli; Zhan, Siyan; Wang, Yanling

    2017-09-08

    Myopic foveoschisis (MF) is among the leading causes of visual loss in high myopia. However, it remains controversial whether internal limiting membrane (ILM) peeling or gas tamponade is necessary treatment option for MF. PubMed, EMBASE, CBM, CNKI, WANFANG DATA and VIP databases were systematically reviewed. Outcome indicators were myopic foveoschisis resolution rate, visual acuity improvement and postoperative complications. Nine studies that included 239 eyes were selected. The proportion of resolution of foveoschisis was higher in ILM peeling group than non-ILM peeling group (OR = 2.15, 95% CI: 1.06-4.35; P = 0.03). The proportion of postoperative complications was higher in Tamponade group than non-Tamponade group (OR = 10.81, 95% CI: 1.26-93.02; P = 0.03). However, the proportion of visual acuity improvement (OR = 1.63, 95% CI: 0.56-4.80; P = 0.37) between ILM peeling group and non-ILM peeling group and the proportion of resolution of foveoschisis (OR = 1.80, 95% CI: 0.76-4.28; P = 0.18) between Tamponade group and non-Tamponade group were similar. Vitrectomy with internal limiting membrane peeling could contribute to better resolution of myopic foveoschisis than non-peeling, however it does not significantly influence the proportion of visual acuity improvement and postoperative complications. Vitrectomy with gas tamponade is associated with more complications than non-tamponade and does not significantly influence the proportion of visual acuity improvement and resolution of myopic foveoschisis.

  13. Internal limiting membrane peeling or not: a systematic review and meta-analysis of idiopathic macular pucker surgery.

    Science.gov (United States)

    Fang, Xiao-Ling; Tong, Yao; Zhou, Ya-Li; Zhao, Pei-Quan; Wang, Zhao-Yang

    2017-11-01

    To determine whether internal limiting membrane (ILM) peeling improves anatomical and functional outcomes in idiopathic macular pucker (IMP)/epiretinal membrane (ERM) surgery in this systematic review and meta-analysis. We searched the PubMed, Medline, Web of Science, Cochrane, Ovid MEDLINE, ClinicalTrials.gov and CNKI databases for studies published before 15 September 2016. The eligibility criteria included studies comparing ILM peeling versus no-peeling for IMP surgery. Thirteen articles (10 retrospective cohort studies, 1 prospective cohort study and 2 randomised controlled trials (RCTs)) were included in the review. Primary outcomes: no differences were observed in the best-corrected visual acuity (BCVA) or central macular thickness (CMT) at 12 months; however, lower ERM recurrence (OR, 0.13; 95% CI 0.04 to 0.41; p=0.0004) and reoperation rates (OR, 0.10; 95% CI 0.02 to 0.49; p=0.004) that favoured ILM peeling were observed at the final follow-up. no difference was observed in BCVA at 3, 6 months, the final follow-up or in CMT at 3, 6 months, the final follow-up. Significantly increased CMT, which favoured ILM peeling, was observed at the final follow-up (p=0.002) in the RCTs. ILM peeling yielded greater anatomical success, but no improvement in functional outcomes as the treatment of choice for patients undergoing IMP surgery. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  14. EFFECT OF INTERNAL LIMITING MEMBRANE PEELING DURING VITRECTOMY FOR DIABETIC MACULAR EDEMA: Systematic Review and Meta-analysis.

    Science.gov (United States)

    Nakajima, Takuya; Roggia, Murilo F; Noda, Yasuo; Ueta, Takashi

    2015-09-01

    To evaluate the effect of internal limiting membrane (ILM) peeling during vitrectomy for diabetic macular edema. MEDLINE, EMBASE, and CENTRAL were systematically reviewed. Eligible studies included randomized or nonrandomized studies that compared surgical outcomes of vitrectomy with or without ILM peeling for diabetic macular edema. The primary and secondary outcome measures were postoperative best-corrected visual acuity and central macular thickness. Meta-analysis on mean differences between vitrectomy with and without ILM peeling was performed using inverse variance method in random effects. Five studies (7 articles) with 741 patients were eligible for analysis. Superiority (95% confidence interval) in postoperative best-corrected visual acuity in ILM peeling group compared with nonpeeling group was 0.04 (-0.05 to 0.13) logMAR (equivalent to 2.0 ETDRS letters, P = 0.37), and superiority in best-corrected visual acuity change in ILM peeling group was 0.04 (-0.02 to 0.09) logMAR (equivalent to 2.0 ETDRS letters, P = 0.16). There was no significant difference in postoperative central macular thickness and central macular thickness reduction between the two groups. The visual acuity outcomes using pars plana vitrectomy with ILM peeling versus no ILM peeling were not significantly different. A larger randomized prospective study would be necessary to adequately address the effectiveness of ILM peeling on visual acuity outcomes.

  15. IMPACT OF INTERNAL LIMITING MEMBRANE PEELING ON MACULAR HOLE REOPENING: A Systematic Review and Meta-Analysis.

    Science.gov (United States)

    Rahimy, Ehsan; McCannel, Colin A

    2016-04-01

    To assess the literature regarding macular hole reopening rates stratified by whether the internal limiting membrane (ILM) was peeled during vitrectomy surgery. Systematic review and meta-analysis of studies reporting on macular hole reopenings among previously surgically closed idiopathic macular holes. A comprehensive literature search using the National Library of Medicine PubMed interface was used to identify potentially eligible publications in English. The minimum mean follow-up period for reports to be included in this study was 12 months. Analysis was divided into eyes that underwent vitrectomy with and without ILM peeling. The primary outcome parameter was the proportion of macular hole reopenings among previously closed holes between the two groups. Secondary outcome parameters included duration from initial surgery to hole reopening and preoperative and postoperative best-corrected correct visual acuities among the non-ILM peeling and ILM peeling groups. A total of 50 publications reporting on 5,480 eyes met inclusion criteria and were assessed in this meta-analysis. The reopening rate without ILM peeling was 7.12% (125 of 1,756 eyes), compared with 1.18% (44 of 3,724 eyes) with ILM peeling (odds ratio: 0.16; 95% confidence interval: 0.11-0.22; Fisher's exact test: P peeling during macular hole surgery reduces the likelihood of macular hole reopening.

  16. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  17. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  18. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  19. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  20. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  1. Pattern Analysis On Banking Dataset

    Directory of Open Access Journals (Sweden)

    Amritpal Singh

    2015-06-01

    Full Text Available Abstract Everyday refinement and development of technology has led to an increase in the competition between the Tech companies and their going out of way to crack the system andbreak down. Thus providing Data mining a strategically and security-wise important area for many business organizations including banking sector. It allows the analyzes of important information in the data warehouse and assists the banks to look for obscure patterns in a group and discover unknown relationship in the data.Banking systems needs to process ample amount of data on daily basis related to customer information their credit card details limit and collateral details transaction details risk profiles Anti Money Laundering related information trade finance data. Thousands of decisionsbased on the related data are taken in a bank daily. This paper analyzes the banking dataset in the weka environment for the detection of interesting patterns based on its applications ofcustomer acquisition customer retention management and marketing and management of risk fraudulence detections.

  2. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  3. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  4. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  5. Limited evidence for intranasal fentanyl in the emergency department and the prehospital setting--a systematic review

    DEFF Research Database (Denmark)

    Hansen, Morten Sejer; Dahl, Jørgen Berg

    2013-01-01

    The intranasal (IN) mode of application may be a valuable asset in non-invasive pain management. Fentanyl demonstrates pharmacokinetic and pharmacodynamic properties that are desirable in the management of acute pain, and IN fentanyl may be of value in the prehospital setting. The aim...... of this systematic review was to evaluate the current evidence for the use of IN fentanyl in the emergency department (ED) and prehospital setting....

  6. Protocol for the systematic review of the prevention, treatment and public health management of impetigo, scabies and fungal skin infections in resource-limited settings.

    Science.gov (United States)

    May, Philippa; Bowen, Asha; Tong, Steven; Steer, Andrew; Prince, Sam; Andrews, Ross; Currie, Bart; Carapetis, Jonathan

    2016-09-23

    Impetigo, scabies, and fungal skin infections disproportionately affect populations in resource-limited settings. Evidence for standard treatment of skin infections predominantly stem from hospital-based studies in high-income countries. The evidence for treatment in resource-limited settings is less clear, as studies in these populations may lack randomisation and control groups for cultural, ethical or economic reasons. Likewise, a synthesis of the evidence for public health control within endemic populations is also lacking. We propose a systematic review of the evidence for the prevention, treatment and public health management of skin infections in resource-limited settings, to inform the development of guidelines for the standardised and streamlined clinical and public health management of skin infections in endemic populations. The protocol has been designed in line with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols statement. All trial designs and analytical observational study designs will be eligible for inclusion. A systematic search of the peer-reviewed literature will include PubMed, Excertpa Medica and Global Health. Grey literature databases will also be systematically searched, and clinical trials registries scanned for future relevant studies. The primary outcome of interest will be the clinical cure or decrease in prevalence of impetigo, scabies, crusted scabies, tinea capitis, tinea corporis or tinea unguium. Two independent reviewers will perform eligibility assessment and data extraction using standardised electronic forms. Risk of bias assessment will be undertaken by two independent reviewers according to the Cochrane Risk of Bias tool. Data will be tabulated and narratively synthesised. We expect there will be insufficient data to conduct meta-analysis. The final body of evidence will be reported against the Grades of Recommendation, Assessment, Development and Evaluation grading system. The evidence

  7. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  8. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  9. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  10. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  11. Associating uncertainty with datasets using Linked Data and allowing propagation via provenance chains

    Science.gov (United States)

    Car, Nicholas; Cox, Simon; Fitch, Peter

    2015-04-01

    With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty

  12. Error characterisation of global active and passive microwave soil moisture datasets

    Directory of Open Access Journals (Sweden)

    W. A. Dorigo

    2010-12-01

    Full Text Available Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT and passive (AMSR-E and SSM/I microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active, the use of different frequencies (C-, X-, and Ku-band for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH. The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different

  13. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  14. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  15. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  16. The impact of initiatives to limit the advertising of food and beverage products to children: a systematic review.

    Science.gov (United States)

    Galbraith-Emami, S; Lobstein, T

    2013-12-01

    In response to increasing evidence that advertising of foods and beverages affects children's food choices and food intake, several national governments and many of the world's larger food and beverage manufacturers have acted to restrict the marketing of their products to children or to advertise only 'better for you' products or 'healthier dietary choices' to children. Independent assessment of the impact of these pledges has been difficult due to the different criteria being used in regulatory and self-regulatory regimes. In this paper, we undertook a systematic review to examine the data available on levels of exposure of children to the advertising of less healthy foods since the introduction of the statutory and voluntary codes. The results indicate a sharp division in the evidence, with scientific, peer-reviewed papers showing that high levels of such advertising of less healthy foods continue to be found in several different countries worldwide. In contrast, the evidence provided in industry-sponsored reports indicates a remarkably high adherence to voluntary codes. We conclude that adherence to voluntary codes may not sufficiently reduce the advertising of foods which undermine healthy diets, or reduce children's exposure to this advertising. © 2013 The Authors. obesity reviews © 2013 International Association for the Study of Obesity.

  17. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  18. Methodological limitations of psychosocial interventions in patients with an implantable cardioverter-defibrillator (ICD A systematic review

    Directory of Open Access Journals (Sweden)

    Ockene Ira S

    2009-12-01

    Full Text Available Abstract Background Despite the potentially life-saving benefits of the implantable cardioverter-defibrillator (ICD, a significant group of patients experiences emotional distress after ICD implantation. Different psychosocial interventions have been employed to improve this condition, but previous reviews have suggested that methodological issues may limit the validity of such interventions. Aim: To review the methodology of previously published studies of psychosocial interventions in ICD patients, according to CONSORT statement guidelines for non-pharmacological interventions, and provide recommendations for future research. Methods We electronically searched the PubMed, PsycInfo and Cochrane databases. To be included, studies needed to be published in a peer-reviewed journal between 1980 and 2008, to involve a human population aged 18+ years and to have an experimental design. Results Twelve studies met the eligibility criteria. Samples were generally small. Interventions were very heterogeneous; most studies used cognitive behavioural therapy (CBT and exercise programs either as unique interventions or as part of a multi-component program. Overall, studies showed a favourable effect on anxiety (6/9 and depression (4/8. CBT appeared to be the most effective intervention. There was no effect on the number of shocks and arrhythmic events, probably because studies were not powered to detect such an effect. Physical functioning improved in the three studies evaluating this outcome. Lack of information about the indication for ICD implantation (primary vs. secondary prevention, limited or no information regarding use of anti-arrhythmic (9/12 and psychotropic (10/12 treatment, lack of assessments of providers' treatment fidelity (12/12 and patients' adherence to the intervention (11/12 were the most common methodological limitations. Conclusions Overall, this review supports preliminary evidence of a positive effect of psychosocial interventions

  19. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  20. Selection criteria limit generalizability of smoking pharmacotherapy studies differentially across clinical trials and laboratory studies: A systematic review on varenicline.

    Science.gov (United States)

    Motschman, Courtney A; Gass, Julie C; Wray, Jennifer M; Germeroth, Lisa J; Schlienz, Nicolas J; Munoz, Diana A; Moore, Faith E; Rhodes, Jessica D; Hawk, Larry W; Tiffany, Stephen T

    2016-12-01

    The selection criteria used in clinical trials for smoking cessation and in laboratory studies that seek to understand mechanisms responsible for treatment outcomes may limit their generalizability to one another and to the general population. We reviewed studies on varenicline versus placebo and compared eligibility criteria and participant characteristics of clinical trials (N=23) and laboratory studies (N=22) across study type and to nationally representative survey data on adult, daily USA smokers (2014 National Health Interview Survey; 2014 National Survey on Drug Use and Health). Relative to laboratory studies, clinical trials more commonly reported excluding smokers who were unmotivated to quit and for specific medical conditions (e.g., cardiovascular disease, COPD), although both study types frequently reported excluding for general medical or psychiatric reasons. Laboratory versus clinical samples smoked less, had lower nicotine dependence, were younger, and more homogeneous with respect to smoking level and nicotine dependence. Application of common eligibility criteria to national survey data resulted in considerable elimination of the daily-smoking population for both clinical trials (≥47%) and laboratory studies (≥39%). Relative to the target population, studies in this review recruited participants who smoked considerably more and had a later smoking onset age, and were under-representative of Caucasians. Results suggest that selection criteria of varenicline studies limit generalizability in meaningful ways, and differences in criteria across study type may undermine efforts at translational research. Recommendations for improvements in participant selection and reporting standards are discussed. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  1. Sleep deprivation in resident physicians, work hour limitations, and related outcomes: a systematic review of the literature.

    Science.gov (United States)

    Mansukhani, Meghna P; Kolla, Bhanu Prakash; Surani, Salim; Varon, Joseph; Ramar, Kannan

    2012-07-01

    Extended work hours, interrupted sleep, and shift work are integral parts of medical training among all specialties. The need for 24-hour patient care coverage and economic factors have resulted in prolonged work hours for resident physicians. This has traditionally been thought to enhance medical educational experience. These long and erratic work hours lead to acute and chronic sleep deprivation and poor sleep quality, resulting in numerous adverse consequences. Impairments may occur in several domains, including attention, cognition, motor skills, and mood. Resident performance, professionalism, safety, and well-being are affected by sleep deprivation, causing potentially adverse implications for patient care. Studies have shown adverse health consequences, motor vehicle accidents, increased alcohol and medication use, and serious medical errors to occur in association with both sleep deprivation and shift work. Resident work hour limitations have been mandated by the Accreditation Council for Graduate Medical Education in response to patient safety concerns. Studies evaluating the impact of these regulations on resident physicians have generated conflicting reports on patient outcomes, demonstrating only a modest increase in sleep duration for resident physicians, along with negative perceptions regarding their education. This literature review summarizes research on the effects of sleep deprivation and shift work, and examines current literature on the impact of recent work hour limitations on resident physicians and patient-related outcomes.

  2. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  3. Healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions: a qualitative systematic review protocol.

    Science.gov (United States)

    Ekberg, Stuart; Bradford, Natalie; Herbert, Anthony; Danby, Susan; Yates, Patsy

    2015-11-01

    -quality communication with and about children who have life-limiting conditions, this does not mean that these stakeholders necessarily share the same perspective of what constitutes high-quality communication and the best way of accomplishing this. Focusing on healthcare users' experiences of communication with healthcare professionals about children who have life-limiting conditions, the present review will explore the subjective impact of professionals' communication on the people for whom they provide care.It may be necessary to consider a range of contextual factors to understand healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions. For instance, age, developmental stage, cognitive capacity, emotional and social strengths, and family dynamics can influence a child's level of involvement in discussions about their condition and care. Although there are factors that appear more consistent across the range of pediatric palliative care users, such as parents' preferences for being treated by healthcare professionals as partners in making decisions about the care of their child, there is not always such consistency. Nor is it clear whether such findings can be generalized across different cultural contexts. In appraising existing research, this systematic review will therefore consider the relationship between the context of individual studies and their reported findings.The primary aim of this review is to identify, appraise and synthesize existing qualitative evidence of healthcare users' experiences of communicating with healthcare professionals about children who have life-limiting conditions. The review will consider relevant details of these findings, particularly whether factors like age are relevant for understanding particular experiences of communication. An outcome of this review will be the identification of best available qualitative evidence that can be used to inform professional practice, as well

  4. Analysis and mitigation of systematic errors in spectral shearing interferometry of pulses approaching the single-cycle limit [Invited

    International Nuclear Information System (INIS)

    Birge, Jonathan R.; Kaertner, Franz X.

    2008-01-01

    We derive an analytical approximation for the measured pulse width error in spectral shearing methods, such as spectral phase interferometry for direct electric-field reconstruction (SPIDER), caused by an anomalous delay between the two sheared pulse components. This analysis suggests that, as pulses approach the single-cycle limit, the resulting requirements on the calibration and stability of this delay become significant, requiring precision orders of magnitude higher than the scale of a wavelength. This is demonstrated by numerical simulations of SPIDER pulse reconstruction using actual data from a sub-two-cycle laser. We briefly propose methods to minimize the effects of this sensitivity in SPIDER and review variants of spectral shearing that attempt to avoid this difficulty

  5. Measuring disability: a systematic review of the validity and reliability of the Global Activity Limitations Indicator (GALI).

    Science.gov (United States)

    Van Oyen, Herman; Bogaert, Petronille; Yokota, Renata T C; Berger, Nicolas

    2018-01-01

    GALI or Global Activity Limitation Indicator is a global survey instrument measuring participation restriction. GALI is the measure underlying the European indicator Healthy Life Years (HLY). Gali has a substantial policy use within the EU and its Member States. The objective of current paper is to bring together what is known from published manuscripts on the validity and the reliability of GALI. Following the PRISMA guidelines, two search strategies (PUBMED, Google Scholar) were combined to identify manuscripts published in English with publication date 2000 or beyond. Articles were classified as reliability studies, concurrent or predictive validity studies, in national or international populations. Four cross-sectional studies (of which 2 international) studied how GALI relates to other health measures (concurrent validity). A dose-response effect by GALI severity level on the association with the other health status measures was observed in the national studies. The 2 international studies (SHARE, EHIS) concluded that the odds of reporting participation restriction was higher in subjects with self-reported or observed functional limitations. In SHARE, the size of the Odds Ratio's (ORs) in the different countries was homogeneous, while in EHIS the size of the ORs varied more strongly. For the predictive validity, subjects were followed over time (4 studies of which one international). GALI proved, both in national and international data, to be a consistent predictor of future health outcomes both in terms of mortality and health care expenditure. As predictors of mortality, the two distinct health concepts, self-rated health and GALI, acted independently and complementary of each other. The one reliability study identified reported a sufficient reliability of GALI. GALI as inclusive one question instrument fits all conceptual characteristics specified for a global measure on participation restriction. In none of the studies, included in the review, there was

  6. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  7. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  8. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  9. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  10. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  11. Psychometric properties of self-reported questionnaires for the evaluation of symptoms and functional limitations in individuals with rotator cuff disorders: a systematic review.

    Science.gov (United States)

    St-Pierre, Corinne; Desmeules, François; Dionne, Clermont E; Frémont, Pierre; MacDermid, Joy C; Roy, Jean-Sébastien

    2016-01-01

    To conduct a systematic review of the psychometric properties (reliability, validity and responsiveness) of self-report questionnaires used to assess symptoms and functional limitations of individuals with rotator cuff (RC) disorders. A systematic search in three databases (Cinahl, Medline and Embase) was conducted. Data extraction and critical methodological appraisal were performed independently by three raters using structured tools, and agreement was achieved by consensus. A descriptive synthesis was performed. One-hundred and twenty articles reporting on 11 questionnaires were included. All questionnaires were highly reliable and responsive to change, and showed construct validity; seven questionnaires also shown known-group validity. The minimal detectable change ranged from 6.4% to 20.8% of total score; only two questionnaires (American Shoulder and Elbow Surgeon questionnaire [ASES] and Upper Limb Functional Index [ULFI]) had a measurement error below 10% of global score. Minimal clinically important differences were established for eight questionnaires, and ranged from 8% to 20% of total score. Overall, included questionnaires showed acceptable psychometric properties for individuals with RC disorders. The ASES and ULFI have the smallest absolute error of measurement, while the Western Ontario RC Index is one of the most responsive questionnaires for individuals suffering from RC disorders. All included questionnaires are reliable, valid and responsive for the evaluation of individuals with RC disorders. As all included questionnaires showed good psychometric properties for the targeted population, the choice should be made according to the purpose of the evaluation and to the construct being evaluated by the questionnaire. The WORC, a RC-specific questionnaire, appeared to be more responsive. It should therefore be used to evaluate change in time. If the evaluation is time-limited, shorter questionnaires or short versions should be considered (such as

  12. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  13. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  14. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  15. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  16. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  17. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  18. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  19. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  20. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  1. Evaluation of measurement properties of self-administered PROMs aimed at patients with non-specific shoulder pain and "activity limitations": a systematic review.

    Science.gov (United States)

    Thoomes-de Graaf, M; Scholten-Peeters, G G M; Schellingerhout, J M; Bourne, A M; Buchbinder, R; Koehorst, M; Terwee, C B; Verhagen, A P

    2016-09-01

    To critically appraise and compare the measurement properties of self-administered patient-reported outcome measures (PROMs) focussing on the shoulder, assessing "activity limitations." Systematic review. The study population had to consist of patients with shoulder pain. We excluded postoperative patients or patients with generic diseases. The methodological quality of the selected studies and the results of the measurement properties were critically appraised and rated using the COSMIN checklist. Out of a total of 3427 unique hits, 31 articles, evaluating 7 different questionnaires, were included. The SPADI is the most frequently evaluated PROM and its measurement properties seem adequate apart from a lack of information regarding its measurement error and content validity. For English, Norwegian and Turkish users, we recommend to use the SPADI. Dutch users could use either the SDQ or the SST. In German, we recommend the DASH. In Tamil, Slovene, Spanish and the Danish languages, the evaluated PROMs were not yet of acceptable validity. None of these PROMs showed strong positive evidence for all measurement properties. We propose to develop a new shoulder PROM focused on activity limitations, taking new knowledge and techniques into account.

  2. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  3. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  4. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  5. Communication and support from health-care professionals to families, with dependent children, following the diagnosis of parental life-limiting illness: A systematic review.

    Science.gov (United States)

    Fearnley, Rachel; Boland, Jason W

    2017-03-01

    Communication between parents and their children about parental life-limiting illness is stressful. Parents want support from health-care professionals; however, the extent of this support is not known. Awareness of family's needs would help ensure appropriate support. To find the current literature exploring (1) how parents with a life-limiting illness, who have dependent children, perceive health-care professionals' communication with them about the illness, diagnosis and treatments, including how social, practical and emotional support is offered to them and (2) how this contributes to the parents' feelings of supporting their children. A systematic literature review and narrative synthesis. Embase, MEDLINE, PsycINFO, CINAHL and ASSIA ProQuest were searched in November 2015 for studies assessing communication between health-care professionals and parents about how to talk with their children about the parent's illness. There were 1342 records identified, five qualitative studies met the inclusion criteria (55 ill parents, 11 spouses/carers, 26 children and 16 health-care professionals). Parents wanted information from health-care professionals about how to talk to their children about the illness; this was not routinely offered. Children also want to talk with a health-care professional about their parents' illness. Health-care professionals are concerned that conversations with parents and their children will be too difficult and time-consuming. Parents with a life-limiting illness want support from their health-care professionals about how to communicate with their children about the illness. Their children look to health-care professionals for information about their parent's illness. Health-care professionals, have an important role but appear reluctant to address these concerns because of fears of insufficient time and expertise.

  6. Vitrectomy with internal limiting membrane peeling versus inverted internal limiting membrane flap technique for macular hole-induced retinal detachment: a systematic review of literature and meta-analysis.

    Science.gov (United States)

    Yuan, Jing; Zhang, Ling-Lin; Lu, Yu-Jie; Han, Meng-Yao; Yu, Ai-Hua; Cai, Xiao-Jun

    2017-11-28

    To evaluate the effects on vitrectomy with internal limiting membrane (ILM) peeling versus vitrectomy with inverted internal limiting membrane flap technique for macular hole-induced retinal detachment (MHRD). Pubmed, Cochrane Library, and Embase were systematically searched for studies that compared ILM peeling with inverted ILM flap technique for macular hole-induced retinal detachment. The primary outcomes are the rate of retinal reattachment and the rate of macular hole closure 6 months later after initial surgery, the secondary outcome is the postoperative best-corrected visual acuity (BCVA) 6 months later after initial surgery. Four studies that included 98 eyes were selected. All the included studies were retrospective comparative studies. The preoperative best-corrected visual acuity was equal between ILM peeling and inverted ILM flap technique groups. It was indicated that the rate of retinal reattachment (odds ratio (OR) = 0.14, 95% confidence interval (CI):0.03 to 0.69; P = 0.02) and macular hole closure (OR = 0.06, 95% CI:0.02 to 0.19; P peeling. However, there was no statistically significant difference in postoperative best-corrected visual acuity (mean difference (MD) 0.18 logarithm of the minimum angle of resolution; 95% CI -0.06 to 0.43 ; P = 0.14) between the two surgery groups. Compared with ILM peeling, vitrectomy with inverted ILM flap technique resulted significantly higher of the rate of retinal reattachment and macular hole closure, but seemed does not improve postoperative best-corrected visual acuity.

  7. SIS - Annual Catch Limit

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Annual Catch Limit (ACL) dataset within the Species Information System (SIS) contains information and data related to management reference points and catch data.

  8. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  9. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2012-01-01

      Introduction The first part of the year presented an important test for the new Physics Performance and Dataset (PPD) group (cf. its mandate: http://cern.ch/go/8f77). The activity was focused on the validation of the new releases meant for the Monte Carlo (MC) production and the data-processing in 2012 (CMSSW 50X and 52X), and on the preparation of the 2012 operations. In view of the Chamonix meeting, the PPD and physics groups worked to understand the impact of the higher pile-up scenario on some of the flagship Higgs analyses to better quantify the impact of the high luminosity on the CMS physics potential. A task force is working on the optimisation of the reconstruction algorithms and on the code to cope with the performance requirements imposed by the higher event occupancy as foreseen for 2012. Concerning the preparation for the analysis of the new data, a new MC production has been prepared. The new samples, simulated at 8 TeV, are already being produced and the digitisation and recons...

  10. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The PPD activities, in the first part of 2013, have been focused mostly on the final physics validation and preparation for the data reprocessing of the full 8 TeV datasets with the latest calibrations. These samples will be the basis for the preliminary results for summer 2013 but most importantly for the final publications on the 8 TeV Run 1 data. The reprocessing involves also the reconstruction of a significant fraction of “parked data” that will allow CMS to perform a whole new set of precision analyses and searches. In this way the CMSSW release 53X is becoming the legacy release for the 8 TeV Run 1 data. The regular operation activities have included taking care of the prolonged proton-proton data taking and the run with proton-lead collisions that ended in February. The DQM and Data Certification team has deployed a continuous effort to promptly certify the quality of the data. The luminosity-weighted certification efficiency (requiring all sub-detectors to be certified as usab...

  11. Training in time-limited dynamic psychotherapy: A systematic comparison of pre- and post-training cases treated by one therapist.

    Science.gov (United States)

    Anderson, Timothy; Strupp, Hans H

    2015-01-01

    This qualitative study systematically compared cases treated by the same therapist in order to understand the group comparison findings of a larger study on training of experienced therapists (the "Vanderbilt II" psychotherapy project). The therapist, Dr C., was selected based on the therapist's overall treatment successes. His two patients were selected based on their outcomes and the relative training cohort from which they were drawn: a case with successful outcome from the pre-training cohort and a case of negligible improvement from the post-training cohort. Dr C. demonstrated a variety of interpersonal skills throughout his pre-training case, though there was also poor interpersonal process throughout. However, in the second case he had considerable difficulty in adapting his typical therapeutic approach to the requirements of the time-limited dynamic psychotherapy (TLDP) manual, even while appearing to work hard to find ways to use the manual. Dr C.'s spontaneity, and his unique set of interpersonal skills may enhanced his initial rapport and alliance building with clients and yet may not have interfaced well with TLDP. His unique interpersonal skills also may have contributed to problems of interpersonal process. Future research may benefit from examining the interaction of between therapist interpersonal skills and the implementation of the treatment manual.

  12. Poor quality of external validity reporting limits generalizability of overweight and/or obesity lifestyle prevention interventions in young adults: a systematic review.

    Science.gov (United States)

    Partridge, S R; Juan, S J-H; McGeechan, K; Bauman, A; Allman-Farinelli, M

    2015-01-01

    Young adulthood is a high-risk life stage for weight gain. Evidence is needed to translate behavioural approaches into community practice to prevent weight gain in young adults. This systematic review assessed the effectiveness and reporting of external validity components in prevention interventions. The search was limited to randomized controlled trial (RCT) lifestyle interventions for the prevention of weight gain in young adults (18-35 years). Mean body weight and/or body mass index (BMI) change were the primary outcomes. External validity, quality assessment and risk of bias tools were applied to all studies. Twenty-one RCTs were identified through 14 major electronic databases. Over half of the studies were effective in the short term for significantly reducing body weight and/or BMI; however, few showed long-term maintenance. All studies lacked full reporting on external validity components. Description of the intervention components and participant attrition rates were reported by most studies. However, few studies reported the representativeness of participants, effectiveness of recruitment methods, process evaluation detail or costs. It is unclear from the information reported how to implement the interventions into community practice. Integrated reporting of intervention effectiveness and enhanced reporting of external validity components are needed for the translation and potential upscale of prevention strategies. © 2014 World Obesity.

  13. Eddington-limited X-Ray Bursts as Distance Indicators. I. Systematic Trends and Spherical Symmetry in Bursts from 4U 1728-34

    Science.gov (United States)

    Galloway, Duncan K.; Psaltis, Dimitrios; Chakrabarty, Deepto; Muno, Michael P.

    2003-06-01

    We investigate the limitations of thermonuclear X-ray bursts as a distance indicator for the weakly magnetized accreting neutron star 4U 1728-34. We measured the unabsorbed peak flux of 81 bursts in public data from the Rossi X-Ray Timing Explorer (RXTE). The distribution of peak fluxes was bimodal: 66 bursts exhibited photospheric radius expansion (presumably reaching the local Eddington limit) and were distributed about a mean bolometric flux of 9.2×10-8ergscm-2s-1, while the remaining (non-radius expansion) bursts reached 4.5×10-8ergscm-2s-1, on average. The peak fluxes of the radius expansion bursts were not constant, exhibiting a standard deviation of 9.4% and a total variation of 46%. These bursts showed significant correlations between their peak flux and the X-ray colors of the persistent emission immediately prior to the burst. We also found evidence for quasi-periodic variation of the peak fluxes of radius expansion bursts, with a timescale of ~=40 days. The persistent flux observed with RXTE/ASM over 5.8 yr exhibited quasi-periodic variability on a similar timescale. We suggest that these variations may have a common origin in reflection from a warped accretion disk. Once the systematic variation of the peak burst fluxes is subtracted, the residual scatter is only ~=3%, roughly consistent with the measurement uncertainties. The narrowness of this distribution strongly suggests that (1) the radiation from the neutron star atmosphere during radius expansion episodes is nearly spherically symmetric and (2) the radius expansion bursts reach a common peak flux that may be interpreted as a standard candle intensity. Adopting the minimum peak flux for the radius expansion bursts as the Eddington flux limit, we derive a distance for the source of 4.4-4.8 kpc (assuming RNS=10 km), with the uncertainty arising from the probable range of the neutron star mass MNS=1.4-2 Msolar.

  14. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  15. Assessment of Global Cloud Datasets from Satellites: Project and Database Initiated by the GEWEX Radiation Panel

    Science.gov (United States)

    Stubenrauch, C. J.; Rossow, W. B.; Kinne, S.; Ackerman, S.; Cesana, G.; Chepfer, H.; Getzewich, B.; Di Girolamo, L.; Guignard, A.; Heidinger, A.; hide

    2012-01-01

    Clouds cover about 70% of the Earth's surface and play a dominant role in the energy and water cycle of our planet. Only satellite observations provide a continuous survey of the state of the atmosphere over the whole globe and across the wide range of spatial and temporal scales that comprise weather and climate variability. Satellite cloud data records now exceed more than 25 years in length. However, climatologies compiled from different satellite datasets can exhibit systematic biases. Questions therefore arise as to the accuracy and limitations of the various sensors. The Global Energy and Water cycle Experiment (GEWEX) Cloud Assessment, initiated in 2005 by the GEWEX Radiation Panel, provided the first coordinated intercomparison of publically available, standard global cloud products (gridded, monthly statistics) retrieved from measurements of multi-spectral imagers (some with multiangle view and polarization capabilities), IR sounders and lidar. Cloud properties under study include cloud amount, cloud height (in terms of pressure, temperature or altitude), cloud radiative properties (optical depth or emissivity), cloud thermodynamic phase and bulk microphysical properties (effective particle size and water path). Differences in average cloud properties, especially in the amount of high-level clouds, are mostly explained by the inherent instrument measurement capability for detecting and/or identifying optically thin cirrus, especially when overlying low-level clouds. The study of long-term variations with these datasets requires consideration of many factors. A monthly, gridded database, in common format, facilitates further assessments, climate studies and the evaluation of climate models.

  16. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  17. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  18. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  19. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  20. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  1. The influence of systematic pulse-limited physical exercise on the parameters of the cardiovascular system in patients over 65 years of age.

    Science.gov (United States)

    Chomiuk, Tomasz; Folga, Andrzej; Mamcarz, Artur

    2013-04-20

    The influence of physical exercise on the parameters of the cardiovascular system of elderly persons has not been sufficiently investigated yet. The aim of the study was to assess the influence of regular 6-week physical exercise using the Nordic walking (NW) method in a group of elderly persons on their physical performance and regulation of selected parameters assessing the cardiovascular system. Fifty patients over 65 years of age participated in the study. The study encompassed: medical interview, physical examination, resting ECG, spiroergometry examination, 6MWT (6-minute walk test) and 24-hour ambulatory blood pressure monitoring (ABPM). During the exercise programme, the pulse was monitored using pulsometers. After the completion of the training, check-up tests assessing the same parameters were performed. The control group consisted of 18 persons over 65 years of age with similar cardiovascular problems. In the test group, duration of the physical effort increased by 1.02 min (p = 0.0001), the maximum load increased by 10.68 W (p = 0.0001), values of VO2max by 2.10 (p = 0.0218), distance improved in 6MWT by 75.04 m (p = 0.00001), systolic blood pressure decreased by 5.50 mm Hg (p = 0.035) and diastolic blood pressure by 3.50 mm Hg (p = 0.054) as compared to the control group. Systematic NW physical exercise limited by the pulse had a beneficial effect on the physical performance of elderly persons as assessed with main parameters. A short 6-week programme of endurance exercises had a hypotensive effect in elderly persons over 65 years of age.

  2. A Systematic Review of Non-Traumatic Spinal Cord Injuries in Sub-Saharan Africa and a Proposed Diagnostic Algorithm for Resource-Limited Settings

    Directory of Open Access Journals (Sweden)

    Abdu Kisekka Musubire

    2017-12-01

    Full Text Available BackgroundNon-traumatic myelopathy is common in Africa and there are geographic differences in etiology. Clinical management is challenging due to the broad differential diagnosis and the lack of diagnostics. The objective of this systematic review is to determine the most common etiologies of non-traumatic myelopathy in sub-Saharan Africa to inform a regionally appropriate diagnostic algorithm.MethodsWe conducted a systemic review searching Medline and Embase databases using the following search terms: “Non traumatic spinal cord injury” or “myelopathy” with limitations to epidemiology or etiologies and Sub-Saharan Africa. We described the frequencies of the different etiologies and proposed a diagnostic algorithm based on the most common diagnoses.ResultsWe identified 19 studies all performed at tertiary institutions; 15 were retrospective and 13 were published in the era of the HIV epidemic. Compressive bone lesions accounted for more than 48% of the cases; a majority were Pott’s disease and metastatic disease. No diagnosis was identified in up to 30% of cases in most studies; in particular, definitive diagnoses of non-compressive lesions were rare and a majority were clinical diagnoses of transverse myelitis and HIV myelopathy. Age and HIV were major determinants of etiology.ConclusionCompressive myelopathies represent a majority of non-traumatic myelopathies in sub-Saharan Africa, and most were due to Pott’s disease. Non-compressive myelopathies have not been well defined and need further research in Africa. We recommend a standardized approach to management of non-traumatic myelopathy focused on identifying treatable conditions with tests widely available in low-resource settings.

  3. EFFECTS OF INTERNAL LIMITING MEMBRANE PEELING COMBINED WITH REMOVAL OF IDIOPATHIC EPIRETINAL MEMBRANE: A Systematic Review of Literature and Meta-Analysis.

    Science.gov (United States)

    Azuma, Kunihiro; Ueta, Takashi; Eguchi, Shuichiro; Aihara, Makoto

    2017-10-01

    To evaluate the effects on postoperative prognosis of internal limiting membrane (ILM) peeling in conjunction with removal of idiopathic epiretinal membranes (ERMs). MEDLINE, Cochrane Central Register of Controlled Trials (CENTRAL), and EMBASE were systematically searched for studies that compared ILM peeling with no ILM peeling in surgery to remove idiopathic ERM. Outcome measures were best-corrected visual acuity, central macular thickness, and ERM recurrence. Studies that compared ILM peeling with no ILM peeling for the treatment of idiopathic ERM were selected. Sixteen studies that included 1,286 eyes were selected. All the included studies were retrospective or prospective comparative studies; no randomized controlled study was identified. Baseline preoperative best-corrected visual acuity and central macular thickness were equal between ILM peeling and no ILM peeling groups. Postoperatively, there was no statistically significant difference in best-corrected visual acuity (mean difference 0.01 logarithm of the minimum angle of resolution [equivalent to 0.5 Early Treatment Diabetic Retinopathy Study letter]; 95% CI -0.05 to 0.07 [-3.5 to 2.5 Early Treatment Diabetic Retinopathy Study letters]; P = 0.83) or central macular thickness (mean difference 13.13 μm; 95% CI -10.66 to 36.93; P = 0.28). However, the recurrence rate of ERM was significantly lower with ILM peeling than with no ILM peeling (odds ratio 0.25; 95% CI 0.12-0.49; P peeling in vitrectomy for idiopathic ERM could result in a significantly lower ERM recurrence rate, but it does not significantly influence postoperative best-corrected visual acuity and central macular thickness.

  4. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.

    Science.gov (United States)

    Ahmidi, Narges; Tao, Lingling; Sefati, Shahin; Gao, Yixin; Lea, Colin; Haro, Benjamin Bejar; Zappella, Luca; Khudanpur, Sanjeev; Vidal, Rene; Hager, Gregory D

    2017-09-01

    State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

  5. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  6. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  7. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  8. A novel dataset for real-life evaluation of facial expression recognition methodologies

    NARCIS (Netherlands)

    Siddiqi, Muhammad Hameed; Ali, Maqbool; Idris, Muhammad; Banos Legran, Oresti; Lee, Sungyoung; Choo, Hyunseung

    2016-01-01

    One limitation seen among most of the previous methods is that they were evaluated under settings that are far from real-life scenarios. The reason is that the existing facial expression recognition (FER) datasets are mostly pose-based and assume a predefined setup. The expressions in these datasets

  9. Low Tidal Volume versus Non-Volume-Limited Strategies for Patients with Acute Respiratory Distress Syndrome. A Systematic Review and Meta-Analysis.

    Science.gov (United States)

    Walkey, Allan J; Goligher, Ewan C; Del Sorbo, Lorenzo; Hodgson, Carol L; Adhikari, Neill K J; Wunsch, Hannah; Meade, Maureen O; Uleryk, Elizabeth; Hess, Dean; Talmor, Daniel S; Thompson, B Taylor; Brower, Roy G; Fan, Eddy

    2017-10-01

    Trials investigating use of lower tidal volumes and inspiratory pressures for patients with acute respiratory distress syndrome (ARDS) have shown mixed results. To compare clinical outcomes of mechanical ventilation strategies that limit tidal volumes and inspiratory pressures (LTV) to strategies with tidal volumes of 10 to 15 ml/kg among patients with ARDS. This is a systematic review and meta-analysis of clinical trials investigating LTV mechanical ventilation strategies. We used random effects models to evaluate the effect of LTV on 28-day mortality, organ failure, ventilator-free days, barotrauma, oxygenation, and ventilation. Our primary analysis excluded trials for which the LTV strategy was combined with the additional strategy of higher positive end-expiratory pressure (PEEP), but these trials were included in a stratified sensitivity analysis. We performed metaregression of tidal volume gradient achieved between intervention and control groups on mortality effect estimates. We used Grading of Recommendations Assessment, Development, and Evaluation methodology to determine the quality of evidence. Seven randomized trials involving 1,481 patients met eligibility criteria for this review. Mortality was not significantly lower for patients receiving an LTV strategy (33.6%) as compared with control strategies (40.4%) (relative risk [RR], 0.87; 95% confidence interval [CI], 0.70-1.08; heterogeneity statistic I 2  = 46%), nor did an LTV strategy significantly decrease barotrauma or ventilator-free days when compared with a lower PEEP strategy. Quality of evidence for clinical outcomes was downgraded for imprecision. Metaregression showed a significant inverse association between larger tidal volume gradient between LTV and control groups and log odds ratios for mortality (β, -0.1587; P = 0.0022). Sensitivity analysis including trials that protocolized an LTV/high PEEP cointervention showed lower mortality associated with LTV (nine trials and 1

  10. Limited evidence on persistence with anticoagulants, and its effect on the risk of recurrence of venous thromboembolism: a systematic review of observational studies

    Directory of Open Access Journals (Sweden)

    Vora P

    2016-08-01

    Full Text Available Pareen Vora, Montse Soriano-Gabarró, Kiliana Suzart, Gunnar Persson Brobert Department of Epidemiology, Bayer Pharma AG, Berlin, Germany Purpose: The risk of venous thromboembolism (VTE recurrence is high following an initial VTE event, and it persists over time. This recurrence risk decreases rapidly after starting with anticoagulation treatment and reduces by ~80%–90% with prolonged anticoagulation. Nonpersistence with anticoagulants could lead to increased risk of VTE recurrence. This systematic review aimed to estimate persistence at 3, 6, and 12 months with anticoagulants in patients with VTE, and to evaluate the risk of VTE recurrence in nonpersistent patients.Methods: PubMed and Embase® were searched up to May 3, 2014 and the search results updated to May 31, 2015. Studies involving patients with VTE aged ≥18 years, treatment with anticoagulants intended for at least 3 months or more, and reporting data for persistence were included. Proportions were transformed using Freeman–Tukey double arcsine transformation and pooled using the DerSimonian–Laird random-effects approach.Results: In total, 12 observational studies (7/12 conference abstracts were included in the review. All 12 studies either reported or provided data for persistence. The total number of patients meta-analyzed to estimate persistence at 3, 6, and 12 months was 71,969 patients, 58,940 patients, and 68,235 patients, respectively. The estimated persistence for 3, 6, and 12 months of therapy was 83% (95% confidence interval [CI], 78–87; I2=99.3%, 62% (95% CI, 58–66; I2=98.1%, and 31% (95% CI, 22–40; I2=99.8%, respectively. Only two studies reported the risk of VTE recurrence based on nonpersistence – one at 3 months and the other at 12 months.Conclusion: Limited evidence showed that persistence was suboptimal with an estimated 17% patients being nonpersistent with anticoagulants in the crucial first 3 months. Persistence declined over 6 and 12 months

  11. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  12. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  13. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  14. Bias correction of surface downwelling longwave and shortwave radiation for the EWEMBI dataset

    Science.gov (United States)

    Lange, Stefan

    2018-05-01

    Many meteorological forcing datasets include bias-corrected surface downwelling longwave and shortwave radiation (rlds and rsds). Methods used for such bias corrections range from multi-year monthly mean value scaling to quantile mapping at the daily timescale. An additional downscaling is necessary if the data to be corrected have a higher spatial resolution than the observational data used to determine the biases. This was the case when EartH2Observe (E2OBS; Calton et al., 2016) rlds and rsds were bias-corrected using more coarsely resolved Surface Radiation Budget (SRB; Stackhouse Jr. et al., 2011) data for the production of the meteorological forcing dataset EWEMBI (Lange, 2016). This article systematically compares various parametric quantile mapping methods designed specifically for this purpose, including those used for the production of EWEMBI rlds and rsds. The methods vary in the timescale at which they operate, in their way of accounting for physical upper radiation limits, and in their approach to bridging the spatial resolution gap between E2OBS and SRB. It is shown how temporal and spatial variability deflation related to bilinear interpolation and other deterministic downscaling approaches can be overcome by downscaling the target statistics of quantile mapping from the SRB to the E2OBS grid such that the sub-SRB-grid-scale spatial variability present in the original E2OBS data is retained. Cross validations at the daily and monthly timescales reveal that it is worthwhile to take empirical estimates of physical upper limits into account when adjusting either radiation component and that, overall, bias correction at the daily timescale is more effective than bias correction at the monthly timescale if sampling errors are taken into account.

  15. Bias correction of surface downwelling longwave and shortwave radiation for the EWEMBI dataset

    Directory of Open Access Journals (Sweden)

    S. Lange

    2018-05-01

    Full Text Available Many meteorological forcing datasets include bias-corrected surface downwelling longwave and shortwave radiation (rlds and rsds. Methods used for such bias corrections range from multi-year monthly mean value scaling to quantile mapping at the daily timescale. An additional downscaling is necessary if the data to be corrected have a higher spatial resolution than the observational data used to determine the biases. This was the case when EartH2Observe (E2OBS; Calton et al., 2016 rlds and rsds were bias-corrected using more coarsely resolved Surface Radiation Budget (SRB; Stackhouse Jr. et al., 2011 data for the production of the meteorological forcing dataset EWEMBI (Lange, 2016. This article systematically compares various parametric quantile mapping methods designed specifically for this purpose, including those used for the production of EWEMBI rlds and rsds. The methods vary in the timescale at which they operate, in their way of accounting for physical upper radiation limits, and in their approach to bridging the spatial resolution gap between E2OBS and SRB. It is shown how temporal and spatial variability deflation related to bilinear interpolation and other deterministic downscaling approaches can be overcome by downscaling the target statistics of quantile mapping from the SRB to the E2OBS grid such that the sub-SRB-grid-scale spatial variability present in the original E2OBS data is retained. Cross validations at the daily and monthly timescales reveal that it is worthwhile to take empirical estimates of physical upper limits into account when adjusting either radiation component and that, overall, bias correction at the daily timescale is more effective than bias correction at the monthly timescale if sampling errors are taken into account.

  16. Evaluation of Uncertainty in Precipitation Datasets for New Mexico, USA

    Science.gov (United States)

    Besha, A. A.; Steele, C. M.; Fernald, A.

    2014-12-01

    Climate change, population growth and other factors are endangering water availability and sustainability in semiarid/arid areas particularly in the southwestern United States. Wide coverage of spatial and temporal measurements of precipitation are key for regional water budget analysis and hydrological operations which themselves are valuable tool for water resource planning and management. Rain gauge measurements are usually reliable and accurate at a point. They measure rainfall continuously, but spatial sampling is limited. Ground based radar and satellite remotely sensed precipitation have wide spatial and temporal coverage. However, these measurements are indirect and subject to errors because of equipment, meteorological variability, the heterogeneity of the land surface itself and lack of regular recording. This study seeks to understand precipitation uncertainty and in doing so, lessen uncertainty propagation into hydrological applications and operations. We reviewed, compared and evaluated the TRMM (Tropical Rainfall Measuring Mission) precipitation products, NOAA's (National Oceanic and Atmospheric Administration) Global Precipitation Climatology Centre (GPCC) monthly precipitation dataset, PRISM (Parameter elevation Regression on Independent Slopes Model) data and data from individual climate stations including Cooperative Observer Program (COOP), Remote Automated Weather Stations (RAWS), Soil Climate Analysis Network (SCAN) and Snowpack Telemetry (SNOTEL) stations. Though not yet finalized, this study finds that the uncertainty within precipitation estimates datasets is influenced by regional topography, season, climate and precipitation rate. Ongoing work aims to further evaluate precipitation datasets based on the relative influence of these phenomena so that we can identify the optimum datasets for input to statewide water budget analysis.

  17. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  18. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  19. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  20. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  1. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  2. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  3. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  4. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  7. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  8. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  9. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  10. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  11. Limited evidence for the effect of sodium fluoride on deterioration of hearing loss in patients with otosclerosis: a systematic review of the literature

    NARCIS (Netherlands)

    Hentschel, M.A.; Huizinga, P.; van der Velden, D.L.; Wegner, I.; Bittermann, A.J.N.; van der Heijden, G.J.M.; Grolman, W.

    2014-01-01

    OBJECTIVE: To determine the protective effect of sodium fluoride on the deterioration of hearing loss in adult patients with otosclerosis. DATA SOURCES: PubMed, Embase, the Cochrane Library, and CINAHL. STUDY SELECTION: A systematic literature search was conducted. Studies reporting original study

  12. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  13. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  14. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  15. Accounting for inertia in modal choices: some new evidence using a RP/SP dataset

    DEFF Research Database (Denmark)

    Cherchi, Elisabetta; Manca, Francesco

    2011-01-01

    effect is stable along the SP experiments. Inertia has been studied more extensively with panel datasets, but few investigations have used RP/SP datasets. In this paper we extend previous work in several ways. We test and compare several ways of measuring inertia, including measures that have been...... proposed for both short and long RP panel datasets. We also explore new measures of inertia to test for the effect of “learning” (in the sense of acquiring experience or getting more familiar with) along the SP experiment and we disentangle this effect from the pure inertia effect. A mixed logit model...... is used that allows us to account for both systematic and random taste variations in the inertia effect and for correlations among RP and SP observations. Finally we explore the relation between the utility specification (especially in the SP dataset) and the role of inertia in explaining current choices....

  16. Introduction of a simple-model-based land surface dataset for Europe

    Science.gov (United States)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar

  17. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  18. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  19. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  20. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  1. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  2. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  3. Comprehensive comparison of large-scale tissue expression datasets

    DEFF Research Database (Denmark)

    Santos Delgado, Alberto; Tsafou, Kalliopi; Stolte, Christian

    2015-01-01

    a comprehensive evaluation of tissue expression data from a variety of experimental techniques and show that these agree surprisingly well with each other and with results from literature curation and text mining. We further found that most datasets support the assumed but not demonstrated distinction between......For tissues to carry out their functions, they rely on the right proteins to be present. Several high-throughput technologies have been used to map out which proteins are expressed in which tissues; however, the data have not previously been systematically compared and integrated. We present......://tissues.jensenlab.org), which makes all the scored and integrated data available through a single user-friendly web interface....

  4. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  5. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  6. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  7. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  8. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

    2018-01-01

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  9. Omicseq: a web-based search engine for exploring omics datasets

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  10. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    Science.gov (United States)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  11. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Müller, Matthias

    2018-03-28

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  12. Omicseq: a web-based search engine for exploring omics datasets.

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. GLEAM version 3: Global Land Evaporation Datasets and Model

    Science.gov (United States)

    Martens, B.; Miralles, D. G.; Lievens, H.; van der Schalie, R.; de Jeu, R.; Fernandez-Prieto, D.; Verhoest, N.

    2015-12-01

    Terrestrial evaporation links energy, water and carbon cycles over land and is therefore a key variable of the climate system. However, the global-scale magnitude and variability of the flux, and the sensitivity of the underlying physical process to changes in environmental factors, are still poorly understood due to limitations in in situ measurements. As a result, several methods have risen to estimate global patterns of land evaporation from satellite observations. However, these algorithms generally differ in their approach to model evaporation, resulting in large differences in their estimates. One of these methods is GLEAM, the Global Land Evaporation: the Amsterdam Methodology. GLEAM estimates terrestrial evaporation based on daily satellite observations of meteorological variables, vegetation characteristics and soil moisture. Since the publication of the first version of the algorithm (2011), the model has been widely applied to analyse trends in the water cycle and land-atmospheric feedbacks during extreme hydrometeorological events. A third version of the GLEAM global datasets is foreseen by the end of 2015. Given the relevance of having a continuous and reliable record of global-scale evaporation estimates for climate and hydrological research, the establishment of an online data portal to host these data to the public is also foreseen. In this new release of the GLEAM datasets, different components of the model have been updated, with the most significant change being the revision of the data assimilation algorithm. In this presentation, we will highlight the most important changes of the methodology and present three new GLEAM datasets and their validation against in situ observations and an alternative dataset of terrestrial evaporation (ERA-Land). Results of the validation exercise indicate that the magnitude and the spatiotemporal variability of the modelled evaporation agree reasonably well with the estimates of ERA-Land and the in situ

  14. Bayesian Method for Building Frequent Landsat-Like NDVI Datasets by Integrating MODIS and Landsat NDVI

    OpenAIRE

    Limin Liao; Jinling Song; Jindi Wang; Zhiqiang Xiao; Jian Wang

    2016-01-01

    Studies related to vegetation dynamics in heterogeneous landscapes often require Normalized Difference Vegetation Index (NDVI) datasets with both high spatial resolution and frequent coverage, which cannot be satisfied by a single sensor due to technical limitations. In this study, we propose a new method called NDVI-Bayesian Spatiotemporal Fusion Model (NDVI-BSFM) for accurately and effectively building frequent high spatial resolution Landsat-like NDVI datasets by integrating Moderate Resol...

  15. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  16. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  17. Brief report on a systematic review of youth violence prevention through media campaigns: Does the limited yield of strong evidence imply methodological challenges or absence of effect?

    Science.gov (United States)

    Cassidy, Tali; Bowman, Brett; McGrath, Chloe; Matzopoulos, Richard

    2016-10-01

    We present a brief report on a systematic review which identified, assessed and synthesized the existing evidence of the effectiveness of media campaigns in reducing youth violence. Search strategies made use of terms for youth, violence and a range of terms relating to the intervention. An array of academic databases and websites were searched. Although media campaigns to reduce violence are widespread, only six studies met the inclusion criteria. There is little strong evidence to support a direct link between media campaigns and a reduction in youth violence. Several studies measure proxies for violence such as empathy or opinions related to violence, but the link between these measures and violence perpetration is unclear. Nonetheless, some evidence suggests that a targeted and context-specific campaign, especially when combined with other measures, can reduce violence. However, such campaigns are less cost-effective to replicate over large populations than generalised campaigns. It is unclear whether the paucity of evidence represents a null effect or methodological challenges with evaluating media campaigns. Future studies need to be carefully planned to accommodate for methodological difficulties as well as to identify the specific elements of campaigns that work, especially in lower and middle income countries. Copyright © 2016 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.

  18. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  19. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  20. A global gridded dataset of daily precipitation going back to 1950, ideal for analysing precipitation extremes

    Science.gov (United States)

    Contractor, S.; Donat, M.; Alexander, L. V.

    2017-12-01

    Reliable observations of precipitation are necessary to determine past changes in precipitation and validate models, allowing for reliable future projections. Existing gauge based gridded datasets of daily precipitation and satellite based observations contain artefacts and have a short length of record, making them unsuitable to analyse precipitation extremes. The largest limiting factor for the gauge based datasets is a dense and reliable station network. Currently, there are two major data archives of global in situ daily rainfall data, first is Global Historical Station Network (GHCN-Daily) hosted by National Oceanic and Atmospheric Administration (NOAA) and the other by Global Precipitation Climatology Centre (GPCC) part of the Deutsche Wetterdienst (DWD). We combine the two data archives and use automated quality control techniques to create a reliable long term network of raw station data, which we then interpolate using block kriging to create a global gridded dataset of daily precipitation going back to 1950. We compare our interpolated dataset with existing global gridded data of daily precipitation: NOAA Climate Prediction Centre (CPC) Global V1.0 and GPCC Full Data Daily Version 1.0, as well as various regional datasets. We find that our raw station density is much higher than other datasets. To avoid artefacts due to station network variability, we provide multiple versions of our dataset based on various completeness criteria, as well as provide the standard deviation, kriging error and number of stations for each grid cell and timestep to encourage responsible use of our dataset. Despite our efforts to increase the raw data density, the in situ station network remains sparse in India after the 1960s and in Africa throughout the timespan of the dataset. Our dataset would allow for more reliable global analyses of rainfall including its extremes and pave the way for better global precipitation observations with lower and more transparent uncertainties.

  1. Clinical efficacy and safety of limited internal fixation combined with external fixation for Pilon fracture: A systematic review and meta-analysis

    OpenAIRE

    Zhang, Shaobo; Zhang, Yibao; Wang, Shenghong; Zhang, Hua; Liu, Peng; Zhang, Wei; Ma, Jing-Lin; Wang, Jing

    2017-01-01

    Purpose: To compare the clinical efficacy and complications of limited internal fixation combined with external fixation (LIFEF) and open reduction and internal fixation (ORIF) in the treatment of Pilon fracture. Methods: We searched databases including Pubmed, Embase, Web of science, Cochrane Library and China Biology Medicine disc for the studies comparing clinical efficacy and complications of LIFEF and ORIF in the treatment of Pilon fracture. The clinical efficacy was evaluated by the ...

  2. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets.

    Science.gov (United States)

    Li, Lianwei; Ma, Zhanshan Sam

    2016-08-16

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health-the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples, we discovered that only 49 communities (less than 1%) satisfied the neutral theory, and concluded that human microbial communities are not neutral in general. The 49 positive cases, although only a tiny minority, do demonstrate the existence of neutral processes. We realize that the traditional doctrine of microbial biogeography "Everything is everywhere, but the environment selects" first proposed by Baas-Becking resolves the apparent contradiction. The first part of Baas-Becking doctrine states that microbes are not dispersal-limited and therefore are neutral prone, and the second part reiterates that the freely dispersed microbes must endure selection by the environment. Therefore, in most cases, it is the host environment that ultimately shapes the community assembly and tip the human microbiome to niche regime.

  3. Isokinetic strength assessment offers limited predictive validity for detecting risk of future hamstring strain in sport: a systematic review and meta-analysis.

    Science.gov (United States)

    Green, Brady; Bourne, Matthew N; Pizzari, Tania

    2018-03-01

    To examine the value of isokinetic strength assessment for predicting risk of hamstring strain injury, and to direct future research into hamstring strain injuries. Systematic review. Database searches for Medline, CINAHL, Embase, AMED, AUSPORT, SPORTDiscus, PEDro and Cochrane Library from inception to April 2017. Manual reference checks, ahead-of-press and citation tracking. Prospective studies evaluating isokinetic hamstrings, quadriceps and hip extensor strength testing as a risk factor for occurrence of hamstring muscle strain. Independent search result screening. Risk of bias assessment by independent reviewers using Quality in Prognosis Studies tool. Best evidence synthesis and meta-analyses of standardised mean difference (SMD). Twelve studies were included, capturing 508 hamstring strain injuries in 2912 athletes. Isokinetic knee flexor, knee extensor and hip extensor outputs were examined at angular velocities ranging 30-300°/s, concentric or eccentric, and relative (Nm/kg) or absolute (Nm) measures. Strength ratios ranged between 30°/s and 300°/s. Meta-analyses revealed a small, significant predictive effect for absolute (SMD=-0.16, P=0.04, 95% CI -0.31 to -0.01) and relative (SMD=-0.17, P=0.03, 95% CI -0.33 to -0.014) eccentric knee flexor strength (60°/s). No other testing speed or strength ratio showed statistical association. Best evidence synthesis found over half of all variables had moderate or strong evidence for no association with future hamstring injury. Despite an isolated finding for eccentric knee flexor strength at slow speeds, the role and application of isokinetic assessment for predicting hamstring strain risk should be reconsidered, particularly given costs and specialised training required. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  4. Manual therapy for the management of pain and limited range of motion in subjects with signs and symptoms of temporomandibular disorder: a systematic review of randomised controlled trials.

    Science.gov (United States)

    Calixtre, L B; Moreira, R F C; Franchini, G H; Alburquerque-Sendín, F; Oliveira, A B

    2015-11-01

    There is a lack of knowledge about the effectiveness of manual therapy (MT) on subjects with temporomandibular disorders (TMD). The aim of this systematic review is to synthetise evidence regarding the isolated effect of MT in improving maximum mouth opening (MMO) and pain in subjects with signs and symptoms of TMD. MEDLINE(®) , Cochrane, Web of Science, SciELO and EMBASE(™) electronic databases were consulted, searching for randomised controlled trials applying MT for TMD compared to other intervention, no intervention or placebo. Two authors independently extracted data, PEDro scale was used to assess risk of bias, and GRADE (Grading of Recommendations Assessment, Development and Evaluation) was applied to synthetise overall quality of the body of evidence. Treatment effect size was calculated for pain, MMO and pressure pain threshold (PPT). Eight trials were included, seven of high methodological quality. Myofascial release and massage techniques applied on the masticatory muscles are more effective than control (low to moderate evidence) but as effective as toxin botulinum injections (moderate evidence). Upper cervical spine thrust manipulation or mobilisation techniques are more effective than control (low to high evidence), while thoracic manipulations are not. There is moderate-to-high evidence that MT techniques protocols are effective. The methodological heterogeneity across trials protocols frequently contributed to decrease quality of evidence. In conclusion, there is widely varying evidence that MT improves pain, MMO and PPT in subjects with TMD signs and symptoms, depending on the technique. Further studies should consider using standardised evaluations and better study designs to strengthen clinical relevance. © 2015 John Wiley & Sons Ltd.

  5. The Problem with Big Data: Operating on Smaller Datasets to Bridge the Implementation Gap.

    Science.gov (United States)

    Mann, Richard P; Mushtaq, Faisal; White, Alan D; Mata-Cervantes, Gabriel; Pike, Tom; Coker, Dalton; Murdoch, Stuart; Hiles, Tim; Smith, Clare; Berridge, David; Hinchliffe, Suzanne; Hall, Geoff; Smye, Stephen; Wilkie, Richard M; Lodge, J Peter A; Mon-Williams, Mark

    2016-01-01

    Big datasets have the potential to revolutionize public health. However, there is a mismatch between the political and scientific optimism surrounding big data and the public's perception of its benefit. We suggest a systematic and concerted emphasis on developing models derived from smaller datasets to illustrate to the public how big data can produce tangible benefits in the long term. In order to highlight the immediate value of a small data approach, we produced a proof-of-concept model predicting hospital length of stay. The results demonstrate that existing small datasets can be used to create models that generate a reasonable prediction, facilitating health-care delivery. We propose that greater attention (and funding) needs to be directed toward the utilization of existing information resources in parallel with current efforts to create and exploit "big data."

  6. Development and validation of a national data registry for midwife-led births: the Midwives Alliance of North America Statistics Project 2.0 dataset.

    Science.gov (United States)

    Cheyney, Melissa; Bovbjerg, Marit; Everson, Courtney; Gordon, Wendy; Hannibal, Darcy; Vedam, Saraswathi

    2014-01-01

    In 2004, the Midwives Alliance of North America's (MANA's) Division of Research developed a Web-based data collection system to gather information on the practices and outcomes associated with midwife-led births in the United States. This system, called the MANA Statistics Project (MANA Stats), grew out of a widely acknowledged need for more reliable data on outcomes by intended place of birth. This article describes the history and development of the MANA Stats birth registry and provides an analysis of the 2.0 dataset's content, strengths, and limitations. Data collection and review procedures for the MANA Stats 2.0 dataset are described, along with methods for the assessment of data accuracy. We calculated descriptive statistics for client demographics and contributing midwife credentials, and assessed the quality of data by calculating point estimates, 95% confidence intervals, and kappa statistics for key outcomes on pre- and postreview samples of records. The MANA Stats 2.0 dataset (2004-2009) contains 24,848 courses of care, 20,893 of which are for women who planned a home or birth center birth at the onset of labor. The majority of these records were planned home births (81%). Births were attended primarily by certified professional midwives (73%), and clients were largely white (92%), married (87%), and college-educated (49%). Data quality analyses of 9932 records revealed no differences between pre- and postreviewed samples for 7 key benchmarking variables (kappa, 0.98-1.00). The MANA Stats 2.0 data were accurately entered by participants; any errors in this dataset are likely random and not systematic. The primary limitation of the 2.0 dataset is that the sample was captured through voluntary participation; thus, it may not accurately reflect population-based outcomes. The dataset's primary strength is that it will allow for the examination of research questions on normal physiologic birth and midwife-led birth outcomes by intended place of birth.

  7. A systematic review of air pollution as a risk factor for cardiovascular disease in South Asia: limited evidence from India and Pakistan.

    Science.gov (United States)

    Yamamoto, S S; Phalkey, R; Malik, A A

    2014-03-01

    Cardiovascular diseases (CVD) are major contributors to mortality and morbidity in South Asia. Chronic exposure to air pollution is an important risk factor for cardiovascular diseases, although the majority of studies to date have been conducted in developed countries. Both indoor and outdoor air pollution are growing problems in developing countries in South Asia yet the impact on rising rates of CVD in these regions has largely been ignored. We aimed to assess the evidence available regarding air pollution effects on CVD and CVD risk factors in lower income countries in South Asia. A literature search was conducted in PubMed and Web of Science. Our inclusion criteria included peer-reviewed, original, empirical articles published in English between the years 1990 and 2012, conducted in the World Bank South Asia region (Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan and Sri Lanka). This resulted in 30 articles. Nine articles met our inclusion criteria and were assessed for this systematic review. Most of the studies were cross-sectional and examined measured particulate matter effects on CVD outcomes and indicators. We observed a bias as nearly all of the studies were from India. Hypertension and CVD deaths were positively associated with higher particulate matter levels. Biomarkers of oxidative stress such as increased levels of P-selection expressing platelets, depleted superoxide dismutase and reactive oxygen species generation as well as elevated levels of inflammatory-related C-reactive protein, interleukin-6 and interleukin-8 were also positively associated with biomass use or elevated particulate matter levels. An important outcome of this investigation was the evidence suggesting important air pollution effects regarding CVD risk in South Asia. However, too few studies have been conducted. There is as an urgent need for longer term investigations using robust measures of air pollution with different population groups that include a wider

  8. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  9. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  10. Analysis of Public Datasets for Wearable Fall Detection Systems.

    Science.gov (United States)

    Casilari, Eduardo; Santoyo-Ramón, José-Antonio; Cano-García, José-Manuel

    2017-06-27

    Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs) have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs). In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.). Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  11. Analysis of Public Datasets for Wearable Fall Detection Systems

    Directory of Open Access Journals (Sweden)

    Eduardo Casilari

    2017-06-01

    Full Text Available Due to the boom of wireless handheld devices such as smartwatches and smartphones, wearable Fall Detection Systems (FDSs have become a major focus of attention among the research community during the last years. The effectiveness of a wearable FDS must be contrasted against a wide variety of measurements obtained from inertial sensors during the occurrence of falls and Activities of Daily Living (ADLs. In this regard, the access to public databases constitutes the basis for an open and systematic assessment of fall detection techniques. This paper reviews and appraises twelve existing available data repositories containing measurements of ADLs and emulated falls envisaged for the evaluation of fall detection algorithms in wearable FDSs. The analysis of the found datasets is performed in a comprehensive way, taking into account the multiple factors involved in the definition of the testbeds deployed for the generation of the mobility samples. The study of the traces brings to light the lack of a common experimental benchmarking procedure and, consequently, the large heterogeneity of the datasets from a number of perspectives (length and number of samples, typology of the emulated falls and ADLs, characteristics of the test subjects, features and positions of the sensors, etc.. Concerning this, the statistical analysis of the samples reveals the impact of the sensor range on the reliability of the traces. In addition, the study evidences the importance of the selection of the ADLs and the need of categorizing the ADLs depending on the intensity of the movements in order to evaluate the capability of a certain detection algorithm to discriminate falls from ADLs.

  12. Heat stress and fetal risk. Environmental limits for exercise and passive heat stress during pregnancy: a systematic review with best evidence synthesis.

    Science.gov (United States)

    Ravanelli, Nicholas; Casasola, William; English, Timothy; Edwards, Kate M; Jay, Ollie

    2018-03-01

    Pregnant women are advised to avoid heat stress (eg, excessive exercise and/or heat exposure) due to the risk of teratogenicity associated with maternal hyperthermia; defined as a core temperature (T core ) ≥39.0°C. However, guidelines are ambiguous in terms of critical combinations of climate and activity to avoid and may therefore unnecessarily discourage physical activity during pregnancy. Thus, the primary aim was to assess T core elevations with different characteristics defining exercise and passive heat stress (intensity, mode, ambient conditions, duration) during pregnancy relative to the critical maternal T core of ≥39.0°C. Systematic review with best evidence synthesis. EMBASE, MEDLINE, SCOPUS, CINAHL and Web of Science were searched from inception to 12 July 2017. Studies reporting the T core response of pregnant women, at any period of gestation, to exercise or passive heat stress, were included. 12 studies satisfied our inclusion criteria (n=347). No woman exceeded a T core of 39.0°C. The highest T core was 38.9°C, reported during land-based exercise. The highest mean end-trial T core was 38.3°C (95% CI 37.7°C to 38.9°C) for land-based exercise, 37.5°C (95% CI 37.3°C to 37.7°C) for water immersion exercise, 36.9°C (95% CI 36.8°C to 37.0°C) for hot water bathing and 37.6°C (95% CI 37.5°C to 37.7°C) for sauna exposure. The highest individual core temperature reported was 38.9°C. Immediately after exercise (either land based or water immersion), the highest mean core temperature was 38.3°C; 0.7°C below the proposed teratogenic threshold. Pregnant women can safely engage in: (1) exercise for up to 35 min at 80%-90% of their maximum heart rate in 25°C and 45% relative humidity (RH); (2) water immersion (≤33.4°C) exercise for up to 45 min; and (3) sitting in hot baths (40°C) or hot/dry saunas (70°C; 15% RH) for up to 20 min, irrespective of pregnancy stage, without reaching a core temperature exceeding the teratogenic

  13. CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

    Directory of Open Access Journals (Sweden)

    K.-G. Karlsson

    2013-05-01

    Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.

  14. Animated analysis of geoscientific datasets: An interactive graphical application

    Science.gov (United States)

    Morse, Peter; Reading, Anya; Lueg, Christopher

    2017-12-01

    Geoscientists are required to analyze and draw conclusions from increasingly large volumes of data. There is a need to recognise and characterise features and changing patterns of Earth observables within such large datasets. It is also necessary to identify significant subsets of the data for more detailed analysis. We present an innovative, interactive software tool and workflow to visualise, characterise, sample and tag large geoscientific datasets from both local and cloud-based repositories. It uses an animated interface and human-computer interaction to utilise the capacity of human expert observers to identify features via enhanced visual analytics. 'Tagger' enables users to analyze datasets that are too large in volume to be drawn legibly on a reasonable number of single static plots. Users interact with the moving graphical display, tagging data ranges of interest for subsequent attention. The tool provides a rapid pre-pass process using fast GPU-based OpenGL graphics and data-handling and is coded in the Quartz Composer visual programing language (VPL) on Mac OSX. It makes use of interoperable data formats, and cloud-based (or local) data storage and compute. In a case study, Tagger was used to characterise a decade (2000-2009) of data recorded by the Cape Sorell Waverider Buoy, located approximately 10 km off the west coast of Tasmania, Australia. These data serve as a proxy for the understanding of Southern Ocean storminess, which has both local and global implications. This example shows use of the tool to identify and characterise 4 different types of storm and non-storm events during this time. Events characterised in this way are compared with conventional analysis, noting advantages and limitations of data analysis using animation and human interaction. Tagger provides a new ability to make use of humans as feature detectors in computer-based analysis of large-volume geosciences and other data.

  15. A synthetic dataset for evaluating soft and hard fusion algorithms

    Science.gov (United States)

    Graham, Jacob L.; Hall, David L.; Rimland, Jeffrey

    2011-06-01

    There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes." The supporting database also maintains the pedigree of messages and other critical metadata.

  16. Quality Controlling CMIP datasets at GFDL

    Science.gov (United States)

    Horowitz, L. W.; Radhakrishnan, A.; Balaji, V.; Adcroft, A.; Krasting, J. P.; Nikonov, S.; Mason, E. E.; Schweitzer, R.; Nadeau, D.

    2017-12-01

    As GFDL makes the switch from model development to production in light of the Climate Model Intercomparison Project (CMIP), GFDL's efforts are shifted to testing and more importantly establishing guidelines and protocols for Quality Controlling and semi-automated data publishing. Every CMIP cycle introduces key challenges and the upcoming CMIP6 is no exception. The new CMIP experimental design comprises of multiple MIPs facilitating research in different focus areas. This paradigm has implications not only for the groups that develop the models and conduct the runs, but also for the groups that monitor, analyze and quality control the datasets before data publishing, before their knowledge makes its way into reports like the IPCC (Intergovernmental Panel on Climate Change) Assessment Reports. In this talk, we discuss some of the paths taken at GFDL to quality control the CMIP-ready datasets including: Jupyter notebooks, PrePARE, LAMP (Linux, Apache, MySQL, PHP/Python/Perl): technology-driven tracker system to monitor the status of experiments qualitatively and quantitatively, provide additional metadata and analysis services along with some in-built controlled-vocabulary validations in the workflow. In addition to this, we also discuss the integration of community-based model evaluation software (ESMValTool, PCMDI Metrics Package, and ILAMB) as part of our CMIP6 workflow.

  17. Integrated remotely sensed datasets for disaster management

    Science.gov (United States)

    McCarthy, Timothy; Farrell, Ronan; Curtis, Andrew; Fotheringham, A. Stewart

    2008-10-01

    Video imagery can be acquired from aerial, terrestrial and marine based platforms and has been exploited for a range of remote sensing applications over the past two decades. Examples include coastal surveys using aerial video, routecorridor infrastructures surveys using vehicle mounted video cameras, aerial surveys over forestry and agriculture, underwater habitat mapping and disaster management. Many of these video systems are based on interlaced, television standards such as North America's NTSC and European SECAM and PAL television systems that are then recorded using various video formats. This technology has recently being employed as a front-line, remote sensing technology for damage assessment post-disaster. This paper traces the development of spatial video as a remote sensing tool from the early 1980s to the present day. The background to a new spatial-video research initiative based at National University of Ireland, Maynooth, (NUIM) is described. New improvements are proposed and include; low-cost encoders, easy to use software decoders, timing issues and interoperability. These developments will enable specialists and non-specialists collect, process and integrate these datasets within minimal support. This integrated approach will enable decision makers to access relevant remotely sensed datasets quickly and so, carry out rapid damage assessment during and post-disaster.

  18. Limitations of studies on school-based nutrition education interventions for obesity in China: a systematic review and meta-analysis.

    Science.gov (United States)

    Kong, Kaimeng; Liu, Jie; Tao, Yexuan

    2016-01-01

    School-based nutrition education has been widely implemented in recent years to fight the increasing prevalence of childhood obesity in China. A comprehensive literature search was performed using six databases to identify studies of school-based nutrition education interventions in China. The methodological quality and the risk of bias of selected literature were evaluated. Stratified analysis was performed to identify whether different methodologies influenced the estimated effect of the intervention. Seventeen articles were included in the analysis. Several of the included studies had inadequate intervention duration, inappropriate randomization methods, selection bias, unbalanced baseline characteristics between control and intervention groups, and absent sample size calculation. Overall, the studies showed no significant impact of nutrition education on obesity (OR=0.76; 95% CI=0.55-1.05; p=0.09). This can be compared with an OR of 0.68 for interventions aimed at preventing malnutrition and an OR of 0.49 for interventions aimed at preventing iron-deficiency anemia. When studies with unbalanced baseline characteristics between groups and selection bias in the study subjects were excluded, the impact of nutrition education on obesity was significant (OR=0.73; 95% CI=0.55-0.98; p=0.003). An analysis stratified according to the duration of intervention revealed that the intervention was effective only when it lasted for more than 2 years (OR=0.49, 95% CI=0.42-0.58; pnutrition education programs in China have some important limitations that might affect the estimated effectiveness of the intervention.

  19. Clinical efficacy and safety of limited internal fixation combined with external fixation for Pilon fracture: A systematic review and meta-analysis.

    Science.gov (United States)

    Zhang, Shao-Bo; Zhang, Yi-Bao; Wang, Sheng-Hong; Zhang, Hua; Liu, Peng; Zhang, Wei; Ma, Jing-Lin; Wang, Jing

    2017-04-01

    To compare the clinical efficacy and complications of limited internal fixation combined with external fixation (LIFEF) and open reduction and internal fixation (ORIF) in the treatment of Pilon fracture. We searched databases including Pubmed, Embase, Web of science, Cochrane Library and China Biology Medicine disc for the studies comparing clinical efficacy and complications of LIFEF and ORIF in the treatment of Pilon fracture. The clinical efficacy was evaluated by the rate of nonunion, malunion/delayed union and the excellent/good rate assessed by Mazur ankle score. The complications including infections and arthritis symptoms after surgery were also investigated. Nine trials including 498 pilon fractures of 494 patients were identified. The meta-analysis found no significant differences in nonunion rate (RR = 1.60, 95% CI: 0.66 to 3.86, p = 0.30), and the excellent/good rate (RR = 0.95, 95% CI: 0.86 to 1.04, p = 0.28) between LIFEF group and ORIF group. For assessment of infections, there were significant differences in the rate of deep infection (RR = 2.18, 95% CI: 1.34 to 3.55, p = 0.002), and the rate of arthritis (RR = 1.26, 95% CI: 1.03 to 1.53, p = 0.02) between LIFEF group and ORIF group. LIFEF has similar effect as ORIF in the treatment of pilon fractures, however, LIFEF group has significantly higher risk of complications than ORIF group does. So LIFEF is not recommended in the treatment of pilon fracture. Copyright © 2017 Daping Hospital and the Research Institute of Surgery of the Third Military Medical University. Production and hosting by Elsevier B.V. All rights reserved.

  20. Pelvic lymph node dissection during robot-assisted radical prostatectomy: efficacy, limitations, and complications-a systematic review of the literature.

    Science.gov (United States)

    Ploussard, Guillaume; Briganti, Alberto; de la Taille, Alexandre; Haese, Alexander; Heidenreich, Axel; Menon, Mani; Sulser, Tullio; Tewari, Ashutosh K; Eastham, James A

    2014-01-01

    Pelvic lymph node dissection (PLND) in prostate cancer is the most effective method for detecting lymph node metastases. However, a decline in the rate of PLND during radical prostatectomy (RP) has been noted. This is likely the result of prostate cancer stage migration in the prostate-specific antigen-screening era, and the introduction of minimally invasive approaches such as robot-assisted radical prostatectomy (RARP). To assess the efficacy, limitations, and complications of PLND during RARP. A review of the literature was performed using the Medline, Scopus, and Web of Science databases with no restriction of language from January 1990 to December 2012. The literature search used the following terms: prostate cancer, radical prostatectomy, robot-assisted, and lymph node dissection. The median value of nodal yield at PLND during RARP ranged from 3 to 24 nodes. As seen in open and laparoscopic RP series, the lymph node positivity rate increased with the extent of dissection during RARP. Overall, PLND-only related complications are rare. The most frequent complication after PLND is symptomatic pelvic lymphocele, with occurrence ranging from 0% to 8% of cases. The rate of PLND-associated grade 3-4 complications ranged from 0% to 5%. PLND is associated with increased operative time. Available data suggest equivalence of PLND between RARP and other surgical approaches in terms of nodal yield, node positivity, and intraoperative and postoperative complications. PLND during RARP can be performed effectively and safely. The overall number of nodes removed, the likelihood of node positivity, and the types and rates of complications of PLND are similar to pure laparoscopic and open retropubic procedures. Copyright © 2013 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  1. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  2. Predicting dataset popularity for the CMS experiment

    CERN Document Server

    INSPIRE-00005122; Li, Ting; Giommi, Luca; Bonacorsi, Daniele; Wildish, Tony

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.

  3. Predicting dataset popularity for the CMS experiment

    International Nuclear Information System (INIS)

    Kuznetsov, V.; Li, T.; Giommi, L.; Bonacorsi, D.; Wildish, T.

    2016-01-01

    The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure. (paper)

  4. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    (c) the Randolph Glacier Inventory (RGI), a new and globally complete digital dataset of outlines from about 180,000 glaciers with some meta-information, which has been used for many applications relating to the IPCC AR5 report. Concerning glacier changes, a database (Fluctuations of Glaciers) exists containing information about mass balance, front variations including past reconstructed time series, geodetic changes and special events. Annual mass balance reporting contains information for about 125 glaciers with a subset of 37 glaciers with continuous observational series since 1980 or earlier. Front variation observations of around 1800 glaciers are available from most of the mountain ranges world-wide. This database was recently updated with 26 glaciers having an unprecedented dataset of length changes from from reconstructions of well-dated historical evidence going back as far as the 16th century. Geodetic observations of about 430 glaciers are available. The database is completed by a dataset containing information on special events including glacier surges, glacier lake outbursts, ice avalanches, eruptions of ice-clad volcanoes, etc. related to about 200 glaciers. A special database of glacier photographs contains 13,000 pictures from around 500 glaciers, some of them dating back to the 19th century. A key challenge is to combine and extend the traditional observations with fast evolving datasets from new technologies.

  5. MIPS bacterial genomes functional annotation benchmark dataset.

    Science.gov (United States)

    Tetko, Igor V; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Fobo, Gisela; Ruepp, Andreas; Antonov, Alexey V; Surmeli, Dimitrij; Mewes, Hans-Wernen

    2005-05-15

    Any development of new methods for automatic functional annotation of proteins according to their sequences requires high-quality data (as benchmark) as well as tedious preparatory work to generate sequence parameters required as input data for the machine learning methods. Different program settings and incompatible protocols make a comparison of the analyzed methods difficult. The MIPS Bacterial Functional Annotation Benchmark dataset (MIPS-BFAB) is a new, high-quality resource comprising four bacterial genomes manually annotated according to the MIPS functional catalogue (FunCat). These resources include precalculated sequence parameters, such as sequence similarity scores, InterPro domain composition and other parameters that could be used to develop and benchmark methods for functional annotation of bacterial protein sequences. These data are provided in XML format and can be used by scientists who are not necessarily experts in genome annotation. BFAB is available at http://mips.gsf.de/proj/bfab

  6. 2006 Fynmeet sea clutter measurement trial: Datasets

    CSIR Research Space (South Africa)

    Herselman, PLR

    2007-09-06

    Full Text Available -011............................................................................................................................................................................................. 25 iii Dataset CAD14-001 0 5 10 15 20 25 30 35 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-001 2400 2600 2800... 40 10 20 30 40 50 60 70 80 90 R an ge G at e # Time [s] A bs ol ut e R an ge [m ] RCS [dBm2] vs. time and range for f1 = 9.000 GHz - CAD14-002 2400 2600 2800 3000 3200 3400 3600 -30 -25 -20 -15 -10 -5 0 5 10...

  7. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  8. Feedback control in deep drawing based on experimental datasets

    Science.gov (United States)

    Fischer, P.; Heingärtner, J.; Aichholzer, W.; Hortig, D.; Hora, P.

    2017-09-01

    In large-scale production of deep drawing parts, like in automotive industry, the effects of scattering material properties as well as warming of the tools have a significant impact on the drawing result. In the scope of the work, an approach is presented to minimize the influence of these effects on part quality by optically measuring the draw-in of each part and adjusting the settings of the press to keep the strain distribution, which is represented by the draw-in, inside a certain limit. For the design of the control algorithm, a design of experiments for in-line tests is used to quantify the influence of the blank holder force as well as the force distribution on the draw-in. The results of this experimental dataset are used to model the process behavior. Based on this model, a feedback control loop is designed. Finally, the performance of the control algorithm is validated in the production line.

  9. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  10. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  11. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  12. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  13. Systematic optimization of multiplex zymography protocol to detect active cathepsins K, L, S, and V in healthy and diseased tissue: compromise among limits of detection, reduced time, and resources.

    Science.gov (United States)

    Dumas, Jerald E; Platt, Manu O

    2013-07-01

    Cysteine cathepsins are a family of proteases identified in cancer, atherosclerosis, osteoporosis, arthritis, and a number of other diseases. As this number continues to rise, so does the need for low cost, broad use quantitative assays to detect their activity and can be translated to the clinic in the hospital or in low resource settings. Multiplex cathepsin zymography is one such assay that detects subnanomolar levels of active cathepsins K, L, S, and V in cell or tissue preparations observed as clear bands of proteolytic activity after gelatin substrate SDS-PAGE with conditions optimal for cathepsin renaturing and activity. Densitometric analysis of the zymogram provides quantitative information from this low cost assay. After systematic modifications to optimize cathepsin zymography, we describe reduced electrophoresis time from 2 h to 10 min, incubation assay time from overnight to 4 h, and reduced minimal tissue protein necessary while maintaining sensitive detection limits; an evaluation of the pros and cons of each modification is also included. We further describe image acquisition by Smartphone camera, export to Matlab, and densitometric analysis code to quantify and report cathepsin activity, adding portability and replacing large scale, darkbox imaging equipment that could be cost prohibitive in limited resource settings.

  14. Systematic optimization of multiplex zymography protocol to detect active cathepsins K, L, S, and V in healthy and diseased tissue: compromise between limits of detection, reduced time, and resources

    Science.gov (United States)

    Dumas, Jerald E.; Platt, Manu O.

    2013-01-01

    Cysteine cathepsins are a family of proteases identified in cancer, atherosclerosis, osteoporosis, arthritis and a number of other diseases. As this number continues to rise, so does the need for low cost, broad use quantitative assays to detect their activity and can be translated to the clinic in the hospital or in low resource settings. Multiplex cathepsin zymography is one such assay that detects subnanomolar levels of active cathepsins K, L, S, and V in cell or tissue preparations observed as cleared bands of proteolytic activity after gelatin substrate SDS-PAGE with conditions optimal for cathepsin renaturing and activity. Densitometric analysis of the zymogram provides quantitative information from this low cost assay. After systematic modifications to optimize cathepsin zymography, we describe reduced electrophoresis time from 2 hours to 10 minutes, incubation assay time from overnight to 4 hours, and reduced minimal tissue protein necessary while maintaining sensitive detection limits; an evaluation of the pros and cons of each modification is also included. We further describe image acquisition by smartphone camera, export to Matlab, and densitometric analysis code to quantify and report cathepsin activity, adding portability and replacing large scale, darkbox imaging equipment that could be cost prohibitive in limited resource settings. PMID:23532386

  15. TIMPs of parasitic helminths - a large-scale analysis of high-throughput sequence datasets.

    Science.gov (United States)

    Cantacessi, Cinzia; Hofmann, Andreas; Pickering, Darren; Navarro, Severine; Mitreva, Makedonka; Loukas, Alex

    2013-05-30

    Tissue inhibitors of metalloproteases (TIMPs) are a multifunctional family of proteins that orchestrate extracellular matrix turnover, tissue remodelling and other cellular processes. In parasitic helminths, such as hookworms, TIMPs have been proposed to play key roles in the host-parasite interplay, including invasion of and establishment in the vertebrate animal hosts. Currently, knowledge of helminth TIMPs is limited to a small number of studies on canine hookworms, whereas no information is available on the occurrence of TIMPs in other parasitic helminths causing neglected diseases. In the present study, we conducted a large-scale investigation of TIMP proteins of a range of neglected human parasites including the hookworm Necator americanus, the roundworm Ascaris suum, the liver flukes Clonorchis sinensis and Opisthorchis viverrini, as well as the schistosome blood flukes. This entailed mining available transcriptomic and/or genomic sequence datasets for the presence of homologues of known TIMPs, predicting secondary structures of defined protein sequences, systematic phylogenetic analyses and assessment of differential expression of genes encoding putative TIMPs in the developmental stages of A. suum, N. americanus and Schistosoma haematobium which infect the mammalian hosts. A total of 15 protein sequences with high homology to known eukaryotic TIMPs were predicted from the complement of sequence data available for parasitic helminths and subjected to in-depth bioinformatic analyses. Supported by the availability of gene manipulation technologies such as RNA interference and/or transgenesis, this work provides a basis for future functional explorations of helminth TIMPs and, in particular, of their role/s in fundamental biological pathways linked to long-term establishment in the vertebrate hosts, with a view towards the development of novel approaches for the control of neglected helminthiases.

  16. Statistical segmentation of multidimensional brain datasets

    Science.gov (United States)

    Desco, Manuel; Gispert, Juan D.; Reig, Santiago; Santos, Andres; Pascau, Javier; Malpica, Norberto; Garcia-Barreno, Pedro

    2001-07-01

    This paper presents an automatic segmentation procedure for MRI neuroimages that overcomes part of the problems involved in multidimensional clustering techniques like partial volume effects (PVE), processing speed and difficulty of incorporating a priori knowledge. The method is a three-stage procedure: 1) Exclusion of background and skull voxels using threshold-based region growing techniques with fully automated seed selection. 2) Expectation Maximization algorithms are used to estimate the probability density function (PDF) of the remaining pixels, which are assumed to be mixtures of gaussians. These pixels can then be classified into cerebrospinal fluid (CSF), white matter and grey matter. Using this procedure, our method takes advantage of using the full covariance matrix (instead of the diagonal) for the joint PDF estimation. On the other hand, logistic discrimination techniques are more robust against violation of multi-gaussian assumptions. 3) A priori knowledge is added using Markov Random Field techniques. The algorithm has been tested with a dataset of 30 brain MRI studies (co-registered T1 and T2 MRI). Our method was compared with clustering techniques and with template-based statistical segmentation, using manual segmentation as a gold-standard. Our results were more robust and closer to the gold-standard.

  17. ASSESSING SMALL SAMPLE WAR-GAMING DATASETS

    Directory of Open Access Journals (Sweden)

    W. J. HURLEY

    2013-10-01

    Full Text Available One of the fundamental problems faced by military planners is the assessment of changes to force structure. An example is whether to replace an existing capability with an enhanced system. This can be done directly with a comparison of measures such as accuracy, lethality, survivability, etc. However this approach does not allow an assessment of the force multiplier effects of the proposed change. To gauge these effects, planners often turn to war-gaming. For many war-gaming experiments, it is expensive, both in terms of time and dollars, to generate a large number of sample observations. This puts a premium on the statistical methodology used to examine these small datasets. In this paper we compare the power of three tests to assess population differences: the Wald-Wolfowitz test, the Mann-Whitney U test, and re-sampling. We employ a series of Monte Carlo simulation experiments. Not unexpectedly, we find that the Mann-Whitney test performs better than the Wald-Wolfowitz test. Resampling is judged to perform slightly better than the Mann-Whitney test.

  18. The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-02-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.

  19. The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-01-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585

  20. Technical note: Space-time analysis of rainfall extremes in Italy: clues from a reconciled dataset

    Science.gov (United States)

    Libertino, Andrea; Ganora, Daniele; Claps, Pierluigi

    2018-05-01

    Like other Mediterranean areas, Italy is prone to the development of events with significant rainfall intensity, lasting for several hours. The main triggering mechanisms of these events are quite well known, but the aim of developing rainstorm hazard maps compatible with their actual probability of occurrence is still far from being reached. A systematic frequency analysis of these occasional highly intense events would require a complete countrywide dataset of sub-daily rainfall records, but this kind of information was still lacking for the Italian territory. In this work several sources of data are gathered, for assembling the first comprehensive and updated dataset of extreme rainfall of short duration in Italy. The resulting dataset, referred to as the Italian Rainfall Extreme Dataset (I-RED), includes the annual maximum rainfalls recorded in 1 to 24 consecutive hours from more than 4500 stations across the country, spanning the period between 1916 and 2014. A detailed description of the spatial and temporal coverage of the I-RED is presented, together with an exploratory statistical analysis aimed at providing preliminary information on the climatology of extreme rainfall at the national scale. Due to some legal restrictions, the database can be provided only under certain conditions. Taking into account the potentialities emerging from the analysis, a description of the ongoing and planned future work activities on the database is provided.

  1. Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

    Science.gov (United States)

    Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

    2016-04-01

    suitable for CPV systems. For these systems, the TMY datasets obtained using dedicated drivers (DNI only or more precise one) are more representative to derive TMY datasets from limited long-term meteorological dataset.

  2. REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

    Science.gov (United States)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history

  3. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  4. BIA Indian Lands Dataset (Indian Lands of the United States)

    Data.gov (United States)

    Federal Geographic Data Committee — The American Indian Reservations / Federally Recognized Tribal Entities dataset depicts feature location, selected demographics and other associated data for the 561...

  5. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  6. Socioeconomic Data and Applications Center (SEDAC) Treaty Status Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Socioeconomic Data and Application Center (SEDAC) Treaty Status Dataset contains comprehensive treaty information for multilateral environmental agreements,...

  7. A method for generating large datasets of organ geometries for radiotherapy treatment planning studies

    International Nuclear Information System (INIS)

    Hu, Nan; Cerviño, Laura; Segars, Paul; Lewis, John; Shan, Jinlu; Jiang, Steve; Zheng, Xiaolin; Wang, Ge

    2014-01-01

    With the rapidly increasing application of adaptive radiotherapy, large datasets of organ geometries based on the patient’s anatomy are desired to support clinical application or research work, such as image segmentation, re-planning, and organ deformation analysis. Sometimes only limited datasets are available in clinical practice. In this study, we propose a new method to generate large datasets of organ geometries to be utilized in adaptive radiotherapy. Given a training dataset of organ shapes derived from daily cone-beam CT, we align them into a common coordinate frame and select one of the training surfaces as reference surface. A statistical shape model of organs was constructed, based on the establishment of point correspondence between surfaces and non-uniform rational B-spline (NURBS) representation. A principal component analysis is performed on the sampled surface points to capture the major variation modes of each organ. A set of principal components and their respective coefficients, which represent organ surface deformation, were obtained, and a statistical analysis of the coefficients was performed. New sets of statistically equivalent coefficients can be constructed and assigned to the principal components, resulting in a larger geometry dataset for the patient’s organs. These generated organ geometries are realistic and statistically representative

  8. An innovative privacy preserving technique for incremental datasets on cloud computing.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

    2016-08-01

    Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  10. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  11. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  12. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  13. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to

  14. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  15. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  16. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    Full Text Available Abstract Background The spatial modeling of infectious disease distributions and dynamics is increasingly being undertaken for health services planning and disease control monitoring, implementation, and evaluation. Where risks are heterogeneous in space or dependent on person-to-person transmission, spatial data on human population distributions are required to estimate infectious disease risks, burdens, and dynamics. Several different modeled human population distribution datasets are available and widely used, but the disparities among them and the implications for enumerating disease burdens and populations at risk have not been considered systematically. Here, we quantify some of these effects using global estimates of populations at risk (PAR of P. falciparum malaria as an example. Methods The recent construction of a global map of P. falciparum malaria endemicity enabled the testing of different gridded population datasets for providing estimates of PAR by endemicity class. The estimated population numbers within each class were calculated for each country using four different global gridded human population datasets: GRUMP (~1 km spatial resolution, LandScan (~1 km, UNEP Global Population Databases (~5 km, and GPW3 (~5 km. More detailed assessments of PAR variation and accuracy were conducted for three African countries where census data were available at a higher administrative-unit level than used by any of the four gridded population datasets. Results The estimates of PAR based on the datasets varied by more than 10 million people for some countries, even accounting for the fact that estimates of population totals made by different agencies are used to correct national totals in these datasets and can vary by more than 5% for many low-income countries. In many cases, these variations in PAR estimates comprised more than 10% of the total national population. The detailed country-level assessments suggested that none of the datasets was

  17. Systematic review and meta-analysis of randomized controlled trials for the management of limited vertical height in the posterior region: short implants (5 to 8 mm) vs longer implants (> 8 mm) in vertically augmented sites.

    Science.gov (United States)

    Lee, Sung-Ah; Lee, Chun-Teh; Fu, Martin M; Elmisalati, Waiel; Chuang, Sung-Kiang

    2014-01-01

    The aim of this study was to undertake a systematic review with meta-analysis on randomized controlled trials (RCTs) to compare the rates of survival, success, and complications of short implants to those of longer implants in the posterior regions. Electronic literature searches were conducted through the MEDLINE (PubMed) and EMBASE databases to locate all relevant articles published between January 1, 1990, and April 30, 2013. Eligible studies were selected based on inclusion criteria, and quality assessments were conducted. After data extraction, meta-analyses were performed. In total, 539 dental implants (265 short implants [length 5 to 8 mm] and 274 control implants [length > 8 mm]) from four RCTs were included. The fixed prostheses of multiple short and control implants were all splinted. The mean follow-up period was 2.1 years. The 1-year and 5-year cumulative survival rates (CSR) were 98.7% (95% confidence interval [CI], 97.8% to 99.5%) and 93.6% (95% CI, 89.8% to 97.5%), respectively, for the short implant group and 98.0% (95% CI, 96.9% to 99.1%) and 90.3% (95% CI, 85.2% to 95.4%), respectively, for the control implant group. The CSRs of the two groups did not demonstrate a statistically significant difference. There were also no statistically significant differences in success rates, failure rates, or complications between the two groups. Placement of short dental implants could be a predictable alternative to longer implants to reduce surgical complications and patient morbidity in situations where vertical augmentation procedures are needed. However, only four studies with potential risk of bias were selected in this meta-analysis. Within the limitations of this meta-analysis, these results should be confirmed with robust methodology and RCTs with longer follow-up duration.

  18. HLA diversity in the 1000 genomes dataset.

    Directory of Open Access Journals (Sweden)

    Pierre-Antoine Gourraud

    Full Text Available The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC, only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.

  19. Deep-learning derived features for lung nodule classification with limited datasets

    Science.gov (United States)

    Thammasorn, P.; Wu, W.; Pierce, L. A.; Pipavath, S. N.; Lampe, P. D.; Houghton, A. M.; Haynor, D. R.; Chaovalitwongse, W. A.; Kinahan, P. E.

    2018-02-01

    Only a few percent of indeterminate nodules found in lung CT images are cancer. However, enabling earlier diagnosis is important to avoid invasive procedures or long-time surveillance to those benign nodules. We are evaluating a classification framework using radiomics features derived with a machine learning approach from a small data set of indeterminate CT lung nodule images. We used a retrospective analysis of 194 cases with pulmonary nodules in the CT images with or without contrast enhancement from lung cancer screening clinics. The nodules were contoured by a radiologist and texture features of the lesion were calculated. In addition, sematic features describing shape were categorized. We also explored a Multiband network, a feature derivation path that uses a modified convolutional neural network (CNN) with a Triplet Network. This was trained to create discriminative feature representations useful for variable-sized nodule classification. The diagnostic accuracy was evaluated for multiple machine learning algorithms using texture, shape, and CNN features. In the CT contrast-enhanced group, the texture or semantic shape features yielded an overall diagnostic accuracy of 80%. Use of a standard deep learning network in the framework for feature derivation yielded features that substantially underperformed compared to texture and/or semantic features. However, the proposed Multiband approach of feature derivation produced results similar in diagnostic accuracy to the texture and semantic features. While the Multiband feature derivation approach did not outperform the texture and/or semantic features, its equivalent performance indicates promise for future improvements to increase diagnostic accuracy. Importantly, the Multiband approach adapts readily to different size lesions without interpolation, and performed well with relatively small amount of training data.

  20. A simplified Excel tool for implementation of RUSLE2 in vineyards for stakeholders with limited dataset

    Science.gov (United States)

    Gomez, Jose Alfonso; Biddoccu, Marcella; Guzmán, Gema; Cavallo, Eugenio

    2016-04-01

    Analysis with simulation models is in many situations the only way to evaluate the impact of changes in soil management on soil erosion risk, and the Revised Universal Soil Loss Equation RUSLE (Renard et al. 1997, Dabney et al. 2012) remains as the most widely used. Even with their relative simplicity compared to other, more process based, erosion models proper RUSLE calibration for a given situation outside the modelling community can be challenging, especially in situations outside of those widely covered in the USA. An approach pursued by Gómez et al. (2003) to overcome this problems for calibrating RUSLE, specially the cover-management factor, C, was to build a summary model using the equations defined by the RUSLE manual (Renard et al. 1997) but considering that the basic information required to calibrate the subfactors, such as soil surface roughness and ground cover, soil moisture, … were calculated (or taken from available sources) elsewhere and added to the summary model instead of calculated by the RUSLE software. This strategy simplified the calibration process as well as the understanding and interpretation of the RUSLE parameters and model behavior by on-expert users for its application in olive orchards under a broad range of management conditions. Gómez et al. (2003) build this summary model in Excel and demonstrated the ability to calibrate RUSLE for a broad range of management conditions. Later on several studies (Vanwalleghem et al., 2011, Marin, 2013) demonstrated how this summary model successfully predicted soil losses at hillslope scale close to those determined experimentally. Vines are one of the most extended tree crops covering a wide range of environmental and management conditions, and conceptually present in terms of soil conservation several analogies with olives especially in relation to soil management (Gomez et al., 2011). In vine growing areas, besides topographic and rainfall characteristics, the soil management practices adopted in vineyards could favor erosion. Cultivation with rows running up-and-down the slope on sloping vineyards, maintenance of bare soil, compaction due to high traffic of machinery are some of the vineyard's management practices that expose soil to degradation, favoring runoff and soil erosion processes. On the other side, the adoption of grass cover in vineyards has a fundamental role in soil protection against erosion, in case of high rainfall intensity and erosivity. This communication presents a preliminary version of a summary model to calibrate RUSLE for vines under different soil management options following an approach analogous to that used by Gómez et al. (2003) for olive orchards in a simplified situation of an homogeneous hillslope, including the latest RUSLE conceptual updates (RUSLE2, Dabney et al., 2012). It also presents preliminary results for different values of the C factor under different soil management and environmental conditions, as well as its impact on predicted soil losses in the long term in vineyards located in Southern Spain and N Italy. Keywords: vines, erosion, soil management, RUSLE, model. References Dabney, S.M. Yoder, D.C. Yoder, Vieira, D.A.N. 2012. The application of the Revised Universal Soil Loss Equation, Version 2, to evaluate the impacts of alternative climate change scenarios on runoff and sediment yield. Journal of Soil and Water Conservation 67: 343 - 353. Gómez, J.A., Battany, M., Renschler, C.S., Fereres, E. 2003. Evaluating the impact of soil management on soil loss in olive orchards. Soil Use Manage. 19: 127- 134. Gómez, J.A., Llewellyn, C., Basch, G, Sutton, P.B., Dyson, J.S., Jones, C.A. 2011. The effects of cover crops and conventional tillage on soil and runoff loss in vineyards and olive groves in several Mediterranean countries. Soil Use and Management 27 502 - 514 Marín, V. 2013. Interfaz gráfica para la valoración de la pérdida de suelo en parcelas de olivar. Final Degree project. University of Cordoba. Vanwalleghem, T., Infante, J.A., González, M., Soto, D., Gómez, J.A. 2011. Quantifying the effect of historical soil management on soil erosion rates in Mediterranean olive orchards. Agriculture, Ecosystems & Environment 142: 341-351.

  1. Validity and reliability of stillbirth data using linked self-reported and administrative datasets.

    Science.gov (United States)

    Hure, Alexis J; Chojenta, Catherine L; Powers, Jennifer R; Byles, Julie E; Loxton, Deborah

    2015-01-01

    A high rate of stillbirth was previously observed in the Australian Longitudinal Study of Women's Health (ALSWH). Our primary objective was to test the validity and reliability of self-reported stillbirth data linked to state-based administrative datasets. Self-reported data, collected as part of the ALSWH cohort born in 1973-1978, were linked to three administrative datasets for women in New South Wales, Australia (n = 4374): the Midwives Data Collection; Admitted Patient Data Collection; and Perinatal Death Review Database. Linkages were obtained from the Centre for Health Record Linkage for the period 1996-2009. True cases of stillbirth were defined by being consistently recorded in two or more independent data sources. Sensitivity, specificity, positive predictive value, negative predictive value, percent agreement, and kappa statistics were calculated for each dataset. Forty-nine women reported 53 stillbirths. No dataset was 100% accurate. The administrative datasets performed better than self-reported data, with high accuracy and agreement. Self-reported data showed high sensitivity (100%) but low specificity (30%), meaning women who had a stillbirth always reported it, but there was also over-reporting of stillbirths. About half of the misreported cases in the ALSWH were able to be removed by identifying inconsistencies in longitudinal data. Data linkage provides great opportunity to assess the validity and reliability of self-reported study data. Conversely, self-reported study data can help to resolve inconsistencies in administrative datasets. Quantifying the strengths and limitations of both self-reported and administrative data can improve epidemiological research, especially by guiding methods and interpretation of findings.

  2. Lunar Meteorites: A Global Geochemical Dataset

    Science.gov (United States)

    Zeigler, R. A.; Joy, K. H.; Arai, T.; Gross, J.; Korotev, R. L.; McCubbin, F. M.

    2017-01-01

    To date, the world's meteorite collections contain over 260 lunar meteorite stones representing at least 120 different lunar meteorites. Additionally, there are 20-30 as yet unnamed stones currently in the process of being classified. Collectively these lunar meteorites likely represent 40-50 distinct sampling locations from random locations on the Moon. Although the exact provenance of each individual lunar meteorite is unknown, collectively the lunar meteorites represent the best global average of the lunar crust. The Apollo sites are all within or near the Procellarum KREEP Terrane (PKT), thus lithologies from the PKT are overrepresented in the Apollo sample suite. Nearly all of the lithologies present in the Apollo sample suite are found within the lunar meteorites (high-Ti basalts are a notable exception), and the lunar meteorites contain several lithologies not present in the Apollo sample suite (e.g., magnesian anorthosite). This chapter will not be a sample-by-sample summary of each individual lunar meteorite. Rather, the chapter will summarize the different types of lunar meteorites and their relative abundances, comparing and contrasting the lunar meteorite sample suite with the Apollo sample suite. This chapter will act as one of the introductory chapters to the volume, introducing lunar samples in general and setting the stage for more detailed discussions in later more specialized chapters. The chapter will begin with a description of how lunar meteorites are ejected from the Moon, how deep samples are being excavated from, what the likely pairing relationships are among the lunar meteorite samples, and how the lunar meteorites can help to constrain the impactor flux in the inner solar system. There will be a discussion of the biases inherent to the lunar meteorite sample suite in terms of underrepresented lithologies or regions of the Moon, and an examination of the contamination and limitations of lunar meteorites due to terrestrial weathering. The

  3. Hydrological simulation of the Brahmaputra basin using global datasets

    Science.gov (United States)

    Bhattacharya, Biswa; Conway, Crystal; Craven, Joanne; Masih, Ilyas; Mazzolini, Maurizio; Shrestha, Shreedeepy; Ugay, Reyne; van Andel, Schalk Jan

    2017-04-01

    Brahmaputra River flows through China, India and Bangladesh to the Bay of Bengal and is one of the largest rivers of the world with a catchment size of 580K km2. The catchment is largely hilly and/or forested with sparse population and with limited urbanisation and economic activities. The catchment experiences heavy monsoon rainfall leading to very high flood discharges. Large inter-annual variation of discharge leading to flooding, erosion and morphological changes are among the major challenges. The catchment is largely ungauged; moreover, limited availability of hydro-meteorological data limits the possibility of carrying out evidence based research, which could provide trustworthy information for managing and when needed, controlling, the basin processes by the riparian countries for overall basin development. The paper presents initial results of a current research project on Brahmaputra basin. A set of hydrological and hydraulic models (SWAT, HMS, RAS) are developed by employing publicly available datasets of DEM, land use and soil and simulated using satellite based rainfall products, evapotranspiration and temperature estimates. Remotely sensed data are compared with sporadically available ground data. The set of models are able to produce catchment wide hydrological information that potentially can be used in the future in managing the basin's water resources. The model predications should be used with caution due to high level of uncertainty because the semi-calibrated models are developed with uncertain physical representation (e.g. cross-section) and simulated with global meteorological forcing (e.g. TRMM) with limited validation. Major scientific challenges are seen in producing robust information that can be reliably used in managing the basin. The information generated by the models are uncertain and as a result, instead of using them per se, they are used in improving the understanding of the catchment, and by running several scenarios with varying

  4. “Controlled, cross-species dataset for exploring biases in genome annotation and modification profiles”

    Directory of Open Access Journals (Sweden)

    Alison McAfee

    2015-12-01

    Full Text Available Since the sequencing of the honey bee genome, proteomics by mass spectrometry has become increasingly popular for biological analyses of this insect; but we have observed that the number of honey bee protein identifications is consistently low compared to other organisms [1]. In this dataset, we use nanoelectrospray ionization-coupled liquid chromatography–tandem mass spectrometry (nLC–MS/MS to systematically investigate the root cause of low honey bee proteome coverage. To this end, we present here data from three key experiments: a controlled, cross-species analyses of samples from Apis mellifera, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Mus musculus and Homo sapiens; a proteomic analysis of an individual honey bee whose genome was also sequenced; and a cross-tissue honey bee proteome comparison. The cross-species dataset was interrogated to determine relative proteome coverages between species, and the other two datasets were used to search for polymorphic sequences and to compare protein cleavage profiles, respectively.

  5. Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control

    Science.gov (United States)

    Kirwan, Jennifer A; Weber, Ralf J M; Broadhurst, David I; Viant, Mark R

    2014-01-01

    Direct-infusion mass spectrometry (DIMS) metabolomics is an important approach for characterising molecular responses of organisms to disease, drugs and the environment. Increasingly large-scale metabolomics studies are being conducted, necessitating improvements in both bioanalytical and computational workflows to maintain data quality. This dataset represents a systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts. It comprises of twenty biological samples (cow vs. sheep) that were analysed repeatedly, in 8 batches across 7 days, together with a concurrent set of quality control (QC) samples. Data are presented from each step of the workflow and are available in MetaboLights. The strength of the dataset is that intra- and inter-batch variation can be corrected using QC spectra and the quality of this correction assessed independently using the repeatedly-measured biological samples. Originally designed to test the efficacy of a batch-correction algorithm, it will enable others to evaluate novel data processing algorithms. Furthermore, this dataset serves as a benchmark for DIMS metabolomics, derived using best-practice workflows and rigorous quality assessment. PMID:25977770

  6. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  7. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  8. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  9. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  10. Bayes classifiers for imbalanced traffic accidents datasets.

    Science.gov (United States)

    Mujalli, Randa Oqab; López, Griselda; Garach, Laura

    2016-03-01

    Traffic accidents data sets are usually imbalanced, where the number of instances classified under the killed or severe injuries class (minority) is much lower than those classified under the slight injuries class (majority). This, however, supposes a challenging problem for classification algorithms and may cause obtaining a model that well cover the slight injuries instances whereas the killed or severe injuries instances are misclassified frequently. Based on traffic accidents data collected on urban and suburban roads in Jordan for three years (2009-2011); three different data balancing techniques were used: under-sampling which removes some instances of the majority class, oversampling which creates new instances of the minority class and a mix technique that combines both. In addition, different Bayes classifiers were compared for the different imbalanced and balanced data sets: Averaged One-Dependence Estimators, Weightily Average One-Dependence Estimators, and Bayesian networks in order to identify factors that affect the severity of an accident. The results indicated that using the balanced data sets, especially those created using oversampling techniques, with Bayesian networks improved classifying a traffic accident according to its severity and reduced the misclassification of killed and severe injuries instances. On the other hand, the following variables were found to contribute to the occurrence of a killed causality or a severe injury in a traffic accident: number of vehicles involved, accident pattern, number of directions, accident type, lighting, surface condition, and speed limit. This work, to the knowledge of the authors, is the first that aims at analyzing historical data records for traffic accidents occurring in Jordan and the first to apply balancing techniques to analyze injury severity of traffic accidents. Copyright © 2015 Elsevier Ltd. All rights reserved.

  11. Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

    Science.gov (United States)

    Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

    2015-09-01

    Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the

  12. Dimension Reduction Aided Hyperspectral Image Classification with a Small-sized Training Dataset: Experimental Comparisons

    Directory of Open Access Journals (Sweden)

    Jinya Su

    2017-11-01

    Full Text Available Hyperspectral images (HSI provide rich information which may not be captured by other sensing technologies and therefore gradually find a wide range of applications. However, they also generate a large amount of irrelevant or redundant data for a specific task. This causes a number of issues including significantly increased computation time, complexity and scale of prediction models mapping the data to semantics (e.g., classification, and the need of a large amount of labelled data for training. Particularly, it is generally difficult and expensive for experts to acquire sufficient training samples in many applications. This paper addresses these issues by exploring a number of classical dimension reduction algorithms in machine learning communities for HSI classification. To reduce the size of training dataset, feature selection (e.g., mutual information, minimal redundancy maximal relevance and feature extraction (e.g., Principal Component Analysis (PCA, Kernel PCA are adopted to augment a baseline classification method, Support Vector Machine (SVM. The proposed algorithms are evaluated using a real HSI dataset. It is shown that PCA yields the most promising performance in reducing the number of features or spectral bands. It is observed that while significantly reducing the computational complexity, the proposed method can achieve better classification results over the classic SVM on a small training dataset, which makes it suitable for real-time applications or when only limited training data are available. Furthermore, it can also achieve performances similar to the classic SVM on large datasets but with much less computing time.

  13. Being an honest broker of hydrology: Uncovering, communicating and addressing model error in a climate change streamflow dataset

    Science.gov (United States)

    Chegwidden, O.; Nijssen, B.; Pytlak, E.

    2017-12-01

    Any model simulation has errors, including errors in meteorological data, process understanding, model structure, and model parameters. These errors may express themselves as bias, timing lags, and differences in sensitivity between the model and the physical world. The evaluation and handling of these errors can greatly affect the legitimacy, validity and usefulness of the resulting scientific product. In this presentation we will discuss a case study of handling and communicating model errors during the development of a hydrologic climate change dataset for the Pacific Northwestern United States. The dataset was the result of a four-year collaboration between the University of Washington, Oregon State University, the Bonneville Power Administration, the United States Army Corps of Engineers and the Bureau of Reclamation. Along the way, the partnership facilitated the discovery of multiple systematic errors in the streamflow dataset. Through an iterative review process, some of those errors could be resolved. For the errors that remained, honest communication of the shortcomings promoted the dataset's legitimacy. Thoroughly explaining errors also improved ways in which the dataset would be used in follow-on impact studies. Finally, we will discuss the development of the "streamflow bias-correction" step often applied to climate change datasets that will be used in impact modeling contexts. We will describe the development of a series of bias-correction techniques through close collaboration among universities and stakeholders. Through that process, both universities and stakeholders learned about the others' expectations and workflows. This mutual learning process allowed for the development of methods that accommodated the stakeholders' specific engineering requirements. The iterative revision process also produced a functional and actionable dataset while preserving its scientific merit. We will describe how encountering earlier techniques' pitfalls allowed us

  14. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  15. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  16. Karna Particle Size Dataset for Tables and Figures

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains 1) table of bulk Pb-XAS LCF results, 2) table of bulk As-XAS LCF results, 3) figure data of particle size distribution, and 4) figure data for...

  17. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  18. National Hydrography Dataset (NHD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS National Hydrography Dataset (NHD) Downloadable Data Collection from The National Map (TNM) is a comprehensive set of digital spatial data that encodes...

  19. Watershed Boundary Dataset (WBD) - USGS National Map Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  20. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  1. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  2. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  3. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  4. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  5. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  6. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  7. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  8. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  9. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  10. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  11. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  12. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  13. Seeing the forests and the trees--innovative approaches to exploring heterogeneity in systematic reviews of complex interventions to enhance health system decision-making: a protocol.

    Science.gov (United States)

    Ivers, Noah; Tricco, Andrea C; Trikalinos, Thomas A; Dahabreh, Issa J; Danko, Kristin J; Moher, David; Straus, Sharon E; Lavis, John N; Yu, Catherine H; Shojania, Kaveh; Manns, Braden; Tonelli, Marcello; Ramsay, Timothy; Edwards, Alun; Sargious, Peter; Paprica, Alison; Hillmer, Michael; Grimshaw, Jeremy M

    2014-08-12

    To improve quality of care and patient outcomes, health system decision-makers need to identify and implement effective interventions. An increasing number of systematic reviews document the effects of quality improvement programs to assist decision-makers in developing new initiatives. However, limitations in the reporting of primary studies and current meta-analysis methods (including approaches for exploring heterogeneity) reduce the utility of existing syntheses for health system decision-makers. This study will explore the role of innovative meta-analysis approaches and the added value of enriched and updated data for increasing the utility of systematic reviews of complex interventions. We will use the dataset from our recent systematic review of 142 randomized trials of diabetes quality improvement programs to evaluate novel approaches for exploring heterogeneity. These will include exploratory methods, such as multivariate meta-regression analyses and all-subsets combinatorial meta-analysis. We will then update our systematic review to include new trials and enrich the dataset by surveying authors of all included trials. In doing so, we will explore the impact of variables not, reported in previous publications, such as details of study context, on the effectiveness of the intervention. We will use innovative analytical methods on the enriched and updated dataset to identify key success factors in the implementation of quality improvement interventions for diabetes. Decision-makers will be involved throughout to help identify and prioritize variables to be explored and to aid in the interpretation and dissemination of results. This study will inform future systematic reviews of complex interventions and describe the value of enriching and updating data for exploring heterogeneity in meta-analysis. It will also result in an updated comprehensive systematic review of diabetes quality improvement interventions that will be useful to health system decision

  14. Current limiters

    Energy Technology Data Exchange (ETDEWEB)

    Loescher, D.H. [Sandia National Labs., Albuquerque, NM (United States). Systems Surety Assessment Dept.; Noren, K. [Univ. of Idaho, Moscow, ID (United States). Dept. of Electrical Engineering

    1996-09-01

    The current that flows between the electrical test equipment and the nuclear explosive must be limited to safe levels during electrical tests conducted on nuclear explosives at the DOE Pantex facility. The safest way to limit the current is to use batteries that can provide only acceptably low current into a short circuit; unfortunately this is not always possible. When it is not possible, current limiters, along with other design features, are used to limit the current. Three types of current limiters, the fuse blower, the resistor limiter, and the MOSFET-pass-transistor limiters, are used extensively in Pantex test equipment. Detailed failure mode and effects analyses were conducted on these limiters. Two other types of limiters were also analyzed. It was found that there is no best type of limiter that should be used in all applications. The fuse blower has advantages when many circuits must be monitored, a low insertion voltage drop is important, and size and weight must be kept low. However, this limiter has many failure modes that can lead to the loss of over current protection. The resistor limiter is simple and inexpensive, but is normally usable only on circuits for which the nominal current is less than a few tens of milliamperes. The MOSFET limiter can be used on high current circuits, but it has a number of single point failure modes that can lead to a loss of protective action. Because bad component placement or poor wire routing can defeat any limiter, placement and routing must be designed carefully and documented thoroughly.

  15. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  16. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  17. An Improved Method for Producing High Spatial-Resolution NDVI Time Series Datasets with Multi-Temporal MODIS NDVI Data and Landsat TM/ETM+ Images

    OpenAIRE

    Rao, Yuhan; Zhu, Xiaolin; Chen, Jin; Wang, Jianmin

    2015-01-01

    Due to technical limitations, it is impossible to have high resolution in both spatial and temporal dimensions for current NDVI datasets. Therefore, several methods are developed to produce high resolution (spatial and temporal) NDVI time-series datasets, which face some limitations including high computation loads and unreasonable assumptions. In this study, an unmixing-based method, NDVI Linear Mixing Growth Model (NDVI-LMGM), is proposed to achieve the goal of accurately and efficiently bl...

  18. Quench limits

    International Nuclear Information System (INIS)

    Sapinski, M.

    2012-01-01

    With thirteen beam induced quenches and numerous Machine Development tests, the current knowledge of LHC magnets quench limits still contains a lot of unknowns. Various approaches to determine the quench limits are reviewed and results of the tests are presented. Attempt to reconstruct a coherent picture emerging from these results is taken. The available methods of computation of the quench levels are presented together with dedicated particle shower simulations which are necessary to understand the tests. The future experiments, needed to reach better understanding of quench limits as well as limits for the machine operation are investigated. The possible strategies to set BLM (Beam Loss Monitor) thresholds are discussed. (author)

  19. An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

    Directory of Open Access Journals (Sweden)

    Hubbard Alan E

    2010-06-01

    Full Text Available Abstract Background As computational power improves, the application of more advanced machine learning techniques to the analysis of large genome-wide association (GWA datasets becomes possible. While most traditional statistical methods can only elucidate main effects of genetic variants on risk for disease, certain machine learning approaches are particularly suited to discover higher order and non-linear effects. One such approach is the Random Forests (RF algorithm. The use of RF for SNP discovery related to human disease has grown in recent years; however, most work has focused on small datasets or simulation studies which are limited. Results Using a multiple sclerosis (MS case-control dataset comprised of 300 K SNP genotypes across the genome, we outline an approach and some considerations for optimally tuning the RF algorithm based on the empirical dataset. Importantly, results show that typical default parameter values are not appropriate for large GWA datasets. Furthermore, gains can be made by sub-sampling the data, pruning based on linkage disequilibrium (LD, and removing strong effects from RF analyses. The new RF results are compared to findings from the original MS GWA study and demonstrate overlap. In addition, four new interesting candidate MS genes are identified, MPHOSPH9, CTNNA3, PHACTR2 and IL7, by RF analysis and warrant further follow-up in independent studies. Conclusions This study presents one of the first illustrations of successfully analyzing GWA data with a machine learning algorithm. It is shown that RF is computationally feasible for GWA data and the results obtained make biologic sense based on previous studies. More importantly, new genes were identified as potentially being associated with MS, suggesting new avenues of investigation for this complex disease.

  20. Systematic review

    DEFF Research Database (Denmark)

    Enggaard, Helle

    Title: Systematic review a method to promote nursing students skills in Evidence Based Practice Background: Department of nursing educate students to practice Evidence Based Practice (EBP), where clinical decisions is based on the best available evidence, patient preference, clinical experience...... and resources available. In order to incorporate evidence in clinical decisions, nursing students need to learn how to transfer knowledge in order to utilize evidence in clinical decisions. The method of systematic review can be one approach to achieve this in nursing education. Method: As an associate lecturer...... I have taken a Comprehensive Systematic Review Training course provide by Center of Clinical Guidelines in Denmark and Jonna Briggs Institute (JBI) and practice in developing a systematic review on how patients with ischemic heart disease experiences peer support. This insight and experience...

  1. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  2. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    International Nuclear Information System (INIS)

    Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

    2010-01-01

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  3. Enhancing Conservation with High Resolution Productivity Datasets for the Conterminous United States

    Science.gov (United States)

    Robinson, Nathaniel Paul

    Human driven alteration of the earth's terrestrial surface is accelerating through land use changes, intensification of human activity, climate change, and other anthropogenic pressures. These changes occur at broad spatio-temporal scales, challenging our ability to effectively monitor and assess the impacts and subsequent conservation strategies. While satellite remote sensing (SRS) products enable monitoring of the earth's terrestrial surface continuously across space and time, the practical applications for conservation and management of these products are limited. Often the processes driving ecological change occur at fine spatial resolutions and are undetectable given the resolution of available datasets. Additionally, the links between SRS data and ecologically meaningful metrics are weak. Recent advances in cloud computing technology along with the growing record of high resolution SRS data enable the development of SRS products that quantify ecologically meaningful variables at relevant scales applicable for conservation and management. The focus of my dissertation is to improve the applicability of terrestrial gross and net primary productivity (GPP/NPP) datasets for the conterminous United States (CONUS). In chapter one, I develop a framework for creating high resolution datasets of vegetation dynamics. I use the entire archive of Landsat 5, 7, and 8 surface reflectance data and a novel gap filling approach to create spatially continuous 30 m, 16-day composites of the normalized difference vegetation index (NDVI) from 1986 to 2016. In chapter two, I integrate this with other high resolution datasets and the MOD17 algorithm to create the first high resolution GPP and NPP datasets for CONUS. I demonstrate the applicability of these products for conservation and management, showing the improvements beyond currently available products. In chapter three, I utilize this dataset to evaluate the relationships between land ownership and terrestrial production

  4. Dose limits

    International Nuclear Information System (INIS)

    Fitoussi, L.

    1987-12-01

    The dose limit is defined to be the level of harmfulness which must not be exceeded, so that an activity can be exercised in a regular manner without running a risk unacceptable to man and the society. The paper examines the effects of radiation categorised into stochastic and non-stochastic. Dose limits for workers and the public are discussed

  5. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  6. Geostatistical exploration of dataset assessing the heavy metal contamination in Ewekoro limestone, Southwestern Nigeria

    Directory of Open Access Journals (Sweden)

    Kehinde D. Oyeyemi

    2017-10-01

    Full Text Available The dataset for this article contains geostatistical analysis of heavy metals contamination from limestone samples collected from Ewekoro Formation in the eastern Dahomey basin, Ogun State Nigeria. The samples were manually collected and analysed using Microwave Plasma Atomic Absorption Spectrometer (MPAS. Analysis of the twenty different samples showed different levels of heavy metals concentration. The analysed nine elements are Arsenic, Mercury, Cadmium, Cobalt, Chromium, Nickel, Lead, Vanadium and Zinc. Descriptive statistics was used to explore the heavy metal concentrations individually. Pearson, Kendall tau and Spearman rho correlation coefficients was used to establish the relationships among the elements and the analysis of variance showed that there is a significant difference in the mean distribution of the heavy metals concentration within and between the groups of the 20 samples analysed. The dataset can provide insights into the health implications of the contaminants especially when the mean concentration levels of the heavy metals are compared with recommended regulatory limit concentration.

  7. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  8. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  9. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  10. Inverse Limits

    CERN Document Server

    Ingram, WT

    2012-01-01

    Inverse limits provide a powerful tool for constructing complicated spaces from simple ones. They also turn the study of a dynamical system consisting of a space and a self-map into a study of a (likely more complicated) space and a self-homeomorphism. In four chapters along with an appendix containing background material the authors develop the theory of inverse limits. The book begins with an introduction through inverse limits on [0,1] before moving to a general treatment of the subject. Special topics in continuum theory complete the book. Although it is not a book on dynamics, the influen

  11. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  12. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  13. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  14. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  15. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  16. Dataset of herbarium specimens of threatened vascular plants in Catalonia.

    Science.gov (United States)

    Nualart, Neus; Ibáñez, Neus; Luque, Pere; Pedrol, Joan; Vilar, Lluís; Guàrdia, Roser

    2017-01-01

    This data paper describes a specimens' dataset of the Catalonian threatened vascular plants conserved in five public Catalonian herbaria (BC, BCN, HGI, HBIL and MTTE). Catalonia is an administrative region of Spain that includes large autochthon plants diversity and 199 taxa with IUCN threatened categories (EX, EW, RE, CR, EN and VU). This dataset includes 1,618 records collected from 17 th century to nowadays. For each specimen, the species name, locality indication, collection date, collector, ecology and revision label are recorded. More than 94% of the taxa are represented in the herbaria, which evidence the paper of the botanical collections as an essential source of occurrence data.

  17. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  18. Traffic sign classification with dataset augmentation and convolutional neural network

    Science.gov (United States)

    Tang, Qing; Kurnianggoro, Laksono; Jo, Kang-Hyun

    2018-04-01

    This paper presents a method for traffic sign classification using a convolutional neural network (CNN). In this method, firstly we transfer a color image into grayscale, and then normalize it in the range (-1,1) as the preprocessing step. To increase robustness of classification model, we apply a dataset augmentation algorithm and create new images to train the model. To avoid overfitting, we utilize a dropout module before the last fully connection layer. To assess the performance of the proposed method, the German traffic sign recognition benchmark (GTSRB) dataset is utilized. Experimental results show that the method is effective in classifying traffic signs.

  19. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  20. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  1. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  2. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  3. A Snow Density Dataset for Improving Surface Boundary Conditions in Greenland Ice Sheet Firn Modeling

    DEFF Research Database (Denmark)

    S. Fausto, Robert; E. Box, Jason; Vandecrux, Baptiste Robert Marcel

    2018-01-01

    The surface snow density of glaciers and ice sheets is of fundamental importance in converting volume to mass in both altimetry and surface mass balance studies, yet it is often poorly constrained. Site-specific surface snow densities are typically derived from empirical relations based...... on temperature and wind speed. These parameterizations commonly calculate the average density of the top meter of snow, thereby systematically overestimating snow density at the actual surface. Therefore, constraining surface snow density to the top 0.1 m can improve boundary conditions in high-resolution firn......-evolution modeling. We have compiled an extensive dataset of 200 point measurements of surface snow density from firn cores and snow pits on the Greenland ice sheet. We find that surface snow density within 0.1 m of the surface has an average value of 315 kg m−3 with a standard deviation of 44 kg m−3, and has...

  4. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  5. Use of country of birth as an indicator of refugee background in health datasets

    Science.gov (United States)

    2014-01-01

    Background Routine public health databases contain a wealth of data useful for research among vulnerable or isolated groups, who may be under-represented in traditional medical research. Identifying specific vulnerable populations, such as resettled refugees, can be particularly challenging; often country of birth is the sole indicator of whether an individual has a refugee background. The objective of this article was to review strengths and weaknesses of different methodological approaches to identifying resettled refugees and comparison groups from routine health datasets and to propose the application of additional methodological rigour in future research. Discussion Methodological approaches to selecting refugee and comparison groups from existing routine health datasets vary widely and are often explained in insufficient detail. Linked data systems or datasets from specialized refugee health services can accurately select resettled refugee and asylum seeker groups but have limited availability and can be selective. In contrast, country of birth is commonly collected in routine health datasets but a robust method for selecting humanitarian source countries based solely on this information is required. The authors recommend use of national immigration data to objectively identify countries of birth with high proportions of humanitarian entrants, matched by time period to the study dataset. When available, additional migration indicators may help to better understand migration as a health determinant. Methodologically, if multiple countries of birth are combined, the proportion of the sample represented by each country of birth should be included, with sub-analysis of individual countries of birth potentially providing further insights, if population size allows. United Nations-defined world regions provide an objective framework for combining countries of birth when necessary. A comparison group of economic migrants from the same world region may be appropriate

  6. Review of access, licenses and understandability of open datasets used in hydrology research

    Science.gov (United States)

    Falkenroth, Esa; Arheimer, Berit; Lagerbäck Adolphi, Emma

    2015-04-01

    The amount of open data available for hydrology research is continually growing. In the EU-funded project SWITCH-ON (Sharing Water-related Information to Tackle Changes in the Hydrosphere - for Operational Needs), we are addressing water concerns by exploring and exploiting the untapped potential of these new open data. This work is enabled by many ongoing efforts to facilitate the use of open data. For instance, a number of portals (such as the GEOSS Portal and the INSPIRE community geoportal) provide the means to search for such open data sets and open spatial data services. However, in general, the systematic use of available open data is still fairly uncommon in hydrology research. Factors that limits (re)usability of a data set include: (1) accessibility, (2) understandability and (3) licences. If you cannot access the data set, you cannot use if for research. If you cannot understand the data set you cannot use it for research. Finally, if you are not permitted to use the data, you cannot use it for research. Early on in the project, we sent out a questionnaire to our research partners (SMHI, Universita di Bologna, University of Bristol, Technische Universiteit Delft and Technische Universitaet Wien) to find out what data sets they were planning to use in their experiments. The result was a comprehensive list of useful open data sets. Later, this list of data sets was extended with additional information on data sets for planned commercial water-information products and services. With the list of 50 common data sets as a starting point, we reviewed issues related to access, understandability and licence conditions. Regarding access to data sets, a majority of data sets were available through direct internet download via some well-known transfer protocol such as ftp or http. However, several data sets were found to be inaccessible due to server downtime, incorrect links or problems with the host database management system. One possible explanation for this

  7. Using Real Datasets for Interdisciplinary Business/Economics Projects

    Science.gov (United States)

    Goel, Rajni; Straight, Ronald L.

    2005-01-01

    The workplace's global and dynamic nature allows and requires improved approaches for providing business and economics education. In this article, the authors explore ways of enhancing students' understanding of course material by using nontraditional, real-world datasets of particular interest to them. Teaching at a historically Black university,…

  8. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  9. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  10. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  11. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  12. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Crooks, Lucy; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    As part of the QTLMAS XII workshop, a simulated dataset was distributed and participants were invited to submit analyses of the data based on genome-wide association, fine mapping and genomic selection. We have evaluated the findings from the groups that reported fine mapping and genome-wide asso...

  13. The LAMBADA dataset: Word prediction requiring a broad discourse context

    NARCIS (Netherlands)

    Paperno, D.; Kruszewski, G.; Lazaridou, A.; Pham, Q.N.; Bernardi, R.; Pezzelle, S.; Baroni, M.; Boleda, G.; Fernández, R.; Erk, K.; Smith, N.A.

    2016-01-01

    We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the

  14. NEW WEB-BASED ACCESS TO NUCLEAR STRUCTURE DATASETS.

    Energy Technology Data Exchange (ETDEWEB)

    WINCHELL,D.F.

    2004-09-26

    As part of an effort to migrate the National Nuclear Data Center (NNDC) databases to a relational platform, a new web interface has been developed for the dissemination of the nuclear structure datasets stored in the Evaluated Nuclear Structure Data File and Experimental Unevaluated Nuclear Data List.

  15. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  16. Level-1 muon trigger performance with the full 2017 dataset

    CERN Document Server

    CMS Collaboration

    2018-01-01

    This document describes the performance of the CMS Level-1 Muon Trigger with the full dataset of 2017. Efficiency plots are included for each track finder (TF) individually and for the system as a whole. The efficiency is measured to be greater than 90% for all track finders.

  17. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  18. Dataset: Multi Sensor-Orientation Movement Data of Goats

    NARCIS (Netherlands)

    Kamminga, Jacob Wilhelm

    2018-01-01

    This is a labeled dataset. Motion data were collected from six sensor nodes that were fixed with different orientations to a collar around the neck of goats. These six sensor nodes simultaneously, with different orientations, recorded various activities performed by the goat. We recorded the

  19. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  20. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  1. participatory development of a minimum dataset for the khayelitsha ...

    African Journals Online (AJOL)

    This dataset was integrated with data requirements at ... model for defining health information needs at district level. This participatory process has enabled health workers to appraise their .... of reproductive health, mental health, disability and community ... each chose a facilitator and met in between the forum meetings.

  2. Comparision of analysis of the QTLMAS XII common dataset

    DEFF Research Database (Denmark)

    Lund, Mogens Sandø; Sahana, Goutam; de Koning, Dirk-Jan

    2009-01-01

    A dataset was simulated and distributed to participants of the QTLMAS XII workshop who were invited to develop genomic selection models. Each contributing group was asked to describe the model development and validation as well as to submit genomic predictions for three generations of individuals...

  3. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  4. A Hybrid Method for Interpolating Missing Data in Heterogeneous Spatio-Temporal Datasets

    Directory of Open Access Journals (Sweden)

    Min Deng

    2016-02-01

    Full Text Available Space-time interpolation is widely used to estimate missing or unobserved values in a dataset integrating both spatial and temporal records. Although space-time interpolation plays a key role in space-time modeling, existing methods were mainly developed for space-time processes that exhibit stationarity in space and time. It is still challenging to model heterogeneity of space-time data in the interpolation model. To overcome this limitation, in this study, a novel space-time interpolation method considering both spatial and temporal heterogeneity is developed for estimating missing data in space-time datasets. The interpolation operation is first implemented in spatial and temporal dimensions. Heterogeneous covariance functions are constructed to obtain the best linear unbiased estimates in spatial and temporal dimensions. Spatial and temporal correlations are then considered to combine the interpolation results in spatial and temporal dimensions to estimate the missing data. The proposed method is tested on annual average temperature and precipitation data in China (1984–2009. Experimental results show that, for these datasets, the proposed method outperforms three state-of-the-art methods—e.g., spatio-temporal kriging, spatio-temporal inverse distance weighting, and point estimation model of biased hospitals-based area disease estimation methods.

  5. Sparse multivariate measures of similarity between intra-modal neuroimaging datasets

    Directory of Open Access Journals (Sweden)

    Maria J. Rosa

    2015-10-01

    Full Text Available An increasing number of neuroimaging studies are now based on either combining more than one data modality (inter-modal or combining more than one measurement from the same modality (intra-modal. To date, most intra-modal studies using multivariate statistics have focused on differences between datasets, for instance relying on classifiers to differentiate between effects in the data. However, to fully characterize these effects, multivariate methods able to measure similarities between datasets are needed. One classical technique for estimating the relationship between two datasets is canonical correlation analysis (CCA. However, in the context of high-dimensional data the application of CCA is extremely challenging. A recent extension of CCA, sparse CCA (SCCA, overcomes this limitation, by regularizing the model parameters while yielding a sparse solution. In this work, we modify SCCA with the aim of facilitating its application to high-dimensional neuroimaging data and finding meaningful multivariate image-to-image correspondences in intra-modal studies. In particular, we show how the optimal subset of variables can be estimated independently and we look at the information encoded in more than one set of SCCA transformations. We illustrate our framework using Arterial Spin Labelling data to investigate multivariate similarities between the effects of two antipsychotic drugs on cerebral blood flow.

  6. Distributed solar photovoltaic array location and extent dataset for remote sensing object identification

    Science.gov (United States)

    Bradbury, Kyle; Saboo, Raghav; L. Johnson, Timothy; Malof, Jordan M.; Devarajan, Arjun; Zhang, Wuming; M. Collins, Leslie; G. Newell, Richard

    2016-12-01

    Earth-observing remote sensing data, including aerial photography and satellite imagery, offer a snapshot of the world from which we can learn about the state of natural resources and the built environment. The components of energy systems that are visible from above can be automatically assessed with these remote sensing data when processed with machine learning methods. Here, we focus on the information gap in distributed solar photovoltaic (PV) arrays, of which there is limited public data on solar PV deployments at small geographic scales. We created a dataset of solar PV arrays to initiate and develop the process of automatically identifying solar PV locations using remote sensing imagery. This dataset contains the geospatial coordinates and border vertices for over 19,000 solar panels across 601 high-resolution images from four cities in California. Dataset applications include training object detection and other machine learning algorithms that use remote sensing imagery, developing specific algorithms for predictive detection of distributed PV systems, estimating installed PV capacity, and analysis of the socioeconomic correlates of PV deployment.

  7. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  8. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  9. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  10. Systematic review

    DEFF Research Database (Denmark)

    Lødrup, Anders Bergh; Reimer, Christina; Bytzer, Peter

    2013-01-01

    in getting off acid-suppressive medication and partly explain the increase in long-term use of PPI. A number of studies addressing this issue have been published recently. The authors aimed to systematically review the existing evidence of clinically relevant symptoms caused by acid rebound following PPI...

  11. Systematic review

    DEFF Research Database (Denmark)

    Christensen, Troels Dreier; Spindler, Karen-Lise Garm; Palshof, Jesper Andreas

    2016-01-01

    to earlier diagnosis and improved survival. Method: In this paper, we describe the incidence as well as characteristics associated with BM based on a systematic review of the current literature, following the PRISMA guidelines. Results: We show that the incidence of BM in CRC patients ranges from 0.6 to 3...

  12. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  13. BayesMotif: de novo protein sorting motif discovery from impure datasets.

    Science.gov (United States)

    Hu, Jianjun; Zhang, Fan

    2010-01-18

    Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of

  14. Hydrodynamic modelling and global datasets: Flow connectivity and SRTM data, a Bangkok case study.

    Science.gov (United States)

    Trigg, M. A.; Bates, P. B.; Michaelides, K.

    2012-04-01

    The rise in the global interconnected manufacturing supply chains requires an understanding and consistent quantification of flood risk at a global scale. Flood risk is often better quantified (or at least more precisely defined) in regions where there has been an investment in comprehensive topographical data collection such as LiDAR coupled with detailed hydrodynamic modelling. Yet in regions where these data and modelling are unavailable, the implications of flooding and the knock on effects for global industries can be dramatic, as evidenced by the recent floods in Bangkok, Thailand. There is a growing momentum in terms of global modelling initiatives to address this lack of a consistent understanding of flood risk and they will rely heavily on the application of available global datasets relevant to hydrodynamic modelling, such as Shuttle Radar Topography Mission (SRTM) data and its derivatives. These global datasets bring opportunities to apply consistent methodologies on an automated basis in all regions, while the use of coarser scale datasets also brings many challenges such as sub-grid process representation and downscaled hydrology data from global climate models. There are significant opportunities for hydrological science in helping define new, realistic and physically based methodologies that can be applied globally as well as the possibility of gaining new insights into flood risk through analysis of the many large datasets that will be derived from this work. We use Bangkok as a case study to explore some of the issues related to using these available global datasets for hydrodynamic modelling, with particular focus on using SRTM data to represent topography. Research has shown that flow connectivity on the floodplain is an important component in the dynamics of flood flows on to and off the floodplain, and indeed within different areas of the floodplain. A lack of representation of flow connectivity, often due to data resolution limitations, means

  15. City Limits, 2004, East Baton Rouge Parish, Louisiana

    Data.gov (United States)

    Louisiana Geographic Information Center — This is a graphical polygon dataset depicting the polygon boundaries of the incorporated city limits of Baton Rouge, Baker, and Zachary within East Baton Rouge...

  16. Development of Gridded Ensemble Precipitation and Temperature Datasets for the Contiguous United States Plus Hawai'i and Alaska

    Science.gov (United States)

    Newman, A. J.; Clark, M. P.; Nijssen, B.; Wood, A.; Gutmann, E. D.; Mizukami, N.; Longman, R. J.; Giambelluca, T. W.; Cherry, J.; Nowak, K.; Arnold, J.; Prein, A. F.

    2016-12-01

    Gridded precipitation and temperature products are inherently uncertain due to myriad factors. These include interpolation from a sparse observation network, measurement representativeness, and measurement errors. Despite this inherent uncertainty, uncertainty is typically not included, or is a specific addition to each dataset without much general applicability across different datasets. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits their utility to support land surface and hydrologic modeling techniques such as data assimilation, probabilistic forecasting and verification. To address this gap, we have developed a first of its kind gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980-2012 over the United States (including Alaska and Hawaii). A longer, higher resolution version (1970-present, 1/16th degree) has also been implemented to support real-time hydrologic- monitoring and prediction in several regional US domains. We will present the development and evaluation of the dataset, along with initial applications of the dataset for ensemble data assimilation and probabilistic evaluation of high resolution regional climate model simulations. We will also present results on the new high resolution products for Alaska and Hawaii (2 km and 250 m respectively), to complete the first ensemble observation based product suite for the entire 50 states. Finally, we will present plans to improve the ensemble dataset, focusing on efforts to improve the methods used for station interpolation and ensemble generation, as well as methods to fuse station data with numerical weather prediction model output.

  17. Systematic review

    DEFF Research Database (Denmark)

    Bager, Palle; Chauhan, Usha; Greveson, Kay

    2017-01-01

    of evidence is needed and the aim of this article was to systematically review the evidence of IBD advice lines. MATERIALS AND METHODS: A broad systematic literature search was performed to identify relevant studies addressing the effect of advice lines. The process of selection of the retrieved studies...... was undertaken in two phases. In phase one, all abstracts were review by two independent reviewers. In phase two, the full text of all included studies were independently reviewed by two reviewers. The included studies underwent quality assessment and data synthesis. RESULTS: Ten published studies and 10...... congress abstracts were included in the review. The studies were heterogeneous both in scientific quality and in the focus of the study. No rigorous evidence was found to support that advice lines improve disease activity in IBD and correspondingly no studies reported worsening in disease activity. Advice...

  18. Systematic Avocating

    Directory of Open Access Journals (Sweden)

    Jan Green

    2014-12-01

    Full Text Available Feeling obliged to undertake complex research tasks outside core working hours is a common occurrence in academia. Detailed and timely research projects are expected; the creation and defence of sufficient intervals within a crowded working schedule is one concern explored in this short version paper. Merely working longer hours fails to provide a satisfactory solution for individuals experiencing concerns of this nature. Personal effort and drive are utilised and requires the application of mental mustering and systematic procedures. The attitude to research work is treating the task as a hobby conceptualised as avocating. Whilst this provides a personal solution through immersion in the task, this approach should raise concerns for employers. The flexibility of grounded theory is evident and the freedom to draw on various bodies of knowledge provides fresh insight into a problem that occurs in organizations in many sectors experiencing multiple priorities. The application of the core category, systematic avocating, may prove beneficial.

  19. A Snow Density Dataset for Improving Surface Boundary Conditions in Greenland Ice Sheet Firn Modeling

    Directory of Open Access Journals (Sweden)

    Robert S. Fausto

    2018-05-01

    Full Text Available The surface snow density of glaciers and ice sheets is of fundamental importance in converting volume to mass in both altimetry and surface mass balance studies, yet it is often poorly constrained. Site-specific surface snow densities are typically derived from empirical relations based on temperature and wind speed. These parameterizations commonly calculate the average density of the top meter of snow, thereby systematically overestimating snow density at the actual surface. Therefore, constraining surface snow density to the top 0.1 m can improve boundary conditions in high-resolution firn-evolution modeling. We have compiled an extensive dataset of 200 point measurements of surface snow density from firn cores and snow pits on the Greenland ice sheet. We find that surface snow density within 0.1 m of the surface has an average value of 315 kg m−3 with a standard deviation of 44 kg m−3, and has an insignificant annual air temperature dependency. We demonstrate that two widely-used surface snow density parameterizations dependent on temperature systematically overestimate surface snow density over the Greenland ice sheet by 17–19%, and that using a constant density of 315 kg m−3 may give superior results when applied in surface mass budget modeling.

  20. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  1. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  2. Augmented Reality Prototype for Visualizing Large Sensors’ Datasets

    Directory of Open Access Journals (Sweden)

    Folorunso Olufemi A.

    2011-04-01

    Full Text Available This paper addressed the development of an augmented reality (AR based scientific visualization system prototype that supports identification, localisation, and 3D visualisation of oil leakages sensors datasets. Sensors generates significant amount of multivariate datasets during normal and leak situations which made data exploration and visualisation daunting tasks. Therefore a model to manage such data and enhance computational support needed for effective explorations are developed in this paper. A challenge of this approach is to reduce the data inefficiency. This paper presented a model for computing information gain for each data attributes and determine a lead attribute.The computed lead attribute is then used for the development of an AR-based scientific visualization interface which automatically identifies, localises and visualizes all necessary data relevant to a particularly selected region of interest (ROI on the network. Necessary architectural system supports and the interface requirements for such visualizations are also presented.

  3. Application of Density Estimation Methods to Datasets from a Glider

    Science.gov (United States)

    2014-09-30

    humpback and sperm whales as well as different dolphin species. OBJECTIVES The objective of this research is to extend existing methods for cetacean...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources...estimation from single sensor datasets. Required steps for a cue counting approach, where a cue has been defined as a clicking event (Küsel et al., 2011), to

  4. A review of continent scale hydrological datasets available for Africa

    OpenAIRE

    Bonsor, H.C.

    2010-01-01

    As rainfall becomes less reliable with predicted climate change the ability to assess the spatial and seasonal variations in groundwater availability on a large-scale (catchment and continent) is becoming increasingly important (Bates, et al. 2007; MacDonald et al. 2009). The scarcity of observed hydrological data, or difficulty in obtaining such data, within Africa means remotely sensed (RS) datasets must often be used to drive large-scale hydrological models. The different ap...

  5. Dataset of mitochondrial genome variants in oncocytic tumors

    Directory of Open Access Journals (Sweden)

    Lihua Lyu

    2018-04-01

    Full Text Available This dataset presents the mitochondrial genome variants associated with oncocytic tumors. These data were obtained by Sanger sequencing of the whole mitochondrial genomes of oncocytic tumors and the adjacent normal tissues from 32 patients. The mtDNA variants are identified after compared with the revised Cambridge sequence, excluding those defining haplogroups of our patients. The pathogenic prediction for the novel missense variants found in this study was performed with the Mitimpact 2 program.

  6. Soil chemistry in lithologically diverse datasets: the quartz dilution effect

    Science.gov (United States)

    Bern, Carleton R.

    2009-01-01

    National- and continental-scale soil geochemical datasets are likely to move our understanding of broad soil geochemistry patterns forward significantly. Patterns of chemistry and mineralogy delineated from these datasets are strongly influenced by the composition of the soil parent material, which itself is largely a function of lithology and particle size sorting. Such controls present a challenge by obscuring subtler patterns arising from subsequent pedogenic processes. Here the effect of quartz concentration is examined in moist-climate soils from a pilot dataset of the North American Soil Geochemical Landscapes Project. Due to variable and high quartz contents (6.2–81.7 wt.%), and its residual and inert nature in soil, quartz is demonstrated to influence broad patterns in soil chemistry. A dilution effect is observed whereby concentrations of various elements are significantly and strongly negatively correlated with quartz. Quartz content drives artificial positive correlations between concentrations of some elements and obscures negative correlations between others. Unadjusted soil data show the highly mobile base cations Ca, Mg, and Na to be often strongly positively correlated with intermediately mobile Al or Fe, and generally uncorrelated with the relatively immobile high-field-strength elements (HFS) Ti and Nb. Both patterns are contrary to broad expectations for soils being weathered and leached. After transforming bulk soil chemistry to a quartz-free basis, the base cations are generally uncorrelated with Al and Fe, and negative correlations generally emerge with the HFS elements. Quartz-free element data may be a useful tool for elucidating patterns of weathering or parent-material chemistry in large soil datasets.

  7. Dataset on records of Hericium erinaceus in Slovakia

    OpenAIRE

    Vladimír Kunca; Marek Čiliak

    2017-01-01

    The data presented in this article are related to the research article entitled ?Habitat preferences of Hericium erinaceus in Slovakia? (Kunca and ?iliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status,...

  8. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  9. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  10. Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

    Science.gov (United States)

    Drouart, G.; Falkendal, T.

    2018-04-01

    We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.

  11. Development of an Operational TS Dataset Production System for the Data Assimilation System

    Science.gov (United States)

    Kim, Sung Dae; Park, Hyuk Min; Kim, Young Ho; Park, Kwang Soon

    2017-04-01

    An operational TS (Temperature and Salinity) dataset production system was developed to provide near real-time data to the data assimilation system periodically. It collects the latest 15 days' TS data of the north western pacific area (20°N - 55°N, 110°E - 150°E), applies QC tests to the archived data and supplies them to numerical prediction models of KIOST (Korea Institute of Ocean Science and Technology). The latest real-time TS data are collected from Argo GDAC and GTSPP data server every week. Argo data are downloaded from /latest_data directory of Argo GDAC. Because many duplicated data exist when all profile data are extracted from all Argo netCDF files, DB system is used to avoid duplication. All metadata (float ID, location, observation date and time, etc) of all Argo floats is stored into Database system and a Matlab program was developed to manipulate DB data, to check the duplication and to exclude duplicated data. GTSPP data are downloaded from /realtime directory of GTSPP data service. The latest data except ARGO data are extracted from the original data. Another Matlab program was coded to inspect all collected data using 10 QC tests and produce final dataset which can be used by the assimilation system. Three regional range tests to inspect annual, seasonal and monthly variations are included in the QC procedures. The C program was developed to provide regional ranges to data managers. It can calculate upper limit and lower limit of temperature and salinity at depth from 0 to 1550m. The final TS dataset contains the latest 15 days' TS data in netCDF format. It is updated every week and transmitted to numerical modeler of KIOST for operational use.

  12. Parton Distributions based on a Maximally Consistent Dataset

    Science.gov (United States)

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  13. New public dataset for spotting patterns in medieval document images

    Science.gov (United States)

    En, Sovann; Nicolas, Stéphane; Petitjean, Caroline; Jurie, Frédéric; Heutte, Laurent

    2017-01-01

    With advances in technology, a large part of our cultural heritage is becoming digitally available. In particular, in the field of historical document image analysis, there is now a growing need for indexing and data mining tools, thus allowing us to spot and retrieve the occurrences of an object of interest, called a pattern, in a large database of document images. Patterns may present some variability in terms of color, shape, or context, making the spotting of patterns a challenging task. Pattern spotting is a relatively new field of research, still hampered by the lack of available annotated resources. We present a new publicly available dataset named DocExplore dedicated to spotting patterns in historical document images. The dataset contains 1500 images and 1464 queries, and allows the evaluation of two tasks: image retrieval and pattern localization. A standardized benchmark protocol along with ad hoc metrics is provided for a fair comparison of the submitted approaches. We also provide some first results obtained with our baseline system on this new dataset, which show that there is room for improvement and that should encourage researchers of the document image analysis community to design new systems and submit improved results.

  14. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  15. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  16. Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

    Directory of Open Access Journals (Sweden)

    Nicolas Robitaille

    2012-01-01

    Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.

  17. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  18. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  19. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  20. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  1. A cross-country Exchange Market Pressure (EMP dataset

    Directory of Open Access Journals (Sweden)

    Mohit Desai

    2017-06-01

    Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  2. Systematic study on the radiation exposure of flora and fauna in case of compliance with the dose limits of the StrlSchV (radiation protection regulation) for men. Final report

    International Nuclear Information System (INIS)

    Kueppers, Christian; Ustohalova, Veronika; Ulanovsky, Alexander

    2012-01-01

    Dose limits for members of the public exposed to the discharge of radioactive substances into the air or water bodies are defined in the German Radiation Protection Ordinance. This study tested whether non-human species are protected within the human dose limits for all 750 radionuclides as compared to a set of reference biota. External and, where possible, internal doses were calculated for the reference biota. In addition new exposure pathways such as submersion and inhalation (for rat and deer) were incorporated. The upper limit as ordered for adequate biota protection is 10 μGy/h. This study found that radionuclide discharges into the air never exceeded the reference dose rate limit. However, violations were detected for discharges of some very short-lived radionuclides into freshwater or seawater, if the maximum water contamination is assumed. Protection of non-human species is guaranteed for more realistic emission and immission situations. This means that damage to populations living in small water volumes cannot be excluded solely on the basis of regulations for the human dose limit. Therefore, it is necessary to judge the individual case in very unfavourable immission situations. (orig.)

  3. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  4. Reproducibility of studies on text mining for citation screening in systematic reviews: Evaluation and checklist.

    Science.gov (United States)

    Olorisade, Babatunde Kazeem; Brereton, Pearl; Andras, Peter

    2017-09-01

    Independent validation of published scientific results through study replication is a pre-condition for accepting the validity of such results. In computation research, full replication is often unrealistic for independent results validation, therefore, study reproduction has been justified as the minimum acceptable standard to evaluate the validity of scientific claims. The application of text mining techniques to citation screening in the context of systematic literature reviews is a relatively young and growing computational field with high relevance for software engineering, medical research and other fields. However, there is little work so far on reproduction studies in the field. In this paper, we investigate the reproducibility of studies in this area based on information contained in published articles and we propose reporting guidelines that could improve reproducibility. The study was approached in two ways. Initially we attempted to reproduce results from six studies, which were based on the same raw dataset. Then, based on this experience, we identified steps considered essential to successful reproduction of text mining experiments and characterized them to measure how reproducible is a study given the information provided on these steps. 33 articles were systematically assessed for reproducibility using this approach. Our work revealed that it is currently difficult if not impossible to independently reproduce the results published in any of the studies investigated. The lack of information about the datasets used limits reproducibility of about 80% of the studies assessed. Also, information about the machine learning algorithms is inadequate in about 27% of the papers. On the plus side, the third party software tools used are mostly free and available. The reproducibility potential of most of the studies can be significantly improved if more attention is paid to information provided on the datasets used, how they were partitioned and utilized, and

  5. SatelliteDL: a Toolkit for Analysis of Heterogeneous Satellite Datasets

    Science.gov (United States)

    Galloy, M. D.; Fillmore, D.

    2014-12-01

    SatelliteDL is an IDL toolkit for the analysis of satellite Earth observations from a diverse set of platforms and sensors. The core function of the toolkit is the spatial and temporal alignment of satellite swath and geostationary data. The design features an abstraction layer that allows for easy inclusion of new datasets in a modular way. Our overarching objective is to create utilities that automate the mundane aspects of satellite data analysis, are extensible and maintainable, and do not place limitations on the analysis itself. IDL has a powerful suite of statistical and visualization tools that can be used in conjunction with SatelliteDL. Toward this end we have constructed SatelliteDL to include (1) HTML and LaTeX API document generation,(2) a unit test framework,(3) automatic message and error logs,(4) HTML and LaTeX plot and table generation, and(5) several real world examples with bundled datasets available for download. For ease of use, datasets, variables and optional workflows may be specified in a flexible format configuration file. Configuration statements may specify, for example, a region and date range, and the creation of images, plots and statistical summary tables for a long list of variables. SatelliteDL enforces data provenance; all data should be traceable and reproducible. The output NetCDF file metadata holds a complete history of the original datasets and their transformations, and a method exists to reconstruct a configuration file from this information. Release 0.1.0 distributes with ingest methods for GOES, MODIS, VIIRS and CERES radiance data (L1) as well as select 2D atmosphere products (L2) such as aerosol and cloud (MODIS and VIIRS) and radiant flux (CERES). Future releases will provide ingest methods for ocean and land surface products, gridded and time averaged datasets (L3 Daily, Monthly and Yearly), and support for 3D products such as temperature and water vapor profiles. Emphasis will be on NPP Sensor, Environmental and

  6. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  7. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  8. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  9. Developing a minimum dataset for nursing team leader handover in the intensive care unit: A focus group study.

    Science.gov (United States)

    Spooner, Amy J; Aitken, Leanne M; Corley, Amanda; Chaboyer, Wendy

    2018-01-01

    Despite increasing demand for structured processes to guide clinical handover, nursing handover tools are limited in the intensive care unit. The study aim was to identify key items to include in a minimum dataset for intensive care nursing team leader shift-to-shift handover. This focus group study was conducted in a 21-bed medical/surgical intensive care unit in Australia. Senior registered nurses involved in team leader handovers were recruited. Focus groups were conducted using a nominal group technique to generate and prioritise minimum dataset items. Nurses were presented with content from previous team leader handovers and asked to select which content items to include in a minimum dataset. Participant responses were summarised as frequencies and percentages. Seventeen senior nurses participated in three focus groups. Participants agreed that ISBAR (Identify-Situation-Background-Assessment-Recommendations) was a useful tool to guide clinical handover. Items recommended to be included in the minimum dataset (≥65% agreement) included Identify (name, age, days in intensive care), Situation (diagnosis, surgical procedure), Background (significant event(s), management of significant event(s)) and Recommendations (patient plan for next shift, tasks to follow up for next shift). Overall, 30 of the 67 (45%) items in the Assessment category were considered important to include in the minimum dataset and focused on relevant observations and treatment within each body system. Other non-ISBAR items considered important to include related to the ICU (admissions to ICU, staffing/skill mix, theatre cases) and patients (infectious status, site of infection, end of life plan). Items were further categorised into those to include in all handovers and those to discuss only when relevant to the patient. The findings suggest a minimum dataset for intensive care nursing team leader shift-to-shift handover should contain items within ISBAR along with unit and patient specific

  10. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

    Science.gov (United States)

    Sauzet, Odile; Peacock, Janet L

    2017-07-20

    The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  11. Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants

    Directory of Open Access Journals (Sweden)

    Odile Sauzet

    2017-07-01

    Full Text Available Abstract Background The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present. Methods Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared. Results The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations. Conclusions This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

  12. Mock Quasar-Lyman-α forest data-sets for the SDSS-III Baryon Oscillation Spectroscopic Survey

    Energy Technology Data Exchange (ETDEWEB)

    Bautista, Julian E.; Busca, Nicolas G. [APC, Université Paris Diderot-Paris 7, CNRS/IN2P3, CEA, Observatoire de Paris, 10, rue A. Domon and L. Duquet, Paris (France); Bailey, Stephen; Font-Ribera, Andreu; Schlegel, David [Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA (United States); Pieri, Matthew M. [Aix Marseille Université, CNRS, LAM (Laboratoire d' Astrophysique de Marseille) UMR 7326, 38 rue Frédéric Joliot-Curie, 13388, Marseille (France); Miralda-Escudé, Jordi; Gontcho, Satya Gontcho A. [Institut de Ciències del Cosmos, Universitat de Barcelona/IEEC, 1 Martí i Franquès, Barcelona 08028, Catalonia (Spain); Palanque-Delabrouille, Nathalie; Rich, James; Goff, Jean Marc Le [CEA, Centre de Saclay, Irfu/SPP, D128, F-91191 Gif-sur-Yvette (France); Dawson, Kyle [Department of Physics and Astronomy, University of Utah, 115 S 100 E, RM 201, Salt Lake City, UT 84112 (United States); Feng, Yu; Ho, Shirley [McWilliams Center for Cosmology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA, 15213 (United States); Ge, Jian [Department of Astronomy, University of Florida, 211 Bryant Space Science Center, Gainesville, FL 32611-2055 (United States); Noterdaeme, Pasquier; Pâris, Isabelle [Université Paris 6 et CNRS, Institut d' Astrophysique de Paris, 98bis blvd. Arago, 75014 Paris (France); Rossi, Graziano, E-mail: bautista@astro.utah.edu [Department of Astronomy and Space Science, Sejong University, 209 Neungdong-ro, Gwangjin-gu, Seoul, 143-747 (Korea, Republic of)

    2015-05-01

    We describe mock data-sets generated to simulate the high-redshift quasar sample in Data Release 11 (DR11) of the SDSS-III Baryon Oscillation Spectroscopic Survey (BOSS). The mock spectra contain Lyα forest correlations useful for studying the 3D correlation function including Baryon Acoustic Oscillations (BAO). They also include astrophysical effects such as quasar continuum diversity and high-density absorbers, instrumental effects such as noise and spectral resolution, as well as imperfections introduced by the SDSS pipeline treatment of the raw data. The Lyα forest BAO analysis of the BOSS collaboration, described in Delubac et al. 2014, has used these mock data-sets to develop and cross-check analysis procedures prior to performing the BAO analysis on real data, and for continued systematic cross checks. Tests presented here show that the simulations reproduce sufficiently well important characteristics of real spectra. These mock data-sets will be made available together with the data at the time of the Data Release 11.

  13. FTSPlot: fast time series visualization for large datasets.

    Directory of Open Access Journals (Sweden)

    Michael Riss

    Full Text Available The analysis of electrophysiological recordings often involves visual inspection of time series data to locate specific experiment epochs, mask artifacts, and verify the results of signal processing steps, such as filtering or spike detection. Long-term experiments with continuous data acquisition generate large amounts of data. Rapid browsing through these massive datasets poses a challenge to conventional data plotting software because the plotting time increases proportionately to the increase in the volume of data. This paper presents FTSPlot, which is a visualization concept for large-scale time series datasets using techniques from the field of high performance computer graphics, such as hierarchic level of detail and out-of-core data handling. In a preprocessing step, time series data, event, and interval annotations are converted into an optimized data format, which then permits fast, interactive visualization. The preprocessing step has a computational complexity of O(n x log(N; the visualization itself can be done with a complexity of O(1 and is therefore independent of the amount of data. A demonstration prototype has been implemented and benchmarks show that the technology is capable of displaying large amounts of time series data, event, and interval annotations lag-free with < 20 ms ms. The current 64-bit implementation theoretically supports datasets with up to 2(64 bytes, on the x86_64 architecture currently up to 2(48 bytes are supported, and benchmarks have been conducted with 2(40 bytes/1 TiB or 1.3 x 10(11 double precision samples. The presented software is freely available and can be included as a Qt GUI component in future software projects, providing a standard visualization method for long-term electrophysiological experiments.

  14. What value, detection limits

    International Nuclear Information System (INIS)

    Currie, L.A.

    1986-01-01

    Specific approaches and applications of LLD's to nuclear and ''nuclear-related'' measurements are presented in connection with work undertaken for the U.S. Nuclear Regulatory Commission and the International Atomic Energy Agency. In this work, special attention was given to assumptions and potential error sources, as well as to different types of analysis. For the former, the authors considered random and systematic error associated with the blank and the calibration and sample preparation processes, as well as issues relating to the nature of the random error distributions. Analysis types considered included continuous monitoring, ''simple counting'' involving scalar quantities, and spectrum fitting involving data vectors. The investigation of data matrices and multivariate analysis is also described. The most important conclusions derived from this study are: that there is a significant lack of communication and compatibility resulting from diverse terminology and conceptual bases - including no-basis ''ad hoc'' definitions; that the distinction between detection decisions and detection limits is frequently lost sight of; and that quite erroneous LOD estimates follow from inadequate consideration of the actual variability of the blank, and systematic error associated with the blank, the calibration-recovery factor, matrix effects, and ''black box'' data reduction models

  15. Identifying frauds and anomalies in Medicare-B dataset.

    Science.gov (United States)

    Jiwon Seo; Mendelevitch, Ofer

    2017-07-01

    Healthcare industry is growing at a rapid rate to reach a market value of $7 trillion dollars world wide. At the same time, fraud in healthcare is becoming a serious problem, amounting to 5% of the total healthcare spending, or $100 billion dollars each year in US. Manually detecting healthcare fraud requires much effort. Recently, machine learning and data mining techniques are applied to automatically detect healthcare frauds. This paper proposes a novel PageRank-based algorithm to detect healthcare frauds and anomalies. We apply the algorithm to Medicare-B dataset, a real-life data with 10 million healthcare insurance claims. The algorithm successfully identifies tens of previously unreported anomalies.

  16. Power analysis dataset for QCA based multiplexer circuits

    Directory of Open Access Journals (Sweden)

    Md. Abdullah-Al-Shafi

    2017-04-01

    Full Text Available Power consumption in irreversible QCA logic circuits is a vital and a major issue; however in the practical cases, this focus is mostly omitted.The complete power depletion dataset of different QCA multiplexers have been worked out in this paper. At −271.15 °C temperature, the depletion is evaluated under three separate tunneling energy levels. All the circuits are designed with QCADesigner, a broadly used simulation engine and QCAPro tool has been applied for estimating the power dissipation.

  17. Equalizing imbalanced imprecise datasets for genetic fuzzy classifiers

    Directory of Open Access Journals (Sweden)

    AnaM. Palacios

    2012-04-01

    Full Text Available Determining whether an imprecise dataset is imbalanced is not immediate. The vagueness in the data causes that the prior probabilities of the classes are not precisely known, and therefore the degree of imbalance can also be uncertain. In this paper we propose suitable extensions of different resampling algorithms that can be applied to interval valued, multi-labelled data. By means of these extended preprocessing algorithms, certain classification systems designed for minimizing the fraction of misclassifications are able to produce knowledge bases that are also adequate under common metrics for imbalanced classification.

  18. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  19. Dataset concerning the analytical approximation of the Ae3 temperature

    Directory of Open Access Journals (Sweden)

    B.L. Ennis

    2017-02-01

    The dataset includes the terms of the function and the values for the polynomial coefficients for major alloying elements in steel. A short description of the approximation method used to derive and validate the coefficients has also been included. For discussion and application of this model, please refer to the full length article entitled “The role of aluminium in chemical and phase segregation in a TRIP-assisted dual phase steel” 10.1016/j.actamat.2016.05.046 (Ennis et al., 2016 [1].

  20. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  1. A Validation Dataset for CryoSat Sea Ice Investigators

    DEFF Research Database (Denmark)

    Julia, Gaudelli,; Baker, Steve; Haas, Christian

    Since its launch in April 2010 Cryosat has been collecting valuable sea ice data over the Arctic region. Over the same period ESA’s CryoVEx and NASA IceBridge validation campaigns have been collecting a unique set of coincident airborne measurements in the Arctic. The CryoVal-SI project has...... community. In this talk we will describe the composition of the validation dataset, summarising how it was processed and how to understand the content and format of the data. We will also explain how to access the data and the supporting documentation....

  2. Dataset of statements on policy integration of selected intergovernmental organizations

    Directory of Open Access Journals (Sweden)

    Jale Tosun

    2018-04-01

    Full Text Available This article describes data for 78 intergovernmental organizations (IGOs working on topics related to energy governance, environmental protection, and the economy. The number of IGOs covered also includes organizations active in other sectors. The point of departure for data construction was the Correlates of War dataset, from which we selected this sample of IGOs. We updated and expanded the empirical information on the IGOs selected by manual coding. Most importantly, we collected the primary law texts of the individual IGOs in order to code whether they commit themselves to environmental policy integration (EPI, climate policy integration (CPI and/or energy policy integration (EnPI.

  3. Dataset on the energy performance of atrium type hotel buildings.

    Science.gov (United States)

    Vujosevic, Milica; Krstic-Furundzic, Aleksandra

    2018-04-01

    The data presented in this article are related to the research article entitled "The Influence of Atrium on Energy Performance of Hotel Building" (Vujosevic and Krstic-Furundzic, 2017) [1], which describes the annual energy performance of atrium type hotel building in Belgrade climate conditions, with the objective to present the impact of the atrium on the hotel building's energy demands for space heating and cooling. This dataset is made publicly available to show energy performance of selected hotel design alternatives, in order to enable extended analyzes of these data for other researchers.

  4. Dataset on records of Hericium erinaceus in Slovakia.

    Science.gov (United States)

    Kunca, Vladimír; Čiliak, Marek

    2017-06-01

    The data presented in this article are related to the research article entitled "Habitat preferences of Hericium erinaceus in Slovakia" (Kunca and Čiliak, 2016) [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  5. Dataset on records of Hericium erinaceus in Slovakia

    Directory of Open Access Journals (Sweden)

    Vladimír Kunca

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Habitat preferences of Hericium erinaceus in Slovakia” (Kunca and Čiliak, 2016 [FUNECO607] [2]. The dataset include all available and unpublished data from Slovakia, besides the records from the same tree or stem. We compiled a database of records of collections by processing data from herbaria, personal records and communication with mycological activists. Data on altitude, tree species, host tree vital status, host tree position and intensity of management of forest stands were evaluated in this study. All surveys were based on basidioma occurrence and some result from targeted searches.

  6. Comparison of Four Precipitation Forcing Datasets in Land Information System Simulations over the Continental U.S.

    Science.gov (United States)

    Case, Jonathan L.; Kumar, Sujay V.; Kuligowski, Robert J.; Langston, Carrie

    2013-01-01

    The NASA Short ]term Prediction Research and Transition (SPoRT) Center in Huntsville, AL is running a real ]time configuration of the NASA Land Information System (LIS) with the Noah land surface model (LSM). Output from the SPoRT ]LIS run is used to initialize land surface variables for local modeling applications at select National Weather Service (NWS) partner offices, and can be displayed in decision support systems for situational awareness and drought monitoring. The SPoRT ]LIS is run over a domain covering the southern and eastern United States, fully nested within the National Centers for Environmental Prediction Stage IV precipitation analysis grid, which provides precipitation forcing to the offline LIS ]Noah runs. The SPoRT Center seeks to expand the real ]time LIS domain to the entire Continental U.S. (CONUS); however, geographical limitations with the Stage IV analysis product have inhibited this expansion. Therefore, a goal of this study is to test alternative precipitation forcing datasets that can enable the LIS expansion by improving upon the current geographical limitations of the Stage IV product. The four precipitation forcing datasets that are inter ]compared on a 4 ]km resolution CONUS domain include the Stage IV, an experimental GOES quantitative precipitation estimate (QPE) from NESDIS/STAR, the National Mosaic and QPE (NMQ) product from the National Severe Storms Laboratory, and the North American Land Data Assimilation System phase 2 (NLDAS ]2) analyses. The NLDAS ]2 dataset is used as the control run, with each of the other three datasets considered experimental runs compared against the control. The regional strengths, weaknesses, and biases of each precipitation analysis are identified relative to the NLDAS ]2 control in terms of accumulated precipitation pattern and amount, and the impacts on the subsequent LSM spin ]up simulations. The ultimate goal is to identify an alternative precipitation forcing dataset that can best support an

  7. Detecting telomere elongation in longitudinal datasets: analysis of a proposal by Simons, Stulp and Nakagawa

    Directory of Open Access Journals (Sweden)

    Daniel Nettle

    2017-04-01

    Full Text Available Telomere shortening has emerged as an important biomarker of aging. Longitudinal studies consistently find that, although telomere length shortens over time on average, there is a subset of individuals for whom telomere length is observed to increase. This apparent lengthening could either be a genuine biological phenomenon, or simply due to measurement and sampling error. Simons, Stulp & Nakagawa (2014 recently proposed a statistical test for detecting when the amount of apparent lengthening in a dataset exceeds that which should be expected due to error, and thus indicating that genuine elongation may be operative in some individuals. However, the test is based on a restrictive assumption, namely that each individual’s true rate of telomere change is constant over time. It is not currently known whether this assumption is true. Here we show, using simulated datasets, that with perfect measurement and large sample size, the test has high power to detect true lengthening as long as the true rate of change is either constant, or moderately stable, over time. If the true rate of change varies randomly from year to year, the test systematically returns type-II errors (false negatives; that is, failures to detect lengthening even when a substantial fraction of the population truly lengthens each year. We also consider the impact of measurement error. Using estimates of the magnitude of annual attrition and of measurement error derived from the human telomere literature, we show that power of the test is likely to be low in several empirically-realistic scenarios, even in large samples. Thus, whilst a significant result of the proposed test is likely to indicate that true lengthening is present in a data set, type-II errors are a likely outcome, either if measurement error is substantial, and/or the true rate of telomere change varies substantially over time within individuals.

  8. Gaussian processes retrieval of leaf parameters from a multi-species reflectance, absorbance and fluorescence dataset.

    Science.gov (United States)

    Van Wittenberghe, Shari; Verrelst, Jochem; Rivera, Juan Pablo; Alonso, Luis; Moreno, José; Samson, Roeland

    2014-05-05

    Biochemical and structural leaf properties such as chlorophyll content (Chl), nitrogen content (N), leaf water content (LWC), and specific leaf area (SLA) have the benefit to be estimated through nondestructive spectral measurements. Current practices, however, mainly focus on a limited amount of wavelength bands while more information could be extracted from other wavelengths in the full range (400-2500nm) spectrum. In this research, leaf characteristics were estimated from a field-based multi-species dataset, covering a wide range in leaf structures and Chl concentrations. The dataset contains leaves with extremely high Chl concentrations (>100μgcm(-2)), which are seldom estimated. Parameter retrieval was conducted with the machine learning regression algorithm Gaussian Processes (GP), which is able to perform adaptive, nonlinear data fitting for complex datasets. Moreover, insight in relevant bands is provided during the development of a regression model. Consequently, the physical meaning of the model can be explored. Best estimates of SLA, LWC and Chl yielded a best obtained normalized root mean square error of 6.0%, 7.7%, 9.1%, respectively. Several distinct wavebands were chosen across the whole spectrum. A band in the red edge (710nm) appeared to be most important for the estimation of Chl. Interestingly, spectral features related to biochemicals with a structural or carbon storage function (e.g. 1090, 1550, 1670, 1730nm) were found important not only for estimation of SLA, but also for LWC, Chl or N estimation. Similar, Chl estimation was also helped by some wavebands related to water content (950, 1430nm) due to correlation between the parameters. It is shown that leaf parameter retrieval by GP regression is successful, and able to cope with large structural differences between leaves. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Bionimbus: a cloud for managing, analyzing and sharing large genomics datasets.

    Science.gov (United States)

    Heath, Allison P; Greenway, Matthew; Powell, Raymond; Spring, Jonathan; Suarez, Rafael; Hanley, David; Bandlamudi, Chai; McNerney, Megan E; White, Kevin P; Grossman, Robert L

    2014-01-01

    As large genomics and phenotypic datasets are becoming more common, it is increasingly difficult for most researchers to access, manage, and analyze them. One possible approach is to provide the research community with several petabyte-scale cloud-based computing platforms containing these data, along with tools and resources to analyze it. Bionimbus is an open source cloud-computing platform that is based primarily upon OpenStack, which manages on-demand virtual machines that provide the required computational resources, and GlusterFS, which is a high-performance clustered file system. Bionimbus also includes Tukey, which is a portal, and associated middleware that provides a single entry point and a single sign on for the various Bionimbus resources; and Yates, which automates the installation, configuration, and maintenance of the software infrastructure required. Bionimbus is used by a variety of projects to process genomics and phenotypic data. For example, it is used by an acute myeloid leukemia resequencing project at the University of Chicago. The project requires several computational pipelines, including pipelines for quality control, alignment, variant calling, and annotation. For each sample, the alignment step requires eight CPUs for about 12 h. BAM file sizes ranged from 5 GB to 10 GB for each sample. Most members of the research community have difficulty downloading large genomics datasets and obtaining sufficient storage and computer resources to manage and analyze the data. Cloud computing platforms, such as Bionimbus, with data commons that contain large genomics datasets, are one choice for broadening access to research data in genomics. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  10. Using kittens to unlock photo-sharing website datasets for environmental applications

    Science.gov (United States)

    Gascoin, Simon

    2016-04-01

    Mining photo-sharing websites is a promising approach to complement in situ and satellite observations of the environment, however a challenge is to deal with the large degree of noise inherent to online social datasets. Here I explored the value of the Flickr image hosting website database to monitor the snow cover in the Pyrenees. Using the Flickr application programming interface (API) I queried all the public images metadata tagged at least with one of the following words: "snow", "neige", "nieve", "neu" (snow in French, Spanish and Catalan languages). The search was limited to the geo-tagged pictures taken in the Pyrenees area. However, the number of public pictures available in the Flickr database for a given time interval depends on several factors, including the Flickr website popularity and the development of digital photography. Thus, I also searched for all Flickr images tagged with "chat", "gat" or "gato" (cat in French, Spanish and Catalan languages). The tag "cat" was not considered in order to exclude the results from North America where Flickr got popular earlier than in Europe. The number of "cat" images per month was used to fit a model of the number of images uploaded in Flickr with time. This model was used to remove this trend in the numbers of snow-tagged photographs. The resulting time series was compared to a time series of the snow cover area derived from the MODIS satellite over the same region. Both datasets are well correlated; in particular they exhibit the same seasonal evolution, although the inter-annual variabilities are less similar. I will also discuss which other factors may explain the main discrepancies in order to further decrease the noise in the Flickr dataset.

  11. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  12. The Path from Large Earth Science Datasets to Information

    Science.gov (United States)

    Vicente, G. A.

    2013-12-01

    The NASA Goddard Earth Sciences Data (GES) and Information Services Center (DISC) is one of the major Science Mission Directorate (SMD) for archiving and distribution of Earth Science remote sensing data, products and services. This virtual portal provides convenient access to Atmospheric Composition and Dynamics, Hydrology, Precipitation, Ozone, and model derived datasets (generated by GSFC's Global Modeling and Assimilation Office), the North American Land Data Assimilation System (NLDAS) and the Global Land Data Assimilation System (GLDAS) data products (both generated by GSFC's Hydrological Sciences Branch). This presentation demonstrates various tools and computational technologies developed in the GES DISC to manage the huge volume of data and products acquired from various missions and programs over the years. It explores approaches to archive, document, distribute, access and analyze Earth Science data and information as well as addresses the technical and scientific issues, governance and user support problem faced by scientists in need of multi-disciplinary datasets. It also discusses data and product metrics, user distribution profiles and lessons learned through interactions with the science communities around the world. Finally it demonstrates some of the most used data and product visualization and analyses tools developed and maintained by the GES DISC.

  13. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  14. Robust computational analysis of rRNA hypervariable tag datasets.

    Directory of Open Access Journals (Sweden)

    Maksim Sipos

    Full Text Available Next-generation DNA sequencing is increasingly being utilized to probe microbial communities, such as gastrointestinal microbiomes, where it is important to be able to quantify measures of abundance and diversity. The fragmented nature of the 16S rRNA datasets obtained, coupled with their unprecedented size, has led to the recognition that the results of such analyses are potentially contaminated by a variety of artifacts, both experimental and computational. Here we quantify how multiple alignment and clustering errors contribute to overestimates of abundance and diversity, reflected by incorrect OTU assignment, corrupted phylogenies, inaccurate species diversity estimators, and rank abundance distribution functions. We show that straightforward procedural optimizations, combining preexisting tools, are effective in handling large (10(5-10(6 16S rRNA datasets, and we describe metrics to measure the effectiveness and quality of the estimators obtained. We introduce two metrics to ascertain the quality of clustering of pyrosequenced rRNA data, and show that complete linkage clustering greatly outperforms other widely used methods.

  15. BLAST-EXPLORER helps you building datasets for phylogenetic analysis

    Directory of Open Access Journals (Sweden)

    Claverie Jean-Michel

    2010-01-01

    Full Text Available Abstract Background The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task. Results To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform. Conclusions BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr

  16. Multiresolution comparison of precipitation datasets for large-scale models

    Science.gov (United States)

    Chun, K. P.; Sapriza Azuri, G.; Davison, B.; DeBeer, C. M.; Wheater, H. S.

    2014-12-01

    Gridded precipitation datasets are crucial for driving large-scale models which are related to weather forecast and climate research. However, the quality of precipitation products is usually validated individually. Comparisons between gridded precipitation products along with ground observations provide another avenue for investigating how the precipitation uncertainty would affect the performance of large-scale models. In this study, using data from a set of precipitation gauges over British Columbia and Alberta, we evaluate several widely used North America gridded products including the Canadian Gridded Precipitation Anomalies (CANGRD), the National Center for Environmental Prediction (NCEP) reanalysis, the Water and Global Change (WATCH) project, the thin plate spline smoothing algorithms (ANUSPLIN) and Canadian Precipitation Analysis (CaPA). Based on verification criteria for various temporal and spatial scales, results provide an assessment of possible applications for various precipitation datasets. For long-term climate variation studies (~100 years), CANGRD, NCEP, WATCH and ANUSPLIN have different comparative advantages in terms of their resolution and accuracy. For synoptic and mesoscale precipitation patterns, CaPA provides appealing performance of spatial coherence. In addition to the products comparison, various downscaling methods are also surveyed to explore new verification and bias-reduction methods for improving gridded precipitation outputs for large-scale models.

  17. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  18. Overview of the CERES Edition-4 Multilayer Cloud Property Datasets

    Science.gov (United States)

    Chang, F. L.; Minnis, P.; Sun-Mack, S.; Chen, Y.; Smith, R. A.; Brown, R. R.

    2014-12-01

    Knowledge of the cloud vertical distribution is important for understanding the role of clouds on earth's radiation budget and climate change. Since high-level cirrus clouds with low emission temperatures and small optical depths can provide a positive feedback to a climate system and low-level stratus clouds with high emission temperatures and large optical depths can provide a negative feedback effect, the retrieval of multilayer cloud properties using satellite observations, like Terra and Aqua MODIS, is critically important for a variety of cloud and climate applications. For the objective of the Clouds and the Earth's Radiant Energy System (CERES), new algorithms have been developed using Terra and Aqua MODIS data to allow separate retrievals of cirrus and stratus cloud properties when the two dominant cloud types are simultaneously present in a multilayer system. In this paper, we will present an overview of the new CERES Edition-4 multilayer cloud property datasets derived from Terra as well as Aqua. Assessment of the new CERES multilayer cloud datasets will include high-level cirrus and low-level stratus cloud heights, pressures, and temperatures as well as their optical depths, emissivities, and microphysical properties.

  19. Predicting weather regime transitions in Northern Hemisphere datasets

    Energy Technology Data Exchange (ETDEWEB)

    Kondrashov, D. [University of California, Department of Atmospheric and Oceanic Sciences and Institute of Geophysics and Planetary Physics, Los Angeles, CA (United States); Shen, J. [UCLA, Department of Statistics, Los Angeles, CA (United States); Berk, R. [UCLA, Department of Statistics, Los Angeles, CA (United States); University of Pennsylvania, Department of Criminology, Philadelphia, PA (United States); D' Andrea, F.; Ghil, M. [Ecole Normale Superieure, Departement Terre-Atmosphere-Ocean and Laboratoire de Meteorologie Dynamique (CNRS and IPSL), Paris Cedex 05 (France)

    2007-10-15

    A statistical learning method called random forests is applied to the prediction of transitions between weather regimes of wintertime Northern Hemisphere (NH) atmospheric low-frequency variability. A dataset composed of 55 winters of NH 700-mb geopotential height anomalies is used in the present study. A mixture model finds that the three Gaussian components that were statistically significant in earlier work are robust; they are the Pacific-North American (PNA) regime, its approximate reverse (the reverse PNA, or RNA), and the blocked phase of the North Atlantic Oscillation (BNAO). The most significant and robust transitions in the Markov chain generated by these regimes are PNA {yields} BNAO, PNA {yields} RNA and BNAO {yields} PNA. The break of a regime and subsequent onset of another one is forecast for these three transitions. Taking the relative costs of false positives and false negatives into account, the random-forests method shows useful forecasting skill. The calculations are carried out in the phase space spanned by a few leading empirical orthogonal functions of dataset variability. Plots of estimated response functions to a given predictor confirm the crucial influence of the exit angle on a preferred transition path. This result points to the dynamic origin of the transitions. (orig.)

  20. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  1. ISC-EHB: Reconstruction of a robust earthquake dataset

    Science.gov (United States)

    Weston, J.; Engdahl, E. R.; Harris, J.; Di Giacomo, D.; Storchak, D. A.

    2018-04-01

    The EHB Bulletin of hypocentres and associated travel-time residuals was originally developed with procedures described by Engdahl, Van der Hilst and Buland (1998) and currently ends in 2008. It is a widely used seismological dataset, which is now expanded and reconstructed, partly by exploiting updated procedures at the International Seismological Centre (ISC), to produce the ISC-EHB. The reconstruction begins in the modern period (2000-2013) to which new and more rigorous procedures for event selection, data preparation, processing, and relocation are applied. The selection criteria minimise the location bias produced by unmodelled 3D Earth structure, resulting in events that are relatively well located in any given region. Depths of the selected events are significantly improved by a more comprehensive review of near station and secondary phase travel-time residuals based on ISC data, especially for the depth phases pP, pwP and sP, as well as by a rigorous review of the event depths in subduction zone cross sections. The resulting cross sections and associated maps are shown to provide details of seismicity in subduction zones in much greater detail than previously achievable. The new ISC-EHB dataset will be especially useful for global seismicity studies and high-frequency regional and global tomographic inversions.

  2. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  3. Condensing Massive Satellite Datasets For Rapid Interactive Analysis

    Science.gov (United States)

    Grant, G.; Gallaher, D. W.; Lv, Q.; Campbell, G. G.; Fowler, C.; LIU, Q.; Chen, C.; Klucik, R.; McAllister, R. A.

    2015-12-01

    Our goal is to enable users to interactively analyze massive satellite datasets, identifying anomalous data or values that fall outside of thresholds. To achieve this, the project seeks to create a derived database containing only the most relevant information, accelerating the analysis process. The database is designed to be an ancillary tool for the researcher, not an archival database to replace the original data. This approach is aimed at improving performance by reducing the overall size by way of condensing the data. The primary challenges of the project include: - The nature of the research question(s) may not be known ahead of time. - The thresholds for determining anomalies may be uncertain. - Problems associated with processing cloudy, missing, or noisy satellite imagery. - The contents and method of creation of the condensed dataset must be easily explainable to users. The architecture of the database will reorganize spatially-oriented satellite imagery into temporally-oriented columns of data (a.k.a., "data rods") to facilitate time-series analysis. The database itself is an open-source parallel database, designed to make full use of clustered server technologies. A demonstration of the system capabilities will be shown. Applications for this technology include quick-look views of the data, as well as the potential for on-board satellite processing of essential information, with the goal of reducing data latency.

  4. Application of global datasets for hydrological modelling of a remote, snowmelt driven catchment in the Canadian Sub-Arctic

    Science.gov (United States)

    Casson, David; Werner, Micha; Weerts, Albrecht; Schellekens, Jaap; Solomatine, Dimitri

    2017-04-01

    Hydrological modelling in the Canadian Sub-Arctic is hindered by the limited spatial and temporal coverage of local meteorological data. Local watershed modelling often relies on data from a sparse network of meteorological stations with a rough density of 3 active stations per 100,000 km2. Global datasets hold great promise for application due to more comprehensive spatial and extended temporal coverage. A key objective of this study is to demonstrate the application of global datasets and data assimilation techniques for hydrological modelling of a data sparse, Sub-Arctic watershed. Application of available datasets and modelling techniques is currently limited in practice due to a lack of local capacity and understanding of available tools. Due to the importance of snow processes in the region, this study also aims to evaluate the performance of global SWE products for snowpack modelling. The Snare Watershed is a 13,300 km2 snowmelt driven sub-basin of the Mackenzie River Basin, Northwest Territories, Canada. The Snare watershed is data sparse in terms of meteorological data, but is well gauged with consistent discharge records since the late 1970s. End of winter snowpack surveys have been conducted every year from 1978-present. The application of global re-analysis datasets from the EU FP7 eartH2Observe project are investigated in this study. Precipitation data are taken from Multi-Source Weighted-Ensemble Precipitation (MSWEP) and temperature data from Watch Forcing Data applied to European Reanalysis (ERA)-Interim data (WFDEI). GlobSnow-2 is a global Snow Water Equivalent (SWE) measurement product funded by the European Space Agency (ESA) and is also evaluated over the local watershed. Downscaled precipitation, temperature and potential evaporation datasets are used as forcing data in a distributed version of the HBV model implemented in the WFLOW framework. Results demonstrate the successful application of global datasets in local watershed modelling, but

  5. Utilizing the Antarctic Master Directory to find orphan datasets

    Science.gov (United States)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  6. MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

    Science.gov (United States)

    Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

    2013-01-01

    Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED

  7. A re-analysis of the Lake Suigetsu terrestrial radiocarbon calibration dataset

    International Nuclear Information System (INIS)

    Staff, R.A.; Bronk Ramsey, C.; Nakagawa, T.

    2010-01-01

    Lake Suigetsu, Honshu Island, Japan provides an ideal sedimentary sequence from which to derive a wholly terrestrial radiocarbon calibration curve back to the limits of radiocarbon detection (circa 60 ka bp). The presence of well-defined, annually-deposited laminae (varves) throughout the entirety of this period provides an independent, high resolution chronometer against which radiocarbon measurements of plant macrofossils from the sediment column can be directly related. However, data from the initial Lake Suigetsu project were found to diverge significantly from alternative, marine-based calibration datasets released around the same time (e.g. ). The main source of this divergence is thought to be the result of inaccuracies in the absolute age profile of the Suigetsu project, caused by both varve counting uncertainties and gaps in the sediment column of unknown duration between successively-drilled core sections. Here, a re-analysis of the previously-published Lake Suigetsu data is conducted. The most recent developments in Bayesian statistical modelling techniques (OxCal v4.1; ) are implemented to fit the Suigetsu data to the latest radiocarbon calibration datasets and thereby estimate the duration of the inter-core section gaps in the Suigetsu data. In this way, the absolute age of the Lake Suigetsu sediment profile is more accurately defined, providing significant information for both radiocarbon calibration and palaeoenvironmental reconstruction purposes.

  8. Northern Chinese dental ages estimated from southern Chinese reference datasets closely correlate with chronological age

    Directory of Open Access Journals (Sweden)

    Hai Ming Wong

    2016-12-01

    Full Text Available While northern and southern Chinese are genetically correlated, there exists notable environmental differences in their living conditions. This study aimed to evaluate validity of the southern Chinese reference dataset for dental age estimation applied to northern Chinese. Dental panoramic tomographs of 437 northern Chinese aged 3 to 21 years were analysed. All the left maxillary and mandibular permanent teeth plus the 2 third molars on the right side were scored based on Demirjian’s classification of tooth development stages. Mean and standard error of dental age were obtained for each tooth development stage, followed by random effect meta-analysis for mean dental age estimation. Validity of the method was examined through measures of agreement (95% limits of agreement, standard error of measurement, and Lin’s concordance correlation coefficient and measure of reliability (Intraclass correlation coefficient. On average, the estimated dental age overestimated chronological age by only around 1 month in both females and males. The Intraclass correlation coefficient values were 0.99 for both sexes, suggesting excellent reliability of the method. Reference dataset for dental age estimation developed on the basis of southern Chinese was applicable for use among the northern Chinese.

  9. A New Heuristic Anonymization Technique for Privacy Preserved Datasets Publication on Cloud Computing

    Science.gov (United States)

    Aldeen Yousra, S.; Mazleena, Salleh

    2018-05-01

    Recent advancement in Information and Communication Technologies (ICT) demanded much of cloud services to sharing users’ private data. Data from various organizations are the vital information source for analysis and research. Generally, this sensitive or private data information involves medical, census, voter registration, social network, and customer services. Primary concern of cloud service providers in data publishing is to hide the sensitive information of individuals. One of the cloud services that fulfill the confidentiality concerns is Privacy Preserving Data Mining (PPDM). The PPDM service in Cloud Computing (CC) enables data publishing with minimized distortion and absolute privacy. In this method, datasets are anonymized via generalization to accomplish the privacy requirements. However, the well-known privacy preserving data mining technique called K-anonymity suffers from several limitations. To surmount those shortcomings, I propose a new heuristic anonymization framework for preserving the privacy of sensitive datasets when publishing on cloud. The advantages of K-anonymity, L-diversity and (α, k)-anonymity methods for efficient information utilization and privacy protection are emphasized. Experimental results revealed the superiority and outperformance of the developed technique than K-anonymity, L-diversity, and (α, k)-anonymity measure.

  10. Evaluating the use of different precipitation datasets in simulating a flood event

    Science.gov (United States)

    Akyurek, Z.; Ozkaya, A.

    2016-12-01

    Floods caused by convective storms in mountainous regions are sensitive to the temporal and spatial variability of rainfall. Space-time estimates of rainfall from weather radar, satellites and numerical weather prediction models can be a remedy to represent pattern of the rainfall with some inaccuracy. However, there is a strong need for evaluation of the performance and limitations of these estimates in hydrology. This study aims to provide a comparison of gauge, radar, satellite (Hydro-Estimator (HE)) and numerical weather prediciton model (Weather Research and Forecasting (WRF)) precipitation datasets during an extreme flood event (22.11.2014) lasting 40 hours in Samsun-Turkey. For this study, hourly rainfall data from 13 ground observation stations were used in the analyses. This event having a peak discharge of 541 m3/sec created flooding at the downstream of Terme Basin. Comparisons were performed in two parts. First the analysis were performed in areal and point based manner. Secondly, a semi-distributed hydrological model was used to assess the accuracy of the rainfall datasets to simulate river flows for the flood event. Kalman Filtering was used in the bias correction of radar rainfall data compared to gauge measurements. Radar, gauge, corrected radar, HE and WRF rainfall data were used as model inputs. Generally, the HE product underestimates the cumulative rainfall amounts in all stations, radar data underestimates the results in cumulative sense but keeps the consistency in the results. On the other hand, almost all stations in WRF mean statistics computations have better results compared to the HE product but worse than the radar dataset. Results in point comparisons indicated that, trend of the rainfall is captured by the radar rainfall estimation well but radar underestimates the maximum values. According to cumulative gauge value, radar underestimated the cumulative rainfall amount by % 32. Contrary to other datasets, the bias of WRF is positive

  11. NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

    Science.gov (United States)

    Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

    2010-05-01

    One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating

  12. Accuracy assessment of seven global land cover datasets over China

    Science.gov (United States)

    Yang, Yongke; Xiao, Pengfeng; Feng, Xuezhi; Li, Haixing

    2017-03-01

    Land cover (LC) is the vital foundation to Earth science. Up to now, several global LC datasets have arisen with efforts of many scientific communities. To provide guidelines for data usage over China, nine LC maps from seven global LC datasets (IGBP DISCover, UMD, GLC, MCD12Q1, GLCNMO, CCI-LC, and GlobeLand30) were evaluated in this study. First, we compared their similarities and discrepancies in both area and spatial patterns, and analysed their inherent relations to data sources and classification schemes and methods. Next, five sets of validation sample units (VSUs) were collected to calculate their accuracy quantitatively. Further, we built a spatial analysis model and depicted their spatial variation in accuracy based on the five sets of VSUs. The results show that, there are evident discrepancies among these LC maps in both area and spatial patterns. For LC maps produced by different institutes, GLC 2000 and CCI-LC 2000 have the highest overall spatial agreement (53.8%). For LC maps produced by same institutes, overall spatial agreement of CCI-LC 2000 and 2010, and MCD12Q1 2001 and 2010 reach up to 99.8% and 73.2%, respectively; while more efforts are still needed if we hope to use these LC maps as time series data for model inputting, since both CCI-LC and MCD12Q1 fail to represent the rapid changing trend of several key LC classes in the early 21st century, in particular urban and built-up, snow and ice, water bodies, and permanent wetlands. With the highest spatial resolution, the overall accuracy of GlobeLand30 2010 is 82.39%. For the other six LC datasets with coarse resolution, CCI-LC 2010/2000 has the highest overall accuracy, and following are MCD12Q1 2010/2001, GLC 2000, GLCNMO 2008, IGBP DISCover, and UMD in turn. Beside that all maps exhibit high accuracy in homogeneous regions; local accuracies in other regions are quite different, particularly in Farming-Pastoral Zone of North China, mountains in Northeast China, and Southeast Hills. Special

  13. Telerheumatology: A Systematic Review.

    Science.gov (United States)

    McDougall, John A; Ferucci, Elizabeth D; Glover, Janis; Fraenkel, Liana

    2017-10-01

    To identify and summarize the published and gray literature on the use of telemedicine for the diagnosis and management of inflammatory and/or autoimmune rheumatic disease. We performed a registered systematic search (CRD42015025382) for studies using MEDLINE (1946 to July 2015), Embase (1974 to July 2015), Web of Science (1900 to July 2015), and Scopus (1946 to July 2015) databases. We included studies that demonstrated the use of telemedicine for diagnosis and/or management of inflammatory/autoimmune rheumatic disease. Following data extraction, we performed a descriptive analysis. Our literature search identified 1,468 potentially eligible studies. Of these studies, 20 were ultimately included in this review. Studies varied significantly in publication type, quality of evidence, and the reporting of methods. Most demonstrated a high risk of bias. Rheumatoid arthritis was the most commonly studied rheumatic disease (42% of patients). Studies demonstrated conflicting results regarding the effectiveness of telemedicine (18 found it effective, 1 found it effective but possibly harmful, and 1 found it ineffective). A limited number of studies included some component of a cost analysis (n = 6; 16% of patients); all of these found telemedicine to be cost-effective. Studies identified by this systematic review generally found telemedicine to be effective for the diagnosis and management of autoimmune/inflammatory rheumatic disease; however, there is limited evidence to support this conclusion. Further studies are needed to determine the best uses of telemedicine for the diagnosis and management of these conditions. © 2016, American College of Rheumatology.

  14. Evaluating satellite-derived long-term historical precipitation datasets for drought monitoring in Chile

    Science.gov (United States)

    Zambrano, Francisco; Wardlow, Brian; Tadesse, Tsegaye; Lillo-Saavedra, Mario; Lagos, Octavio

    2017-04-01

    Precipitation is a key parameter for the study of climate change and variability and the detection and monitoring of natural disaster such as drought. Precipitation datasets that accurately capture the amount and spatial variability of rainfall is critical for drought monitoring and a wide range of other climate applications. This is challenging in many parts of the world, which often have a limited number of weather stations and/or historical data records. Satellite-derived precipitation products offer a viable alternative with several remotely sensed precipitation datasets now available with long historical data records (+30years), which include the Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Climate Data Record (PERSIANN-CDR) datasets. This study presents a comparative analysis of three historical satellite-based precipitation datasets that include Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) 3B43 version 7 (1998-2015), PERSIANN-CDR (1983-2015) and CHIRPS 2.0 (1981-2015) over Chile to assess their performance across the country and for the case of the two long-term products the applicability for agricultural drought were evaluated when used in the calculation of commonly used drought indicator as the Standardized Precipitation Index (SPI). In this analysis, 278 weather stations of in situ rainfall measurements across Chile were initially compared to the satellite data. The study area (Chile) was divided into five latitudinal zones: North, North-Central, Central, South-Central and South to determine if there were a regional difference among these satellite products, and nine statistics were used to evaluate their performance to estimate the amount and spatial distribution of historical rainfall across Chile. Hierarchical cluster analysis, k-means and singular value decomposition were used to analyze

  15. Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...

  16. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  17. Dataset for Probabilistic estimation of residential air exchange rates for population-based exposure modeling

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset provides the city-specific air exchange rate measurements, modeled, literature-based as well as housing characteristics. This dataset is associated with...

  18. Ecohydrological Index, Native Fish, and Climate Trends and Relationships in the Kansas River Basin_dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The dataset is an excel file that contain data for the figures in the manuscript. This dataset is associated with the following publication: Sinnathamby, S., K....

  19. Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Human Built-up And Settlement Extent (HBASE) Dataset from Landsat is a global map of HBASE derived from the Global Land Survey (GLS) Landsat dataset for...

  20. GIS measured environmental correlates of active school transport: A systematic review of 14 studies

    Directory of Open Access Journals (Sweden)

    Faulkner Guy

    2011-05-01

    Full Text Available Abstract Background Emerging frameworks to examine active school transportation (AST commonly emphasize the built environment (BE as having an influence on travel mode decisions. Objective measures of BE attributes have been recommended for advancing knowledge about the influence of the BE on school travel mode choice. An updated systematic review on the relationships between GIS-measured BE attributes and AST is required to inform future research in this area. The objectives of this review are: i to examine and summarize the relationships between objectively measured BE features and AST in children and adolescents and ii to critically discuss GIS methodologies used in this context. Methods Six electronic databases, and websites were systematically searched, and reference lists were searched and screened to identify studies examining AST in students aged five to 18 and reporting GIS as an environmental measurement tool. Fourteen cross-sectional studies were identified. The analyses were classified in terms of density, diversity, and design and further differentiated by the measures used or environmental condition examined. Results Only distance was consistently found to be negatively associated with AST. Consistent findings of positive or negative associations were not found for land use mix, residential density, and intersection density. Potential modifiers of any relationship between these attributes and AST included age, school travel mode, route direction (e.g., to/from school, and trip-end (home or school. Methodological limitations included inconsistencies in geocoding, selection of study sites, buffer methods and the shape of zones (Modifiable Areal Unit Problem [MAUP], the quality of road and pedestrian infrastructure data, and school route estimation. Conclusions The inconsistent use of spatial concepts limits the ability to draw conclusions about the relationship between objectively measured environmental attributes and AST. Future

  1. An Automatic Matcher and Linker for Transportation Datasets

    Directory of Open Access Journals (Sweden)

    Ali Masri

    2017-01-01

    Full Text Available Multimodality requires the integration of heterogeneous transportation data to construct a broad view of the transportation network. Many new transportation services are emerging while being isolated from previously-existing networks. This leads them to publish their data sources to the web, according to linked data principles, in order to gain visibility. Our interest is to use these data to construct an extended transportation network that links these new services to existing ones. The main problems we tackle in this article fall in the categories of automatic schema matching and data interlinking. We propose an approach that uses web services as mediators to help in automatically detecting geospatial properties and mapping them between two different schemas. On the other hand, we propose a new interlinking approach that enables the user to define rich semantic links between datasets in a flexible and customizable way.

  2. [Parallel virtual reality visualization of extreme large medical datasets].

    Science.gov (United States)

    Tang, Min

    2010-04-01

    On the basis of a brief description of grid computing, the essence and critical techniques of parallel visualization of extreme large medical datasets are discussed in connection with Intranet and common-configuration computers of hospitals. In this paper are introduced several kernel techniques, including the hardware structure, software framework, load balance and virtual reality visualization. The Maximum Intensity Projection algorithm is realized in parallel using common PC cluster. In virtual reality world, three-dimensional models can be rotated, zoomed, translated and cut interactively and conveniently through the control panel built on virtual reality modeling language (VRML). Experimental results demonstrate that this method provides promising and real-time results for playing the role in of a good assistant in making clinical diagnosis.

  3. The wildland-urban interface raster dataset of Catalonia.

    Science.gov (United States)

    Alcasena, Fermín J; Evers, Cody R; Vega-Garcia, Cristina

    2018-04-01

    We provide the wildland urban interface (WUI) map of the autonomous community of Catalonia (Northeastern Spain). The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region.

  4. xarray: N-D labeled Arrays and Datasets in Python

    Directory of Open Access Journals (Sweden)

    Stephan Hoyer

    2017-04-01

    Full Text Available xarray is an open source project and Python package that provides a toolkit and data structures for N-dimensional labeled arrays. Our approach combines an application programing interface (API inspired by pandas with the Common Data Model for self-described scientific data. Key features of the xarray package include label-based indexing and arithmetic, interoperability with the core scientific Python packages (e.g., pandas, NumPy, Matplotlib, out-of-core computation on datasets that don’t fit into memory, a wide range of serialization and input/output (I/O options, and advanced multi-dimensional data manipulation tools such as group-by and resampling. xarray, as a data model and analytics toolkit, has been widely adopted in the geoscience community but is also used more broadly for multi-dimensional data analysis in physics, machine learning and finance.

  5. The wildland-urban interface raster dataset of Catalonia

    Directory of Open Access Journals (Sweden)

    Fermín J. Alcasena

    2018-04-01

    Full Text Available We provide the wildland urban interface (WUI map of the autonomous community of Catalonia (Northeastern Spain. The map encompasses an area of some 3.21 million ha and is presented as a 150-m resolution raster dataset. Individual housing location, structure density and vegetation cover data were used to spatially assess in detail the interface, intermix and dispersed rural WUI communities with a geographical information system. Most WUI areas concentrate in the coastal belt where suburban sprawl has occurred nearby or within unmanaged forests. This geospatial information data provides an approximation of residential housing potential for loss given a wildfire, and represents a valuable contribution to assist landscape and urban planning in the region. Keywords: Wildland-urban interface, Wildfire risk, Urban planning, Human communities, Catalonia

  6. Reconstructing flaw image using dataset of full matrix capture technique

    Energy Technology Data Exchange (ETDEWEB)

    Lee, Tae Hun; Kim, Yong Sik; Lee, Jeong Seok [KHNP Central Research Institute, Daejeon (Korea, Republic of)

    2017-02-15

    A conventional phased array ultrasonic system offers the ability to steer an ultrasonic beam by applying independent time delays of individual elements in the array and produce an ultrasonic image. In contrast, full matrix capture (FMC) is a data acquisition process that collects a complete matrix of A-scans from every possible independent transmit-receive combination in a phased array transducer and makes it possible to reconstruct various images that cannot be produced by conventional phased array with the post processing as well as images equivalent to a conventional phased array image. In this paper, a basic algorithm based on the LLL mode total focusing method (TFM) that can image crack type flaws is described. And this technique was applied to reconstruct flaw images from the FMC dataset obtained from the experiments and ultrasonic simulation.

  7. Survey dataset on occupational hazards on construction sites

    Directory of Open Access Journals (Sweden)

    Patience F. Tunji-Olayeni

    2018-06-01

    Full Text Available The construction site provides an unfriendly working conditions, exposing workers to one of the harshest environments at a workplace. In this dataset, a structured questionnaire was design directed to thirty-five (35 craftsmen selected through a purposive sampling technique on various construction sites in one of the most populous cities in sub-Saharan Africa. The set of descriptive statistics is presented with tables, stacked bar chats and pie charts. Common occupational health conditions affecting the cardiovascular, respiratory and musculoskeletal systems of craftsmen on construction sites were identified. The effects of occupational health hazards on craftsmen and on construction project performance can be determined when the data is analyzed. Moreover, contractors’ commitment to occupational health and safety (OHS can be obtained from the analysis of the survey data. Keywords: Accidents, Construction industry, Craftsmen, Health, Occupational hazards

  8. Orthology detection combining clustering and synteny for very large datasets.

    Science.gov (United States)

    Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

    2014-01-01

    The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  9. Orthology detection combining clustering and synteny for very large datasets.

    Directory of Open Access Journals (Sweden)

    Marcus Lechner

    Full Text Available The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.

  10. The SAIL databank: linking multiple health and social care datasets

    Directory of Open Access Journals (Sweden)

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  11. Clusterflock: a flocking algorithm for isolating congruent phylogenomic datasets.

    Science.gov (United States)

    Narechania, Apurva; Baker, Richard; DeSalle, Rob; Mathema, Barun; Kolokotronis, Sergios-Orestis; Kreiswirth, Barry; Planet, Paul J

    2016-10-24

    Collective animal behavior, such as the flocking of birds or the shoaling of fish, has inspired a class of algorithms designed to optimize distance-based clusters in various applications, including document analysis and DNA microarrays. In a flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize, and clusters emerge without the need for partitional seeds. In addition to its unsupervised nature, flocking offers several computational advantages, including the potential to reduce the number of required comparisons. In the tool presented here, Clusterflock, we have implemented a flocking algorithm designed to locate groups (flocks) of orthologous gene families (OGFs) that share an evolutionary history. Pairwise distances that measure phylogenetic incongruence between OGFs guide flock formation. We tested this approach on several simulated datasets by varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, Clusterflock outperforms other well-established clustering techniques. We also verified its utility on a known, large-scale recombination event in Staphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signals, we were able to pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold. Clusterflock is an open-source tool that can be used to discover horizontally transferred genes, recombined areas of chromosomes, and the phylogenetic 'core' of a genome. Although we used it here in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval, and can use these distances to 'flock' any type of data.

  12. The SAIL databank: linking multiple health and social care datasets.

    Science.gov (United States)

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  13. A high quality finger vascular pattern dataset collected using a custom designed capturing device

    NARCIS (Netherlands)

    Ton, B.T.; Veldhuis, Raymond N.J.

    2013-01-01

    The number of finger vascular pattern datasets available for the research community is scarce, therefore a new finger vascular pattern dataset containing 1440 images is prsented. This dataset is unique in its kind as the images are of high resolution and have a known pixel density. Furthermore this

  14. Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

    NARCIS (Netherlands)

    Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

    2017-01-01

    Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

  15. Ocean heat content and ocean energy budget: make better use of historical global subsurface temperature dataset

    Science.gov (United States)

    Cheng, L.; Zhu, J.

    2016-02-01

    Ocean heat content (OHC) change contributes substantially to global sea level rise, also is a key metric of the ocean/global energy budget, so it is a vital task for the climate research community to estimate historical OHC. While there are large uncertainties regarding its value, here we review the OHC calculation by using the historical global subsurface temperature dataset, and discuss the sources of its uncertainty. The presentation briefly introduces how to correct to the systematic biases in expendable bathythermograph (XBT) data, a alternative way of filling data gaps (which is main focus of this talk), and how to choose a proper climatology. A new reconstruction of historical upper (0-700 m) OHC change will be presented, which is the Institute of Atmospheric Physics (IAP) version of historical upper OHC assessment. The authors also want to highlight the impact of observation system change on OHC calculation, which could lead to bias in OHC estimates. Furthermore, we will compare the updated observational-based estimates on ocean heat content change since 1970s with CMIP5 results. This comparison shows good agreement, increasing the confidence of the climate models in representing the climate history.

  16. A Scalable Permutation Approach Reveals Replication and Preservation Patterns of Network Modules in Large Datasets.

    Science.gov (United States)

    Ritchie, Scott C; Watts, Stephen; Fearnley, Liam G; Holt, Kathryn E; Abraham, Gad; Inouye, Michael

    2016-07-01

    Network modules-topologically distinct groups of edges and nodes-that are preserved across datasets can reveal common features of organisms, tissues, cell types, and molecules. Many statistics to identify such modules have been developed, but testing their significance requires heuristics. Here, we demonstrate that current methods for assessing module preservation are systematically biased and produce skewed p values. We introduce NetRep, a rapid and computationally efficient method that uses a permutation approach to score module preservation without assuming data are normally distributed. NetRep produces unbiased p values and can distinguish between true and false positives during multiple hypothesis testing. We use NetRep to quantify preservation of gene coexpression modules across murine brain, liver, adipose, and muscle tissues. Complex patterns of multi-tissue preservation were revealed, including a liver-derived housekeeping module that displayed adipose- and muscle-specific association with body weight. Finally, we demonstrate the broader applicability of NetRep by quantifying preservation of bacterial networks in gut microbiota between men and women. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  17. Improving AfriPop dataset with settlement extents extracted from RapidEye for the border region comprising South-Africa, Swaziland and Mozambique

    Directory of Open Access Journals (Sweden)

    Julie Deleu

    2015-11-01

    Full Text Available For modelling the spatial distribution of malaria incidence, accurate and detailed information on population size and distribution are of significant importance. Different, global, spatial, standard datasets of population distribution have been developed and are widely used. However, most of them are not up-to-date and the low spatial resolution of the input census data has limitations for contemporary, national- scale analyses. The AfriPop project, launched in July 2009, was initiated with the aim of producing detailed, contemporary and easily updatable population distribution datasets for the whole of Africa. High-resolution satellite sensors can help to further improve this dataset through the generation of high-resolution settlement layers at greater spatial details. In the present study, the settlement extents included in the MALAREO land use classification were used to generate an enhanced and updated version of the AfriPop dataset for the study area covering southern Mozambique, eastern Swaziland and the malarious part of KwaZulu-Natal in South Africa. Results show that it is possible to easily produce a detailed and updated population distribution dataset applying the AfriPop modelling approach with the use of high-resolution settlement layers and population growth rates. The 2007 and 2011 population datasets are freely available as a product of the MALAREO project and can be downloaded from the project website.

  18. Exploring the Limitations of Measures of Students' Socioeconomic Status (SES)

    Science.gov (United States)

    Dickinson, Emily R.; Adelson, Jill L.

    2014-01-01

    This study uses a nationally representative student dataset to explore the limitations of commonly used measures of socioeconomic status (SES). Among the identified limitations are patterns of missing data that conflate the traditional conceptualization of SES with differences in family structure that have emerged in recent years and a lack of…

  19. Systematic Applications of Metabolomics in Metabolic Engineering

    Directory of Open Access Journals (Sweden)

    Robert A. Dromms

    2012-12-01

    Full Text Available The goals of metabolic engineering are well-served by the biological information provided by metabolomics: information on how the cell is currently using its biochemical resources is perhaps one of the best ways to inform strategies to engineer a cell to produce a target compound. Using the analysis of extracellular or intracellular levels of the target compound (or a few closely related molecules to drive metabolic engineering is quite common. However, there is surprisingly little systematic use of metabolomics datasets, which simultaneously measure hundreds of metabolites rather than just a few, for that same purpose. Here, we review the most common systematic approaches to integrating metabolite data with metabolic engineering, with emphasis on existing efforts to use whole-metabolome datasets. We then review some of the most common approaches for computational modeling of cell-wide metabolism, including constraint-based models, and discuss current computational approaches that explicitly use metabolomics data. We conclude with discussion of the broader potential of computational approaches that systematically use metabolomics data to drive metabolic engineering.

  20. Privacy preserving data anonymization of spontaneous ADE reporting system dataset.

    Science.gov (United States)

    Lin, Wen-Yang; Yang, Duen-Chuan; Wang, Jie-Teng

    2016-07-18

    To facilitate long-term safety surveillance of marketing drugs, many spontaneously reporting systems (SRSs) of ADR events have been established world-wide. Since the data collected by SRSs contain sensitive personal health information that should be protected to prevent the identification of individuals, it procures the issue of privacy preserving data publishing (PPDP), that is, how to sanitize (anonymize) raw data before publishing. Although much work has been done on PPDP, very few studies have focused on protecting privacy of SRS data and none of the anonymization methods is favorable for SRS datasets, due to which contain some characteristics such as rare events, multiple individual records, and multi-valued sensitive attributes. We propose a new privacy model called MS(k, θ (*) )-bounding for protecting published spontaneous ADE reporting data from privacy attacks. Our model has the flexibility of varying privacy thresholds, i.e., θ (*) , for different sensitive values and takes the characteristics of SRS data into consideration. We also propose an anonymization algorithm for sanitizing the raw data to meet the requirements specified through the proposed model. Our algorithm adopts a greedy-based clustering strategy to group the records into clusters, conforming to an innovative anonymization metric aiming to minimize the privacy risk as well as maintain the data utility for ADR detection. Empirical study was conducted using FAERS dataset from 2004Q1 to 2011Q4. We compared our model with four prevailing methods, including k-anonymity, (X, Y)-anonymity, Multi-sensitive l-diversity, and (α, k)-anonymity, evaluated via two measures, Danger Ratio (DR) and Information Loss (IL), and considered three different scenarios of threshold setting for θ (*) , including uniform setting, level-wise setting and frequency-based setting. We also conducted experiments to inspect the impact of anonymized data on the strengths of discovered ADR signals. With all three

  1. Standardization of GIS datasets for emergency preparedness of NPPs

    International Nuclear Information System (INIS)

    Saindane, Shashank S.; Suri, M.M.K.; Otari, Anil; Pradeepkumar, K.S.

    2012-01-01

    Probability of a major nuclear accident which can lead to large scale release of radioactivity into environment is extremely small by the incorporation of safety systems and defence-in-depth philosophy. Nevertheless emergency preparedness for implementation of counter measures to reduce the consequences are required for all major nuclear facilities. Iodine prophylaxis, Sheltering, evacuation etc. are protective measures to be implemented for members of public in the unlikely event of any significant releases from nuclear facilities. Bhabha Atomic Research Centre has developed a GIS supported Nuclear Emergency Preparedness Program. Preparedness for Response to Nuclear emergencies needs geographical details of the affected locations specially Nuclear Power Plant Sites and nearby public domain. Geographical information system data sets which the planners are looking for will have appropriate details in order to take decision and mobilize the resources in time and follow the Standard Operating Procedures. Maps are 2-dimensional representations of our real world and GIS makes it possible to manipulate large amounts of geo-spatially referenced data and convert it into information. This has become an integral part of the nuclear emergency preparedness and response planning. This GIS datasets consisting of layers such as village settlements, roads, hospitals, police stations, shelters etc. is standardized and effectively used during the emergency. The paper focuses on the need of standardization of GIS datasets which in turn can be used as a tool to display and evaluate the impact of standoff distances and selected zones in community planning. It will also highlight the database specifications which will help in fast processing of data and analysis to derive useful and helpful information. GIS has the capability to store, manipulate, analyze and display the large amount of required spatial and tabular data. This study intends to carry out a proper response and preparedness

  2. Forest restoration: a global dataset for biodiversity and vegetation structure.

    Science.gov (United States)

    Crouzeilles, Renato; Ferreira, Mariana S; Curran, Michael

    2016-08-01

    Restoration initiatives are becoming increasingly applied around the world. Billions of dollars have been spent on ecological restoration research and initiatives, but restoration outcomes differ widely among these initiatives in part due to variable socioeconomic and ecological contexts. Here, we present the most comprehensive dataset gathered to date on forest restoration. It encompasses 269 primary studies across 221 study landscapes in 53 countries and contains 4,645 quantitative comparisons between reference ecosystems (e.g., old-growth forest) and degraded or restored ecosystems for five taxonomic groups (mammals, birds, invertebrates, herpetofauna, and plants) and five measures of vegetation structure reflecting different ecological processes (cover, density, height, biomass, and litter). We selected studies that (1) were conducted in forest ecosystems; (2) had multiple replicate sampling sites to measure indicators of biodiversity and/or vegetation structure in reference and restored and/or degraded ecosystems; and (3) used less-disturbed forests as a reference to the ecosystem under study. We recorded (1) latitude and longitude; (2) study year; (3) country; (4) biogeographic realm; (5) past disturbance type; (6) current disturbance type; (7) forest conversion class; (8) restoration activity; (9) time that a system has been disturbed; (10) time elapsed since restoration started; (11) ecological metric used to assess biodiversity; and (12) quantitative value of the ecological metric of biodiversity and/or vegetation structure for reference and restored and/or degraded ecosystems. These were the most common data available in the selected studies. We also estimated forest cover and configuration in each study landscape using a recently developed 1 km consensus land cover dataset. We measured forest configuration as the (1) mean size of all forest patches; (2) size of the largest forest patch; and (3) edge:area ratio of forest patches. Global analyses of the

  3. External validation of a publicly available computer assisted diagnostic tool for mammographic mass lesions with two high prevalence research datasets.

    Science.gov (United States)

    Benndorf, Matthias; Burnside, Elizabeth S; Herda, Christoph; Langer, Mathias; Kotter, Elmar

    2015-08-01

    for the MMassDx (inclusive) model in the DDSM data is 0.891/0.900 (MLO/CC view). AUC for the MMassDx (descriptor) model in the MM data is 0.862 and AUC for the MMassDx (inclusive) model in the MM data is 0.900. In all scenarios, MMassDx performs significantly better than clinical performance, P < 0.05 each. The authors furthermore demonstrate that the MMassDx algorithm systematically underestimates the risk of malignancy in the DDSM and MM datasets, especially when low probabilities of malignancy are assigned. The authors' results reveal that the MMassDx algorithms have good discriminatory performance but less accurate calibration when tested on two independent validation datasets. Improvement in calibration and testing in a prospective clinical population will be important steps in the pursuit of translation of these algorithms to the clinic.

  4. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  5. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Science.gov (United States)

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  6. Defining datasets and creating data dictionaries for quality improvement and research in chronic disease using routinely collected data: an ontology-driven approach

    Directory of Open Access Journals (Sweden)

    Simon de Lusignan

    2011-06-01

    Conclusion Adopting an ontology-driven approach to case finding could improve the quality of disease registers and of research based on routine data. It would offer considerable advantages over using limited datasets to define cases. This approach should be considered by those involved in research and quality improvement projects which utilise routine data.

  7. Daily precipitation grids for Austria since 1961—development and evaluation of a spatial dataset for hydroclimatic monitoring and modelling

    Science.gov (United States)

    Hiebl, Johann; Frei, Christoph

    2018-04-01

    Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.

  8. Science Mapping: A Systematic Review of the Literature

    Directory of Open Access Journals (Sweden)

    Chaomei Chen

    2017-03-01

    Full Text Available Purpose: We present a systematic review of the literature concerning major aspects of science mapping to serve two primary purposes: First, to demonstrate the use of a science mapping approach to perform the review so that researchers may apply the procedure to the review of a scientific domain of their own interest, and second, to identify major areas of research activities concerning science mapping, intellectual milestones in the development of key specialties, evolutionary stages of major specialties involved, and the dynamics of transitions from one specialty to another. Design/methodology/approach: We first introduce a theoretical framework of the evolution of a scientific specialty. Then we demonstrate a generic search strategy that can be used to construct a representative dataset of bibliographic records of a domain of research. Next, progressively synthesized co-citation networks are constructed and visualized to aid visual analytic studies of the domain’s structural and dynamic patterns and trends. Finally, trajectories of citations made by particular types of authors and articles are presented to illustrate the predictive potential of the analytic approach. Findings: The evolution of the science mapping research involves the development of a number of interrelated specialties. Four major specialties are discussed in detail in terms of four evolutionary stages: conceptualization, tool construction, application, and codification. Underlying connections between major specialties are also explored. The predictive analysis demonstrates citations trajectories of potentially transformative contributions. Research limitations: The systematic review is primarily guided by citation patterns in the dataset retrieved from the literature. The scope of the data is limited by the source of the retrieval, i.e. the Web of Science, and the composite query used. An iterative query refinement is possible if one would like to improve the data quality

  9. Provenance of Earth Science Datasets - How Deep Should One Go?

    Science.gov (United States)

    Ramapriyan, H.; Manipon, G. J. M.; Aulenbach, S.; Duggan, B.; Goldstein, J.; Hua, H.; Tan, D.; Tilmes, C.; Wilson, B. D.; Wolfe, R.; Zednik, S.

    2015-12-01

    For credibility of scientific research, transparency and reproducibility are essential. This fundamental tenet has been emphasized for centuries, and has been receiving increased attention in recent years. The Office of Management and Budget (2002) addressed reproducibility and other aspects of quality and utility of information from federal agencies. Specific guidelines from NASA (2002) are derived from the above. According to these guidelines, "NASA requires a higher standard of quality for information that is considered influential. Influential scientific, financial, or statistical information is defined as NASA information that, when disseminated, will have or does have clear and substantial impact on important public policies or important private sector decisions." For information to be compliant, "the information must be transparent and reproducible to the greatest possible extent." We present how the principles of transparency and reproducibility have been applied to NASA data supporting the Third National Climate Assessment (NCA3). The depth of trace needed of provenance of data used to derive conclusions in NCA3 depends on how the data were used (e.g., qualitatively or quantitatively). Given that the information is diligently maintained in the agency archives, it is possible to trace from a figure in the publication through the datasets, specific files, algorithm versions, instruments used for data collection, and satellites, as well as the individuals and organizations involved in each step. Such trace back permits transparency and reproducibility.

  10. A dataset from bottom trawl survey around Taiwan

    Directory of Open Access Journals (Sweden)

    Kwang-tsao Shao

    2012-05-01

    Full Text Available Bottom trawl fishery is one of the most important coastal fisheries in Taiwan both in production and economic values. However, its annual production started to decline due to overfishing since the 1980s. Its bycatch problem also damages the fishery resource seriously. Thus, the government banned the bottom fishery within 3 nautical miles along the shoreline in 1989. To evaluate the effectiveness of this policy, a four year survey was conducted from 2000–2003, in the waters around Taiwan and Penghu (Pescadore Islands, one region each year respectively. All fish specimens collected from trawling were brought back to lab for identification, individual number count and body weight measurement. These raw data have been integrated and established in Taiwan Fish Database (http://fishdb.sinica.edu.tw. They have also been published through TaiBIF (http://taibif.tw, FishBase and GBIF (website see below. This dataset contains 631 fish species and 3,529 records, making it the most complete demersal fish fauna and their temporal and spatial distributional data on the soft marine habitat in Taiwan.

  11. Integrated interpretation of overlapping AEM datasets achieved through standardisation

    Science.gov (United States)

    Sørensen, Camilla C.; Munday, Tim; Heinson, Graham

    2015-12-01

    Numerous airborne electromagnetic surveys have been acquired in Australia using a variety of systems. It is not uncommon to find two or more surveys covering the same ground, but acquired using different systems and at different times. Being able to combine overlapping datasets and get a spatially coherent resistivity-depth image of the ground can assist geological interpretation, particularly when more subtle geophysical responses are important. Combining resistivity-depth models obtained from the inversion of airborne electromagnetic (AEM) data can be challenging, given differences in system configuration, geometry, flying height and preservation or monitoring of system acquisition parameters such as waveform. In this study, we define and apply an approach to overlapping AEM surveys, acquired by fixed wing and helicopter time domain electromagnetic (EM) systems flown in the vicinity of the Goulds Dam uranium deposit in the Frome Embayment, South Australia, with the aim of mapping the basement geometry and the extent of the Billeroo palaeovalley. Ground EM soundings were used to standardise the AEM data, although results indicated that only data from the REPTEM system needed to be corrected to bring the two surveys into agreement and to achieve coherent spatial resistivity-depth intervals.

  12. A global dataset of sub-daily rainfall indices

    Science.gov (United States)

    Fowler, H. J.; Lewis, E.; Blenkinsop, S.; Guerreiro, S.; Li, X.; Barbero, R.; Chan, S.; Lenderink, G.; Westra, S.

    2017-12-01

    It is still uncertain how hydrological extremes will change with global warming as we do not fully understand the processes that cause extreme precipitation under current climate variability. The INTENSE project is using a novel and fully-integrated data-modelling approach to provide a step-change in our understanding of the nature and drivers of global precipitation extremes and change on societally relevant timescales, leading to improved high-resolution climate model representation of extreme rainfall processes. The INTENSE project is in conjunction with the World Climate Research Programme (WCRP)'s Grand Challenge on 'Understanding and Predicting Weather and Climate Extremes' and the Global Water and Energy Exchanges Project (GEWEX) Science questions. A new global sub-daily precipitation dataset has been constructed (data collection is ongoing). Metadata for each station has been calculated, detailing record lengths, missing data, station locations. A set of global hydroclimatic indices have been produced based upon stakeholder recommendations including indices that describe maximum rainfall totals and timing, the intensity, duration and frequency of storms, frequency of storms above specific thresholds and information about the diurnal cycle. This will provide a unique global data resource on sub-daily precipitation whose derived indices will be freely available to the wider scientific community.

  13. The Centennial Trends Greater Horn of Africa precipitation dataset

    Science.gov (United States)

    Funk, Chris; Nicholson, Sharon E.; Landsfeld, Martin F.; Klotter, Douglas; Peterson, Pete J.; Harrison, Laura

    2015-01-01

    East Africa is a drought prone, food and water insecure region with a highly variable climate. This complexity makes rainfall estimation challenging, and this challenge is compounded by low rain gauge densities and inhomogeneous monitoring networks. The dearth of observations is particularly problematic over the past decade, since the number of records in globally accessible archives has fallen precipitously. This lack of data coincides with an increasing scientific and humanitarian need to place recent seasonal and multi-annual East African precipitation extremes in a deep historic context. To serve this need, scientists from the UC Santa Barbara Climate Hazards Group and Florida State University have pooled their station archives and expertise to produce a high quality gridded ‘Centennial Trends’ precipitation dataset. Additional observations have been acquired from the national meteorological agencies and augmented with data provided by other universities. Extensive quality control of the data was carried out and seasonal anomalies interpolated using kriging. This paper documents the CenTrends methodology and data.

  14. Dataset on daytime outdoor thermal comfort for Belo Horizonte, Brazil.

    Science.gov (United States)

    Hirashima, Simone Queiroz da Silveira; Assis, Eleonora Sad de; Nikolopoulou, Marialena

    2016-12-01

    This dataset describe microclimatic parameters of two urban open public spaces in the city of Belo Horizonte, Brazil; physiological equivalent temperature (PET) index values and the related subjective responses of interviewees regarding thermal sensation perception and preference and thermal comfort evaluation. Individuals and behavioral characteristics of respondents were also presented. Data were collected at daytime, in summer and winter, 2013. Statistical treatment of this data was firstly presented in a PhD Thesis ("Percepção sonora e térmica e avaliação de conforto em espaços urbanos abertos do município de Belo Horizonte - MG, Brasil" (Hirashima, 2014) [1]), providing relevant information on thermal conditions in these locations and on thermal comfort assessment. Up to now, this data was also explored in the article "Daytime Thermal Comfort in Urban Spaces: A Field Study in Brazil" (Hirashima et al., in press) [2]. These references are recommended for further interpretation and discussion.

  15. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying

    2014-11-07

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  16. Statistically and Computationally Efficient Estimating Equations for Large Spatial Datasets

    KAUST Repository

    Sun, Ying; Stein, Michael L.

    2014-01-01

    For Gaussian process models, likelihood based methods are often difficult to use with large irregularly spaced spatial datasets, because exact calculations of the likelihood for n observations require O(n3) operations and O(n2) memory. Various approximation methods have been developed to address the computational difficulties. In this paper, we propose new unbiased estimating equations based on score equation approximations that are both computationally and statistically efficient. We replace the inverse covariance matrix that appears in the score equations by a sparse matrix to approximate the quadratic forms, then set the resulting quadratic forms equal to their expected values to obtain unbiased estimating equations. The sparse matrix is constructed by a sparse inverse Cholesky approach to approximate the inverse covariance matrix. The statistical efficiency of the resulting unbiased estimating equations are evaluated both in theory and by numerical studies. Our methods are applied to nearly 90,000 satellite-based measurements of water vapor levels over a region in the Southeast Pacific Ocean.

  17. Challenges and Experiences of Building Multidisciplinary Datasets across Cultures

    Science.gov (United States)

    Jamiyansharav, K.; Laituri, M.; Fernandez-Gimenez, M.; Fassnacht, S. R.; Venable, N. B. H.; Allegretti, A. M.; Reid, R.; Baival, B.; Jamsranjav, C.; Ulambayar, T.; Linn, S.; Angerer, J.

    2017-12-01

    Efficient data sharing and management are key challenges to multidisciplinary scientific research. These challenges are further complicated by adding a multicultural component. We address the construction of a complex database for social-ecological analysis in Mongolia. Funded by the National Science Foundation (NSF) Dynamics of Coupled Natural and Human (CNH) Systems, the Mongolian Rangelands and Resilience (MOR2) project focuses on the vulnerability of Mongolian pastoral systems to climate change and adaptive capacity. The MOR2 study spans over three years of fieldwork in 36 paired districts (Soum) from 18 provinces (Aimag) of Mongolia that covers steppe, mountain forest steppe, desert steppe and eastern steppe ecological zones. Our project team is composed of hydrologists, social scientists, geographers, and ecologists. The MOR2 database includes multiple ecological, social, meteorological, geospatial and hydrological datasets, as well as archives of original data and survey in multiple formats. Managing this complex database requires significant organizational skills, attention to detail and ability to communicate within collective team members from diverse disciplines and across multiple institutions in the US and Mongolia. We describe the database's rich content, organization, structure and complexity. We discuss lessons learned, best practices and recommendations for complex database management, sharing, and archiving in creating a cross-cultural and multi-disciplinary database.

  18. Automated Fault Interpretation and Extraction using Improved Supplementary Seismic Datasets

    Science.gov (United States)

    Bollmann, T. A.; Shank, R.

    2017-12-01

    During the interpretation of seismic volumes, it is necessary to interpret faults along with horizons of interest. With the improvement of technology, the interpretation of faults can be expedited with the aid of different algorithms that create supplementary seismic attributes, such as semblance and coherency. These products highlight discontinuities, but still need a large amount of human interaction to interpret faults and are plagued by noise and stratigraphic discontinuities. Hale (2013) presents a method to improve on these datasets by creating what is referred to as a Fault Likelihood volume. In general, these volumes contain less noise and do not emphasize stratigraphic features. Instead, planar features within a specified strike and dip range are highlighted. Once a satisfactory Fault Likelihood Volume is created, extraction of fault surfaces is much easier. The extracted fault surfaces are then exported to interpretation software for QC. Numerous software packages have implemented this methodology with varying results. After investigating these platforms, we developed a preferred Automated Fault Interpretation workflow.

  19. Privacy-preserving record linkage on large real world datasets.

    Science.gov (United States)

    Randall, Sean M; Ferrante, Anna M; Boyd, James H; Bauer, Jacqueline K; Semmens, James B

    2014-08-01

    Record linkage typically involves the use of dedicated linkage units who are supplied with personally identifying information to determine individuals from within and across datasets. The personally identifying information supplied to linkage units is separated from clinical information prior to release by data custodians. While this substantially reduces the risk of disclosure of sensitive information, some residual risks still exist and remain a concern for some custodians. In this paper we trial a method of record linkage which reduces privacy risk still further on large real world administrative data. The method uses encrypted personal identifying information (bloom filters) in a probability-based linkage framework. The privacy preserving linkage method was tested on ten years of New South Wales (NSW) and Western Australian (WA) hospital admissions data, comprising in total over 26 million records. No difference in linkage quality was found when the results were compared to traditional probabilistic methods using full unencrypted personal identifiers. This presents as a possible means of reducing privacy risks related to record linkage in population level research studies. It is hoped that through adaptations of this method or similar privacy preserving methods, risks related to information disclosure can be reduced so that the benefits of linked research taking place can be fully realised. Copyright © 2013 Elsevier Inc. All rights reserved.

  20. Genomics dataset on unclassified published organism (patent US 7547531

    Directory of Open Access Journals (Sweden)

    Mohammad Mahfuz Ali Khan Shawan

    2016-12-01

    Full Text Available Nucleotide (DNA sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531 is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5% which was followed by GP445198 (61.8% and GP445189 (59.44%, while lowest was in GP445178 (24.39%. In addition, New England BioLabs (NEB database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms’ hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.

  1. Structural dataset for the PPARγ V290M mutant

    Directory of Open Access Journals (Sweden)

    Ana C. Puhl

    2016-06-01

    Full Text Available Loss-of-function mutation V290M in the ligand-binding domain of the peroxisome proliferator activated receptor γ (PPARγ is associated with a ligand resistance syndrome (PLRS, characterized by partial lipodystrophy and severe insulin resistance. In this data article we discuss an X-ray diffraction dataset that yielded the structure of PPARγ LBD V290M mutant refined at 2.3 Å resolution, that allowed building of 3D model of the receptor mutant with high confidence and revealed continuous well-defined electron density for the partial agonist diclofenac bound to hydrophobic pocket of the PPARγ. These structural data provide significant insights into molecular basis of PLRS caused by V290M mutation and are correlated with the receptor disability of rosiglitazone binding and increased affinity for corepressors. Furthermore, our structural evidence helps to explain clinical observations which point out to a failure to restore receptor function by the treatment with a full agonist of PPARγ, rosiglitazone.

  2. A wavelet-coupled support vector machine model for forecasting global incident solar radiation using limited meteorological dataset

    International Nuclear Information System (INIS)

    Deo, Ravinesh C.; Wen, Xiaohu; Qi, Feng

    2016-01-01

    Highlights: • A forecasting model for short- and long-term global incident solar radiation (R_n) has been developed. • The support vector machine and discrete wavelet transformation algorithm has been integrated. • The precision of the wavelet-coupled hybrid model is assessed using several prediction score metrics. • The proposed model is an appealing tool for forecasting R_n in the present study region. - Abstract: A solar radiation forecasting model can be utilized is a scientific contrivance for investigating future viability of solar energy potentials. In this paper, a wavelet-coupled support vector machine (W-SVM) model was adopted to forecast global incident solar radiation based on the sunshine hours (S_t), minimum temperature (T_m_a_x), maximum temperature (T_m_a_x), windspeed (U), evaporation (E) and precipitation (P) as the predictor variables. To ascertain conclusive results, the merit of the W-SVM was benchmarked with the classical SVM model. For daily forecasting, sixteen months of data (01-March-2014 to 30-June-2015) partitioned into the train (65%) and test (35%) set for the three metropolitan stations (Brisbane City, Cairns Aero and Townsville Aero) were utilized. Data were decomposed into their wavelet sub-series by discrete wavelet transformation algorithm and summed up to create new series with one approximation and four levels of detail using Daubechies-2 mother wavelet. For daily forecasting, six model scenarios were formulated where the number of input was increased and the forecast was assessed by statistical metrics (correlation coefficient r; Willmott’s index d; Nash-Sutcliffe coefficient E_N_S; peak deviation P_d_v), distribution statistics and prediction errors (mean absolute error MAE; root mean square error RMSE; mean absolute percentage error MAPE; relative root mean square error RMSE). Results for daily forecasts showed that the W-SVM model outperformed the classical SVM model for optimum input combinations. A sensitivity analysis with single predictor variables yielded the best performance with S_t as an input, confirming that the largest contributions for forecasting solar energy is derived from the sunshine hours per day compared to the other prescribed inputs. All six inputs were required in the optimum W-SVM for Brisbane and Cairns stations to yield r = 0.928, d = 0.927, E_N_S = 0.858, P_d_v = 1.757%, MAE = 1.819 MJ m"−"2 and r = 0.881, d = 0.870, E_N_S = 0.762, P_d_v = 9.633%, MAE = 2.086 MJ m"−"2, respectively. However, for Townsville, the time-series of S_t, T_m_i_n and T_m_a_x and E were required in the optimum model to yield r = 0.858, d = 0.886, E_N_S = 0.722, P_d_v = 10.282% and MAE = 2.167 MJ m"−"2. In terms of the relative model errors over daily forecast horizon, W-SVM model was the most accurate precise for Townsville (RRMSE = 12.568%; MAPE = 12.666%) followed by Brisbane (13.313%; 13.872%) and Cairns (14.467%; 15.675%) weather stations. A set of alternative models developed over the monthly, seasonal and annual forecast horizons verified the long-term forecasting skill, where lagged inputs of T_m_a_x, T_m_i_n, E, P and VP for Roma Post Office and Toowoomba Regional stations were employed. The wavelet-coupled model performed well, with r = 0.965, d = 0.964, P_d_v = 2.249%, RRMSE = 5.942% and MAPE = 4.696% (Roma) and r = 0.958, d = 0.943, P_d_v = 0.979%, RRMSE = 7.66% and MAPE = 6.20% (Toowoomba). Accordingly, the results conclusively ascertained the importance of wavelet-coupled SVM predictive model as a qualified stratagem for short and long-term forecasting of solar energy for assessment of solar energy prospectivity in this study region.

  3. Avulsion research using flume experiments and highly accurate and temporal-rich SfM datasets

    Science.gov (United States)

    Javernick, L.; Bertoldi, W.; Vitti, A.

    2017-12-01

    SfM's ability to produce high-quality, large-scale digital elevation models (DEMs) of complicated and rapidly evolving systems has made it a valuable technique for low-budget researchers and practitioners. While SfM has provided valuable datasets that capture single-flood event DEMs, there is an increasing scientific need to capture higher temporal resolution datasets that can quantify the evolutionary processes instead of pre- and post-flood snapshots. However, flood events' dangerous field conditions and image matching challenges (e.g. wind, rain) prevent quality SfM-image acquisition. Conversely, flume experiments offer opportunities to document flood events, but achieving consistent and accurate DEMs to detect subtle changes in dry and inundated areas remains a challenge for SfM (e.g. parabolic error signatures).This research aimed at investigating the impact of naturally occurring and manipulated avulsions on braided river morphology and on the encroachment of floodplain vegetation, using laboratory experiments. This required DEMs with millimeter accuracy and precision and at a temporal resolution to capture the processes. SfM was chosen as it offered the most practical method. Through redundant local network design and a meticulous ground control point (GCP) survey with a Leica Total Station in red laser configuration (reported 2 mm accuracy), the SfM residual errors compared to separate ground truthing data produced mean errors of 1.5 mm (accuracy) and standard deviations of 1.4 mm (precision) without parabolic error signatures. Lighting conditions in the flume were limited to uniform, oblique, and filtered LED strips, which removed glint and thus improved bed elevation mean errors to 4 mm, but errors were further reduced by means of an open source software for refraction correction. The obtained datasets have provided the ability to quantify how small flood events with avulsion can have similar morphologic and vegetation impacts as large flood events

  4. High End Visualization of Geophysical Datasets Using Immersive Technology: The SIO Visualization Center.

    Science.gov (United States)

    Newman, R. L.

    2002-12-01

    How many images can you display at one time with Power Point without getting "postage stamps"? Do you have fantastic datasets that you cannot view because your computer is too slow/small? Do you assume a few 2-D images of a 3-D picture are sufficient? High-end visualization centers can minimize and often eliminate these problems. The new visualization center [http://siovizcenter.ucsd.edu] at Scripps Institution of Oceanography [SIO] immerses users into a virtual world by projecting 3-D images onto a Panoram GVR-120E wall-sized floor-to-ceiling curved screen [7' x 23'] that has 3.2 mega-pixels of resolution. The Infinite Reality graphics subsystem is driven by a single-pipe SGI Onyx 3400 with a system bandwidth of 44 Gbps. The Onyx is powered by 16 MIPS R12K processors and 16 GB of addressable memory. The system is also equipped with transmitters and LCD shutter glasses which permit stereographic 3-D viewing of high-resolution images. This center is ideal for groups of up to 60 people who can simultaneously view these large-format images. A wide range of hardware and software is available, giving the users a totally immersive working environment in which to display, analyze, and discuss large datasets. The system enables simultaneous display of video and audio streams from sources such as SGI megadesktop and stereo megadesktop, S-VHS video, DVD video, and video from a Macintosh or PC. For instance, one-third of the screen might be displaying S-VHS video from a remotely-operated-vehicle [ROV], while the remaining portion of the screen might be used for an interactive 3-D flight over the same parcel of seafloor. The video and audio combinations using this system are numerous, allowing users to combine and explore data and images in innovative ways, greatly enhancing scientists' ability to visualize, understand and collaborate on complex datasets. In the not-distant future, with the rapid growth in networking speeds in the US, it will be possible for Earth Sciences

  5. Image-based Exploration of Iso-surfaces for Large Multi- Variable Datasets using Parameter Space.

    KAUST Repository

    Binyahib, Roba S.

    2013-05-13

    With an increase in processing power, more complex simulations have resulted in larger data size, with higher resolution and more variables. Many techniques have been developed to help the user to visualize and analyze data from such simulations. However, dealing with a large amount of multivariate data is challenging, time- consuming and often requires high-end clusters. Consequently, novel visualization techniques are needed to explore such data. Many users would like to visually explore their data and change certain visual aspects without the need to use special clusters or having to load a large amount of data. This is the idea behind explorable images (EI). Explorable images are a novel approach that provides limited interactive visualization without the need to re-render from the original data [40]. In this work, the concept of EI has been used to create a workflow that deals with explorable iso-surfaces for scalar fields in a multivariate, time-varying dataset. As a pre-processing step, a set of iso-values for each scalar field is inferred and extracted from a user-assisted sampling technique in time-parameter space. These iso-values are then used to generate iso- surfaces that are then pre-rendered (from a fixed viewpoint) along with additional buffers (i.e. normals, depth, values of other fields, etc.) to provide a compressed representation of iso-surfaces in the dataset. We present a tool that at run-time allows the user to interactively browse and calculate a combination of iso-surfaces superimposed on each other. The result is the same as calculating multiple iso- surfaces from the original data but without the memory and processing overhead. Our tool also allows the user to change the (scalar) values superimposed on each of the surfaces, modify their color map, and interactively re-light the surfaces. We demonstrate the effectiveness of our approach over a multi-terabyte combustion dataset. We also illustrate the efficiency and accuracy of our

  6. Large Survey Database: A Distributed Framework for Storage and Analysis of Large Datasets

    Science.gov (United States)

    Juric, Mario

    2011-01-01

    The Large Survey Database (LSD) is a Python framework and DBMS for distributed storage, cross-matching and querying of large survey catalogs (>10^9 rows, >1 TB). The primary driver behind its development is the analysis of Pan-STARRS PS1 data. It is specifically optimized for fast queries and parallel sweeps of positionally and temporally indexed datasets. It transparently scales to more than >10^2 nodes, and can be made to function in "shared nothing" architectures. An LSD database consists of a set of vertically and horizontally partitioned tables, physically stored as compressed HDF5 files. Vertically, we partition the tables into groups of related columns ('column groups'), storing together logically related data (e.g., astrometry, photometry). Horizontally, the tables are partitioned into partially overlapping ``cells'' by position in space (lon, lat) and time (t). This organization allows for fast lookups based on spatial and temporal coordinates, as well as data and task distribution. The design was inspired by the success of Google BigTable (Chang et al., 2006). Our programming model is a pipelined extension of MapReduce (Dean and Ghemawat, 2004). An SQL-like query language is used to access data. For complex tasks, map-reduce ``kernels'' that operate on query results on a per-cell basis can be written, with the framework taking care of scheduling and execution. The combination leverages users' familiarity with SQL, while offering a fully distributed computing environment. LSD adds little overhead compared to direct Python file I/O. In tests, we sweeped through 1.1 Grows of PanSTARRS+SDSS data (220GB) less than 15 minutes on a dual CPU machine. In a cluster environment, we achieved bandwidths of 17Gbits/sec (I/O limited). Based on current experience, we believe LSD should scale to be useful for analysis and storage of LSST-scale datasets. It can be downloaded from http://mwscience.net/lsd.

  7. Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

    Science.gov (United States)

    Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

    2011-01-01

    Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus

  8. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression

    Directory of Open Access Journals (Sweden)

    Dongdong eLin

    2014-10-01

    Full Text Available A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: 1 treat the biomarker identification in each single study as a task and then combine them by multitask learning; 2 group variables from all studies for identifying significant genes; 3 enforce sparse constraint on groups of variables to overcome the ‘small sample, but large variables’ problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed

  9. Comparative and joint analysis of two metagenomic datasets from a biogas fermenter obtained by 454-pyrosequencing.

    Directory of Open Access Journals (Sweden)

    Sebastian Jaenicke

    Full Text Available Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen

  10. VarB Plus: An Integrated Tool for Visualization of Genome Variation Datasets

    KAUST Repository

    Hidayah, Lailatul

    2012-07-01

    Research on genomic sequences has been improving significantly as more advanced technology for sequencing has been developed. This opens enormous opportunities for sequence analysis. Various analytical tools have been built for purposes such as sequence assembly, read alignments, genome browsing, comparative genomics, and visualization. From the visualization perspective, there is an increasing trend towards use of large-scale computation. However, more than power is required to produce an informative image. This is a challenge that we address by providing several ways of representing biological data in order to advance the inference endeavors of biologists. This thesis focuses on visualization of variations found in genomic sequences. We develop several visualization functions and embed them in an existing variation visualization tool as extensions. The tool we improved is named VarB, hence the nomenclature for our enhancement is VarB Plus. To the best of our knowledge, besides VarB, there is no tool that provides the capability of dynamic visualization of genome variation datasets as well as statistical analysis. Dynamic visualization allows users to toggle different parameters on and off and see the results on the fly. The statistical analysis includes Fixation Index, Relative Variant Density, and Tajima’s D. Hence we focused our efforts on this tool. The scope of our work includes plots of per-base genome coverage, Principal Coordinate Analysis (PCoA), integration with a read alignment viewer named LookSeq, and visualization of geo-biological data. In addition to description of embedded functionalities, significance, and limitations, future improvements are discussed. The result is four extensions embedded successfully in the original tool, which is built on the Qt framework in C++. Hence it is portable to numerous platforms. Our extensions have shown acceptable execution time in a beta testing with various high-volume published datasets, as well as positive

  11. Oil palm mapping for Malaysia using PALSAR-2 dataset

    Science.gov (United States)

    Gong, P.; Qi, C. Y.; Yu, L.; Cracknell, A.

    2016-12-01

    Oil palm is one of the most productive vegetable oil crops in the world. The main oil palm producing areas are distributed in humid tropical areas such as Malaysia, Indonesia, Thailand, western and central Africa, northern South America, and central America. Increasing market demands, high yields and low production costs of palm oil are the primary factors driving large-scale commercial cultivation of oil palm, especially in Malaysia and Indonesia. Global demand for palm oil has grown exponentially during the last 50 years, and the expansion of oil palm plantations is linked directly to the deforestation of natural forests. Satellite remote sensing plays an important role in monitoring expansion of oil palm. However, optical remote sensing images are difficult to acquire in the Tropics because of the frequent occurrence of thick cloud cover. This problem has led to the use of data obtained by synthetic aperture radar (SAR), which is a sensor capable of all-day/all-weather observation for studies in the Tropics. In this study, the ALOS-2 (Advanced Land Observing Satellite) PALSAR-2 (Phased Array type L-band SAR) datasets for year 2015 were used as an input to a support vector machine (SVM) based machine learning algorithm. Oil palm/non-oil palm samples were collected using a hexagonal equal-area sampling design. High-resolution images in Google Earth and PALSAR-2 imagery were used in human photo-interpretation to separate oil palm from others (i.e. cropland, forest, grassland, shrubland, water, hard surface and bareland). The characteristics of oil palms from various aspects, including PALSAR-2 backscattering coefficients (HH, HV), terrain and climate by using this sample set were further explored to post-process the SVM output. The average accuracy of oil palm type is better than 80% in the final oil palm map for Malaysia.

  12. Automatic aortic root segmentation in CTA whole-body dataset

    Science.gov (United States)

    Gao, Xinpei; Kitslaar, Pieter H.; Scholte, Arthur J. H. A.; Lelieveldt, Boudewijn P. F.; Dijkstra, Jouke; Reiber, Johan H. C.

    2016-03-01

    Trans-catheter aortic valve replacement (TAVR) is an evolving technique for patients with serious aortic stenosis disease. Typically, in this application a CTA data set is obtained of the patient's arterial system from the subclavian artery to the femoral arteries, to evaluate the quality of the vascular access route and analyze the aortic root to determine if and which prosthesis should be used. In this paper, we concentrate on the automated segmentation of the aortic root. The purpose of this study was to automatically segment the aortic root in computed tomography angiography (CTA) datasets to support TAVR procedures. The method in this study includes 4 major steps. First, the patient's cardiac CTA image was resampled to reduce the computation time. Next, the cardiac CTA image was segmented using an atlas-based approach. The most similar atlas was selected from a total of 8 atlases based on its image similarity to the input CTA image. Third, the aortic root segmentation from the previous step was transferred to the patient's whole-body CTA image by affine registration and refined in the fourth step using a deformable subdivision surface model fitting procedure based on image intensity. The pipeline was applied to 20 patients. The ground truth was created by an analyst who semi-automatically corrected the contours of the automatic method, where necessary. The average Dice similarity index between the segmentations of the automatic method and the ground truth was found to be 0.965±0.024. In conclusion, the current results are very promising.

  13. Appraising city-scale pollution monitoring capabilities of multi-satellite datasets using portable pollutant monitors

    Science.gov (United States)

    Aliyu, Yahaya A.; Botai, Joel O.

    2018-04-01

    The retrieval characteristics for a city-scale satellite experiment was explored over a Nigerian city. The study evaluated carbon monoxide and aerosol contents in the city atmosphere. We utilized the MSA Altair 5× gas detector and CW-HAT200 particulate counter to investigate the city-scale monitoring capabilities of satellite pollution observing instruments; atmospheric infrared sounder (AIRS), measurement of pollution in the troposphere (MOPITT), moderate resolution imaging spectroradiometer (MODIS), multi-angle imaging spectroradiometer (MISR) and ozone monitoring instrument (OMI). To achieve this, we employed the Kriging interpolation technique to collocate the satellite pollutant estimations over 19 ground sample sites for the period of 2015-2016. The portable pollutant devices were validated using the WHO air filter sampling model. To determine the city-scale performance of the satellite datasets, performance indicators: correlation coefficient, model efficiency, reliability index and root mean square error, were adopted as measures. The comparative analysis revealed that MOPITT carbon monoxide (CO) and MODIS aerosol optical depth (AOD) estimates are the appropriate satellite measurements for ground equivalents in Zaria, Nigeria. Our findings were within the acceptable limits of similar studies that utilized reference stations. In conclusion, this study offers direction to Nigeria's air quality policy organizers about available alternative air pollution measurements for mitigating air quality effects within its limited resource environment.

  14. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  15. Calibrating a numerical model's morphology using high-resolution spatial and temporal datasets from multithread channel flume experiments.

    Science.gov (United States)

    Javernick, L.; Bertoldi, W.; Redolfi, M.

    2017-12-01

    Accessing or acquiring high quality, low-cost topographic data has never been easier due to recent developments of the photogrammetric techniques of Structure-from-Motion (SfM). Researchers can acquire the necessary SfM imagery with various platforms, with the ability to capture millimetre resolution and accuracy, or large-scale areas with the help of unmanned platforms. Such datasets in combination with numerical modelling have opened up new opportunities to study river environments physical and ecological relationships. While numerical models overall predictive accuracy is most influenced by topography, proper model calibration requires hydraulic data and morphological data; however, rich hydraulic and morphological datasets remain scarce. This lack in field and laboratory data has limited model advancement through the inability to properly calibrate, assess sensitivity, and validate the models performance. However, new time-lapse imagery techniques have shown success in identifying instantaneous sediment transport in flume experiments and their ability to improve hydraulic model calibration. With new capabilities to capture high resolution spatial and temporal datasets of flume experiments, there is a need to further assess model performance. To address this demand, this research used braided river flume experiments and captured time-lapse observed sediment transport and repeat SfM elevation surveys to provide unprecedented spatial and temporal datasets. Through newly created metrics that quantified observed and modeled activation, deactivation, and bank erosion rates, the numerical model Delft3d was calibrated. This increased temporal data of both high-resolution time series and long-term temporal coverage provided significantly improved calibration routines that refined calibration parameterization. Model results show that there is a trade-off between achieving quantitative statistical and qualitative morphological representations. Specifically, statistical

  16. Learning analytics: Dataset for empirical evaluation of entry requirements into engineering undergraduate programs in a Nigerian university.

    Science.gov (United States)

    Odukoya, Jonathan A; Popoola, Segun I; Atayero, Aderemi A; Omole, David O; Badejo, Joke A; John, Temitope M; Olowo, Olalekan O

    2018-04-01

    In Nigerian universities, enrolment into any engineering undergraduate program requires that the minimum entry criteria established by the National Universities Commission (NUC) must be satisfied. Candidates seeking admission to study engineering discipline must have reached a predetermined entry age and met the cut-off marks set for Senior School Certificate Examination (SSCE), Unified Tertiary Matriculation Examination (UTME), and the post-UTME screening. However, limited effort has been made to show that these entry requirements eventually guarantee successful academic performance in engineering programs because the data required for such validation are not readily available. In this data article, a comprehensive dataset for empirical evaluation of entry requirements into engineering undergraduate programs in a Nigerian university is presented and carefully analyzed. A total sample of 1445 undergraduates that were admitted between 2005 and 2009 to study Chemical Engineering (CHE), Civil Engineering (CVE), Computer Engineering (CEN), Electrical and Electronics Engineering (EEE), Information and Communication Engineering (ICE), Mechanical Engineering (MEE), and Petroleum Engineering (PET) at Covenant University, Nigeria were randomly selected. Entry age, SSCE aggregate, UTME score, Covenant University Scholastic Aptitude Screening (CUSAS) score, and the Cumulative Grade Point Average (CGPA) of the undergraduates were obtained from the Student Records and Academic Affairs unit. In order to facilitate evidence-based evaluation, the robust dataset is made publicly available in a Microsoft Excel spreadsheet file. On yearly basis, first-order descriptive statistics of the dataset are presented in tables. Box plot representations, frequency distribution plots, and scatter plots of the dataset are provided to enrich its value. Furthermore, correlation and linear regression analyses are performed to understand the relationship between the entry requirements and the

  17. USGS Watershed Boundary Dataset (WBD) Overlay Map Service from The National Map - National Geospatial Data Asset (NGDA) Watershed Boundary Dataset (WBD)

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The Watershed Boundary Dataset (WBD) from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics....

  18. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: National Coal Resource Dataset System

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the coal mine density and storage volumes within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the...

  19. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Anthropogenic Barrier Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the dam density and storage volumes within individual, local NHDPlusV2 catchments and upstream, contributing watersheds based on the National...

  20. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 (Version 2.1) Catchments for the Conterminous United States: National Elevation Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset represents the elevation values within individual local NHDPlusV2 catchments and upstream, contributing watersheds based on the National Elevation...