WorldWideScience

Sample records for internally consistent dataset

  1. Parton Distributions based on a Maximally Consistent Dataset

    Science.gov (United States)

    Rojo, Juan

    2016-04-01

    The choice of data that enters a global QCD analysis can have a substantial impact on the resulting parton distributions and their predictions for collider observables. One of the main reasons for this has to do with the possible presence of inconsistencies, either internal within an experiment or external between different experiments. In order to assess the robustness of the global fit, different definitions of a conservative PDF set, that is, a PDF set based on a maximally consistent dataset, have been introduced. However, these approaches are typically affected by theory biases in the selection of the dataset. In this contribution, after a brief overview of recent NNPDF developments, we propose a new, fully objective, definition of a conservative PDF set, based on the Bayesian reweighting approach. Using the new NNPDF3.0 framework, we produce various conservative sets, which turn out to be mutually in agreement within the respective PDF uncertainties, as well as with the global fit. We explore some of their implications for LHC phenomenology, finding also good consistency with the global fit result. These results provide a non-trivial validation test of the new NNPDF3.0 fitting methodology, and indicate that possible inconsistencies in the fitted dataset do not affect substantially the global fit PDFs.

  2. Comparison of recent SnIa datasets

    International Nuclear Information System (INIS)

    Sanchez, J.C. Bueno; Perivolaropoulos, L.; Nesseris, S.

    2009-01-01

    We rank the six latest Type Ia supernova (SnIa) datasets (Constitution (C), Union (U), ESSENCE (Davis) (E), Gold06 (G), SNLS 1yr (S) and SDSS-II (D)) in the context of the Chevalier-Polarski-Linder (CPL) parametrization w(a) = w 0 +w 1 (1−a), according to their Figure of Merit (FoM), their consistency with the cosmological constant (ΛCDM), their consistency with standard rulers (Cosmic Microwave Background (CMB) and Baryon Acoustic Oscillations (BAO)) and their mutual consistency. We find a significant improvement of the FoM (defined as the inverse area of the 95.4% parameter contour) with the number of SnIa of these datasets ((C) highest FoM, (U), (G), (D), (E), (S) lowest FoM). Standard rulers (CMB+BAO) have a better FoM by about a factor of 3, compared to the highest FoM SnIa dataset (C). We also find that the ranking sequence based on consistency with ΛCDM is identical with the corresponding ranking based on consistency with standard rulers ((S) most consistent, (D), (C), (E), (U), (G) least consistent). The ranking sequence of the datasets however changes when we consider the consistency with an expansion history corresponding to evolving dark energy (w 0 ,w 1 ) = (−1.4,2) crossing the phantom divide line w = −1 (it is practically reversed to (G), (U), (E), (S), (D), (C)). The SALT2 and MLCS2k2 fitters are also compared and some peculiar features of the SDSS-II dataset when standardized with the MLCS2k2 fitter are pointed out. Finally, we construct a statistic to estimate the internal consistency of a collection of SnIa datasets. We find that even though there is good consistency among most samples taken from the above datasets, this consistency decreases significantly when the Gold06 (G) dataset is included in the sample

  3. Choice, internal consistency, and rationality

    OpenAIRE

    Aditi Bhattacharyya; Prasanta K. Pattanaik; Yongsheng Xu

    2010-01-01

    The classical theory of rational choice is built on several important internal consistency conditions. In recent years, the reasonableness of those internal consistency conditions has been questioned and criticized, and several responses to accommodate such criticisms have been proposed in the literature. This paper develops a general framework to accommodate the issues raised by the criticisms of classical rational choice theory, and examines the broad impact of these criticisms from both no...

  4. Internationally coordinated glacier monitoring: strategy and datasets

    Science.gov (United States)

    Hoelzle, Martin; Armstrong, Richard; Fetterer, Florence; Gärtner-Roer, Isabelle; Haeberli, Wilfried; Kääb, Andreas; Kargel, Jeff; Nussbaumer, Samuel; Paul, Frank; Raup, Bruce; Zemp, Michael

    2014-05-01

    Internationally coordinated monitoring of long-term glacier changes provide key indicator data about global climate change and began in the year 1894 as an internationally coordinated effort to establish standardized observations. Today, world-wide monitoring of glaciers and ice caps is embedded within the Global Climate Observing System (GCOS) in support of the United Nations Framework Convention on Climate Change (UNFCCC) as an important Essential Climate Variable (ECV). The Global Terrestrial Network for Glaciers (GTN-G) was established in 1999 with the task of coordinating measurements and to ensure the continuous development and adaptation of the international strategies to the long-term needs of users in science and policy. The basic monitoring principles must be relevant, feasible, comprehensive and understandable to a wider scientific community as well as to policy makers and the general public. Data access has to be free and unrestricted, the quality of the standardized and calibrated data must be high and a combination of detailed process studies at selected field sites with global coverage by satellite remote sensing is envisaged. Recently a GTN-G Steering Committee was established to guide and advise the operational bodies responsible for the international glacier monitoring, which are the World Glacier Monitoring Service (WGMS), the US National Snow and Ice Data Center (NSIDC), and the Global Land Ice Measurements from Space (GLIMS) initiative. Several online databases containing a wealth of diverse data types having different levels of detail and global coverage provide fast access to continuously updated information on glacier fluctuation and inventory data. For world-wide inventories, data are now available through (a) the World Glacier Inventory containing tabular information of about 130,000 glaciers covering an area of around 240,000 km2, (b) the GLIMS-database containing digital outlines of around 118,000 glaciers with different time stamps and

  5. Datasets collected in general practice: an international comparison using the example of obesity.

    Science.gov (United States)

    Sturgiss, Elizabeth; van Boven, Kees

    2018-06-04

    International datasets from general practice enable the comparison of how conditions are managed within consultations in different primary healthcare settings. The Australian Bettering the Evaluation and Care of Health (BEACH) and TransHIS from the Netherlands collect in-consultation general practice data that have been used extensively to inform local policy and practice. Obesity is a global health issue with different countries applying varying approaches to management. The objective of the present paper is to compare the primary care management of obesity in Australia and the Netherlands using data collected from consultations. Despite the different prevalence in obesity in the two countries, the number of patients per 1000 patient-years seen with obesity is similar. Patients in Australia with obesity are referred to allied health practitioners more often than Dutch patients. Without quality general practice data, primary care researchers will not have data about the management of conditions within consultations. We use obesity to highlight the strengths of these general practice data sources and to compare their differences. What is known about the topic? Australia had one of the longest-running consecutive datasets about general practice activity in the world, but it has recently lost government funding. The Netherlands has a longitudinal general practice dataset of information collected within consultations since 1985. What does this paper add? We discuss the benefits of general practice-collected data in two countries. Using obesity as a case example, we compare management in general practice between Australia and the Netherlands. This type of analysis should start all international collaborations of primary care management of any health condition. Having a national general practice dataset allows international comparisons of the management of conditions with primary care. Without a current, quality general practice dataset, primary care researchers will not

  6. Internally consistent thermodynamic data for aqueous species in the system Na-K-Al-Si-O-H-Cl

    Science.gov (United States)

    Miron, George D.; Wagner, Thomas; Kulik, Dmitrii A.; Heinrich, Christoph A.

    2016-08-01

    A large amount of critically evaluated experimental data on mineral solubility, covering the entire Na-K-Al-Si-O-H-Cl system over wide ranges in temperature and pressure, was used to simultaneously refine the standard state Gibbs energies of aqueous ions and complexes in the framework of the revised Helgeson-Kirkham-Flowers equation of state. The thermodynamic properties of the solubility-controlling minerals were adopted from the internally consistent dataset of Holland and Powell (2002; Thermocalc dataset ds55). The global optimization of Gibbs energies of aqueous species, performed with the GEMSFITS code (Miron et al., 2015), was set up in such a way that the association equilibria for ion pairs and complexes, independently derived from conductance and potentiometric data, are always maintained. This was achieved by introducing reaction constraints into the parameter optimization that adjust Gibbs energies of complexes by their respective Gibbs energy effects of reaction, whenever the Gibbs energies of reactant species (ions) are changed. The optimized thermodynamic dataset is reported with confidence intervals for all parameters evaluated by Monte Carlo trial calculations. The new thermodynamic dataset is shown to reproduce all available fluid-mineral phase equilibria and mineral solubility data with good accuracy and precision over wide ranges in temperature (25-800 °C), pressure (1 bar to 5 kbar) and composition (salt concentrations up to 5 molal). The global data optimization process adopted in this study can be readily repeated any time when extensions to new chemical elements and species are needed, when new experimental data become available, or when a different aqueous activity model or equation of state should be used. This work serves as a proof of concept that our optimization strategy is feasible and successful in generating a thermodynamic dataset reproducing all fluid-mineral and aqueous speciation equilibria in the Na-K-Al-Si-O-H-Cl system within

  7. Delimiting Coefficient a from Internal Consistency and Unidimensionality

    Science.gov (United States)

    Sijtsma, Klaas

    2015-01-01

    I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient a to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient a is a lower bound to reliability and that concepts of internal consistency and…

  8. Internal Branding and Employee Brand Consistent Behaviours

    DEFF Research Database (Denmark)

    Mazzei, Alessandra; Ravazzani, Silvia

    2017-01-01

    constitutive processes. In particular, the paper places emphasis on the role and kinds of communication practices as a central part of the nonnormative and constitutive internal branding process. The paper also discusses an empirical study based on interviews with 32 Italian and American communication managers...... and 2 focus groups with Italian communication managers. Findings show that, in order to enhance employee brand consistent behaviours, the most effective communication practices are those characterised as enablement-oriented. Such a communication creates the organizational conditions adequate to sustain......Employee behaviours conveying brand values, named brand consistent behaviours, affect the overall brand evaluation. Internal branding literature highlights a knowledge gap in terms of communication practices intended to sustain such behaviours. This study contributes to the development of a non...

  9. Comment on the internal consistency of thermodynamic databases supporting repository safety assessments

    International Nuclear Information System (INIS)

    Arthur, R.C.

    2001-11-01

    This report addresses the concept of internal consistency and its relevance to the reliability of thermodynamic databases used in repository safety assessments. In addition to being internally consistent, a reliable database should be accurate over a range of relevant temperatures and pressures, complete in the sense that all important aqueous species, gases and solid phases are represented, and traceable to original experimental results. No single definition of internal consistency need to be universally accepted as the most appropriate under all conditions, however. As a result, two databases that are each internally consistent may be inconsistent with respect to each other, and a database derived from two or more such databases must itself be internally inconsistent. The consequences of alternative definitions that are reasonably attributable to the concept of internal consistency can be illustrated with reference to the thermodynamic database supporting SKB's recent SR 97 safety assessment. This database is internally inconsistent because it includes equilibrium constants calculated over a range of temperatures: using conflicting reference values for some solids, gases and aqueous species that are common to two internally consistent databases (the OECD/NEA database for radioelements and SUPCRT databases for non-radioactive elements) that serve as source databases for the SR 97 TDB, using different definitions in these source databases of standard states for condensed phases and aqueous species, based on different mathematical expressions used in these source databases representing the temperature dependence of the heat capacity, and based on different chemical models adopted in these source databases for the aqueous phase. The importance of such inconsistencies must be considered in relation to the other database reliability criteria noted above, however. Thus, accepting a certain level of internal inconsistency in a database it is probably preferable to use a

  10. Comment on the internal consistency of thermodynamic databases supporting repository safety assessments

    Energy Technology Data Exchange (ETDEWEB)

    Arthur, R.C. [Monitor Scientific, LLC, Denver, CO (United States)

    2001-11-01

    This report addresses the concept of internal consistency and its relevance to the reliability of thermodynamic databases used in repository safety assessments. In addition to being internally consistent, a reliable database should be accurate over a range of relevant temperatures and pressures, complete in the sense that all important aqueous species, gases and solid phases are represented, and traceable to original experimental results. No single definition of internal consistency need to be universally accepted as the most appropriate under all conditions, however. As a result, two databases that are each internally consistent may be inconsistent with respect to each other, and a database derived from two or more such databases must itself be internally inconsistent. The consequences of alternative definitions that are reasonably attributable to the concept of internal consistency can be illustrated with reference to the thermodynamic database supporting SKB's recent SR 97 safety assessment. This database is internally inconsistent because it includes equilibrium constants calculated over a range of temperatures: using conflicting reference values for some solids, gases and aqueous species that are common to two internally consistent databases (the OECD/NEA database for radioelements and SUPCRT databases for non-radioactive elements) that serve as source databases for the SR 97 TDB, using different definitions in these source databases of standard states for condensed phases and aqueous species, based on different mathematical expressions used in these source databases representing the temperature dependence of the heat capacity, and based on different chemical models adopted in these source databases for the aqueous phase. The importance of such inconsistencies must be considered in relation to the other database reliability criteria noted above, however. Thus, accepting a certain level of internal inconsistency in a database it is probably preferable to

  11. Comparison of Shallow Survey 2012 Multibeam Datasets

    Science.gov (United States)

    Ramirez, T. M.

    2012-12-01

    The purpose of the Shallow Survey common dataset is a comparison of the different technologies utilized for data acquisition in the shallow survey marine environment. The common dataset consists of a series of surveys conducted over a common area of seabed using a variety of systems. It provides equipment manufacturers the opportunity to showcase their latest systems while giving hydrographic researchers and scientists a chance to test their latest algorithms on the dataset so that rigorous comparisons can be made. Five companies collected data for the Common Dataset in the Wellington Harbor area in New Zealand between May 2010 and May 2011; including Kongsberg, Reson, R2Sonic, GeoAcoustics, and Applied Acoustics. The Wellington harbor and surrounding coastal area was selected since it has a number of well-defined features, including the HMNZS South Seas and HMNZS Wellington wrecks, an armored seawall constructed of Tetrapods and Akmons, aquifers, wharves and marinas. The seabed inside the harbor basin is largely fine-grained sediment, with gravel and reefs around the coast. The area outside the harbor on the southern coast is an active environment, with moving sand and exposed reefs. A marine reserve is also in this area. For consistency between datasets, the coastal research vessel R/V Ikatere and crew were used for all surveys conducted for the common dataset. Using Triton's Perspective processing software multibeam datasets collected for the Shallow Survey were processed for detail analysis. Datasets from each sonar manufacturer were processed using the CUBE algorithm developed by the Center for Coastal and Ocean Mapping/Joint Hydrographic Center (CCOM/JHC). Each dataset was gridded at 0.5 and 1.0 meter resolutions for cross comparison and compliance with International Hydrographic Organization (IHO) requirements. Detailed comparisons were made of equipment specifications (transmit frequency, number of beams, beam width), data density, total uncertainty, and

  12. An Annotated Dataset of 14 Meat Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given.......This note describes a dataset consisting of 14 annotated images of meat. Points of correspondence are placed on each image. As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  13. Delimiting coefficient alpha from internal consistency and unidimensionality

    NARCIS (Netherlands)

    Sijtsma, K.

    2015-01-01

    I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and

  14. Editorial: Datasets for Learning Analytics

    NARCIS (Netherlands)

    Dietze, Stefan; George, Siemens; Davide, Taibi; Drachsler, Hendrik

    2018-01-01

    The European LinkedUp and LACE (Learning Analytics Community Exchange) project have been responsible for setting up a series of data challenges at the LAK conferences 2013 and 2014 around the LAK dataset. The LAK datasets consists of a rich collection of full text publications in the domain of

  15. Psychometrics and the neuroscience of individual differences: Internal consistency limits between-subjects effects.

    Science.gov (United States)

    Hajcak, Greg; Meyer, Alexandria; Kotov, Roman

    2017-08-01

    In the clinical neuroscience literature, between-subjects differences in neural activity are presumed to reflect reliable measures-even though the psychometric properties of neural measures are almost never reported. The current article focuses on the critical importance of assessing and reporting internal consistency reliability-the homogeneity of "items" that comprise a neural "score." We demonstrate how variability in the internal consistency of neural measures limits between-subjects (i.e., individual differences) effects. To this end, we utilize error-related brain activity (i.e., the error-related negativity or ERN) in both healthy and generalized anxiety disorder (GAD) participants to demonstrate options for psychometric analyses of neural measures; we examine between-groups differences in internal consistency, between-groups effect sizes, and between-groups discriminability (i.e., ROC analyses)-all as a function of increasing items (i.e., number of trials). Overall, internal consistency should be used to inform experimental design and the choice of neural measures in individual differences research. The internal consistency of neural measures is necessary for interpreting results and guiding progress in clinical neuroscience-and should be routinely reported in all individual differences studies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Comparison of global 3-D aviation emissions datasets

    Directory of Open Access Journals (Sweden)

    S. C. Olsen

    2013-01-01

    Full Text Available Aviation emissions are unique from other transportation emissions, e.g., from road transportation and shipping, in that they occur at higher altitudes as well as at the surface. Aviation emissions of carbon dioxide, soot, and water vapor have direct radiative impacts on the Earth's climate system while emissions of nitrogen oxides (NOx, sulfur oxides, carbon monoxide (CO, and hydrocarbons (HC impact air quality and climate through their effects on ozone, methane, and clouds. The most accurate estimates of the impact of aviation on air quality and climate utilize three-dimensional chemistry-climate models and gridded four dimensional (space and time aviation emissions datasets. We compare five available aviation emissions datasets currently and historically used to evaluate the impact of aviation on climate and air quality: NASA-Boeing 1992, NASA-Boeing 1999, QUANTIFY 2000, Aero2k 2002, and AEDT 2006 and aviation fuel usage estimates from the International Energy Agency. Roughly 90% of all aviation emissions are in the Northern Hemisphere and nearly 60% of all fuelburn and NOx emissions occur at cruise altitudes in the Northern Hemisphere. While these datasets were created by independent methods and are thus not strictly suitable for analyzing trends they suggest that commercial aviation fuelburn and NOx emissions increased over the last two decades while HC emissions likely decreased and CO emissions did not change significantly. The bottom-up estimates compared here are consistently lower than International Energy Agency fuelburn statistics although the gap is significantly smaller in the more recent datasets. Overall the emissions distributions are quite similar for fuelburn and NOx with regional peaks over the populated land masses of North America, Europe, and East Asia. For CO and HC there are relatively larger differences. There are however some distinct differences in the altitude distribution

  17. On the internal consistency of the term structure of forecasts of housing starts

    DEFF Research Database (Denmark)

    Pierdzioch, C.; Rulke, J. C.; Stadtmann, G.

    2013-01-01

    We use the term structure of forecasts of housing starts to test for rationality of forecasts. Our test is based on the idea that short-term and long-term forecasts should be internally consistent. We test the internal consistency of forecasts using data for Australia, Canada, Japan and the United...

  18. Assessing the internal consistency of the CARINA database in the Indian sector of the Southern Ocean

    Directory of Open Access Journals (Sweden)

    C. Lo Monaco

    2010-02-01

    Full Text Available Carbon and carbon-relevant hydrographic and hydrochemical ancillary data from previously not publicly available cruises were retrieved and recently merged to a new data base, CARINA (CARbon IN the Atlantic. The initial North Atlantic project, an international effort for ocean carbon synthesis, was extended to include the Arctic Mediterranean Seas (Arctic Ocean and Nordic Seas and all three sectors of the Southern Ocean. Of a total of 188 cruises, 37 cruises are part of the Southern Ocean. The present work focuses on data collected in the Indian sector (20° S–70° S; 30° E–150° E. The Southern Indian Ocean dataset covers the period 1992–2004 and includes seasonal repeated observations. Parameters including salinity, dissolved inorganic carbon (TCO2, total alkalinity (TA, oxygen, nitrate, phosphate and silicate were examined for cruise-to-cruise and overall consistency. In addition, data from an existing, quality controlled data base (GLODAP were introduced in the CARINA analysis to improve data coverage in the Southern Ocean. A global inversion was performed to synthesize the information deduced from objective comparisons of deep measurements (>1500 m at nearby stations (generally <220 km. The corrections suggested by the inversion were allowed to vary within a fixed envelope, thus accounting for ocean interior variability. The adjustments applied to CARINA data and those recommended for GLODAP data, in order to obtain a consistent merged dataset, are presented and discussed. The final outcome of this effort is a new quality controlled data base for TCO2 and other properties of the carbon system that can now be used to investigate the natural variability or stability of ocean chemistry and the accumulation of anthropogenic carbon. This data product also offers an important new synthesis of seasonal to decadal observations to validate ocean biogeochemical models in a region where available historical data were very

  19. Norwegian Hydrological Reference Dataset for Climate Change Studies

    Energy Technology Data Exchange (ETDEWEB)

    Magnussen, Inger Helene; Killingland, Magnus; Spilde, Dag

    2012-07-01

    Based on the Norwegian hydrological measurement network, NVE has selected a Hydrological Reference Dataset for studies of hydrological change. The dataset meets international standards with high data quality. It is suitable for monitoring and studying the effects of climate change on the hydrosphere and cryosphere in Norway. The dataset includes streamflow, groundwater, snow, glacier mass balance and length change, lake ice and water temperature in rivers and lakes.(Author)

  20. IPCC Socio-Economic Baseline Dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) Socio-Economic Baseline Dataset consists of population, human development, economic, water resources, land...

  1. Validity and internal consistency of a whiplash-specific disability measure

    NARCIS (Netherlands)

    Pinfold, Melanie; Niere, Ken R.; O'Leary, Elizabeth F.; Hoving, Jan Lucas; Green, Sally; Buchbinder, Rachelle

    2004-01-01

    STUDY DESIGN: Cross-sectional study of patients with whiplash-associated disorders investigating the internal consistency, factor structure, response rates, and presence of floor and ceiling effects of the Whiplash Disability Questionnaire (WDQ). OBJECTIVES: The aim of this study was to confirm the

  2. Validity and internal consistency of a whiplash-specific disability measure.

    Science.gov (United States)

    Pinfold, Melanie; Niere, Ken R; O'Leary, Elizabeth F; Hoving, Jan Lucas; Green, Sally; Buchbinder, Rachelle

    2004-02-01

    Cross-sectional study of patients with whiplash-associated disorders investigating the internal consistency, factor structure, response rates, and presence of floor and ceiling effects of the Whiplash Disability Questionnaire (WDQ). The aim of this study was to confirm the appropriateness of the proposed WDQ items. Whiplash injuries are a common cause of pain and disability after motor vehicle accidents. Neck disability questionnaires are often used in whiplash studies to assess neck pain but lack content validity for patients with whiplash-associated disorders. The newly developed WDQ measures functional limitations associated with whiplash injury and was designed after interviews with 83 patients with whiplash in a previous study. Researchers sought expert opinion on items of the WDQ, and items were then tested on a clinical whiplash population. Data were inspected to determine floor and ceiling effects, response rates, factor structure, and internal consistency. Packages of questionnaires were distributed to 55 clinicians, whose patients with whiplash completed and returned 101 questionnaires to researchers. No substantial floor or ceiling effects were identified on inspection of data. The overall floor effect was 12%, and the overall ceiling effect was 4%. Principal component analysis identified one broad factor that accounted for 65% of the variance in responses. Internal consistency was high; Cronbach's alpha = 0.96. Results of the study supported the retention of the 13 proposed items in a whiplash-specific disability questionnaire. Dependent on the results of further psychometric testing, the WDQ is likely to be an appropriate outcome measure for patients with whiplash.

  3. The Karen instruments for measuring quality of nursing care: construct validity and internal consistency.

    Science.gov (United States)

    Lindgren, Margareta; Andersson, Inger S

    2011-06-01

    Valid and reliable instruments for measuring the quality of care are needed for evaluation and improvement of nursing care. Previously developed and evaluated instruments, the Karen-patient and the Karen-personnel based on Donabedian's Structure-Process-Outcome triad (S-P-O triad) had promising content validity, discriminative power and internal consistency. The objective of this study was to further develop the instruments with regard to construct validity and internal consistency. This prospective study was carried out in medical and surgical wards at a hospital in Sweden. A total of 95 patients and 120 personnel were included. The instruments were tested for construct validity by performing factor analyses in two steps and for internal consistency using Cronbach's alpha coefficient. The first confirmatory factor analyses, with a pre-determined three-factor solution did not load well according to the S-P-O triad, but the second exploratory factor analysis with a six-factor solution appeared to be more coherent and the distribution of variables seemed to be logical. The reliability, i.e. internal consistency, was good in both factor analyses. The Karen-patient and the Karen-personnel instruments have achieved acceptable levels of construct validity. The internal consistency of the instruments is good. This indicates that the instruments may be suitable to use in clinical practice for measuring the quality of nursing care.

  4. Synthetic and Empirical Capsicum Annuum Image Dataset

    NARCIS (Netherlands)

    Barth, R.

    2016-01-01

    This dataset consists of per-pixel annotated synthetic (10500) and empirical images (50) of Capsicum annuum, also known as sweet or bell pepper, situated in a commercial greenhouse. Furthermore, the source models to generate the synthetic images are included. The aim of the datasets are to

  5. Background qualitative analysis of the European reference life cycle database (ELCD) energy datasets - part II: electricity datasets.

    Science.gov (United States)

    Garraín, Daniel; Fazio, Simone; de la Rúa, Cristina; Recchioni, Marco; Lechón, Yolanda; Mathieux, Fabrice

    2015-01-01

    The aim of this paper is to identify areas of potential improvement of the European Reference Life Cycle Database (ELCD) electricity datasets. The revision is based on the data quality indicators described by the International Life Cycle Data system (ILCD) Handbook, applied on sectorial basis. These indicators evaluate the technological, geographical and time-related representativeness of the dataset and the appropriateness in terms of completeness, precision and methodology. Results show that ELCD electricity datasets have a very good quality in general terms, nevertheless some findings and recommendations in order to improve the quality of Life-Cycle Inventories have been derived. Moreover, these results ensure the quality of the electricity-related datasets to any LCA practitioner, and provide insights related to the limitations and assumptions underlying in the datasets modelling. Giving this information, the LCA practitioner will be able to decide whether the use of the ELCD electricity datasets is appropriate based on the goal and scope of the analysis to be conducted. The methodological approach would be also useful for dataset developers and reviewers, in order to improve the overall Data Quality Requirements of databases.

  6. Internal consistency of a Spanish translation of the Francis Scale of Attitude Toward Christianity Short Form.

    Science.gov (United States)

    Campo-Arias, Adalberto; Oviedo, Heidi Celina; Díaz, Carmen Elena; Cogollo, Zuleima

    2006-12-01

    This study evaluated the internal consistency of a Spanish version of the short form of the Francis Scale of Attitude Toward Christianity based on responses of 405 Colombian adolescent students ages 13 to 17 years. This translated short-form version of the scale had an internal consistency of .80. This estimate indicates suitable internal consistency reliability for research use in this population.

  7. [Factor analysis and internal consistency of pedagogical practices questionnaire among health care teachers].

    Science.gov (United States)

    Pérez V, Cristhian; Vaccarezza G, Giulietta; Aguilar A, César; Coloma N, Katherine; Salgado F, Horacio; Baquedano R, Marjorie; Chavarría R, Carla; Bastías V, Nancy

    2016-06-01

    Teaching practice is one of the most complex topics of the training process in medicine and other health care careers. The Teaching Practices Questionnaire (TPQ) evaluates teaching skills. To assess the factor structure and internal consistency of the Spanish version of the TPP among health care teachers. The TPQ was answered by 315 university teachers from 13 of the 15 administrative Chilean regions, who were selected through a non-probabilistic volunteer sampling. The internal consistency of TPP factors was calculated and the correlation between them was analyzed. Six factors were identified: Student-centered teaching, Teaching planning, Assessment process, Dialogue relationship, Teacher-centered teaching and Use of technological resources. They had Cronbach alphas ranging from 0.60 to 0.85. The factorial structure of TPQ differentiates the most important functions of teaching. It also shows a theoretical consistency and a practical relevance to perform a diagnosis and continuous evaluation of teaching practices. Additionally, it has an adequate internal consistency. Thus, TPQ is valid and reliable to evaluate pedagogical practices in health care careers.

  8. ASSISTments Dataset from Multiple Randomized Controlled Experiments

    Science.gov (United States)

    Selent, Douglas; Patikorn, Thanaporn; Heffernan, Neil

    2016-01-01

    In this paper, we present a dataset consisting of data generated from 22 previously and currently running randomized controlled experiments inside the ASSISTments online learning platform. This dataset provides data mining opportunities for researchers to analyze ASSISTments data in a convenient format across multiple experiments at the same time.…

  9. Minimum energy requirements for desalination of brackish groundwater in the United States with comparison to international datasets

    Science.gov (United States)

    Ahdab, Yvana D.; Thiel, Gregory P.; Böhlke, John Karl; Stanton, Jennifer S.; Lienhard, John H.

    2018-01-01

    This paper uses chemical and physical data from a large 2017 U.S. Geological Surveygroundwater dataset with wells in the U.S. and three smaller international groundwater datasets with wells primarily in Australia and Spain to carry out a comprehensive investigation of brackish groundwater composition in relation to minimum desalinationenergy costs. First, we compute the site-specific least work required for groundwater desalination. Least work of separation represents a baseline for specific energy consumptionof desalination systems. We develop simplified equations based on the U.S. data for least work as a function of water recovery ratio and a proxy variable for composition, either total dissolved solids, specific conductance, molality or ionic strength. We show that the U.S. correlations for total dissolved solids and molality may be applied to the international datasets. We find that total molality can be used to calculate the least work of dilute solutions with very high accuracy. Then, we examine the effects of groundwater solute composition on minimum energy requirements, showing that separation requirements increase from calcium to sodium for cations and from sulfate to bicarbonate to chloride for anions, for any given TDS concentration. We study the geographic distribution of least work, total dissolved solids, and major ions concentration across the U.S. We determine areas with both low least work and high water stress in order to highlight regions holding potential for desalination to decrease the disparity between high water demand and low water supply. Finally, we discuss the implications of the USGS results on water resource planning, by comparing least work to the specific energy consumption of brackish water reverse osmosisplants and showing the scaling propensity of major electrolytes and silica in the U.S. groundwater samples.

  10. [Validity and internal consistency of the Maslach Burnout Inventory in Dental Students from Cartagena, Colombia].

    Science.gov (United States)

    Simancas-Pallares, Miguel Angel; Fortich Mesa, Natalia; González Martínez, Farith Damián

    To determine the internal consistency and content validity of the Maslach Burnout Inventory-Student Survey (MBI-SS) in dental students from Cartagena, Colombia. Scale validation study in 886 dental students from Cartagena, Colombia. Factor structure was determined through exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Internal consistency was measured using the Cronbach's alpha coefficient. Analyses were performed using the Stata v.13.2 for Windows (Statacorp., USA) and Mplus v.7.31 for Windows (Muthén & Muthén, USA) software. Internal consistency was α=.806. The factor structure showed three that accounted for the 56.6% of the variance. CFA revealed: χ 2 =926.036; df=85; RMSEA=.106 (90%CI, .100-.112); CFI=.947; TLI=.934. The MBI showed an adequate internal consistency and a factor structure being consistent with the original proposed structure with a poor fit, which does not reflect adequate content validity in this sample. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  11. Construct validity and internal consistency in the Leisure Practices Scale (EPL) for adults.

    Science.gov (United States)

    Andrade, Rubian Diego; Schwartz, Gisele Maria; Tavares, Giselle Helena; Pelegrini, Andreia; Teixeira, Clarissa Stefani; Felden, Érico Pereira Gomes

    2018-02-01

    This study proposes and analyzes the construct validity and internal consistency of the Leisure Practices Scale (EPL). This survey seeks to identify the preferences and involvement in in different leisure practices in adults. The instrument was formed based on the cultural leisure content (artistic, manual, physical, sports, intellectual, social, tourist, virtual and contemplation/leisure). The validation process was conducted with: a) content analysis by leisure experts, who evaluated the instrument for clarity of language and practical relevance, which allowed the calculation of the content validity coefficient (CVC); b) reproducibility test-retest with 51 subjects to calculate the temporal variation coefficient; c) internal consistency analysis with 885 participants. The evaluation presented appropriate coefficients, both with respect to language clarity (CVCt = 0.883) and practical relevance (CVCt = 0.879). The reproducibility coefficients were moderate to excellent. The scale showed adequate internal consistency (0.72). The EPL has psychometric quality and acceptable values in its structure, and can be used to investigate adult involvement in leisure activities.

  12. Internal Consistency and Convergent Validity of the Klontz Money Behavior Inventory (KMBI

    Directory of Open Access Journals (Sweden)

    Colby D. Taylor

    2015-12-01

    Full Text Available The Klontz Money Behavior Inventory (KMBI is a standalone, multi-scale measure than can screen for the presence of eight distinct money disorders. Given the well-established relationship between mental health and financial behaviors, results from the KMBI can be used to inform both mental health care professionals and financial planners. The present study examined the internal consistency and convergent validity of the KMBI, through comparison with similar measures, among a sample of college students (n = 232. Results indicate that the KMBI demonstrates acceptable internal consistency reliability and some convergence for most subscales when compared to other analogous measures. These findings highlight a need for literature and assessments to identify and describe disordered money behaviors.

  13. Online self-report questionnaire on computer work-related exposure (OSCWE): validity and internal consistency.

    Science.gov (United States)

    Mekhora, Keerin; Jalayondeja, Wattana; Jalayondeja, Chutima; Bhuanantanondh, Petcharatana; Dusadiisariyavong, Asadang; Upiriyasakul, Rujiret; Anuraktam, Khajornyod

    2014-07-01

    To develop an online, self-report questionnaire on computer work-related exposure (OSCWE) and to determine the internal consistency, face and content validity of the questionnaire. The online, self-report questionnaire was developed to determine the risk factors related to musculoskeletal disorders in computer users. It comprised five domains: personal, work-related, work environment, physical health and psychosocial factors. The questionnaire's content was validated by an occupational medical doctor and three physical therapy lecturers involved in ergonomic teaching. Twenty-five lay people examined the feasibility of computer-administered and the user-friendly language. The item correlation in each domain was analyzed by the internal consistency (Cronbach's alpha; alpha). The content of the questionnaire was considered congruent with the testing purposes. Eight hundred and thirty-five computer users at the PTT Exploration and Production Public Company Limited registered to the online self-report questionnaire. The internal consistency of the five domains was: personal (alpha = 0.58), work-related (alpha = 0.348), work environment (alpha = 0.72), physical health (alpha = 0.68) and psychosocial factor (alpha = 0.93). The findings suggested that the OSCWE had acceptable internal consistency for work environment and psychosocial factors. The OSCWE is available to use in population-based survey research among computer office workers.

  14. An Annotated Dataset of 14 Cardiac MR Images

    DEFF Research Database (Denmark)

    Stegmann, Mikkel Bille

    2002-01-01

    This note describes a dataset consisting of 14 annotated cardiac MR images. Points of correspondence are placed on each image at the left ventricle (LV). As such, the dataset can be readily used for building statistical models of shape. Further, format specifications and terms of use are given....

  15. Reliability, Dimensionality, and Internal Consistency as Defined by Cronbach: Distinct Albeit Related Concepts

    Science.gov (United States)

    Davenport, Ernest C.; Davison, Mark L.; Liou, Pey-Yan; Love, Quintin U.

    2015-01-01

    This article uses definitions provided by Cronbach in his seminal paper for coefficient a to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's a. Internal consistency…

  16. Internal Consistency, Retest Reliability, and their Implications For Personality Scale Validity

    Science.gov (United States)

    McCrae, Robert R.; Kurtz, John E.; Yamagata, Shinji; Terracciano, Antonio

    2010-01-01

    We examined data (N = 34,108) on the differential reliability and validity of facet scales from the NEO Inventories. We evaluated the extent to which (a) psychometric properties of facet scales are generalizable across ages, cultures, and methods of measurement; and (b) validity criteria are associated with different forms of reliability. Composite estimates of facet scale stability, heritability, and cross-observer validity were broadly generalizable. Two estimates of retest reliability were independent predictors of the three validity criteria; none of three estimates of internal consistency was. Available evidence suggests the same pattern of results for other personality inventories. Internal consistency of scales can be useful as a check on data quality, but appears to be of limited utility for evaluating the potential validity of developed scales, and it should not be used as a substitute for retest reliability. Further research on the nature and determinants of retest reliability is needed. PMID:20435807

  17. A diagnostic test for apraxia in stroke patients: internal consistency and diagnostic value.

    NARCIS (Netherlands)

    Heugten, C.M. van; Dekker, J.; Deelman, B.G.; Stehmann-Saris, F.C.; Kinebanian, A.

    1999-01-01

    The internal consistency and the diagnostic value of a test for apraxia in patients having had a stroke are presented. Results indicate that the items of the test form a strong and consistent scale: Cronbach's alpha as well as the results of a Mokken scale analysis present good reliability and good

  18. Consistency of two global MODIS aerosol products over ocean on Terra and Aqua CERES SSF datasets

    Science.gov (United States)

    Ignatov, Alexander; Minnis, Patrick; Wielicki, Bruce; Loeb, Norman G.; Remer, Lorraine A.; Kaufman, Yoram J.; Miller, Walter F.; Sun-Mack, Sunny; Laszlo, Istvan; Geier, Erika B.

    2004-12-01

    MODIS aerosol retrievals over ocean from Terra and Aqua platforms are available from the Clouds and the Earth's Radiant Energy System (CERES) Single Scanner Footprint (SSF) datasets generated at NASA Langley Research Center (LaRC). Two aerosol products are reported side by side. The primary M product is generated by subsetting and remapping the multi-spectral (0.44 - 2.1 μm) MOD04 aerosols onto CERES footprints. MOD04 processing uses cloud screening and aerosol algorithms developed by the MODIS science team. The secondary (AVHRR-like) A product is generated in only two MODIS bands: 1 and 6 on Terra, and ` and 7 on Aqua. The A processing uses NASA/LaRC cloud-screening and NOAA/NESDIS single channel aerosol algorthm. The M and A products have been documented elsewhere and preliminarily compared using two weeks of global Terra CERES SSF (Edition 1A) data in December 2000 and June 2001. In this study, the M and A aerosol optical depths (AOD) in MODIS band 1 and (0.64 μm), τ1M and τ1A, are further checked for cross-platform consistency using 9 days of global Terra CERES SSF (Edition 2A) and Aqua CERES SSF (Edition 1A) data from 13 - 21 October 2002.

  19. The internal consistency of the standard gamble: tests after adjusting for prospect theory.

    Science.gov (United States)

    Oliver, Adam

    2003-07-01

    This article reports a study that tests whether the internal consistency of the standard gamble can be improved upon by incorporating loss weighting and probability transformation parameters in the standard gamble valuation procedure. Five alternatives to the standard EU formulation are considered: (1) probability transformation within an EU framework; and, within a prospect theory framework, (2) loss weighting and full probability transformation, (3) no loss weighting and full probability transformation, (4) loss weighting and no probability transformation, and (5) loss weighting and partial probability transformation. Of the five alternatives, only the prospect theory formulation with loss weighting and no probability transformation offers an improvement in internal consistency over the standard EU valuation procedure.

  20. Assessing motivation for work environment improvements: internal consistency, reliability and factorial structure.

    Science.gov (United States)

    Hedlund, Ann; Ateg, Mattias; Andersson, Ing-Marie; Rosén, Gunnar

    2010-04-01

    Workers' motivation to actively take part in improvements to the work environment is assumed to be important for the efficiency of investments for that purpose. That gives rise to the need for a tool to measure this motivation. A questionnaire to measure motivation for improvements to the work environment has been designed. Internal consistency and test-retest reliability of the domains of the questionnaire have been measured, and the factorial structure has been explored, from the answers of 113 employees. The internal consistency is high (0.94), as well as the correlation for the total score (0.84). Three factors are identified accounting for 61.6% of the total variance. The questionnaire can be a useful tool in improving intervention methods. The expectation is that the tool can be useful, particularly with the aim of improving efficiency of companies' investments for work environment improvements. Copyright 2010 Elsevier Ltd. All rights reserved.

  1. Studies on the consistency of internally taken contrast medium for pancreas CT

    Energy Technology Data Exchange (ETDEWEB)

    Matsushima, Kishio; Mimura, Seiichi; Tahara, Seiji; Kitayama, Takuichi; Inamura, Keiji; Mikami, Yasutaka; Hashimoto, Keiji; Hiraki, Yoshio; Aono, Kaname

    1985-02-01

    A problem of Pancreatic CT scanning is the discrimination between the pancreas and the adjacent gastrointestinal tract. Generally we administer a dilution of gastrografin internally to make the discrimination. The degree of dilution has been decided by experience at each hospital. When the consistency of the contrast medium is low in density, an enhancement effect cannot be expected, but when the consistency is high, artifacts appear. We have experimented on the degree of the dilution and CT-No to decide the optimum consistency of gastrografin for the diagnosis of pancreatic disease. Statistical analysis of the results show the optimum dilution of gastrografin to be 1.5%.

  2. Evaluation of the Consistency of MODIS Land Cover Product (MCD12Q1 Based on Chinese 30 m GlobeLand30 Datasets: A Case Study in Anhui Province, China

    Directory of Open Access Journals (Sweden)

    Dong Liang

    2015-11-01

    Full Text Available Land cover plays an important role in the climate and biogeochemistry of the Earth system. It is of great significance to produce and evaluate the global land cover (GLC data when applying the data to the practice at a specific spatial scale. The objective of this study is to evaluate and validate the consistency of the Moderate Resolution Imaging Spectroradiometer (MODIS land cover product (MCD12Q1 at a provincial scale (Anhui Province, China based on the Chinese 30 m GLC product (GlobeLand30. A harmonization method is firstly used to reclassify the land cover types between five classification schemes (International Geosphere Biosphere Programme (IGBP global vegetation classification, University of Maryland (UMD, MODIS-derived Leaf Area Index and Fractional Photosynthetically Active Radiation (LAI/FPAR, MODIS-derived Net Primary Production (NPP, and Plant Functional Type (PFT of MCD12Q1 and ten classes of GlobeLand30, based on the knowledge rule (KR and C4.5 decision tree (DT classification algorithm. A total of five harmonized land cover types are derived including woodland, grassland, cropland, wetland and artificial surfaces, and four evaluation indicators are selected including the area consistency, spatial consistency, classification accuracy and landscape diversity in the three sub-regions of Wanbei, Wanzhong and Wannan. The results indicate that the consistency of IGBP is the best among the five schemes of MCD12Q1 according to the correlation coefficient (R. The “woodland” LAI/FPAR is the worst, with a spatial similarity (O of 58.17% due to the misclassification between “woodland” and “others”. The consistency of NPP is the worst among the five schemes as the agreement varied from 1.61% to 56.23% in the three sub-regions. Furthermore, with the biggest difference of diversity indices between LAI/FPAR and GlobeLand30, the consistency of LAI/FPAR is the weakest. This study provides a methodological reference for evaluating the

  3. Internal consistency of the CHAMPS physical activity questionnaire for Spanish speaking older adults.

    Science.gov (United States)

    Rosario, Martín G; Vázquez, Jenniffer M; Cruz, Wanda I; Ortiz, Alexis

    2008-09-01

    The Community Healthy Activities Model Program for Seniors (CHAMPS) is a physical activity monitoring questionnaire for people between 65 to 90 years old. This questionnaire has been previously translated to Spanish to be used in the Latin American population. To adapt the Spanish version of the CHAMPS questionnaire to Puerto Rico and assess its internal consistency. An external review committee adapted the existent Spanish version of the CHAMPS to be used in the Puerto Rican population. Three older adults participated in a second phase with the purpose of training the research team. After the second phase, 35 older adults participated in a third content adaptation phase. During the third phase, the preliminary Spanish version for Puerto Rico of the CHAMPS was given to the 35 participants to assess for clarity, vocabulary and understandability. Interviews to each participant in the third phase were carried out to obtain feedback and create a final Spanish version of the CHAMPS for Puerto Rico. After analyses of this phase, the external review committee prepared a final Spanish version of the CHAMPS for Puerto Rico. The final version was administered to 15 older adults (76 +/- 6.5 years) to assess the internal consistency by using Cronbach's Alpha analysis. The questionnaire showed a strong internal consistency of 0.76. The total time to answer the questionnaire was 17.4 minutes. The Spanish version of the CHAMPS questionnaire for Puerto Rico suggested being an easy to administer and consistent measurement tool to assess physical activity in older adults.

  4. Evaluating the consistency of the 1982-1999 NDVI trends in the Iberian Peninsula across four time-series derived from the AVHRR sensor: LTDR, GIMMS, FASIR, and PAL-II.

    Science.gov (United States)

    Alcaraz-Segura, Domingo; Liras, Elisa; Tabik, Siham; Paruelo, José; Cabello, Javier

    2010-01-01

    Successive efforts have processed the Advanced Very High Resolution Radiometer (AVHRR) sensor archive to produce Normalized Difference Vegetation Index (NDVI) datasets (i.e., PAL, FASIR, GIMMS, and LTDR) under different corrections and processing schemes. Since NDVI datasets are used to evaluate carbon gains, differences among them may affect nations' carbon budgets in meeting international targets (such as the Kyoto Protocol). This study addresses the consistency across AVHRR NDVI datasets in the Iberian Peninsula (Spain and Portugal) by evaluating whether their 1982-1999 NDVI trends show similar spatial patterns. Significant trends were calculated with the seasonal Mann-Kendall trend test and their spatial consistency with partial Mantel tests. Over 23% of the Peninsula (N, E, and central mountain ranges) showed positive and significant NDVI trends across the four datasets and an additional 18% across three datasets. In 20% of Iberia (SW quadrant), the four datasets exhibited an absence of significant trends and an additional 22% across three datasets. Significant NDVI decreases were scarce (croplands in the Guadalquivir and Segura basins, La Mancha plains, and Valencia). Spatial consistency of significant trends across at least three datasets was observed in 83% of the Peninsula, but it decreased to 47% when comparing across the four datasets. FASIR, PAL, and LTDR were the most spatially similar datasets, while GIMMS was the most different. The different performance of each AVHRR dataset to detect significant NDVI trends (e.g., LTDR detected greater significant trends (both positive and negative) and in 32% more pixels than GIMMS) has great implications to evaluate carbon budgets. The lack of spatial consistency across NDVI datasets derived from the same AVHRR sensor archive, makes it advisable to evaluate carbon gains trends using several satellite datasets and, whether possible, independent/additional data sources to contrast.

  5. Framework for Interactive Parallel Dataset Analysis on the Grid

    Energy Technology Data Exchange (ETDEWEB)

    Alexander, David A.; Ananthan, Balamurali; /Tech-X Corp.; Johnson, Tony; Serbo, Victor; /SLAC

    2007-01-10

    We present a framework for use at a typical Grid site to facilitate custom interactive parallel dataset analysis targeting terabyte-scale datasets of the type typically produced by large multi-institutional science experiments. We summarize the needs for interactive analysis and show a prototype solution that satisfies those needs. The solution consists of desktop client tool and a set of Web Services that allow scientists to sign onto a Grid site, compose analysis script code to carry out physics analysis on datasets, distribute the code and datasets to worker nodes, collect the results back to the client, and to construct professional-quality visualizations of the results.

  6. Gridded 5km GHCN-Daily Temperature and Precipitation Dataset, Version 1

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Gridded 5km GHCN-Daily Temperature and Precipitation Dataset (nClimGrid) consists of four climate variables derived from the GHCN-D dataset: maximum temperature,...

  7. Evaluating the Consistency of the 1982–1999 NDVI Trends in the Iberian Peninsula across Four Time-series Derived from the AVHRR Sensor: LTDR, GIMMS, FASIR, and PAL-II

    Science.gov (United States)

    Alcaraz-Segura, Domingo; Liras, Elisa; Tabik, Siham; Paruelo, José; Cabello, Javier

    2010-01-01

    Successive efforts have processed the Advanced Very High Resolution Radiometer (AVHRR) sensor archive to produce Normalized Difference Vegetation Index (NDVI) datasets (i.e., PAL, FASIR, GIMMS, and LTDR) under different corrections and processing schemes. Since NDVI datasets are used to evaluate carbon gains, differences among them may affect nations’ carbon budgets in meeting international targets (such as the Kyoto Protocol). This study addresses the consistency across AVHRR NDVI datasets in the Iberian Peninsula (Spain and Portugal) by evaluating whether their 1982–1999 NDVI trends show similar spatial patterns. Significant trends were calculated with the seasonal Mann-Kendall trend test and their spatial consistency with partial Mantel tests. Over 23% of the Peninsula (N, E, and central mountain ranges) showed positive and significant NDVI trends across the four datasets and an additional 18% across three datasets. In 20% of Iberia (SW quadrant), the four datasets exhibited an absence of significant trends and an additional 22% across three datasets. Significant NDVI decreases were scarce (croplands in the Guadalquivir and Segura basins, La Mancha plains, and Valencia). Spatial consistency of significant trends across at least three datasets was observed in 83% of the Peninsula, but it decreased to 47% when comparing across the four datasets. FASIR, PAL, and LTDR were the most spatially similar datasets, while GIMMS was the most different. The different performance of each AVHRR dataset to detect significant NDVI trends (e.g., LTDR detected greater significant trends (both positive and negative) and in 32% more pixels than GIMMS) has great implications to evaluate carbon budgets. The lack of spatial consistency across NDVI datasets derived from the same AVHRR sensor archive, makes it advisable to evaluate carbon gains trends using several satellite datasets and, whether possible, independent/additional data sources to contrast. PMID:22205868

  8. PHYSICS PERFORMANCE AND DATASET (PPD)

    CERN Multimedia

    L. Silvestris

    2013-01-01

    The first part of the Long Shutdown period has been dedicated to the preparation of the samples for the analysis targeting the summer conferences. In particular, the 8 TeV data acquired in 2012, including most of the “parked datasets”, have been reconstructed profiting from improved alignment and calibration conditions for all the sub-detectors. A careful planning of the resources was essential in order to deliver the datasets well in time to the analysts, and to schedule the update of all the conditions and calibrations needed at the analysis level. The newly reprocessed data have undergone detailed scrutiny by the Dataset Certification team allowing to recover some of the data for analysis usage and further improving the certification efficiency, which is now at 91% of the recorded luminosity. With the aim of delivering a consistent dataset for 2011 and 2012, both in terms of conditions and release (53X), the PPD team is now working to set up a data re-reconstruction and a new MC pro...

  9. Expanding the Reach of Participatory Risk Management: Testing an Online Decision-Aiding Framework for Informing Internally Consistent Choices.

    Science.gov (United States)

    Bessette, Douglas L; Campbell-Arvai, Victoria; Arvai, Joseph

    2016-05-01

    This article presents research aimed at developing and testing an online, multistakeholder decision-aiding framework for informing multiattribute risk management choices associated with energy development and climate change. The framework was designed to provide necessary background information and facilitate internally consistent choices, or choices that are in line with users' prioritized objectives. In order to test different components of the decision-aiding framework, a six-part, 2 × 2 × 2 factorial experiment was conducted, yielding eight treatment scenarios. The three factors included: (1) whether or not users could construct their own alternatives; (2) the level of detail regarding the composition of alternatives users would evaluate; and (3) the way in which a final choice between users' own constructed (or highest-ranked) portfolio and an internally consistent portfolio was presented. Participants' self-reports revealed the framework was easy to use and providing an opportunity to develop one's own risk-management alternatives (Factor 1) led to the highest knowledge gains. Empirical measures showed the internal consistency of users' decisions across all treatments to be lower than expected and confirmed that providing information about alternatives' composition (Factor 2) resulted in the least internally consistent choices. At the same time, those users who did not develop their own alternatives and were not shown detailed information about the composition of alternatives believed their choices to be the most internally consistent. These results raise concerns about how the amount of information provided and the ability to construct alternatives may inversely affect users' real and perceived internal consistency. © 2015 Society for Risk Analysis.

  10. Essays on Multinational Production and International Trade

    DEFF Research Database (Denmark)

    Clementi, Federico

    This Thesis consists of an introduction followed by three independent chapters. Each chapter is a self-contained paper that can be read independently. They cover different topics of international economics with a specific focus on multinational production and international trade. A common feature...... the intensity of spillovers to local suppliers. Domestic firms benefit only from the activity of foreign clients that are not vertically integrated in their industry. In the last chapter, I use a detailed dataset of international transactions of Danish companies to study the impact of Chinese competition...

  11. Harvard Aging Brain Study : Dataset and accessibility

    NARCIS (Netherlands)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G.; Chatwal, Jasmeer P.; Papp, Kathryn V.; Amariglio, Rebecca E.; Blacker, Deborah; Rentz, Dorene M.; Johnson, Keith A.; Sperling, Reisa A.; Schultz, Aaron P.

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging.

  12. A Research Graph dataset for connecting research data repositories using RD-Switchboard.

    Science.gov (United States)

    Aryani, Amir; Poblet, Marta; Unsworth, Kathryn; Wang, Jingbo; Evans, Ben; Devaraju, Anusuriya; Hausstein, Brigitte; Klas, Claus-Peter; Zapilko, Benjamin; Kaplun, Samuele

    2018-05-29

    This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures. The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants. The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation. Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

  13. Consistent adoption of the International System of Units (SI) in nuclear science and technology

    Energy Technology Data Exchange (ETDEWEB)

    Klumpar, J; Kovar, Z [Ceskoslovenska Akademie Ved, Prague. Laborator Radiologicke Dozimetrie; Sacha, J [Slovenska Akademia Vied, Bratislava (Czechoslovakia). Fyzikalny Ustav

    1975-11-01

    The principles are stressed behind a consistent introduction of the International System of Units (SI) in Czechoslovakia complying with the latest edition of the Czechoslovak Standard CSN 01 1300 on the prescribed system of national and international units. The use of special and auxiliary units in nuclear physics and technology is discussed, particular attention being devoted to the units of activity and to the time units applied in radiology. Conversion graph and tables are annexed.

  14. Factorial validity and internal consistency of the motivational climate in physical education scale.

    Science.gov (United States)

    Soini, Markus; Liukkonen, Jarmo; Watt, Anthony; Yli-Piipari, Sami; Jaakkola, Timo

    2014-01-01

    The aim of the study was to examine the construct validity and internal consistency of the Motivational Climate in Physical Education Scale (MCPES). A key element of the development process of the scale was establishing a theoretical framework that integrated the dimensions of task- and ego involving climates in conjunction with autonomy, and social relatedness supporting climates. These constructs were adopted from the self-determination and achievement goal theories. A sample of Finnish Grade 9 students, comprising 2,594 girls and 1,803 boys, completed the 18-item MCPES during one physical education class. The results of the study demonstrated that participants had highest mean in task-involving climate and the lowest in autonomy climate and ego-involving climate. Additionally, autonomy, social relatedness, and task- involving climates were significantly and strongly correlated with each other, whereas the ego- involving climate had low or negligible correlations with the other climate dimensions.The construct validity of the MCPES was analyzed using confirmatory factor analysis. The statistical fit of the four-factor model consisting of motivational climate factors supporting perceived autonomy, social relatedness, task-involvement, and ego-involvement was satisfactory. The results of the reliability analysis showed acceptable internal consistencies for all four dimensions. The Motivational Climate in Physical Education Scale can be considered as psychometrically valid tool to measure motivational climate in Finnish Grade 9 students. Key PointsThis study developed Motivational Climate in School Physical Education Scale (MCPES). During the development process of the scale, the theoretical framework using dimensions of task- and ego involving as well as autonomy, and social relatedness supporting climates was constructed. These constructs were adopted from the self-determination and achievement goal theories.The statistical fit of the four-factor model of the

  15. Harvard Aging Brain Study: Dataset and accessibility.

    Science.gov (United States)

    Dagley, Alexander; LaPoint, Molly; Huijbers, Willem; Hedden, Trey; McLaren, Donald G; Chatwal, Jasmeer P; Papp, Kathryn V; Amariglio, Rebecca E; Blacker, Deborah; Rentz, Dorene M; Johnson, Keith A; Sperling, Reisa A; Schultz, Aaron P

    2017-01-01

    The Harvard Aging Brain Study is sharing its data with the global research community. The longitudinal dataset consists of a 284-subject cohort with the following modalities acquired: demographics, clinical assessment, comprehensive neuropsychological testing, clinical biomarkers, and neuroimaging. To promote more extensive analyses, imaging data was designed to be compatible with other publicly available datasets. A cloud-based system enables access to interested researchers with blinded data available contingent upon completion of a data usage agreement and administrative approval. Data collection is ongoing and currently in its fifth year. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Test of Gross Motor Development : Expert Validity, confirmatory validity and internal consistence

    Directory of Open Access Journals (Sweden)

    Nadia Cristina Valentini

    2008-12-01

    Full Text Available The Test of Gross Motor Development (TGMD-2 is an instrument used to evaluate children’s level of motordevelopment. The objective of this study was to translate and verify the clarity and pertinence of the TGMD-2 items by expertsand the confirmatory factorial validity and the internal consistence by means of test-retest of the Portuguese TGMD-2. Across-cultural translation was used to construct the Portuguese version. The participants of this study were 7 professionalsand 587 children, from 27 schools (kindergarten and elementary from 3 to 10 years old (51.1% boys and 48.9% girls.Each child was videotaped performing the test twice. The videotaped tests were then scored. The results indicated thatthe Portuguese version of the TGMD-2 contains clear and pertinent motor items; demonstrated satisfactory indices ofconfirmatory factorial validity (χ2/gl = 3.38; Goodness-of-fit Index = 0.95; Adjusted Goodness-of-fit index = 0.92 and Tuckerand Lewis’s Index of Fit = 0.83 and test-retest internal consistency (locomotion r = 0.82; control of object: r = 0.88. ThePortuguese TGMD-2 demonstrated validity and reliability for the sample investigated.

  17. Test of Gross Motor Development: expert validity, confirmatory validity and internal consistence

    Directory of Open Access Journals (Sweden)

    Nadia Cristina Valentini

    2008-01-01

    The Test of Gross Motor Development (TGMD-2 is an instrument used to evaluate children’s level of motor development. The objective of this study was to translate and verify the clarity and pertinence of the TGMD-2 items by experts and the confirmatory factorial validity and the internal consistence by means of test-retest of the Portuguese TGMD-2. A cross-cultural translation was used to construct the Portuguese version. The participants of this study were 7 professionals and 587 children, from 27 schools (kindergarten and elementary from 3 to 10 years old (51.1% boys and 48.9% girls. Each child was videotaped performing the test twice. The videotaped tests were then scored. The results indicated that the Portuguese version of the TGMD-2 contains clear and pertinent motor items; demonstrated satisfactory indices of confirmatory factorial validity (÷2/gl = 3.38; Goodness-of-fit Index = 0.95; Adjusted Goodness-of-fit index = 0.92 and Tucker and Lewis’s Index of Fit = 0.83 and test-retest internal consistency (locomotion r = 0.82; control of object: r = 0.88. The Portuguese TGMD-2 demonstrated validity and reliability for the sample investigated.

  18. WOrk-Related Questionnaire for UPper extremity disorders (WORQ-UP): Factor Analysis and Internal Consistency.

    Science.gov (United States)

    Aerts, Bas R; Kuijer, P Paul; Beumer, Annechien; Eygendaal, Denise; Frings-Dresen, Monique H

    2018-04-17

    To test a 17-item questionnaire, the WOrk-Related Questionnaire for UPper extremity disorders (WORQ-UP), for dimensionality of the items (factor analysis) and internal consistency. Cross-sectional study. Outpatient clinic. A consecutive sample of patients (N=150) consisting of all new referral patients (either from a general physician or other hospital) who visited the orthopedic outpatient clinic because of an upper extremity musculoskeletal disorder. Not applicable. Number and dimensionality of the factors in the WORQ-UP. Four factors with eigenvalues (EVs) >1.0 were found. The factors were named exertion, dexterity, tools & equipment, and mobility. The EVs of the factors were, respectively, 5.78, 2.38, 1.81, and 1.24. The factors together explained 65.9% of the variance. The Cronbach alpha values for these factors were, respectively, .88, .74, .87, and .66. The 17 items of the WORQ-UP resemble 4 factors-exertion, dexterity, tools & equipment, and mobility-with a good internal consistency. Copyright © 2018 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.

  19. Internal Consistency and Concurrent Validity of the Questionnaire for Limitations and Restrictions Assessment in Children with ADHD

    Directory of Open Access Journals (Sweden)

    Luisa Matilde Salamanca-Duque

    2014-09-01

    Full Text Available Introduction: ADHD is one of the most common diagnoses in child psychiatry, its early diagnosis is of great importance for intervention at family, school and social environment. Based on the International Classification of Functioning, Disability and Health (ICF, a questionnaire was designed to assess activity limitations and participation restrictions in children with ADHD. The questionnaire was called “CLARP-ADHD Parent and Teacher Version”. Objective: To determine the degree of internal consistency of the CLARP-ADHD questionnaire, and its concurrent validity with the “Strengths and Difficulties Questionnaire SDQ parent and teacher version”. Material and Methods: A sample of 203 children aged 6 to 12 with ADHD, currently attending school in five Colombian cities. The questionnaires were applied to parents and teachers. The internal consistency analysis was performed through Cronbach coefficient and concurrent validity using the Spearman correlation coefficient utilizing multiple and unique predictors through multiple linear regression as well as simple regression models. Results: A high internal consistency was found for global questionnaires for each of its domains. The CLARP-ADHD for parents gave as result an internal consistency of 0.83, and the CLARP-ADHD for teachers one of 0.93. Concurrent validity was found between the CLARP-ADHD and the SDQ Parent and Teacher version; also, concurrence between the CLARPADHD for Teachers and the SDQ Teachers was found, as well as between CLARP ADHD for Parents and CLARP ADHD Teachers, given by p values of p < 0.001.

  20. Sharing Video Datasets in Design Research

    DEFF Research Database (Denmark)

    Christensen, Bo; Abildgaard, Sille Julie Jøhnk

    2017-01-01

    This paper examines how design researchers, design practitioners and design education can benefit from sharing a dataset. We present the Design Thinking Research Symposium 11 (DTRS11) as an exemplary project that implied sharing video data of design processes and design activity in natural settings...... with a large group of fellow academics from the international community of Design Thinking Research, for the purpose of facilitating research collaboration and communication within the field of Design and Design Thinking. This approach emphasizes the social and collaborative aspects of design research, where...... a multitude of appropriate perspectives and methods may be utilized in analyzing and discussing the singular dataset. The shared data is, from this perspective, understood as a design object in itself, which facilitates new ways of working, collaborating, studying, learning and educating within the expanding...

  1. A robust dataset-agnostic heart disease classifier from Phonocardiogram.

    Science.gov (United States)

    Banerjee, Rohan; Dutta Choudhury, Anirban; Deshpande, Parijat; Bhattacharya, Sakyajit; Pal, Arpan; Mandana, K M

    2017-07-01

    Automatic classification of normal and abnormal heart sounds is a popular area of research. However, building a robust algorithm unaffected by signal quality and patient demography is a challenge. In this paper we have analysed a wide list of Phonocardiogram (PCG) features in time and frequency domain along with morphological and statistical features to construct a robust and discriminative feature set for dataset-agnostic classification of normal and cardiac patients. The large and open access database, made available in Physionet 2016 challenge was used for feature selection, internal validation and creation of training models. A second dataset of 41 PCG segments, collected using our in-house smart phone based digital stethoscope from an Indian hospital was used for performance evaluation. Our proposed methodology yielded sensitivity and specificity scores of 0.76 and 0.75 respectively on the test dataset in classifying cardiovascular diseases. The methodology also outperformed three popular prior art approaches, when applied on the same dataset.

  2. The Internal Consistency and Validity of the Vaccination Attitudes Examination Scale: A Replication Study.

    Science.gov (United States)

    Wood, Louise; Smith, Michael; Miller, Christopher B; O'Carroll, Ronan E

    2018-06-19

    Vaccinations are important preventative health behaviors. The recently developed Vaccination Attitudes Examination (VAX) Scale aims to measure the reasons behind refusal/hesitancy regarding vaccinations. The aim of this replication study is to conduct an independent test of the newly developed VAX Scale in the UK. We tested (a) internal consistency (Cronbach's α); (b) convergent validity by assessing its relationships with beliefs about medication, medical mistrust, and perceived sensitivity to medicines; and (c) construct validity by testing how well the VAX Scale discriminated between vaccinators and nonvaccinators. A sample of 243 UK adults completed the VAX Scale, the Beliefs About Medicines Questionnaire, the Perceived Sensitivity to Medicines Scale, and the Medical Mistrust Index, in addition to demographics of age, gender, education levels, and social deprivation. Participants were asked (a) whether they received an influenza vaccination in the past year and (b) if they had a young child, whether they had vaccinated the young child against influenza in the past year. The VAX (a) demonstrated high internal consistency (α = .92); (b) was positively correlated with medical mistrust and beliefs about medicines, and less strongly correlated with perceived sensitivity to medicines; and (c) successfully differentiated parental influenza vaccinators from nonvaccinators. The VAX demonstrated good internal consistency, convergent validity, and construct validity in an independent UK sample. It appears to be a useful measure to help us understand the health beliefs that promote or deter vaccination behavior.

  3. The LANDFIRE Refresh strategy: updating the national dataset

    Science.gov (United States)

    Nelson, Kurtis J.; Connot, Joel A.; Peterson, Birgit E.; Martin, Charley

    2013-01-01

    The LANDFIRE Program provides comprehensive vegetation and fuel datasets for the entire United States. As with many large-scale ecological datasets, vegetation and landscape conditions must be updated periodically to account for disturbances, growth, and natural succession. The LANDFIRE Refresh effort was the first attempt to consistently update these products nationwide. It incorporated a combination of specific systematic improvements to the original LANDFIRE National data, remote sensing based disturbance detection methods, field collected disturbance information, vegetation growth and succession modeling, and vegetation transition processes. This resulted in the creation of two complete datasets for all 50 states: LANDFIRE Refresh 2001, which includes the systematic improvements, and LANDFIRE Refresh 2008, which includes the disturbance and succession updates to the vegetation and fuel data. The new datasets are comparable for studying landscape changes in vegetation type and structure over a decadal period, and provide the most recent characterization of fuel conditions across the country. The applicability of the new layers is discussed and the effects of using the new fuel datasets are demonstrated through a fire behavior modeling exercise using the 2011 Wallow Fire in eastern Arizona as an example.

  4. Tissue-Based MRI Intensity Standardization: Application to Multicentric Datasets

    Directory of Open Access Journals (Sweden)

    Nicolas Robitaille

    2012-01-01

    Full Text Available Intensity standardization in MRI aims at correcting scanner-dependent intensity variations. Existing simple and robust techniques aim at matching the input image histogram onto a standard, while we think that standardization should aim at matching spatially corresponding tissue intensities. In this study, we present a novel automatic technique, called STI for STandardization of Intensities, which not only shares the simplicity and robustness of histogram-matching techniques, but also incorporates tissue spatial intensity information. STI uses joint intensity histograms to determine intensity correspondence in each tissue between the input and standard images. We compared STI to an existing histogram-matching technique on two multicentric datasets, Pilot E-ADNI and ADNI, by measuring the intensity error with respect to the standard image after performing nonlinear registration. The Pilot E-ADNI dataset consisted in 3 subjects each scanned in 7 different sites. The ADNI dataset consisted in 795 subjects scanned in more than 50 different sites. STI was superior to the histogram-matching technique, showing significantly better intensity matching for the brain white matter with respect to the standard image.

  5. Dataset-driven research for improving recommender systems for learning

    NARCIS (Netherlands)

    Verbert, Katrien; Drachsler, Hendrik; Manouselis, Nikos; Wolpers, Martin; Vuorikari, Riina; Duval, Erik

    2011-01-01

    Verbert, K., Drachsler, H., Manouselis, N., Wolpers, M., Vuorikari, R., & Duval, E. (2011). Dataset-driven research for improving recommender systems for learning. In Ph. Long, & G. Siemens (Eds.), Proceedings of 1st International Conference Learning Analytics & Knowledge (pp. 44-53). February,

  6. Global Man-made Impervious Surface (GMIS) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Man-made Impervious Surface (GMIS) Dataset From Landsat consists of global estimates of fractional impervious cover derived from the Global Land Survey...

  7. Data assimilation and model evaluation experiment datasets

    Science.gov (United States)

    Lai, Chung-Cheng A.; Qian, Wen; Glenn, Scott M.

    1994-01-01

    The Institute for Naval Oceanography, in cooperation with Naval Research Laboratories and universities, executed the Data Assimilation and Model Evaluation Experiment (DAMEE) for the Gulf Stream region during fiscal years 1991-1993. Enormous effort has gone into the preparation of several high-quality and consistent datasets for model initialization and verification. This paper describes the preparation process, the temporal and spatial scopes, the contents, the structure, etc., of these datasets. The goal of DAMEE and the need of data for the four phases of experiment are briefly stated. The preparation of DAMEE datasets consisted of a series of processes: (1) collection of observational data; (2) analysis and interpretation; (3) interpolation using the Optimum Thermal Interpolation System package; (4) quality control and re-analysis; and (5) data archiving and software documentation. The data products from these processes included a time series of 3D fields of temperature and salinity, 2D fields of surface dynamic height and mixed-layer depth, analysis of the Gulf Stream and rings system, and bathythermograph profiles. To date, these are the most detailed and high-quality data for mesoscale ocean modeling, data assimilation, and forecasting research. Feedback from ocean modeling groups who tested this data was incorporated into its refinement. Suggestions for DAMEE data usages include (1) ocean modeling and data assimilation studies, (2) diagnosis and theoretical studies, and (3) comparisons with locally detailed observations.

  8. Factorial Validity and Internal Consistency of the Motivational Climate in Physical Education Scale

    Directory of Open Access Journals (Sweden)

    Markus Soini

    2014-03-01

    Full Text Available The aim of the study was to examine the construct validity and internal consistency of the Motivational Climate in Physical Education Scale (MCPES. A key element of the development process of the scale was establishing a theoretical framework that integrated the dimensions of task- and ego involving climates in conjunction with autonomy, and social relatedness supporting climates. These constructs were adopted from the self-determination and achievement goal theories. A sample of Finnish Grade 9 students, comprising 2,594 girls and 1,803 boys, completed the 18-item MCPES during one physical education class. The results of the study demonstrated that participants had highest mean in task-involving climate and the lowest in autonomy climate and ego-involving climate. Additionally, autonomy, social relatedness, and task- involving climates were significantly and strongly correlated with each other, whereas the ego- involving climate had low or negligible correlations with the other climate dimensions.The construct validity of the MCPES was analyzed using confirmatory factor analysis. The statistical fit of the four-factor model consisting of motivational climate factors supporting perceived autonomy, social relatedness, task-involvement, and ego-involvement was satisfactory. The results of the reliability analysis showed acceptable internal consistencies for all four dimensions. The Motivational Climate in Physical Education Scale can be considered as psychometrically valid tool to measure motivational climate in Finnish Grade 9 students.

  9. Internal consistency of a five-item form of the Francis Scale of Attitude Toward Christianity among adolescent students.

    Science.gov (United States)

    Campo-Arias, Adalberto; Oviedo, Heidi Celina; Cogollo, Zuleima

    2009-04-01

    The short form of the Francis Scale of Attitude Toward Christianity (L. J. Francis, 1992) is a 7-item Likert-type scale that shows high homogeneity among adolescents. The psychometric performance of a shorter version of this scale has not been explored. The authors aimed to determine the internal consistency of a 5-item form of the Francis Scale of Attitude Toward Christianity among 405 students from a school in Cartagena, Colombia. The authors computed the Cronbach's alpha coefficient for the 5 items with a greater corrected item-total punctuation correlation. The version without Items 2 and 7 showed internal consistency of .87. The 5-item version of the Francis Scale of Attitude Toward Christianity exhibited higher internal consistency than did the 7-item version. Future researchers should corroborate this finding.

  10. Assessment of disabilities in stroke patients with apraxia : Internal consistency and inter-observer reliability

    NARCIS (Netherlands)

    van Heugten, CM; Dekker, J; Deelman, BG; Stehmann-Saris, JC; Kinebanian, A

    1999-01-01

    In this paper the internal consistency and inter-observer reliability of the assessment of disabilities in stroke patients with apraxia is presented. Disabilities were assessed by means of observation of activities of daily living (ADL). The study was conducted at occupational therapy departments in

  11. Assessment of disabilities in stroke patients with apraxia: internal consistency and inter-observer reliability.

    NARCIS (Netherlands)

    Heugten, C.M. van; Dekker, J.; Deelman, B.G.; Stehmann-Saris, J.C.; Kinebanian, A.

    1999-01-01

    In this paper the internal consistency and inter-observer reliability of the assessment of disabilities in stroke patients with apraxia is presented. Disabilities were assessed by means of observation of activities of daily living (ADL). The study was conducted at occupational therapy departments in

  12. A dataset of forest biomass structure for Eurasia.

    Science.gov (United States)

    Schepaschenko, Dmitry; Shvidenko, Anatoly; Usoltsev, Vladimir; Lakyda, Petro; Luo, Yunjian; Vasylyshyn, Roman; Lakyda, Ivan; Myklush, Yuriy; See, Linda; McCallum, Ian; Fritz, Steffen; Kraxner, Florian; Obersteiner, Michael

    2017-05-16

    The most comprehensive dataset of in situ destructive sampling measurements of forest biomass in Eurasia have been compiled from a combination of experiments undertaken by the authors and from scientific publications. Biomass is reported as four components: live trees (stem, bark, branches, foliage, roots); understory (above- and below ground); green forest floor (above- and below ground); and coarse woody debris (snags, logs, dead branches of living trees and dead roots), consisting of 10,351 unique records of sample plots and 9,613 sample trees from ca 1,200 experiments for the period 1930-2014 where there is overlap between these two datasets. The dataset also contains other forest stand parameters such as tree species composition, average age, tree height, growing stock volume, etc., when available. Such a dataset can be used for the development of models of biomass structure, biomass extension factors, change detection in biomass structure, investigations into biodiversity and species distribution and the biodiversity-productivity relationship, as well as the assessment of the carbon pool and its dynamics, among many others.

  13. EEG datasets for motor imagery brain-computer interface.

    Science.gov (United States)

    Cho, Hohyun; Ahn, Minkyu; Ahn, Sangtae; Kwon, Moonyoung; Jun, Sung Chan

    2017-07-01

    Most investigators of brain-computer interface (BCI) research believe that BCI can be achieved through induced neuronal activity from the cortex, but not by evoked neuronal activity. Motor imagery (MI)-based BCI is one of the standard concepts of BCI, in that the user can generate induced activity by imagining motor movements. However, variations in performance over sessions and subjects are too severe to overcome easily; therefore, a basic understanding and investigation of BCI performance variation is necessary to find critical evidence of performance variation. Here we present not only EEG datasets for MI BCI from 52 subjects, but also the results of a psychological and physiological questionnaire, EMG datasets, the locations of 3D EEG electrodes, and EEGs for non-task-related states. We validated our EEG datasets by using the percentage of bad trials, event-related desynchronization/synchronization (ERD/ERS) analysis, and classification analysis. After conventional rejection of bad trials, we showed contralateral ERD and ipsilateral ERS in the somatosensory area, which are well-known patterns of MI. Finally, we showed that 73.08% of datasets (38 subjects) included reasonably discriminative information. Our EEG datasets included the information necessary to determine statistical significance; they consisted of well-discriminated datasets (38 subjects) and less-discriminative datasets. These may provide researchers with opportunities to investigate human factors related to MI BCI performance variation, and may also achieve subject-to-subject transfer by using metadata, including a questionnaire, EEG coordinates, and EEGs for non-task-related states. © The Authors 2017. Published by Oxford University Press.

  14. REM-3D Reference Datasets: Reconciling large and diverse compilations of travel-time observations

    Science.gov (United States)

    Moulik, P.; Lekic, V.; Romanowicz, B. A.

    2017-12-01

    A three-dimensional Reference Earth model (REM-3D) should ideally represent the consensus view of long-wavelength heterogeneity in the Earth's mantle through the joint modeling of large and diverse seismological datasets. This requires reconciliation of datasets obtained using various methodologies and identification of consistent features. The goal of REM-3D datasets is to provide a quality-controlled and comprehensive set of seismic observations that would not only enable construction of REM-3D, but also allow identification of outliers and assist in more detailed studies of heterogeneity. The community response to data solicitation has been enthusiastic with several groups across the world contributing recent measurements of normal modes, (fundamental mode and overtone) surface waves, and body waves. We present results from ongoing work with body and surface wave datasets analyzed in consultation with a Reference Dataset Working Group. We have formulated procedures for reconciling travel-time datasets that include: (1) quality control for salvaging missing metadata; (2) identification of and reasons for discrepant measurements; (3) homogenization of coverage through the construction of summary rays; and (4) inversions of structure at various wavelengths to evaluate inter-dataset consistency. In consultation with the Reference Dataset Working Group, we retrieved the station and earthquake metadata in several legacy compilations and codified several guidelines that would facilitate easy storage and reproducibility. We find strong agreement between the dispersion measurements of fundamental-mode Rayleigh waves, particularly when made using supervised techniques. The agreement deteriorates substantially in surface-wave overtones, for which discrepancies vary with frequency and overtone number. A half-cycle band of discrepancies is attributed to reversed instrument polarities at a limited number of stations, which are not reflected in the instrument response history

  15. International cooperation for the development of consistent and stable transportation regulations to promote and enhance safety and security

    International Nuclear Information System (INIS)

    Strosnider, J.

    2004-01-01

    International commerce of radioactive materials crosses national boundaries, linking separate regulatory institutions with a common purpose and making it necessary for these institutions to work together in order to achieve common safety goals in a manner that does not place an undue burden on industry and commerce. Widespread and increasing use of radioactive materials across the world has led to increases in the transport of radioactive materials. The demand for consistency in the oversight of international transport has also increased to prevent unnecessary delays and costs associated with incongruent or redundant regulatory requirements by the various countries through which radioactive material is transported. The International Atomic Energy Agency (IAEA) is the authority for international regulation of transportation of radioactive materials responsible for promulgation of regulations and guidance for the establishment of acceptable methods of transportation for the international community. As such, the IAEA is seen as the focal point for consensus building between its Member States to develop consistency in transportation regulations and reviews and to ensure the safe and secure transport of radioactive material. International cooperation is also needed to ensure stability in our regulatory processes. Changes to transportation regulations should be based on an anticipated safety benefit supported by risk information and insights gained from continuing experience, evaluation, and research studies. If we keep safety as the principle basis for regulatory changes, regulatory stability will be enhanced. Finally, as we endeavour to maintain consistency and stability in our international regulations, we must be mindful of the new security challenges that lay before the international community as a result of a changing terrorist environment. Terrorism is a problem of global concern that also requires international cooperation and support, as we look for ways to

  16. Internal Consistency of Performance Evaluations as a Function of Music Expertise and Excerpt Familiarity

    Science.gov (United States)

    Kinney, Daryl W.

    2009-01-01

    The purpose of this study was to examine the effects of music experience and excerpt familiarity on the internal consistency of performance evaluations. Participants included nonmusic majors who had not participated in high school music ensembles, nonmusic majors who had participated in high school music ensembles, music majors, and experts…

  17. Consistência interna da versão em português do Mini-Inventário de Fobia Social (Mini-SPIN Internal consistency of the Portuguese version of the Mini-Social Phobia Inventory (Mini-SPIN

    Directory of Open Access Journals (Sweden)

    Gustavo J. Fonseca D'El Rey

    2007-01-01

    Full Text Available CONTEXTO: A fobia social é um grave transtorno de ansiedade que traz incapacitação e sofrimento. OBJETIVOS: Investigar a consistência interna da versão em português do Mini-Inventário de Fobia Social (Mini-SPIN. MÉTODOS: Foi realizado um estudo da consistência interna do Mini-SPIN em uma amostra de 206 estudantes universitários da cidade de São Paulo, SP. RESULTADOS: A consistência interna do instrumento, analisada pelo coeficiente alfa de Cronbach, foi de 0,81. CONCLUSÕES: Esses achados permitiram concluir que a versão em português do Mini-SPIN exibiu resultados de boa consistência interna, semelhantes aos da versão original em inglês.BACKGROUND: Social phobia is a severe anxiety disorder that brings disability and distress. OBJECTIVES: To investigate the internal consistency of the Portuguese version of the Mini-Social Phobia Inventory (Mini-SPIN. METHODS: We conducted a study of internal consistency of the Mini-SPIN in a sample of 206 college students of the city of São Paulo, SP. RESULTS: The internal consistency of the instrument, analyzed by Cronbach's alpha coefficient, was 0.81. CONCLUSIONS: These findings suggest that the Portuguese version of the Mini-SPIN has a good internal consistency, similar to those obtained with the original English version.

  18. Internal consistency & validity of Indian Disability Evaluation and Assessment Scale (IDEAS in patients with schizophrenia

    Directory of Open Access Journals (Sweden)

    Sandeep Grover

    2014-01-01

    Full Text Available Background & objectives: The Indian Disability Evaluation and Assessment Scale (IDEAS has been recommended for assessment and certification of disability by the Government of India (GOI. However, the psychometric properties of IDEAS as adopted by GOI remain understudied. Our aim, thus, was to study the internal consistency and validity of IDEAS in patients with schizophrenia. Methods: A total of 103 consenting patients with residual schizophrenia were assessed for disability, quality of life (QOL and psychopathology using the IDEAS, WHO QOL-100 and Positive and Negative symptom scale (PANSS respectively. Internal consistency was calculated using Cronbach′s alpha. For construct validity, relations between IDEAS, and psychopathology and QOL were studied. Results: The inter-item correlations for IDEAS were significant with a Cronbach′s alpha of 0.721. All item scores other than score on communication and understanding; total and global IDEAS scores correlated significantly with the positive, negative and general sub-scales, and total PANSS scores. Communication and understanding was significantly related to negative sub-scale score only. Total and global disability scores correlated negatively with all the domains of WHOQOL-100 (ρ<0.01. The individual IDEAS item scores correlated negatively with various WHOQOL-100 domains (ρ0< 0.01. Interpretation & conclusions: This study findings showed that the GOI-modified IDEAS had good internal consistency and construct validity as tested in patients with residual schizophrenia. Similar studies need to be done with other groups of patients.

  19. Do Countries Consistently Engage in Misinforming the International Community about Their Efforts to Combat Money Laundering? Evidence Using Benford's Law.

    Science.gov (United States)

    Deleanu, Ioana Sorina

    2017-01-01

    Indicators of compliance and efficiency in combatting money laundering, collected by EUROSTAT, are plagued with shortcomings. In this paper, I have carried out a forensic analysis on a 2003-2010 dataset of indicators of compliance and efficiency in combatting money laundering, that European Union member states self-reported to EUROSTAT, and on the basis of which, their efforts were evaluated. I used Benford's law to detect any anomalous statistical patterns and found that statistical anomalies were also consistent with strategic manipulation. According to Benford's law, if we pick a random sample of numbers representing natural processes, and look at the distribution of the first digits of these numbers, we see that, contrary to popular belief, digit 1 occurs most often, then digit 2, and so on, with digit 9 occurring in less than 5% of the sample. Without prior knowledge of Benford's law, since people are not intuitively good at creating truly random numbers, deviations thereof can capture strategic alterations. In order to eliminate other sources of deviation, I have compared deviations in situations where incentives and opportunities for manipulation existed and in situations where they did not. While my results are not a conclusive proof of strategic manipulation, they signal that countries that faced incentives and opportunities to misinform the international community about their efforts to combat money laundering may have manipulated these indicators. Finally, my analysis points to the high potential for disruption that the manipulation of national statistics has, and calls for the acknowledgment that strategic manipulation can be an unintended consequence of the international community's pressure on countries to put combatting money laundering on the top of their national agenda.

  20. Do Countries Consistently Engage in Misinforming the International Community about Their Efforts to Combat Money Laundering? Evidence Using Benford's Law.

    Directory of Open Access Journals (Sweden)

    Ioana Sorina Deleanu

    Full Text Available Indicators of compliance and efficiency in combatting money laundering, collected by EUROSTAT, are plagued with shortcomings. In this paper, I have carried out a forensic analysis on a 2003-2010 dataset of indicators of compliance and efficiency in combatting money laundering, that European Union member states self-reported to EUROSTAT, and on the basis of which, their efforts were evaluated. I used Benford's law to detect any anomalous statistical patterns and found that statistical anomalies were also consistent with strategic manipulation. According to Benford's law, if we pick a random sample of numbers representing natural processes, and look at the distribution of the first digits of these numbers, we see that, contrary to popular belief, digit 1 occurs most often, then digit 2, and so on, with digit 9 occurring in less than 5% of the sample. Without prior knowledge of Benford's law, since people are not intuitively good at creating truly random numbers, deviations thereof can capture strategic alterations. In order to eliminate other sources of deviation, I have compared deviations in situations where incentives and opportunities for manipulation existed and in situations where they did not. While my results are not a conclusive proof of strategic manipulation, they signal that countries that faced incentives and opportunities to misinform the international community about their efforts to combat money laundering may have manipulated these indicators. Finally, my analysis points to the high potential for disruption that the manipulation of national statistics has, and calls for the acknowledgment that strategic manipulation can be an unintended consequence of the international community's pressure on countries to put combatting money laundering on the top of their national agenda.

  1. Dataset of anomalies and malicious acts in a cyber-physical subsystem.

    Science.gov (United States)

    Laso, Pedro Merino; Brosset, David; Puentes, John

    2017-10-01

    This article presents a dataset produced to investigate how data and information quality estimations enable to detect aNomalies and malicious acts in cyber-physical systems. Data were acquired making use of a cyber-physical subsystem consisting of liquid containers for fuel or water, along with its automated control and data acquisition infrastructure. Described data consist of temporal series representing five operational scenarios - Normal, aNomalies, breakdown, sabotages, and cyber-attacks - corresponding to 15 different real situations. The dataset is publicly available in the .zip file published with the article, to investigate and compare faulty operation detection and characterization methods for cyber-physical systems.

  2. Development of a SPARK Training Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Sayre, Amanda M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Olson, Jarrod R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2015-03-01

    In its first five years, the National Nuclear Security Administration’s (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK’s intended analysis capability. The analysis demonstration sought to answer the

  3. A Dataset of Three Educational Technology Experiments on Differentiation, Formative Testing and Feedback

    Science.gov (United States)

    Haelermans, Carla; Ghysels, Joris; Prince, Fernao

    2015-01-01

    This paper describes a dataset with data from three individually randomized educational technology experiments on differentiation, formative testing and feedback during one school year for a group of 8th grade students in the Netherlands, using administrative data and the online motivation questionnaire of Boekaerts. The dataset consists of pre-…

  4. Review of ATLAS Open Data 8 TeV datasets, tools and activities

    CERN Document Server

    The ATLAS collaboration

    2018-01-01

    The ATLAS Collaboration has released two 8 TeV datasets and relevant simulated samples to the public for educational use. A number of groups within ATLAS have used these ATLAS Open Data 8 TeV datasets, developing tools and educational material to promote particle physics. The general aim of these activities is to provide simple and user-friendly interactive interfaces to simulate the procedures used by high-energy physics researchers. International Masterclasses introduce particle physics to high school students and have been studying 8 TeV ATLAS Open Data since 2015. Inspired by this success, a new ATLAS Open Data initiative was launched in 2016 for university students. A comprehensive educational platform was thus developed featuring a second 8 TeV dataset and a new set of educational tools. The 8 TeV datasets and associated tools are presented and discussed here, as well as a selection of activities studying the ATLAS Open Data 8 TeV datasets.

  5. Internal consistency, reliability, and temporal stability of the Oxford Happiness Questionnaire short-form: Test-retest data over two weeks

    OpenAIRE

    MCGUCKIN, CONOR

    2006-01-01

    PUBLISHED The Oxford Happiness Questionnaire short-form is a recently developed eight-item measure of happiness. This study evaluated the internal consistency reliability and test-retest reliability of the Oxford Happiness Questionnaire short-form among 55 Northern Irish undergraduate university students who completed the measure on two occasions separated by two weeks. Internal consistency of the measure on both occasions was satisfactory at both Time 1 (alpha = .62) and Time 2 (alpha = ....

  6. Correction of elevation offsets in multiple co-located lidar datasets

    Science.gov (United States)

    Thompson, David M.; Dalyander, P. Soupy; Long, Joseph W.; Plant, Nathaniel G.

    2017-04-07

    IntroductionTopographic elevation data collected with airborne light detection and ranging (lidar) can be used to analyze short- and long-term changes to beach and dune systems. Analysis of multiple lidar datasets at Dauphin Island, Alabama, revealed systematic, island-wide elevation differences on the order of 10s of centimeters (cm) that were not attributable to real-world change and, therefore, were likely to represent systematic sampling offsets. These offsets vary between the datasets, but appear spatially consistent within a given survey. This report describes a method that was developed to identify and correct offsets between lidar datasets collected over the same site at different times so that true elevation changes over time, associated with sediment accumulation or erosion, can be analyzed.

  7. Factor Structure, Internal Consistency, and Screening Sensitivity of the GARS-2 in a Developmental Disabilities Sample

    Directory of Open Access Journals (Sweden)

    Martin A. Volker

    2016-01-01

    Full Text Available The Gilliam Autism Rating Scale-Second Edition (GARS-2 is a widely used screening instrument that assists in the identification and diagnosis of autism. The purpose of this study was to examine the factor structure, internal consistency, and screening sensitivity of the GARS-2 using ratings from special education teaching staff for a sample of 240 individuals with autism or other significant developmental disabilities. Exploratory factor analysis yielded a correlated three-factor solution similar to that found in 2005 by Lecavalier for the original GARS. Though the three factors appeared to be reasonably consistent with the intended constructs of the three GARS-2 subscales, the analysis indicated that more than a third of the GARS-2 items were assigned to the wrong subscale. Internal consistency estimates met or exceeded standards for screening and were generally higher than those in previous studies. Screening sensitivity was .65 and specificity was .81 for the Autism Index using a cut score of 85. Based on these findings, recommendations are made for instrument revision.

  8. EPA Nanorelease Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — EPA Nanorelease Dataset. This dataset is associated with the following publication: Wohlleben, W., C. Kingston, J. Carter, E. Sahle-Demessie, S. Vazquez-Campos, B....

  9. Development of a SPARK Training Dataset

    International Nuclear Information System (INIS)

    Sayre, Amanda M.; Olson, Jarrod R.

    2015-01-01

    In its first five years, the National Nuclear Security Administration's (NNSA) Next Generation Safeguards Initiative (NGSI) sponsored more than 400 undergraduate, graduate, and post-doctoral students in internships and research positions (Wyse 2012). In the past seven years, the NGSI program has, and continues to produce a large body of scientific, technical, and policy work in targeted core safeguards capabilities and human capital development activities. Not only does the NGSI program carry out activities across multiple disciplines, but also across all U.S. Department of Energy (DOE)/NNSA locations in the United States. However, products are not readily shared among disciplines and across locations, nor are they archived in a comprehensive library. Rather, knowledge of NGSI-produced literature is localized to the researchers, clients, and internal laboratory/facility publication systems such as the Electronic Records and Information Capture Architecture (ERICA) at the Pacific Northwest National Laboratory (PNNL). There is also no incorporated way of analyzing existing NGSI literature to determine whether the larger NGSI program is achieving its core safeguards capabilities and activities. A complete library of NGSI literature could prove beneficial to a cohesive, sustainable, and more economical NGSI program. The Safeguards Platform for Automated Retrieval of Knowledge (SPARK) has been developed to be a knowledge storage, retrieval, and analysis capability to capture safeguards knowledge to exist beyond the lifespan of NGSI. During the development process, it was necessary to build a SPARK training dataset (a corpus of documents) for initial entry into the system and for demonstration purposes. We manipulated these data to gain new information about the breadth of NGSI publications, and they evaluated the science-policy interface at PNNL as a practical demonstration of SPARK's intended analysis capability. The analysis demonstration sought to answer

  10. Internal Affairs Allegations

    Data.gov (United States)

    Montgomery County of Maryland — This dataset contains allegations brought to the attention of the Internal Affairs Division either through external complaints or internal complaint or recognition....

  11. Introduction of a simple-model-based land surface dataset for Europe

    Science.gov (United States)

    Orth, Rene; Seneviratne, Sonia I.

    2015-04-01

    Land surface hydrology can play a crucial role during extreme events such as droughts, floods and even heat waves. We introduce in this study a new hydrological dataset for Europe that consists of soil moisture, runoff and evapotranspiration (ET). It is derived with a simple water balance model (SWBM) forced with precipitation, temperature and net radiation. The SWBM dataset extends over the period 1984-2013 with a daily time step and 0.5° × 0.5° resolution. We employ a novel calibration approach, in which we consider 300 random parameter sets chosen from an observation-based range. Using several independent validation datasets representing soil moisture (or terrestrial water content), ET and streamflow, we identify the best performing parameter set and hence the new dataset. To illustrate its usefulness, the SWBM dataset is compared against several state-of-the-art datasets (ERA-Interim/Land, MERRA-Land, GLDAS-2-Noah, simulations of the Community Land Model Version 4), using all validation datasets as reference. For soil moisture dynamics it outperforms the benchmarks. Therefore the SWBM soil moisture dataset constitutes a reasonable alternative to sparse measurements, little validated model results, or proxy data such as precipitation indices. Also in terms of runoff the SWBM dataset performs well, whereas the evaluation of the SWBM ET dataset is overall satisfactory, but the dynamics are less well captured for this variable. This highlights the limitations of the dataset, as it is based on a simple model that uses uniform parameter values. Hence some processes impacting ET dynamics may not be captured, and quality issues may occur in regions with complex terrain. Even though the SWBM is well calibrated, it cannot replace more sophisticated models; but as their calibration is a complex task the present dataset may serve as a benchmark in future. In addition we investigate the sources of skill of the SWBM dataset and find that the parameter set has a similar

  12. The NASA Subsonic Jet Particle Image Velocimetry (PIV) Dataset

    Science.gov (United States)

    Bridges, James; Wernet, Mark P.

    2011-01-01

    Many tasks in fluids engineering require prediction of turbulence of jet flows. The present document documents the single-point statistics of velocity, mean and variance, of cold and hot jet flows. The jet velocities ranged from 0.5 to 1.4 times the ambient speed of sound, and temperatures ranged from unheated to static temperature ratio 2.7. Further, the report assesses the accuracies of the data, e.g., establish uncertainties for the data. This paper covers the following five tasks: (1) Document acquisition and processing procedures used to create the particle image velocimetry (PIV) datasets. (2) Compare PIV data with hotwire and laser Doppler velocimetry (LDV) data published in the open literature. (3) Compare different datasets acquired at the same flow conditions in multiple tests to establish uncertainties. (4) Create a consensus dataset for a range of hot jet flows, including uncertainty bands. (5) Analyze this consensus dataset for self-consistency and compare jet characteristics to those of the open literature. The final objective was fulfilled by using the potential core length and the spread rate of the half-velocity radius to collapse of the mean and turbulent velocity fields over the first 20 jet diameters.

  13. Public Availability to ECS Collected Datasets

    Science.gov (United States)

    Henderson, J. F.; Warnken, R.; McLean, S. J.; Lim, E.; Varner, J. D.

    2013-12-01

    Coastal nations have spent considerable resources exploring the limits of their extended continental shelf (ECS) beyond 200 nm. Although these studies are funded to fulfill requirements of the UN Convention on the Law of the Sea, the investments are producing new data sets in frontier areas of Earth's oceans that will be used to understand, explore, and manage the seafloor and sub-seafloor for decades to come. Although many of these datasets are considered proprietary until a nation's potential ECS has become 'final and binding' an increasing amount of data are being released and utilized by the public. Data sets include multibeam, seismic reflection/refraction, bottom sampling, and geophysical data. The U.S. ECS Project, a multi-agency collaboration whose mission is to establish the full extent of the continental shelf of the United States consistent with international law, relies heavily on data and accurate, standard metadata. The United States has made it a priority to make available to the public all data collected with ECS-funding as quickly as possible. The National Oceanic and Atmospheric Administration's (NOAA) National Geophysical Data Center (NGDC) supports this objective by partnering with academia and other federal government mapping agencies to archive, inventory, and deliver marine mapping data in a coordinated, consistent manner. This includes ensuring quality, standard metadata and developing and maintaining data delivery capabilities built on modern digital data archives. Other countries, such as Ireland, have submitted their ECS data for public availability and many others have made pledges to participate in the future. The data services provided by NGDC support the U.S. ECS effort as well as many developing nation's ECS effort through the U.N. Environmental Program. Modern discovery, visualization, and delivery of scientific data and derived products that span national and international sources of data ensure the greatest re-use of data and

  14. International research networks in pharmaceuticals

    DEFF Research Database (Denmark)

    Cantner, Uwe; Rake, Bastian

    2014-01-01

    of scientific publications related to pharmaceutical research and applying social network analysis, we find that both the number of countries and their connectivity increase in almost all disease group specific networks. The cores of the networks consist of high income OECD countries and remain rather stable......Knowledge production and scientific research have become increasingly more collaborative and international, particularly in pharmaceuticals. We analyze this tendency in general and tie formation in international research networks on the country level in particular. Based on a unique dataset...... over time. Using network regression techniques to analyze the network dynamics our results indicate that accumulative advantages based on connectedness and multi-connectivity are positively related to changes in the countries' collaboration intensity whereas various indicators on similarity between...

  15. An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures.

    Science.gov (United States)

    Ponterotto, Joseph G; Ruckdeschel, Daniel E

    2007-12-01

    The present article addresses issues in reliability assessment that are often neglected in psychological research such as acceptable levels of internal consistency for research purposes, factors affecting the magnitude of coefficient alpha (alpha), and considerations for interpreting alpha within the research context. A new reliability matrix anchored in classical test theory is introduced to help researchers judge adequacy of internal consistency coefficients with research measures. Guidelines and cautions in applying the matrix are provided.

  16. Glyco-centric lectin magnetic bead array (LeMBA − proteomics dataset of human serum samples from healthy, Barrett׳s esophagus and esophageal adenocarcinoma individuals

    Directory of Open Access Journals (Sweden)

    Alok K. Shah

    2016-06-01

    Full Text Available This data article describes serum glycoprotein biomarker discovery and qualification datasets generated using lectin magnetic bead array (LeMBA – mass spectrometry techniques, “Serum glycoprotein biomarker discovery and qualification pipeline reveals novel diagnostic biomarker candidates for esophageal adenocarcinoma” [1]. Serum samples collected from healthy, metaplastic Barrett׳s esophagus (BE and esophageal adenocarcinoma (EAC individuals were profiled for glycoprotein subsets via differential lectin binding. The biomarker discovery proteomics dataset consisting of 20 individual lectin pull-downs for 29 serum samples with a spiked-in internal standard chicken ovalbumin protein has been deposited in the PRIDE partner repository of the ProteomeXchange Consortium with the data set identifier PRIDE: http://www.ebi.ac.uk/pride/archive/projects/PXD002442. Annotated MS/MS spectra for the peptide identifications can be viewed using MS-Viewer (〈http://prospector2.ucsf.edu/prospector/cgi-bin/msform.cgi?form=msviewer〉 using search key “jn7qafftux”. The qualification dataset contained 6-lectin pulldown-coupled multiple reaction monitoring-mass spectrometry (MRM-MS data for 41 protein candidates, from 60 serum samples. This dataset is available as a supplemental files with the original publication [1].

  17. Internal Consistency of the easyCBM© CCSS Reading Measures: Grades 3-8. Technical Report #1407

    Science.gov (United States)

    Guerreiro, Meg; Alonzo, Julie; Tindal, Gerald

    2014-01-01

    This technical report documents findings from a study of the internal consistency and split-half reliability of the easyCBM© CCSS Reading measures, grades 3-8. Data, drawn from an extant data set gathered in school year 2013-2014, include scores from over 150,000 students' fall and winter benchmark assessments. Findings suggest that the easyCBM©…

  18. How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

    NARCIS (Netherlands)

    Goeman, J.J.; De, Jong N.H.

    2018-01-01

    Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate

  19. The memory failures of everyday questionnaire (MFE): internal consistency and reliability.

    Science.gov (United States)

    Montejo Carrasco, Pedro; Montenegro, Peña Mercedes; Sueiro, Manuel J

    2012-07-01

    The Memory Failures of Everyday Questionnaire (MFE) is one of the most widely-used instruments to assess memory failures in daily life. The original scale has nine response options, making it difficult to apply; we created a three-point scale (0-1-2) with response choices that make it easier to administer. We examined the two versions' equivalence in a sample of 193 participants between 19 and 64 years of age. The test-retest reliability and internal consistency of the version we propose were also computed in a sample of 113 people. Several indicators attest to the two forms' equivalence: the correlation between the items' means (r = .94; p MFE 1-9. The MFE 0-2 provides a brief, simple evaluation, so we recommend it for use in clinical practice as well as research.

  20. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2])...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  1. [Discomfort associated with dental extraction surgery and development of a questionnaire (QCirDental). Part I: Impacts and internal consistency].

    Science.gov (United States)

    Bortoluzzi, Marcelo Carlos; Martins, Luciana Dorochenko; Takahashi, André; Ribeiro, Bianca; Martins, Ligiane; Pinto, Marcia Helena Baldani

    2018-01-01

    The scope of this study was to develop and validate a questionnaire (QCirDental) to measure the impacts associated with dental extraction surgery. The QCirDental questionnaire was developed in two steps; (1) question and item generation and selection, and (2) pretest of the questionnaire with evaluation of the its measurement properties (internal consistency and responsiveness). The sample was composed of 123 patients. None of the patients had any difficulty in understanding the QCirDental. The instrument was found to have excellent internal consistency with Cronbach's alpha reliability coefficient of 0.83. The principal component analysis (Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0,72 and Bartlett's Test of Sphericity with p < 0.001) showed six (6) dimensions explaining 67.5% of the variance. The QCirDental presented excellent internal consistency, being a questionnaire that is easy to read and understand with adequate semantic and content validity. More than 80% of the patients who underwent dental extraction reported some degree of discomfort within the perioperative period which highlights the necessity to assess the quality of care and impacts of dental extraction surgery.

  2. Do Countries Consistently Engage in Misinforming the International Community about Their Efforts to Combat Money Laundering? Evidence Using Benford’s Law

    Science.gov (United States)

    2017-01-01

    Indicators of compliance and efficiency in combatting money laundering, collected by EUROSTAT, are plagued with shortcomings. In this paper, I have carried out a forensic analysis on a 2003–2010 dataset of indicators of compliance and efficiency in combatting money laundering, that European Union member states self-reported to EUROSTAT, and on the basis of which, their efforts were evaluated. I used Benford’s law to detect any anomalous statistical patterns and found that statistical anomalies were also consistent with strategic manipulation. According to Benford’s law, if we pick a random sample of numbers representing natural processes, and look at the distribution of the first digits of these numbers, we see that, contrary to popular belief, digit 1 occurs most often, then digit 2, and so on, with digit 9 occurring in less than 5% of the sample. Without prior knowledge of Benford’s law, since people are not intuitively good at creating truly random numbers, deviations thereof can capture strategic alterations. In order to eliminate other sources of deviation, I have compared deviations in situations where incentives and opportunities for manipulation existed and in situations where they did not. While my results are not a conclusive proof of strategic manipulation, they signal that countries that faced incentives and opportunities to misinform the international community about their efforts to combat money laundering may have manipulated these indicators. Finally, my analysis points to the high potential for disruption that the manipulation of national statistics has, and calls for the acknowledgment that strategic manipulation can be an unintended consequence of the international community’s pressure on countries to put combatting money laundering on the top of their national agenda. PMID:28122058

  3. Link between self-consistent pressure profiles and electron internal transport barriers in tokamaks

    Energy Technology Data Exchange (ETDEWEB)

    Razumova, K A [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation); Andreev, V F [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation); Donne, A J H [FOM-Institute for Plasma Physics Rijnhuizen, Association EURATOM-FOM, partner in the Trilateral Euregio Cluster, PO Box 1207, 3430 BE Nieuwegein (Netherlands); Hogeweij, G M D [FOM-Institute for Plasma Physics Rijnhuizen, Association EURATOM-FOM, partner in the Trilateral Euregio Cluster, PO Box 1207, 3430 BE Nieuwegein (Netherlands); Lysenko, S E [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation); Shelukhin, D A [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation); Spakman, G W [FOM-Institute for Plasma Physics Rijnhuizen, Association EURATOM-FOM, partner in the Trilateral Euregio Cluster, PO Box 1207, 3430 BE Nieuwegein (Netherlands); Vershkov, V A [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation); Zhuravlev, V A [Nuclear Fusion Institute, RRC ' Kurchatov Institute' , 123182 Moscow (Russian Federation)

    2006-09-15

    Tokamak plasmas have a tendency to self-organization: the plasma pressure profiles obtained in different operational regimes and even in various tokamaks may be represented by a single typical curve, called the self-consistent pressure profile. About a decade ago local zones with enhanced confinement were discovered in tokamak plasmas. These zones are referred to as internal transport barriers (ITBs) and they can act on the electron and/or ion fluid. Here the pressure gradients can largely exceed the gradients dictated by profile consistency. So the existence of ITBs seems to be in contradiction with the self-consistent pressure profiles (this is also often referred to as profile resilience or profile stiffness). In this paper we will discuss the interplay between profile consistency and ITBs. A summary of the cumulative information obtained from T-10, RTP and TEXTOR is given, and a coherent explanation of the main features of the observed phenomena is suggested. Both phenomena, the self-consistent profile and ITB, are connected with the density of rational magnetic surfaces, where the turbulent cells are situated. The distance between these cells determines the level of their interaction, and therefore the level of the turbulent transport. This process regulates the plasma pressure profile. If the distance is wide, the turbulent flux may be diminished and the ITB may be formed. In regions with rarefied surfaces the steeper pressure gradients are possible without instantaneously inducing pressure driven instabilities, which force the profiles back to their self-consistent shapes. Also it can be expected that the ITB region is wider for lower dq/d{rho} (more rarefied surfaces)

  4. Towards interoperable and reproducible QSAR analyses: Exchange of datasets.

    Science.gov (United States)

    Spjuth, Ola; Willighagen, Egon L; Guha, Rajarshi; Eklund, Martin; Wikberg, Jarl Es

    2010-06-30

    QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML) which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join, extend, combine datasets and hence work collectively, but

  5. Towards interoperable and reproducible QSAR analyses: Exchange of datasets

    Directory of Open Access Journals (Sweden)

    Spjuth Ola

    2010-06-01

    Full Text Available Abstract Background QSAR is a widely used method to relate chemical structures to responses or properties based on experimental observations. Much effort has been made to evaluate and validate the statistical modeling in QSAR, but these analyses treat the dataset as fixed. An overlooked but highly important issue is the validation of the setup of the dataset, which comprises addition of chemical structures as well as selection of descriptors and software implementations prior to calculations. This process is hampered by the lack of standards and exchange formats in the field, making it virtually impossible to reproduce and validate analyses and drastically constrain collaborations and re-use of data. Results We present a step towards standardizing QSAR analyses by defining interoperable and reproducible QSAR datasets, consisting of an open XML format (QSAR-ML which builds on an open and extensible descriptor ontology. The ontology provides an extensible way of uniquely defining descriptors for use in QSAR experiments, and the exchange format supports multiple versioned implementations of these descriptors. Hence, a dataset described by QSAR-ML makes its setup completely reproducible. We also provide a reference implementation as a set of plugins for Bioclipse which simplifies setup of QSAR datasets, and allows for exporting in QSAR-ML as well as old-fashioned CSV formats. The implementation facilitates addition of new descriptor implementations from locally installed software and remote Web services; the latter is demonstrated with REST and XMPP Web services. Conclusions Standardized QSAR datasets open up new ways to store, query, and exchange data for subsequent analyses. QSAR-ML supports completely reproducible creation of datasets, solving the problems of defining which software components were used and their versions, and the descriptor ontology eliminates confusions regarding descriptors by defining them crisply. This makes is easy to join

  6. EPA Office of Water (OW): 2002 Impaired Waters Baseline NHDPlus Indexed Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset consists of geospatial and attribute data identifying the spatial extent of state-reported impaired waters (EPA's Integrated Reporting categories 4a,...

  7. NGO Presence and Activity in Afghanistan, 2000–2014: A Provincial-Level Dataset

    Directory of Open Access Journals (Sweden)

    David F. Mitchell

    2017-06-01

    Full Text Available This article introduces a new provincial-level dataset on non-governmental organizations (NGOs in Afghanistan. The data—which are freely available for download—provide information on the locations and sectors of activity of 891 international and local (Afghan NGOs that operated in the country between 2000 and 2014. A summary and visualization of the data is presented in the article following a brief historical overview of NGOs in Afghanistan. Links to download the full dataset are provided in the conclusion.

  8. Evaluating the factor structure, item analyses, and internal consistency of hospital anxiety and depression scale in Iranian infertile patients

    Directory of Open Access Journals (Sweden)

    Payam Amini

    2017-09-01

    Full Text Available Background: The hospital anxiety and depression scale (HADS is a common screening tool designed to measure the level of anxiety and depression in different factor structures and has been extensively used in non-psychiatric populations and individuals experiencing fertility problems. Objective: The aims of this study were to evaluate the factor structure, item analyses, and internal consistency of HADS in Iranian infertile patients. Materials and Methods: This cross-sectional study included 651 infertile patients (248 men and 403 women referred to a referral infertility Center in Tehran, Iran between January 2014 and January 2015. Confirmatory factor analysis was used to determine the underlying factor structure of the HADS among one, two, and threefactor models. Several goodness of fit indices were utilized such as comparative, normed and goodness of fit indices, Akaike information criterion, and the root mean squared error of approximation. In addition to HADS, the Satisfaction with Life Scale questionnaires as well as demographic and clinical information were administered to all patients. Results: The goodness of fit indices through CFAs exposed that three and onefactor model provided the best and worst fit to the total, male and female datasets compared to the other factor structure models for the infertile patients. The Cronbach’s alpha for anxiety and depression subscales were 0.866 and 0.753 respectively. The HADS subscales significantly correlated with SWLS, indicating an acceptable convergent validity. Conclusion: The HADS was found to be a three-factor structure screening instrument in the field of infertility.

  9. A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset

    Science.gov (United States)

    Schellekens, Jaap; Dutra, Emanuel; Martínez-de la Torre, Alberto; Balsamo, Gianpaolo; van Dijk, Albert; Sperna Weiland, Frederiek; Minvielle, Marie; Calvet, Jean-Christophe; Decharme, Bertrand; Eisner, Stephanie; Fink, Gabriel; Flörke, Martina; Peßenteiner, Stefanie; van Beek, Rens; Polcher, Jan; Beck, Hylke; Orth, René; Calton, Ben; Burke, Sophia; Dorigo, Wouter; Weedon, Graham P.

    2017-07-01

    The dataset presented here consists of an ensemble of 10 global hydrological and land surface models for the period 1979-2012 using a reanalysis-based meteorological forcing dataset (0.5° resolution). The current dataset serves as a state of the art in current global hydrological modelling and as a benchmark for further improvements in the coming years. A signal-to-noise ratio analysis revealed low inter-model agreement over (i) snow-dominated regions and (ii) tropical rainforest and monsoon areas. The large uncertainty of precipitation in the tropics is not reflected in the ensemble runoff. Verification of the results against benchmark datasets for evapotranspiration, snow cover, snow water equivalent, soil moisture anomaly and total water storage anomaly using the tools from The International Land Model Benchmarking Project (ILAMB) showed overall useful model performance, while the ensemble mean generally outperformed the single model estimates. The results also show that there is currently no single best model for all variables and that model performance is spatially variable. In our unconstrained model runs the ensemble mean of total runoff into the ocean was 46 268 km3 yr-1 (334 kg m-2 yr-1), while the ensemble mean of total evaporation was 537 kg m-2 yr-1. All data are made available openly through a Water Cycle Integrator portal (WCI, wci.earth2observe.eu), and via a direct http and ftp download. The portal follows the protocols of the open geospatial consortium such as OPeNDAP, WCS and WMS. The DOI for the data is https://doi.org/10.1016/10.5281/zenodo.167070.

  10. A global water resources ensemble of hydrological models: the eartH2Observe Tier-1 dataset

    Directory of Open Access Journals (Sweden)

    J. Schellekens

    2017-07-01

    Full Text Available The dataset presented here consists of an ensemble of 10 global hydrological and land surface models for the period 1979–2012 using a reanalysis-based meteorological forcing dataset (0.5° resolution. The current dataset serves as a state of the art in current global hydrological modelling and as a benchmark for further improvements in the coming years. A signal-to-noise ratio analysis revealed low inter-model agreement over (i snow-dominated regions and (ii tropical rainforest and monsoon areas. The large uncertainty of precipitation in the tropics is not reflected in the ensemble runoff. Verification of the results against benchmark datasets for evapotranspiration, snow cover, snow water equivalent, soil moisture anomaly and total water storage anomaly using the tools from The International Land Model Benchmarking Project (ILAMB showed overall useful model performance, while the ensemble mean generally outperformed the single model estimates. The results also show that there is currently no single best model for all variables and that model performance is spatially variable. In our unconstrained model runs the ensemble mean of total runoff into the ocean was 46 268 km3 yr−1 (334 kg m−2 yr−1, while the ensemble mean of total evaporation was 537 kg m−2 yr−1. All data are made available openly through a Water Cycle Integrator portal (WCI, wci.earth2observe.eu, and via a direct http and ftp download. The portal follows the protocols of the open geospatial consortium such as OPeNDAP, WCS and WMS. The DOI for the data is https://doi.org/10.1016/10.5281/zenodo.167070.

  11. Productivity Levels in Distributive Trades : A New ICOP Dataset for OECD Countries

    NARCIS (Netherlands)

    Timmer, Marcel P.; Ypma, Gerard

    2006-01-01

    This study provides a new dataset for international comparisons of labour productivity levels in distributive trade (retail and wholesale trade) between OECD countries. The productivity level comparisons are based on a harmonised set of Purchasing Power Parities (PPPs) for 1997 using the

  12. The Internal Consistency Reliability of the Katz-Francis Scale of Attitude toward Judaism among Australian Jews

    Directory of Open Access Journals (Sweden)

    Patrick Lumbroso

    2016-09-01

    Full Text Available The Katz-Francis Scale of Attitude toward Judaism was developed initially to extend among the Hebrew-speaking Jewish community in Israel a growing body of international research concerned to map the correlates, antecedents and consequences of individual differences in attitude toward religion as assessed by the Francis Scale of Attitude toward Christianity. The present paper explored the internal consistency reliability and construct validity of the English translation of the Katz-Francis Scale of Attitude toward Judaism among 101 Australian Jews. On the basis of these data, this instrument is commended for application in further research.

  13. Content validation: clarity/relevance, reliability and internal consistency of enunciative signs of language acquisition.

    Science.gov (United States)

    Crestani, Anelise Henrich; Moraes, Anaelena Bragança de; Souza, Ana Paula Ramos de

    2017-08-10

    To analyze the results of the validation of building enunciative signs of language acquisition for children aged 3 to 12 months. The signs were built based on mechanisms of language acquisition in an enunciative perspective and on clinical experience with language disorders. The signs were submitted to judgment of clarity and relevance by a sample of six experts, doctors in linguistic in with knowledge of psycholinguistics and language clinic. In the validation of reliability, two judges/evaluators helped to implement the instruments in videos of 20% of the total sample of mother-infant dyads using the inter-evaluator method. The method known as internal consistency was applied to the total sample, which consisted of 94 mother-infant dyads to the contents of the Phase 1 (3-6 months) and 61 mother-infant dyads to the contents of Phase 2 (7 to 12 months). The data were collected through the analysis of mother-infant interaction based on filming of dyads and application of the parameters to be validated according to the child's age. Data were organized in a spreadsheet and then converted to computer applications for statistical analysis. The judgments of clarity/relevance indicated no modifications to be made in the instruments. The reliability test showed an almost perfect agreement between judges (0.8 ≤ Kappa ≥ 1.0); only the item 2 of Phase 1 showed substantial agreement (0.6 ≤ Kappa ≥ 0.79). The internal consistency for Phase 1 had alpha = 0.84, and Phase 2, alpha = 0.74. This demonstrates the reliability of the instruments. The results suggest adequacy as to content validity of the instruments created for both age groups, demonstrating the relevance of the content of enunciative signs of language acquisition.

  14. Statistical exploration of dataset examining key indicators influencing housing and urban infrastructure investments in megacities

    Directory of Open Access Journals (Sweden)

    Adedeji O. Afolabi

    2018-06-01

    Full Text Available Lagos, by the UN standards, has attained the megacity status, with the attendant challenges of living up to that titanic position; regrettably it struggles with its present stock of housing and infrastructural facilities to match its new status. Based on a survey of construction professionals’ perception residing within the state, a questionnaire instrument was used to gather the dataset. The statistical exploration contains dataset on the state of housing and urban infrastructural deficit, key indicators spurring the investment by government to upturn the deficit and improvement mechanisms to tackle the infrastructural dearth. Descriptive statistics and inferential statistics were used to present the dataset. The dataset when analyzed can be useful for policy makers, local and international governments, world funding bodies, researchers and infrastructural investors. Keywords: Construction, Housing, Megacities, Population, Urban infrastructures

  15. Resampling Methods Improve the Predictive Power of Modeling in Class-Imbalanced Datasets

    Directory of Open Access Journals (Sweden)

    Paul H. Lee

    2014-09-01

    Full Text Available In the medical field, many outcome variables are dichotomized, and the two possible values of a dichotomized variable are referred to as classes. A dichotomized dataset is class-imbalanced if it consists mostly of one class, and performance of common classification models on this type of dataset tends to be suboptimal. To tackle such a problem, resampling methods, including oversampling and undersampling can be used. This paper aims at illustrating the effect of resampling methods using the National Health and Nutrition Examination Survey (NHANES wave 2009–2010 dataset. A total of 4677 participants aged ≥20 without self-reported diabetes and with valid blood test results were analyzed. The Classification and Regression Tree (CART procedure was used to build a classification model on undiagnosed diabetes. A participant demonstrated evidence of diabetes according to WHO diabetes criteria. Exposure variables included demographics and socio-economic status. CART models were fitted using a randomly selected 70% of the data (training dataset, and area under the receiver operating characteristic curve (AUC was computed using the remaining 30% of the sample for evaluation (testing dataset. CART models were fitted using the training dataset, the oversampled training dataset, the weighted training dataset, and the undersampled training dataset. In addition, resampling case-to-control ratio of 1:1, 1:2, and 1:4 were examined. Resampling methods on the performance of other extensions of CART (random forests and generalized boosted trees were also examined. CARTs fitted on the oversampled (AUC = 0.70 and undersampled training data (AUC = 0.74 yielded a better classification power than that on the training data (AUC = 0.65. Resampling could also improve the classification power of random forests and generalized boosted trees. To conclude, applying resampling methods in a class-imbalanced dataset improved the classification power of CART, random forests

  16. Internal consistency and validity of an observational method for assessing disability in mobility in patients with osteoarthritis.

    NARCIS (Netherlands)

    Steultjens, M.P.M.; Dekker, J.; Baar, M.E. van; Oostendorp, R.A.B.; Bijlsma, J.W.J.

    1999-01-01

    Objective: To establish the internal consistency of validity of an observational method for assessing diasbility in mobility in patients with osteoarthritis (OA), Methods: Data were obtained from 198 patients with OA of the hip or knee. Results of the observational method were compared with results

  17. Satellite-Based Precipitation Datasets

    Science.gov (United States)

    Munchak, S. J.; Huffman, G. J.

    2017-12-01

    Of the possible sources of precipitation data, those based on satellites provide the greatest spatial coverage. There is a wide selection of datasets, algorithms, and versions from which to choose, which can be confusing to non-specialists wishing to use the data. The International Precipitation Working Group (IPWG) maintains tables of the major publicly available, long-term, quasi-global precipitation data sets (http://www.isac.cnr.it/ ipwg/data/datasets.html), and this talk briefly reviews the various categories. As examples, NASA provides two sets of quasi-global precipitation data sets: the older Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis (TMPA) and current Integrated Multi-satellitE Retrievals for Global Precipitation Measurement (GPM) mission (IMERG). Both provide near-real-time and post-real-time products that are uniformly gridded in space and time. The TMPA products are 3-hourly 0.25°x0.25° on the latitude band 50°N-S for about 16 years, while the IMERG products are half-hourly 0.1°x0.1° on 60°N-S for over 3 years (with plans to go to 16+ years in Spring 2018). In addition to the precipitation estimates, each data set provides fields of other variables, such as the satellite sensor providing estimates and estimated random error. The discussion concludes with advice about determining suitability for use, the necessity of being clear about product names and versions, and the need for continued support for satellite- and surface-based observation.

  18. RARD: The Related-Article Recommendation Dataset

    OpenAIRE

    Beel, Joeran; Carevic, Zeljko; Schaible, Johann; Neusch, Gabor

    2017-01-01

    Recommender-system datasets are used for recommender-system evaluations, training machine-learning algorithms, and exploring user behavior. While there are many datasets for recommender systems in the domains of movies, books, and music, there are rather few datasets from research-paper recommender systems. In this paper, we introduce RARD, the Related-Article Recommendation Dataset, from the digital library Sowiport and the recommendation-as-a-service provider Mr. DLib. The dataset contains ...

  19. Internal consistency and validity of an observational method for assessing disability in mobility in patients with osteoarthritis

    NARCIS (Netherlands)

    Steultjens, M. P.; Dekker, J.; van Baar, M. E.; Oostendorp, R. A.; Bijlsma, J. W.

    1999-01-01

    To establish the internal consistency and validity of an observational method for assessing disability in mobility in patients with osteoarthritis (OA). Data were obtained from 198 patients with OA of the hip or knee. Results of the observational method were compared with results of self-report

  20. Isfahan MISP Dataset.

    Science.gov (United States)

    Kashefpur, Masoud; Kafieh, Rahele; Jorjandi, Sahar; Golmohammadi, Hadis; Khodabande, Zahra; Abbasi, Mohammadreza; Teifuri, Nilufar; Fakharzadeh, Ali Akbar; Kashefpoor, Maryam; Rabbani, Hossein

    2017-01-01

    An online depository was introduced to share clinical ground truth with the public and provide open access for researchers to evaluate their computer-aided algorithms. PHP was used for web programming and MySQL for database managing. The website was entitled "biosigdata.com." It was a fast, secure, and easy-to-use online database for medical signals and images. Freely registered users could download the datasets and could also share their own supplementary materials while maintaining their privacies (citation and fee). Commenting was also available for all datasets, and automatic sitemap and semi-automatic SEO indexing have been set for the site. A comprehensive list of available websites for medical datasets is also presented as a Supplementary (http://journalonweb.com/tempaccess/4800.584.JMSS_55_16I3253.pdf).

  1. Evaluation of Modified Categorical Data Fuzzy Clustering Algorithm on the Wisconsin Breast Cancer Dataset

    Directory of Open Access Journals (Sweden)

    Amir Ahmad

    2016-01-01

    Full Text Available The early diagnosis of breast cancer is an important step in a fight against the disease. Machine learning techniques have shown promise in improving our understanding of the disease. As medical datasets consist of data points which cannot be precisely assigned to a class, fuzzy methods have been useful for studying of these datasets. Sometimes breast cancer datasets are described by categorical features. Many fuzzy clustering algorithms have been developed for categorical datasets. However, in most of these methods Hamming distance is used to define the distance between the two categorical feature values. In this paper, we use a probabilistic distance measure for the distance computation among a pair of categorical feature values. Experiments demonstrate that the distance measure performs better than Hamming distance for Wisconsin breast cancer data.

  2. Astronaut Photography of the Earth: A Long-Term Dataset for Earth Systems Research, Applications, and Education

    Science.gov (United States)

    Stefanov, William L.

    2017-01-01

    The NASA Earth observations dataset obtained by humans in orbit using handheld film and digital cameras is freely accessible to the global community through the online searchable database at https://eol.jsc.nasa.gov, and offers a useful compliment to traditional ground-commanded sensor data. The dataset includes imagery from the NASA Mercury (1961) through present-day International Space Station (ISS) programs, and currently totals over 2.6 million individual frames. Geographic coverage of the dataset includes land and oceans areas between approximately 52 degrees North and South latitudes, but is spatially and temporally discontinuous. The photographic dataset includes some significant impediments for immediate research, applied, and educational use: commercial RGB films and camera systems with overlapping bandpasses; use of different focal length lenses, unconstrained look angles, and variable spacecraft altitudes; and no native geolocation information. Such factors led to this dataset being underutilized by the community but recent advances in automated and semi-automated image geolocation, image feature classification, and web-based services are adding new value to the astronaut-acquired imagery. A coupled ground software and on-orbit hardware system for the ISS is in development for planned deployment in mid-2017; this system will capture camera pose information for each astronaut photograph to allow automated, full georegistration of the data. The ground system component of the system is currently in use to fully georeference imagery collected in response to International Disaster Charter activations, and the auto-registration procedures are being applied to the extensive historical database of imagery to add value for research and educational purposes. In parallel, machine learning techniques are being applied to automate feature identification and classification throughout the dataset, in order to build descriptive metadata that will improve search

  3. Gene-Environment Interplay in Internalizing Disorders: Consistent Findings across Six Environmental Risk Factors

    Science.gov (United States)

    Hicks, Brian M.; DiRago, Ana C.; Iacono, William G.; McGue, Matt

    2009-01-01

    Background Newer behavior genetic methods can better elucidate gene-environment (G-E) interplay in the development of internalizing (INT) disorders (i.e., major depression and anxiety disorders). However, no study to date has conducted a comprehensive analysis examining multiple environmental risks with the purpose of delineating how general G-E mechanisms influence the development of INT disorders. Methods The sample consisted of 1315 male and female twin pairs participating in the age 17 assessment of the Minnesota Twin Family Study. Quantitative G-E interplay models were used to examine how genetic and environmental risk for INT disorders changes as a function of environmental context. Multiple measures and informants were employed to construct composite measures of INT disorders and 6 environmental risk factors including: stressful life events, mother-child and father-child relationship problems, antisocial and prosocial peer affiliation, and academic achievement and engagement. Results Significant moderation effects were detected between each environmental risk factor and INT such that in the context of greater environmental adversity, nonshared environmental factors became more important in the etiology of INT symptoms. Conclusion Our results are consistent with the interpretation that environmental stressors have a causative effect on the emergence of INT disorders. The consistency of our results suggests a general mechanism of environmental influence on INT disorders regardless of the specific form of environmental risk. PMID:19594836

  4. The eye-complaint questionnaire in a visual display unit work environment: Internal consistency and test-retest reliability

    NARCIS (Netherlands)

    Steenstra, Ivan A.; Sluiter, Judith K.; Frings-Dresen, Monique H. W.

    2009-01-01

    The internal consistency and test-retest reliability of a 10-item eye-complaint questionnaire (ECQ) were examined within a sample of office workers. Repeated within-subjects measures were performed within a single day and over intervals of 1 and 7 d. Questionnaires were completed by 96 workers (70%

  5. Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

    Directory of Open Access Journals (Sweden)

    Sai Kiranmayee Samudrala

    2015-01-01

    Full Text Available Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.

  6. Open University Learning Analytics dataset.

    Science.gov (United States)

    Kuzilek, Jakub; Hlosta, Martin; Zdrahal, Zdenek

    2017-11-28

    Learning Analytics focuses on the collection and analysis of learners' data to improve their learning experience by providing informed guidance and to optimise learning materials. To support the research in this area we have developed a dataset, containing data from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students' interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at https://analyse.kmi.open.ac.uk/open_dataset under a CC-BY 4.0 license.

  7. National Elevation Dataset

    Science.gov (United States)

    ,

    2002-01-01

    The National Elevation Dataset (NED) is a new raster product assembled by the U.S. Geological Survey. NED is designed to provide National elevation data in a seamless form with a consistent datum, elevation unit, and projection. Data corrections were made in the NED assembly process to minimize artifacts, perform edge matching, and fill sliver areas of missing data. NED has a resolution of one arc-second (approximately 30 meters) for the conterminous United States, Hawaii, Puerto Rico and the island territories and a resolution of two arc-seconds for Alaska. NED data sources have a variety of elevation units, horizontal datums, and map projections. In the NED assembly process the elevation values are converted to decimal meters as a consistent unit of measure, NAD83 is consistently used as horizontal datum, and all the data are recast in a geographic projection. Older DEM's produced by methods that are now obsolete have been filtered during the NED assembly process to minimize artifacts that are commonly found in data produced by these methods. Artifact removal greatly improves the quality of the slope, shaded-relief, and synthetic drainage information that can be derived from the elevation data. Figure 2 illustrates the results of this artifact removal filtering. NED processing also includes steps to adjust values where adjacent DEM's do not match well, and to fill sliver areas of missing data between DEM's. These processing steps ensure that NED has no void areas and artificial discontinuities have been minimized. The artifact removal filtering process does not eliminate all of the artifacts. In areas where the only available DEM is produced by older methods, then "striping" may still occur.

  8. Kernel-based discriminant feature extraction using a representative dataset

    Science.gov (United States)

    Li, Honglin; Sancho Gomez, Jose-Luis; Ahalt, Stanley C.

    2002-07-01

    Discriminant Feature Extraction (DFE) is widely recognized as an important pre-processing step in classification applications. Most DFE algorithms are linear and thus can only explore the linear discriminant information among the different classes. Recently, there has been several promising attempts to develop nonlinear DFE algorithms, among which is Kernel-based Feature Extraction (KFE). The efficacy of KFE has been experimentally verified by both synthetic data and real problems. However, KFE has some known limitations. First, KFE does not work well for strongly overlapped data. Second, KFE employs all of the training set samples during the feature extraction phase, which can result in significant computation when applied to very large datasets. Finally, KFE can result in overfitting. In this paper, we propose a substantial improvement to KFE that overcomes the above limitations by using a representative dataset, which consists of critical points that are generated from data-editing techniques and centroid points that are determined by using the Frequency Sensitive Competitive Learning (FSCL) algorithm. Experiments show that this new KFE algorithm performs well on significantly overlapped datasets, and it also reduces computational complexity. Further, by controlling the number of centroids, the overfitting problem can be effectively alleviated.

  9. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide

    DEFF Research Database (Denmark)

    Kissling, W. Daniel; Dalby, Lars; Fløjgaard, Camilla

    2014-01-01

    , the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals (“MammalDIET”). Diet information was digitized from two global...... species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally...... information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external validation showed that: (1) extrapolations were most reliable for primary food items; (2) several diet categories (“Animal”, “Mammal...

  10. Knowledge discovery with classification rules in a cardiovascular dataset.

    Science.gov (United States)

    Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

    2005-12-01

    In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.

  11. Structural covariance networks across healthy young adults and their consistency.

    Science.gov (United States)

    Guo, Xiaojuan; Wang, Yan; Guo, Taomei; Chen, Kewei; Zhang, Jiacai; Li, Ke; Jin, Zhen; Yao, Li

    2015-08-01

    To investigate structural covariance networks (SCNs) as measured by regional gray matter volumes with structural magnetic resonance imaging (MRI) from healthy young adults, and to examine their consistency and stability. Two independent cohorts were included in this study: Group 1 (82 healthy subjects aged 18-28 years) and Group 2 (109 healthy subjects aged 20-28 years). Structural MRI data were acquired at 3.0T and 1.5T using a magnetization prepared rapid-acquisition gradient echo sequence for these two groups, respectively. We applied independent component analysis (ICA) to construct SCNs and further applied the spatial overlap ratio and correlation coefficient to evaluate the spatial consistency of the SCNs between these two datasets. Seven and six independent components were identified for Group 1 and Group 2, respectively. Moreover, six SCNs including the posterior default mode network, the visual and auditory networks consistently existed across the two datasets. The overlap ratios and correlation coefficients of the visual network reached the maximums of 72% and 0.71. This study demonstrates the existence of consistent SCNs corresponding to general functional networks. These structural covariance findings may provide insight into the underlying organizational principles of brain anatomy. © 2014 Wiley Periodicals, Inc.

  12. Diffeomorphic Iterative Centroid Methods for Template Estimation on Large Datasets

    OpenAIRE

    Cury , Claire; Glaunès , Joan Alexis; Colliot , Olivier

    2014-01-01

    International audience; A common approach for analysis of anatomical variability relies on the stimation of a template representative of the population. The Large Deformation Diffeomorphic Metric Mapping is an attractive framework for that purpose. However, template estimation using LDDMM is computationally expensive, which is a limitation for the study of large datasets. This paper presents an iterative method which quickly provides a centroid of the population in the shape space. This centr...

  13. International standards for monoclonal antibodies to support pre- and post-marketing product consistency: Evaluation of a candidate international standard for the bioactivities of rituximab.

    Science.gov (United States)

    Prior, Sandra; Hufton, Simon E; Fox, Bernard; Dougall, Thomas; Rigsby, Peter; Bristow, Adrian

    2018-01-01

    The intrinsic complexity and heterogeneity of therapeutic monoclonal antibodies is built into the biosimilarity paradigm where critical quality attributes are controlled in exhaustive comparability studies with the reference medicinal product. The long-term success of biosimilars will depend on reassuring healthcare professionals and patients of consistent product quality, safety and efficacy. With this aim, the World Health Organization has endorsed the need for public bioactivity standards for therapeutic monoclonal antibodies in support of current controls. We have developed a candidate international potency standard for rituximab that was evaluated in a multi-center collaborative study using participants' own qualified Fc-effector function and cell-based binding bioassays. Dose-response curve model parameters were shown to reflect similar behavior amongst rituximab preparations, albeit with some differences in potency. In the absence of a common reference standard, potency estimates were in poor agreement amongst laboratories, but the use of the candidate preparation significantly reduced this variability. Our results suggest that the candidate rituximab standard can support bioassay performance and improve data harmonization, which when implemented will promote consistency of rituximab products over their life-cycles. This data provides the first scientific evidence that a classical standardization exercise allowing traceability of bioassay data to an international standard is also applicable to rituximab. However, we submit that this new type of international standard needs to be used appropriately and its role not to be mistaken with that of the reference medicinal product.

  14. Towards consistent and reliable Dutch and international energy statistics for the chemical industry

    International Nuclear Information System (INIS)

    Neelis, M.L.; Pouwelse, J.W.

    2008-01-01

    Consistent and reliable energy statistics are of vital importance for proper monitoring of energy-efficiency policies. In recent studies, irregularities have been reported in the Dutch energy statistics for the chemical industry. We studied in depth the company data that form the basis of the energy statistics in the Netherlands between 1995 and 2004 to find causes for these irregularities. We discovered that chemical products have occasionally been included, resulting in statistics with an inconsistent system boundary. Lack of guidance in the survey for the complex energy conversions in the chemical industry in the survey also resulted in large fluctuations for certain energy commodities. The findings of our analysis have been the basis for a new survey that has been used since 2007. We demonstrate that the annual questionnaire used for the international energy statistics can result in comparable problems as observed in the Netherlands. We suggest to include chemical residual gas as energy commodity in the questionnaire and to include the energy conversions in the chemical industry in the international energy statistics. In addition, we think the questionnaire should be explicit about the treatment of basic chemical products produced at refineries and in the petrochemical industry to avoid system boundary problems

  15. Mridangam stroke dataset

    OpenAIRE

    CompMusic

    2014-01-01

    The audio examples were recorded from a professional Carnatic percussionist in a semi-anechoic studio conditions by Akshay Anantapadmanabhan using SM-58 microphones and an H4n ZOOM recorder. The audio was sampled at 44.1 kHz and stored as 16 bit wav files. The dataset can be used for training models for each Mridangam stroke. /n/nA detailed description of the Mridangam and its strokes can be found in the paper below. A part of the dataset was used in the following paper. /nAkshay Anantapadman...

  16. Factor Structure, Internal Consistency, and Screening Sensitivity of the GARS-2 in a Developmental Disabilities Sample

    OpenAIRE

    Martin A. Volker; Elissa H. Dua; Christopher Lopata; Marcus L. Thomeer; Jennifer A. Toomey; Audrey M. Smerbeck; Jonathan D. Rodgers; Joshua R. Popkin; Andrew T. Nelson; Gloria K. Lee

    2016-01-01

    The Gilliam Autism Rating Scale-Second Edition (GARS-2) is a widely used screening instrument that assists in the identification and diagnosis of autism. The purpose of this study was to examine the factor structure, internal consistency, and screening sensitivity of the GARS-2 using ratings from special education teaching staff for a sample of 240 individuals with autism or other significant developmental disabilities. Exploratory factor analysis yielded a correlated three-factor solution si...

  17. 2008 TIGER/Line Nationwide Dataset

    Data.gov (United States)

    California Natural Resource Agency — This dataset contains a nationwide build of the 2008 TIGER/Line datasets from the US Census Bureau downloaded in April 2009. The TIGER/Line Shapefiles are an extract...

  18. Design of an audio advertisement dataset

    Science.gov (United States)

    Fu, Yutao; Liu, Jihong; Zhang, Qi; Geng, Yuting

    2015-12-01

    Since more and more advertisements swarm into radios, it is necessary to establish an audio advertising dataset which could be used to analyze and classify the advertisement. A method of how to establish a complete audio advertising dataset is presented in this paper. The dataset is divided into four different kinds of advertisements. Each advertisement's sample is given in *.wav file format, and annotated with a txt file which contains its file name, sampling frequency, channel number, broadcasting time and its class. The classifying rationality of the advertisements in this dataset is proved by clustering the different advertisements based on Principal Component Analysis (PCA). The experimental results show that this audio advertisement dataset offers a reliable set of samples for correlative audio advertisement experimental studies.

  19. Consistent errors in first strand cDNA due to random hexamer mispriming.

    Directory of Open Access Journals (Sweden)

    Thomas P van Gurp

    Full Text Available Priming of random hexamers in cDNA synthesis is known to show sequence bias, but in addition it has been suggested recently that mismatches in random hexamer priming could be a cause of mismatches between the original RNA fragment and observed sequence reads. To explore random hexamer mispriming as a potential source of these errors, we analyzed two independently generated RNA-seq datasets of synthetic ERCC spikes for which the reference is known. First strand cDNA synthesized by random hexamer priming on RNA showed consistent position and nucleotide-specific mismatch errors in the first seven nucleotides. The mismatch errors found in both datasets are consistent in distribution and thermodynamically stable mismatches are more common. This strongly indicates that RNA-DNA mispriming of specific random hexamers causes these errors. Due to their consistency and specificity, mispriming errors can have profound implications for downstream applications if not dealt with properly.

  20. SAR image dataset of military ground targets with multiple poses for ATR

    Science.gov (United States)

    Belloni, Carole; Balleri, Alessio; Aouf, Nabil; Merlet, Thomas; Le Caillec, Jean-Marc

    2017-10-01

    Automatic Target Recognition (ATR) is the task of automatically detecting and classifying targets. Recognition using Synthetic Aperture Radar (SAR) images is interesting because SAR images can be acquired at night and under any weather conditions, whereas optical sensors operating in the visible band do not have this capability. Existing SAR ATR algorithms have mostly been evaluated using the MSTAR dataset.1 The problem with the MSTAR is that some of the proposed ATR methods have shown good classification performance even when targets were hidden,2 suggesting the presence of a bias in the dataset. Evaluations of SAR ATR techniques are currently challenging due to the lack of publicly available data in the SAR domain. In this paper, we present a high resolution SAR dataset consisting of images of a set of ground military target models taken at various aspect angles, The dataset can be used for a fair evaluation and comparison of SAR ATR algorithms. We applied the Inverse Synthetic Aperture Radar (ISAR) technique to echoes from targets rotating on a turntable and illuminated with a stepped frequency waveform. The targets in the database consist of four variants of two 1.7m-long models of T-64 and T-72 tanks. The gun, the turret position and the depression angle are varied to form 26 different sequences of images. The emitted signal spanned the frequency range from 13 GHz to 18 GHz to achieve a bandwidth of 5 GHz sampled with 4001 frequency points. The resolution obtained with respect to the size of the model targets is comparable to typical values obtained using SAR airborne systems. Single polarized images (Horizontal-Horizontal) are generated using the backprojection algorithm.3 A total of 1480 images are produced using a 20° integration angle. The images in the dataset are organized in a suggested training and testing set to facilitate a standard evaluation of SAR ATR algorithms.

  1. International migration to and from the United Kingdom, 1975-1999: consistency, change and implications for the labour market.

    Science.gov (United States)

    Dobson, J; McLaughlan, G

    2001-01-01

    This article presents some findings of a recent study carried out for the Home Office by the Migration Research Unit (MRU) in the Department of Geography at UCL. The study was concerned with patterns and trends in international migration to and from the United Kingdom since 1975, with a particular focus on those in employment, and drew on many sources. The statistics analysed here derive from the International Passenger Survey, including hitherto unpublished tables provided by the Office for National Statistics on migration of the employed by citizenship. They indicate remarkable consistency in some aspects of migration flows and major change in others.

  2. Flood damage curves for consistent global risk assessments

    Science.gov (United States)

    de Moel, Hans; Huizinga, Jan; Szewczyk, Wojtek

    2016-04-01

    Assessing potential damage of flood events is an important component in flood risk management. Determining direct flood damage is commonly done using depth-damage curves, which denote the flood damage that would occur at specific water depths per asset or land-use class. Many countries around the world have developed flood damage models using such curves which are based on analysis of past flood events and/or on expert judgement. However, such damage curves are not available for all regions, which hampers damage assessments in those regions. Moreover, due to different methodologies employed for various damage models in different countries, damage assessments cannot be directly compared with each other, obstructing also supra-national flood damage assessments. To address these problems, a globally consistent dataset of depth-damage curves has been developed. This dataset contains damage curves depicting percent of damage as a function of water depth as well as maximum damage values for a variety of assets and land use classes (i.e. residential, commercial, agriculture). Based on an extensive literature survey concave damage curves have been developed for each continent, while differentiation in flood damage between countries is established by determining maximum damage values at the country scale. These maximum damage values are based on construction cost surveys from multinational construction companies, which provide a coherent set of detailed building cost data across dozens of countries. A consistent set of maximum flood damage values for all countries was computed using statistical regressions with socio-economic World Development Indicators from the World Bank. Further, based on insights from the literature survey, guidance is also given on how the damage curves and maximum damage values can be adjusted for specific local circumstances, such as urban vs. rural locations, use of specific building material, etc. This dataset can be used for consistent supra

  3. Felder-Soloman's Index of Learning Styles: internal consistency, temporal stability, and factor structure.

    Science.gov (United States)

    Hosford, Charles C; Siders, William A

    2010-10-01

    Strategies to facilitate learning include using knowledge of students' learning style preferences to inform students and their teachers. Aims of this study were to evaluate the factor structure, internal consistency, and temporal stability of medical student responses to the Index of Learning Styles (ILS) and determine its appropriateness as an instrument for medical education. The ILS assesses preferences on four dimensions: sensing/intuitive information perceiving, visual/verbal information receiving, active/reflective information processing, and sequential/global information understanding. Students entering the 2002-2007 classes completed the ILS; some completed the ILS again after 2 and 4 years. Analyses of responses supported the ILS's intended structure and moderate reliability. Students had moderate preferences for sensing and visual learning. This study provides evidence supporting the appropriateness of the ILS for assessing learning style preferences in medical students.

  4. The role of interactive control systems in obtaining internal consistency in the management control system package

    DEFF Research Database (Denmark)

    Toldbod, Thomas; Israelsen, Poul

    2014-01-01

    Companies rely on multiple Management Control Systems to obtain their short and long term objectives. When applying a multifaceted perspective on Management Control System the concept of internal consistency has been found to be important in obtaining goal congruency in the company. However, to d...... management is aware of this shortcoming they use the cybernetic controls more interactively to overcome this shortcoming, whereby the cybernetic controls are also used as a learning platform and not just for performance control....

  5. The GTZAN dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2013-01-01

    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge...... of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN...

  6. Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior.

    Science.gov (United States)

    Tsoi, Lam C; Qin, Tingting; Slate, Elizabeth H; Zheng, W Jim

    2011-11-11

    To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets. We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as AMIGO2, Gem, and CXCL11 that have not been shown to associate with, but may play roles in, metastasis. CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray

  7. Factorial Validity and Internal Consistency of Malaysian Adapted Depression Anxiety Stress Scale - 21 in an Adolescent Sample

    OpenAIRE

    Hairul Anuar Hashim; Freddy Golok; Rosmatunisah Ali

    2011-01-01

    Background: Psychometrically sound measurement instrument is a fundamental requirement across broad range of research areas. In negative affect research, Depression Anxiety Stress Scale (DASS) has been identified as a psychometrically sound instrument to measure depression, anxiety and stress, especially the 21-item version. However, its psychometric properties in adolescents have been less consistent. Objectives: Thus, the present study sought to examine the factorial validity and internal c...

  8. BLOND, a building-level office environment dataset of typical electrical appliances

    Science.gov (United States)

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-01

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  9. BLOND, a building-level office environment dataset of typical electrical appliances.

    Science.gov (United States)

    Kriechbaumer, Thomas; Jacobsen, Hans-Arno

    2018-03-27

    Energy metering has gained popularity as conventional meters are replaced by electronic smart meters that promise energy savings and higher comfort levels for occupants. Achieving these goals requires a deeper understanding of consumption patterns to reduce the energy footprint: load profile forecasting, power disaggregation, appliance identification, startup event detection, etc. Publicly available datasets are used to test, verify, and benchmark possible solutions to these problems. For this purpose, we present the BLOND dataset: continuous energy measurements of a typical office environment at high sampling rates with common appliances and load profiles. We provide voltage and current readings for aggregated circuits and matching fully-labeled ground truth data (individual appliance measurements). The dataset contains 53 appliances (16 classes) in a 3-phase power grid. BLOND-50 contains 213 days of measurements sampled at 50kSps (aggregate) and 6.4kSps (individual appliances). BLOND-250 consists of the same setup: 50 days, 250kSps (aggregate), 50kSps (individual appliances). These are the longest continuous measurements at such high sampling rates and fully-labeled ground truth we are aware of.

  10. Reliability, factor analysis and internal consistency calculation of the Insomnia Severity Index (ISI) in French and in English among Lebanese adolescents.

    Science.gov (United States)

    Chahoud, M; Chahine, R; Salameh, P; Sauleau, E A

    2017-06-01

    Our goal is to validate and to verify the reliability of the French and English versions of the Insomnia Severity Index (ISI) in Lebanese adolescents. A cross-sectional study was implemented. 104 Lebanese students aged between 14 and 19 years participated in the study. The English version of the questionnaire was distributed to English-speaking students and the French version was administered to French-speaking students. A scale (1 to 7 with 1 = very well understood and 7 = not at all) was used to identify the level of the students' understanding of each instruction, question and answer of the ISI. The scale's structural validity was assessed. The factor structure of ISI was evaluated by principal component analysis. The internal consistency of this scale was evaluated by Cronbach's alpha. To assess test-retest reliability the intraclass correlation coefficient (ICC) was used. The principal component analysis confirmed the presence of a two-component factor structure in the English version and a three-component factor structure in the French version with eigenvalues > 1. The English version of the ISI had an excellent internal consistency (α = 0.90), while the French version had a good internal consistency (α = 0.70). The ICC presented an excellent agreement in the French version (ICC = 0.914, CI = 0.856-0.949) and a good agreement in the English one (ICC = 0.762, CI = 0.481-890). The Bland-Altman plots of the two versions of the ISI showed that the responses over two weeks' were comparable and very few outliers were detected. The results of our analyses reveal that both English and French versions of the ISI scale have good internal consistency and are reproducible and reliable. Therefore, it can be used to assess the prevalence of insomnia in Lebanese adolescents.

  11. A new dataset validation system for the Planetary Science Archive

    Science.gov (United States)

    Manaud, N.; Zender, J.; Heather, D.; Martinez, S.

    2007-08-01

    The Planetary Science Archive is the official archive for the Mars Express mission. It has received its first data by the end of 2004. These data are delivered by the PI teams to the PSA team as datasets, which are formatted conform to the Planetary Data System (PDS). The PI teams are responsible for analyzing and calibrating the instrument data as well as the production of reduced and calibrated data. They are also responsible of the scientific validation of these data. ESA is responsible of the long-term data archiving and distribution to the scientific community and must ensure, in this regard, that all archived products meet quality. To do so, an archive peer-review is used to control the quality of the Mars Express science data archiving process. However a full validation of its content is missing. An independent review board recently recommended that the completeness of the archive as well as the consistency of the delivered data should be validated following well-defined procedures. A new validation software tool is being developed to complete the overall data quality control system functionality. This new tool aims to improve the quality of data and services provided to the scientific community through the PSA, and shall allow to track anomalies in and to control the completeness of datasets. It shall ensure that the PSA end-users: (1) can rely on the result of their queries, (2) will get data products that are suitable for scientific analysis, (3) can find all science data acquired during a mission. We defined dataset validation as the verification and assessment process to check the dataset content against pre-defined top-level criteria, which represent the general characteristics of good quality datasets. The dataset content that is checked includes the data and all types of information that are essential in the process of deriving scientific results and those interfacing with the PSA database. The validation software tool is a multi-mission tool that

  12. Internal consistency and construct validity of the Quality of Life in Alzheimer's Disease (QoL-AD) proxy – a secondary data analysis

    Science.gov (United States)

    Hylla, Jonas; Schwab, Christian G G; Isfort, Michael; Halek, Margareta; Dichter, Martin N

    2016-07-01

    Background: The maintenance and promotion of Quality of Life (QoL) of people with dementia is a major outcome in intervention studies and health care. The Quality of Life Alzheimer's Disease (QoL-AD) is an internationally recommended QoL measurement also available in German language. Until now, only a few results on the psychometric properties of the German QoL-AD were available. Objective: Evaluation of internal consistency and construct validity of the QoL-AD proxy. Method: A principal component analysis (secondary data analysis) of the 13 QoL-AD items was carried out based on the total sample of 234 people with dementia from nine nursing homes in Germany. Subsequently, the internal consistency of the identified factors was examined using Cronbach's alpha. Results: Two factors physical and mental health and social network were determined. Both factors explain 53 % of the total variance. The stability of both factors was validated in two sensitivity analyses. The internal consistency is good for both factors with a Cronbach's alpha of 0.88 (physical and mental health) and 0.75 (social network). Conclusion: The QoL-AD proxy allows the assessment of two relevant health-related QoL domains of people with dementia. However, in future studies especially the inter-rater reliability of the QoL-AD proxy has to be examined.

  13. CLARA-A1: a cloud, albedo, and radiation dataset from 28 yr of global AVHRR data

    Directory of Open Access Journals (Sweden)

    K.-G. Karlsson

    2013-05-01

    Full Text Available A new satellite-derived climate dataset – denoted CLARA-A1 ("The CM SAF cLoud, Albedo and RAdiation dataset from AVHRR data" – is described. The dataset covers the 28 yr period from 1982 until 2009 and consists of cloud, surface albedo, and radiation budget products derived from the AVHRR (Advanced Very High Resolution Radiometer sensor carried by polar-orbiting operational meteorological satellites. Its content, anticipated accuracies, limitations, and potential applications are described. The dataset is produced by the EUMETSAT Climate Monitoring Satellite Application Facility (CM SAF project. The dataset has its strengths in the long duration, its foundation upon a homogenized AVHRR radiance data record, and in some unique features, e.g. the availability of 28 yr of summer surface albedo and cloudiness parameters over the polar regions. Quality characteristics are also well investigated and particularly useful results can be found over the tropics, mid to high latitudes and over nearly all oceanic areas. Being the first CM SAF dataset of its kind, an intensive evaluation of the quality of the datasets was performed and major findings with regard to merits and shortcomings of the datasets are reported. However, the CM SAF's long-term commitment to perform two additional reprocessing events within the time frame 2013–2018 will allow proper handling of limitations as well as upgrading the dataset with new features (e.g. uncertainty estimates and extension of the temporal coverage.

  14. Check your biosignals here: a new dataset for off-the-person ECG biometrics.

    Science.gov (United States)

    da Silva, Hugo Plácido; Lourenço, André; Fred, Ana; Raposo, Nuno; Aires-de-Sousa, Marta

    2014-02-01

    The Check Your Biosignals Here initiative (CYBHi) was developed as a way of creating a dataset and consistently repeatable acquisition framework, to further extend research in electrocardiographic (ECG) biometrics. In particular, our work targets the novel trend towards off-the-person data acquisition, which opens a broad new set of challenges and opportunities both for research and industry. While datasets with ECG signals collected using medical grade equipment at the chest can be easily found, for off-the-person ECG data the solution is generally for each team to collect their own corpus at considerable expense of resources. In this paper we describe the context, experimental considerations, methods, and preliminary findings of two public datasets created by our team, one for short-term and another for long-term assessment, with ECG data collected at the hand palms and fingers. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  15. IPCC IS92 Emissions Scenarios (A, B, C, D, E, F) Dataset Version 1.1

    Data.gov (United States)

    National Aeronautics and Space Administration — The Intergovernmental Panel on Climate Change (IPCC) IS92 Emissions Scenarios (A, B, C, D, E, F) Dataset Version 1.1 consists of six global and regional greenhouse...

  16. International Boundary United States Mexico Minute 315

    Data.gov (United States)

    International Boundary & Water Commission — This dataset was created to provide resource managers, public officials, researchers, and the general public with ready access to the location of the international...

  17. The French Muséum national d'histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-02-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d'histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments.

  18. The French Muséum national d’histoire naturelle vascular plant herbarium collection dataset

    Science.gov (United States)

    Le Bras, Gwenaël; Pignal, Marc; Jeanson, Marc L.; Muller, Serge; Aupic, Cécile; Carré, Benoît; Flament, Grégoire; Gaudeul, Myriam; Gonçalves, Claudia; Invernón, Vanessa R.; Jabbour, Florian; Lerat, Elodie; Lowry, Porter P.; Offroy, Bérangère; Pimparé, Eva Pérez; Poncy, Odile; Rouhan, Germinal; Haevermans, Thomas

    2017-01-01

    We provide a quantitative description of the French national herbarium vascular plants collection dataset. Held at the Muséum national d’histoire naturelle, Paris, it currently comprises records for 5,400,000 specimens, representing 90% of the estimated total of specimens. Ninety nine percent of the specimen entries are linked to one or more images and 16% have field-collecting information available. This major botanical collection represents the results of over three centuries of exploration and study. The sources of the collection are global, with a strong representation for France, including overseas territories, and former French colonies. The compilation of this dataset was made possible through numerous national and international projects, the most important of which was linked to the renovation of the herbarium building. The vascular plant collection is actively expanding today, hence the continuous growth exhibited by the dataset, which can be fully accessed through the GBIF portal or the MNHN database portal (available at: https://science.mnhn.fr/institution/mnhn/collection/p/item/search/form). This dataset is a major source of data for systematics, global plants macroecological studies or conservation assessments. PMID:28195585

  19. Does standardised structured reporting contribute to quality in diagnostic pathology? The importance of evidence-based datasets.

    Science.gov (United States)

    Ellis, D W; Srigley, J

    2016-01-01

    Key quality parameters in diagnostic pathology include timeliness, accuracy, completeness, conformance with current agreed standards, consistency and clarity in communication. In this review, we argue that with worldwide developments in eHealth and big data, generally, there are two further, often overlooked, parameters if our reports are to be fit for purpose. Firstly, population-level studies have clearly demonstrated the value of providing timely structured reporting data in standardised electronic format as part of system-wide quality improvement programmes. Moreover, when combined with multiple health data sources through eHealth and data linkage, structured pathology reports become central to population-level quality monitoring, benchmarking, interventions and benefit analyses in public health management. Secondly, population-level studies, particularly for benchmarking, require a single agreed international and evidence-based standard to ensure interoperability and comparability. This has been taken for granted in tumour classification and staging for many years, yet international standardisation of cancer datasets is only now underway through the International Collaboration on Cancer Reporting (ICCR). In this review, we present evidence supporting the role of structured pathology reporting in quality improvement for both clinical care and population-level health management. Although this review of available evidence largely relates to structured reporting of cancer, it is clear that the same principles can be applied throughout anatomical pathology generally, as they are elsewhere in the health system.

  20. On sample size and different interpretations of snow stability datasets

    Science.gov (United States)

    Schirmer, M.; Mitterer, C.; Schweizer, J.

    2009-04-01

    Interpretations of snow stability variations need an assessment of the stability itself, independent of the scale investigated in the study. Studies on stability variations at a regional scale have often chosen stability tests such as the Rutschblock test or combinations of various tests in order to detect differences in aspect and elevation. The question arose: ‘how capable are such stability interpretations in drawing conclusions'. There are at least three possible errors sources: (i) the variance of the stability test itself; (ii) the stability variance at an underlying slope scale, and (iii) that the stability interpretation might not be directly related to the probability of skier triggering. Various stability interpretations have been proposed in the past that provide partly different results. We compared a subjective one based on expert knowledge with a more objective one based on a measure derived from comparing skier-triggered slopes vs. slopes that have been skied but not triggered. In this study, the uncertainties are discussed and their effects on regional scale stability variations will be quantified in a pragmatic way. An existing dataset with very large sample sizes was revisited. This dataset contained the variance of stability at a regional scale for several situations. The stability in this dataset was determined using the subjective interpretation scheme based on expert knowledge. The question to be answered was how many measurements were needed to obtain similar results (mainly stability differences in aspect or elevation) as with the complete dataset. The optimal sample size was obtained in several ways: (i) assuming a nominal data scale the sample size was determined with a given test, significance level and power, and by calculating the mean and standard deviation of the complete dataset. With this method it can also be determined if the complete dataset consists of an appropriate sample size. (ii) Smaller subsets were created with similar

  1. The Geometry of Finite Equilibrium Datasets

    DEFF Research Database (Denmark)

    Balasko, Yves; Tvede, Mich

    We investigate the geometry of finite datasets defined by equilibrium prices, income distributions, and total resources. We show that the equilibrium condition imposes no restrictions if total resources are collinear, a property that is robust to small perturbations. We also show that the set...... of equilibrium datasets is pathconnected when the equilibrium condition does impose restrictions on datasets, as for example when total resources are widely non collinear....

  2. The International Human Epigenome Consortium Data Portal.

    Science.gov (United States)

    Bujold, David; Morais, David Anderson de Lima; Gauthier, Carol; Côté, Catherine; Caron, Maxime; Kwan, Tony; Chen, Kuang Chung; Laperle, Jonathan; Markovits, Alexei Nordell; Pastinen, Tomi; Caron, Bryan; Veilleux, Alain; Jacques, Pierre-Étienne; Bourque, Guillaume

    2016-11-23

    The International Human Epigenome Consortium (IHEC) coordinates the production of reference epigenome maps through the characterization of the regulome, methylome, and transcriptome from a wide range of tissues and cell types. To define conventions ensuring the compatibility of datasets and establish an infrastructure enabling data integration, analysis, and sharing, we developed the IHEC Data Portal (http://epigenomesportal.ca/ihec). The portal provides access to >7,000 reference epigenomic datasets, generated from >600 tissues, which have been contributed by seven international consortia: ENCODE, NIH Roadmap, CEEHRC, Blueprint, DEEP, AMED-CREST, and KNIH. The portal enhances the utility of these reference maps by facilitating the discovery, visualization, analysis, download, and sharing of epigenomics data. The IHEC Data Portal is the official source to navigate through IHEC datasets and represents a strategy for unifying the distributed data produced by international research consortia. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.

  3. 26 CFR 301.6224(c)-3 - Consistent settlements.

    Science.gov (United States)

    2010-04-01

    ... 26 Internal Revenue 18 2010-04-01 2010-04-01 false Consistent settlements. 301.6224(c)-3 Section... settlements. (a) In general. If the Internal Revenue Service enters into a settlement agreement with any..., settlement terms consistent with those contained in the settlement agreement entered into. (b) Requirements...

  4. Validity and reliability of stillbirth data using linked self-reported and administrative datasets.

    Science.gov (United States)

    Hure, Alexis J; Chojenta, Catherine L; Powers, Jennifer R; Byles, Julie E; Loxton, Deborah

    2015-01-01

    A high rate of stillbirth was previously observed in the Australian Longitudinal Study of Women's Health (ALSWH). Our primary objective was to test the validity and reliability of self-reported stillbirth data linked to state-based administrative datasets. Self-reported data, collected as part of the ALSWH cohort born in 1973-1978, were linked to three administrative datasets for women in New South Wales, Australia (n = 4374): the Midwives Data Collection; Admitted Patient Data Collection; and Perinatal Death Review Database. Linkages were obtained from the Centre for Health Record Linkage for the period 1996-2009. True cases of stillbirth were defined by being consistently recorded in two or more independent data sources. Sensitivity, specificity, positive predictive value, negative predictive value, percent agreement, and kappa statistics were calculated for each dataset. Forty-nine women reported 53 stillbirths. No dataset was 100% accurate. The administrative datasets performed better than self-reported data, with high accuracy and agreement. Self-reported data showed high sensitivity (100%) but low specificity (30%), meaning women who had a stillbirth always reported it, but there was also over-reporting of stillbirths. About half of the misreported cases in the ALSWH were able to be removed by identifying inconsistencies in longitudinal data. Data linkage provides great opportunity to assess the validity and reliability of self-reported study data. Conversely, self-reported study data can help to resolve inconsistencies in administrative datasets. Quantifying the strengths and limitations of both self-reported and administrative data can improve epidemiological research, especially by guiding methods and interpretation of findings.

  5. FY13 Summary Report on the Augmentation of the Spent Fuel Composition Dataset for Nuclear Forensics: SFCOMPO/NF

    Energy Technology Data Exchange (ETDEWEB)

    Brady Raap, Michaele C.; Lyons, Jennifer A.; Collins, Brian A.; Livingston, James V.

    2014-03-31

    This report documents the FY13 efforts to enhance a dataset of spent nuclear fuel isotopic composition data for use in developing intrinsic signatures for nuclear forensics. A review and collection of data from the open literature was performed in FY10. In FY11, the Spent Fuel COMPOsition (SFCOMPO) excel-based dataset for nuclear forensics (NF), SFCOMPO/NF was established and measured data for graphite production reactors, Boiling Water Reactors (BWRs) and Pressurized Water Reactors (PWRs) were added to the dataset and expanded to include a consistent set of data simulated by calculations. A test was performed to determine whether the SFCOMPO/NF dataset will be useful for the analysis and identification of reactor types from isotopic ratios observed in interdicted samples.

  6. Water availability and agricultural demand: An assessment framework using global datasets in a data scarce catchment, Rokel-Seli River, Sierra Leone

    Directory of Open Access Journals (Sweden)

    Christopher K. Masafu

    2016-12-01

    New hydrological insights: We find that the hydrological model capably simulates both low and high flows satisfactorily, and that all the input datasets consistently produce similar results for water withdrawal scenarios. The proposed framework is successfully applied to assess the variability of flows available for abstraction against agricultural demand. The assessment framework conclusions are robust despite the different input datasets and calibration scenarios tested, and can be extended to include other global input datasets.

  7. Five-Item Francis Scale of Attitude toward Christianity: Construct and Nomological Validity and Internal Consistency among Colombian College Students

    Science.gov (United States)

    Ceballos, Guillermo A.; Suescun, Jesus D.; Oviedo, Heidi C.; Herazo, Edwin; Campo-Arias, Adalberto

    2015-01-01

    The Spanish version of the five-item Francis scale of attitude toward Christianity is a refinement of the short version of the Francis scale of attitude toward Christianity. The scale is a good measurement for intrinsic religiosity. It has been applied previously among Colombian adolescent students. The internal consistency and construct and…

  8. A robust post-processing workflow for datasets with motion artifacts in diffusion kurtosis imaging.

    Science.gov (United States)

    Li, Xianjun; Yang, Jian; Gao, Jie; Luo, Xue; Zhou, Zhenyu; Hu, Yajie; Wu, Ed X; Wan, Mingxi

    2014-01-01

    The aim of this study was to develop a robust post-processing workflow for motion-corrupted datasets in diffusion kurtosis imaging (DKI). The proposed workflow consisted of brain extraction, rigid registration, distortion correction, artifacts rejection, spatial smoothing and tensor estimation. Rigid registration was utilized to correct misalignments. Motion artifacts were rejected by using local Pearson correlation coefficient (LPCC). The performance of LPCC in characterizing relative differences between artifacts and artifact-free images was compared with that of the conventional correlation coefficient in 10 randomly selected DKI datasets. The influence of rejected artifacts with information of gradient directions and b values for the parameter estimation was investigated by using mean square error (MSE). The variance of noise was used as the criterion for MSEs. The clinical practicality of the proposed workflow was evaluated by the image quality and measurements in regions of interest on 36 DKI datasets, including 18 artifact-free (18 pediatric subjects) and 18 motion-corrupted datasets (15 pediatric subjects and 3 essential tremor patients). The relative difference between artifacts and artifact-free images calculated by LPCC was larger than that of the conventional correlation coefficient (pworkflow improved the image quality and reduced the measurement biases significantly on motion-corrupted datasets (pworkflow was reliable to improve the image quality and the measurement precision of the derived parameters on motion-corrupted DKI datasets. The workflow provided an effective post-processing method for clinical applications of DKI in subjects with involuntary movements.

  9. SIMADL: Simulated Activities of Daily Living Dataset

    Directory of Open Access Journals (Sweden)

    Talal Alshammari

    2018-04-01

    Full Text Available With the realisation of the Internet of Things (IoT paradigm, the analysis of the Activities of Daily Living (ADLs, in a smart home environment, is becoming an active research domain. The existence of representative datasets is a key requirement to advance the research in smart home design. Such datasets are an integral part of the visualisation of new smart home concepts as well as the validation and evaluation of emerging machine learning models. Machine learning techniques that can learn ADLs from sensor readings are used to classify, predict and detect anomalous patterns. Such techniques require data that represent relevant smart home scenarios, for training, testing and validation. However, the development of such machine learning techniques is limited by the lack of real smart home datasets, due to the excessive cost of building real smart homes. This paper provides two datasets for classification and anomaly detection. The datasets are generated using OpenSHS, (Open Smart Home Simulator, which is a simulation software for dataset generation. OpenSHS records the daily activities of a participant within a virtual environment. Seven participants simulated their ADLs for different contexts, e.g., weekdays, weekends, mornings and evenings. Eighty-four files in total were generated, representing approximately 63 days worth of activities. Forty-two files of classification of ADLs were simulated in the classification dataset and the other forty-two files are for anomaly detection problems in which anomalous patterns were simulated and injected into the anomaly detection dataset.

  10. The NOAA Dataset Identifier Project

    Science.gov (United States)

    de la Beaujardiere, J.; Mccullough, H.; Casey, K. S.

    2013-12-01

    The US National Oceanic and Atmospheric Administration (NOAA) initiated a project in 2013 to assign persistent identifiers to datasets archived at NOAA and to create informational landing pages about those datasets. The goals of this project are to enable the citation of datasets used in products and results in order to help provide credit to data producers, to support traceability and reproducibility, and to enable tracking of data usage and impact. A secondary goal is to encourage the submission of datasets for long-term preservation, because only archived datasets will be eligible for a NOAA-issued identifier. A team was formed with representatives from the National Geophysical, Oceanographic, and Climatic Data Centers (NGDC, NODC, NCDC) to resolve questions including which identifier scheme to use (answer: Digital Object Identifier - DOI), whether or not to embed semantics in identifiers (no), the level of granularity at which to assign identifiers (as coarsely as reasonable), how to handle ongoing time-series data (do not break into chunks), creation mechanism for the landing page (stylesheet from formal metadata record preferred), and others. Decisions made and implementation experience gained will inform the writing of a Data Citation Procedural Directive to be issued by the Environmental Data Management Committee in 2014. Several identifiers have been issued as of July 2013, with more on the way. NOAA is now reporting the number as a metric to federal Open Government initiatives. This paper will provide further details and status of the project.

  11. Digital Astronaut Photography: A Discovery Dataset for Archaeology

    Science.gov (United States)

    Stefanov, William L.

    2010-01-01

    Astronaut photography acquired from the International Space Station (ISS) using commercial off-the-shelf cameras offers a freely-accessible source for high to very high resolution (4-20 m/pixel) visible-wavelength digital data of Earth. Since ISS Expedition 1 in 2000, over 373,000 images of the Earth-Moon system (including land surface, ocean, atmospheric, and lunar images) have been added to the Gateway to Astronaut Photography of Earth online database (http://eol.jsc.nasa.gov ). Handheld astronaut photographs vary in look angle, time of acquisition, solar illumination, and spatial resolution. These attributes of digital astronaut photography result from a unique combination of ISS orbital dynamics, mission operations, camera systems, and the individual skills of the astronaut. The variable nature of astronaut photography makes the dataset uniquely useful for archaeological applications in comparison with more traditional nadir-viewing multispectral datasets acquired from unmanned orbital platforms. For example, surface features such as trenches, walls, ruins, urban patterns, and vegetation clearing and regrowth patterns may be accentuated by low sun angles and oblique viewing conditions (Fig. 1). High spatial resolution digital astronaut photographs can also be used with sophisticated land cover classification and spatial analysis approaches like Object Based Image Analysis, increasing the potential for use in archaeological characterization of landscapes and specific sites.

  12. ISC-EHB: Reconstruction of a robust earthquake dataset

    Science.gov (United States)

    Weston, J.; Engdahl, E. R.; Harris, J.; Di Giacomo, D.; Storchak, D. A.

    2018-04-01

    The EHB Bulletin of hypocentres and associated travel-time residuals was originally developed with procedures described by Engdahl, Van der Hilst and Buland (1998) and currently ends in 2008. It is a widely used seismological dataset, which is now expanded and reconstructed, partly by exploiting updated procedures at the International Seismological Centre (ISC), to produce the ISC-EHB. The reconstruction begins in the modern period (2000-2013) to which new and more rigorous procedures for event selection, data preparation, processing, and relocation are applied. The selection criteria minimise the location bias produced by unmodelled 3D Earth structure, resulting in events that are relatively well located in any given region. Depths of the selected events are significantly improved by a more comprehensive review of near station and secondary phase travel-time residuals based on ISC data, especially for the depth phases pP, pwP and sP, as well as by a rigorous review of the event depths in subduction zone cross sections. The resulting cross sections and associated maps are shown to provide details of seismicity in subduction zones in much greater detail than previously achievable. The new ISC-EHB dataset will be especially useful for global seismicity studies and high-frequency regional and global tomographic inversions.

  13. Creating a Regional MODIS Satellite-Driven Net Primary Production Dataset for European Forests

    Directory of Open Access Journals (Sweden)

    Mathias Neumann

    2016-06-01

    Full Text Available Net primary production (NPP is an important ecological metric for studying forest ecosystems and their carbon sequestration, for assessing the potential supply of food or timber and quantifying the impacts of climate change on ecosystems. The global MODIS NPP dataset using the MOD17 algorithm provides valuable information for monitoring NPP at 1-km resolution. Since coarse-resolution global climate data are used, the global dataset may contain uncertainties for Europe. We used a 1-km daily gridded European climate data set with the MOD17 algorithm to create the regional NPP dataset MODIS EURO. For evaluation of this new dataset, we compare MODIS EURO with terrestrial driven NPP from analyzing and harmonizing forest inventory data (NFI from 196,434 plots in 12 European countries as well as the global MODIS NPP dataset for the years 2000 to 2012. Comparing these three NPP datasets, we found that the global MODIS NPP dataset differs from NFI NPP by 26%, while MODIS EURO only differs by 7%. MODIS EURO also agrees with NFI NPP across scales (from continental, regional to country and gradients (elevation, location, tree age, dominant species, etc.. The agreement is particularly good for elevation, dominant species or tree height. This suggests that using improved climate data allows the MOD17 algorithm to provide realistic NPP estimates for Europe. Local discrepancies between MODIS EURO and NFI NPP can be related to differences in stand density due to forest management and the national carbon estimation methods. With this study, we provide a consistent, temporally continuous and spatially explicit productivity dataset for the years 2000 to 2012 on a 1-km resolution, which can be used to assess climate change impacts on ecosystems or the potential biomass supply of the European forests for an increasing bio-based economy. MODIS EURO data are made freely available at ftp://palantir.boku.ac.at/Public/MODIS_EURO.

  14. Benchmarking Deep Learning Models on Large Healthcare Datasets.

    Science.gov (United States)

    Purushotham, Sanjay; Meng, Chuizheng; Che, Zhengping; Liu, Yan

    2018-06-04

    Deep learning models (aka Deep Neural Networks) have revolutionized many fields including computer vision, natural language processing, speech recognition, and is being increasingly used in clinical healthcare applications. However, few works exist which have benchmarked the performance of the deep learning models with respect to the state-of-the-art machine learning models and prognostic scoring systems on publicly available healthcare datasets. In this paper, we present the benchmarking results for several clinical prediction tasks such as mortality prediction, length of stay prediction, and ICD-9 code group prediction using Deep Learning models, ensemble of machine learning models (Super Learner algorithm), SAPS II and SOFA scores. We used the Medical Information Mart for Intensive Care III (MIMIC-III) (v1.4) publicly available dataset, which includes all patients admitted to an ICU at the Beth Israel Deaconess Medical Center from 2001 to 2012, for the benchmarking tasks. Our results show that deep learning models consistently outperform all the other approaches especially when the 'raw' clinical time series data is used as input features to the models. Copyright © 2018 Elsevier Inc. All rights reserved.

  15. Control Measure Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — The EPA Control Measure Dataset is a collection of documents describing air pollution control available to regulated facilities for the control and abatement of air...

  16. The Kinetics Human Action Video Dataset

    OpenAIRE

    Kay, Will; Carreira, Joao; Simonyan, Karen; Zhang, Brian; Hillier, Chloe; Vijayanarasimhan, Sudheendra; Viola, Fabio; Green, Tim; Back, Trevor; Natsev, Paul; Suleyman, Mustafa; Zisserman, Andrew

    2017-01-01

    We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action. Each clip lasts around 10s and is taken from a different YouTube video. The actions are human focussed and cover a broad range of classes including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands. We describe the statistics of the dataset, how it was collected, and give some ...

  17. Temporal and Geographic variation in the validity and internal consistency of the Nursing Home Resident Assessment Minimum Data Set 2.0.

    Science.gov (United States)

    Mor, Vincent; Intrator, Orna; Unruh, Mark Aaron; Cai, Shubing

    2011-04-15

    The Minimum Data Set (MDS) for nursing home resident assessment has been required in all U.S. nursing homes since 1990 and has been universally computerized since 1998. Initially intended to structure clinical care planning, uses of the MDS expanded to include policy applications such as case-mix reimbursement, quality monitoring and research. The purpose of this paper is to summarize a series of analyses examining the internal consistency and predictive validity of the MDS data as used in the "real world" in all U.S. nursing homes between 1999 and 2007. We used person level linked MDS and Medicare denominator and all institutional claim files including inpatient (hospital and skilled nursing facilities) for all Medicare fee-for-service beneficiaries entering U.S. nursing homes during the period 1999 to 2007. We calculated the sensitivity and positive predictive value (PPV) of diagnoses taken from Medicare hospital claims and from the MDS among all new admissions from hospitals to nursing homes and the internal consistency (alpha reliability) of pairs of items within the MDS that logically should be related. We also tested the internal consistency of commonly used MDS based multi-item scales and examined the predictive validity of an MDS based severity measure viz. one year survival. Finally, we examined the correspondence of the MDS discharge record to hospitalizations and deaths seen in Medicare claims, and the completeness of MDS assessments upon skilled nursing facility (SNF) admission. Each year there were some 800,000 new admissions directly from hospital to US nursing homes and some 900,000 uninterrupted SNF stays. Comparing Medicare enrollment records and claims with MDS records revealed reasonably good correspondence that improved over time (by 2006 only 3% of deaths had no MDS discharge record, only 5% of SNF stays had no MDS, but over 20% of MDS discharges indicating hospitalization had no associated Medicare claim). The PPV and sensitivity levels of

  18. The test of variables of attention (TOVA): Internal consistency (Q1 vs. Q2 and Q3 vs. Q4) in children with Attention Deficit/Hyperactivity Disorder (ADHD)

    Science.gov (United States)

    The internal consistency of the Test of Variables of Attention (TOVA) was examined in a cohort of 6- to 12-year-old children (N = 63) strictly diagnosed with ADHD. The internal consistency of errors of omission (OMM), errors of commission (COM), response time (RT), and response time variability (RTV...

  19. Comparison of CORA and EN4 in-situ datasets validation methods, toward a better quality merged dataset.

    Science.gov (United States)

    Szekely, Tanguy; Killick, Rachel; Gourrion, Jerome; Reverdin, Gilles

    2017-04-01

    CORA and EN4 are both global delayed time mode validated in-situ ocean temperature and salinity datasets distributed by the Met Office (http://www.metoffice.gov.uk/) and Copernicus (www.marine.copernicus.eu). A large part of the profiles distributed by CORA and EN4 in recent years are Argo profiles from the ARGO DAC, but profiles are also extracted from the World Ocean Database and TESAC profiles from GTSPP. In the case of CORA, data coming from the EUROGOOS Regional operationnal oserving system( ROOS) operated by European institutes no managed by National Data Centres and other datasets of profiles povided by scientific sources can also be found (Sea mammals profiles from MEOP, XBT datasets from cruises ...). (EN4 also takes data from the ASBO dataset to supplement observations in the Arctic). First advantage of this new merge product is to enhance the space and time coverage at global and european scales for the period covering 1950 till a year before the current year. This product is updated once a year and T&S gridded fields are alos generated for the period 1990-year n-1. The enhancement compared to the revious CORA product will be presented Despite the fact that the profiles distributed by both datasets are mostly the same, the quality control procedures developed by the Met Office and Copernicus teams differ, sometimes leading to different quality control flags for the same profile. Started in 2016 a new study started that aims to compare both validation procedures to move towards a Copernicus Marine Service dataset with the best features of CORA and EN4 validation.A reference data set composed of the full set of in-situ temperature and salinity measurements collected by Coriolis during 2015 is used. These measurements have been made thanks to wide range of instruments (XBTs, CTDs, Argo floats, Instrumented sea mammals,...), covering the global ocean. The reference dataset has been validated simultaneously by both teams.An exhaustive comparison of the

  20. Determining the Feasibility, Content Validity, and Internal Consistency of a Newly Developed Care Coordination Scale for People with Brain Injury

    Directory of Open Access Journals (Sweden)

    Brian P. Johnson

    2017-07-01

    Full Text Available Background: With the increasing complexity of care, people with disabilities and supportive significant others (SSO must often coordinate key aspects of their own care, but no validated scale currently exists to comprehensively characterize the activities done to manage and coordinate their care. Method: This study aimed to improve the feasibility, acceptability, and content validity of the Care and Service Coordination and Management (CASCAM scale and to test its internal consistency. Questionnaire items were administered to 23 individuals with acquired brain injury and 17 SSO. Results: Respondents confirmed content validity and that the instrument addresses important care coordination and management issues. The internal consistency of care coordination domains for medical/ rehabilitative and independent living needs for people with brain injury and their SSO ranged from α = .774 to .945. Conclusion: Care coordination activities by persons with disabilities, including brain injury, and their SSO are multifaceted but feasibly measurable and should be assessed to improve care.

  1. Hydrodynamic modelling and global datasets: Flow connectivity and SRTM data, a Bangkok case study.

    Science.gov (United States)

    Trigg, M. A.; Bates, P. B.; Michaelides, K.

    2012-04-01

    The rise in the global interconnected manufacturing supply chains requires an understanding and consistent quantification of flood risk at a global scale. Flood risk is often better quantified (or at least more precisely defined) in regions where there has been an investment in comprehensive topographical data collection such as LiDAR coupled with detailed hydrodynamic modelling. Yet in regions where these data and modelling are unavailable, the implications of flooding and the knock on effects for global industries can be dramatic, as evidenced by the recent floods in Bangkok, Thailand. There is a growing momentum in terms of global modelling initiatives to address this lack of a consistent understanding of flood risk and they will rely heavily on the application of available global datasets relevant to hydrodynamic modelling, such as Shuttle Radar Topography Mission (SRTM) data and its derivatives. These global datasets bring opportunities to apply consistent methodologies on an automated basis in all regions, while the use of coarser scale datasets also brings many challenges such as sub-grid process representation and downscaled hydrology data from global climate models. There are significant opportunities for hydrological science in helping define new, realistic and physically based methodologies that can be applied globally as well as the possibility of gaining new insights into flood risk through analysis of the many large datasets that will be derived from this work. We use Bangkok as a case study to explore some of the issues related to using these available global datasets for hydrodynamic modelling, with particular focus on using SRTM data to represent topography. Research has shown that flow connectivity on the floodplain is an important component in the dynamics of flood flows on to and off the floodplain, and indeed within different areas of the floodplain. A lack of representation of flow connectivity, often due to data resolution limitations, means

  2. Adaptação transcultural e consistência interna do Early Trauma Inventory (ETI Early Trauma Inventory (ETI: cross-cultural adaptation and internal consistency

    Directory of Open Access Journals (Sweden)

    Marcelo Feijó de Mello

    2010-04-01

    Full Text Available As experiências traumáticas precoces são um fator de risco preditivo de problemas psicopatológicos futuros. O Early Trauma Inventory (ETI é um instrumento que avalia em indivíduos adultos experiências traumáticas ocorridas antes dos 18 anos de idade. Tal instrumento foi traduzido, transculturalmente adaptado e sua consistência interna foi avaliada. Vítimas de violência que preencheram os critérios de inclusão e exclusão foram submetidas a uma entrevista diagnóstica (SCID-I e ao ETI. Foram incluídos 91 pacientes com o transtorno do estresse pós-traumático (TEPT. O alfa de Cronbach nos diferentes domínios variou de 0,595-0,793, e o escore total foi de 0,878. A maior parte dos itens nos vários domínios, com exceção do abuso emocional, apresentou índices de correlação interitem entre 0,51-0,99. A versão adaptada foi útil tanto na clínica quanto na pesquisa. Apresentou boa consistência interna e na correlação interitem. O ETI é um instrumento válido, com boa consistência para se avaliar a presença de história de traumas precoces em indivíduos adultos.Early life stress is a strong predictor of future psychopathology during adulthood. The Early Trauma Inventory (ETI was developed to detect the presence and impact of traumatic experiences that occurred up to 18 years of age. The ETI was translated and cross-culturally adapted and had its consistency evaluated. Victims of violence that met the inclusion and exclusion criteria were submitted to SCID-I and ETI. Ninety-one patients with post-traumatic stress disorder (PTSD were included. Cronbach's alpha in the different domains varied from 0.595 to 0.793, and the total score was 0.878. Except for emotional abuse, most of the various domains displayed inter-item correlation rates of 0.51 to 0.99. The adapted version was useful for clinical and research purposes and showed good internal consistency and inter-item correlation. The ETI is a valid instrument with good

  3. HEp-2 cell image classification method based on very deep convolutional networks with small datasets

    Science.gov (United States)

    Lu, Mengchi; Gao, Long; Guo, Xifeng; Liu, Qiang; Yin, Jianping

    2017-07-01

    Human Epithelial-2 (HEp-2) cell images staining patterns classification have been widely used to identify autoimmune diseases by the anti-Nuclear antibodies (ANA) test in the Indirect Immunofluorescence (IIF) protocol. Because manual test is time consuming, subjective and labor intensive, image-based Computer Aided Diagnosis (CAD) systems for HEp-2 cell classification are developing. However, methods proposed recently are mostly manual features extraction with low accuracy. Besides, the scale of available benchmark datasets is small, which does not exactly suitable for using deep learning methods. This issue will influence the accuracy of cell classification directly even after data augmentation. To address these issues, this paper presents a high accuracy automatic HEp-2 cell classification method with small datasets, by utilizing very deep convolutional networks (VGGNet). Specifically, the proposed method consists of three main phases, namely image preprocessing, feature extraction and classification. Moreover, an improved VGGNet is presented to address the challenges of small-scale datasets. Experimental results over two benchmark datasets demonstrate that the proposed method achieves superior performance in terms of accuracy compared with existing methods.

  4. A bivariate contaminated binormal model for robust fitting of proper ROC curves to a pair of correlated, possibly degenerate, ROC datasets.

    Science.gov (United States)

    Zhai, Xuetong; Chakraborty, Dev P

    2017-06-01

    The objective was to design and implement a bivariate extension to the contaminated binormal model (CBM) to fit paired receiver operating characteristic (ROC) datasets-possibly degenerate-with proper ROC curves. Paired datasets yield two correlated ratings per case. Degenerate datasets have no interior operating points and proper ROC curves do not inappropriately cross the chance diagonal. The existing method, developed more than three decades ago utilizes a bivariate extension to the binormal model, implemented in CORROC2 software, which yields improper ROC curves and cannot fit degenerate datasets. CBM can fit proper ROC curves to unpaired (i.e., yielding one rating per case) and degenerate datasets, and there is a clear scientific need to extend it to handle paired datasets. In CBM, nondiseased cases are modeled by a probability density function (pdf) consisting of a unit variance peak centered at zero. Diseased cases are modeled with a mixture distribution whose pdf consists of two unit variance peaks, one centered at positive μ with integrated probability α, the mixing fraction parameter, corresponding to the fraction of diseased cases where the disease was visible to the radiologist, and one centered at zero, with integrated probability (1-α), corresponding to disease that was not visible. It is shown that: (a) for nondiseased cases the bivariate extension is a unit variances bivariate normal distribution centered at (0,0) with a specified correlation ρ 1 ; (b) for diseased cases the bivariate extension is a mixture distribution with four peaks, corresponding to disease not visible in either condition, disease visible in only one condition, contributing two peaks, and disease visible in both conditions. An expression for the likelihood function is derived. A maximum likelihood estimation (MLE) algorithm, CORCBM, was implemented in the R programming language that yields parameter estimates and the covariance matrix of the parameters, and other statistics

  5. Reliability, factor analysis and internal consistency calculation of the Insomnia Severity Index (ISI in French and in English among Lebanese adolescents

    Directory of Open Access Journals (Sweden)

    M. Chahoud

    2017-06-01

    Conclusion: The results of our analyses reveal that both English and French versions of the ISI scale have good internal consistency and are reproducible and reliable. Therefore, it can be used to assess the prevalence of insomnia in Lebanese adolescents.

  6. Identification of the Consistently Altered Metabolic Targets in Human Hepatocellular Carcinoma.

    Science.gov (United States)

    Nwosu, Zeribe Chike; Megger, Dominik Andre; Hammad, Seddik; Sitek, Barbara; Roessler, Stephanie; Ebert, Matthias Philip; Meyer, Christoph; Dooley, Steven

    2017-09-01

    Cancer cells rely on metabolic alterations to enhance proliferation and survival. Metabolic gene alterations that repeatedly occur in liver cancer are largely unknown. We aimed to identify metabolic genes that are consistently deregulated, and are of potential clinical significance in human hepatocellular carcinoma (HCC). We studied the expression of 2,761 metabolic genes in 8 microarray datasets comprising 521 human HCC tissues. Genes exclusively up-regulated or down-regulated in 6 or more datasets were defined as consistently deregulated. The consistent genes that correlated with tumor progression markers ( ECM2 and MMP9) (Pearson correlation P < .05) were used for Kaplan-Meier overall survival analysis in a patient cohort. We further compared proteomic expression of metabolic genes in 19 tumors vs adjacent normal liver tissues. We identified 634 consistent metabolic genes, ∼60% of which are not yet described in HCC. The down-regulated genes (n = 350) are mostly involved in physiologic hepatocyte metabolic functions (eg, xenobiotic, fatty acid, and amino acid metabolism). In contrast, among consistently up-regulated metabolic genes (n = 284) are those involved in glycolysis, pentose phosphate pathway, nucleotide biosynthesis, tricarboxylic acid cycle, oxidative phosphorylation, proton transport, membrane lipid, and glycan metabolism. Several metabolic genes (n = 434) correlated with progression markers, and of these, 201 predicted overall survival outcome in the patient cohort analyzed. Over 90% of the metabolic targets significantly altered at the protein level were similarly up- or down-regulated as in genomic profile. We provide the first exposition of the consistently altered metabolic genes in HCC and show that these genes are potentially relevant targets for onward studies in preclinical and clinical contexts.

  7. Crosslinguistic Developmental Consistency in the Composition of Toddlers’ Internal State Vocabulary: Evidence from Four Languages

    Directory of Open Access Journals (Sweden)

    Susanne Kristen

    2014-01-01

    Full Text Available Mental state language, emerging in the second and third years of life in typically developing children, is one of the first signs of an explicit psychological understanding. While mental state vocabulary may serve a variety of conversational functions in discourse and thus might not always indicate psychological comprehension, there is evidence for genuine references to mental states (desires, knowledge, beliefs, and emotions early in development across languages. This present study presents parental questionnaire data on the composition of 297 toddler-aged (30-to 32-month-olds children’s internal state vocabulary in four languages: Italian, German, English, and French. The results demonstrated that across languages expressions for physiological states (e.g., hungry and tired were among the most varied, while children’s vocabulary for cognitive entities (e.g., know and think proved to be least varied. Further, consistent with studies on children’s comprehension of these concepts, across languages children’s mastery of volition terms (e.g., like to do and want preceded their mastery of cognition terms. These findings confirm the cross-linguistic consistency of children’s emerging expression of abstract psychological concepts.

  8. Internal consistency and content validity of a questionnaire aimed to assess the stages of behavioral lifestyle changes in Colombian schoolchildren: The Fuprecol study

    Directory of Open Access Journals (Sweden)

    Yasmira CARRILLO-BERNATE

    Full Text Available ABSTRACT Objective To assess internal consistency and content validity of a questionnaire aimed to assess the stages of Behavioural Lifestyle Changes in a sample of school-aged children and adolescents aged 9 to 17 years-old. Methods This validation study involved 675 schoolchildren from three official school in the city of Bogota, Colombia. A self-administered questionnaire called Behavioural Lifestyle Changes has been designed to explore stages of change regarding to physical activity/exercise, fruit and vegetable consumption, alcohol abuse, tobacco use, and drug abuse. Cronbach-α, Kappa index and exploratory factor analysis were used for evaluating the internal consistency and validity of content, respectively. Results The study population consisted of 51.1% males and the participants’ average age was 12.7±2.4 years-old. Behavioural Lifestyle Changes scored 0.720 (range 0.691 to 0.730 on the Cronbach α and intra-observer reproducibility was good (Kappa=0.71. Exploratory factor analysis determined two factors (factor 1: physical activity/exercise, fruit and vegetable consumption, and factor 2: alcohol abuse tobacco use and drug abuse, explaining 67.78% of variance by the items and six interactions χ2/gL=11649.833; p<0.001. Conclusion Behavioural Lifestyle Changes Questionnaire was seen to have suitable internal consistency and validity. This instrument can be recommended, mainly within the context of primary attention for studying the stages involved in the lifestyle behavioural changes model on a school-based population.

  9. “Controlled, cross-species dataset for exploring biases in genome annotation and modification profiles”

    Directory of Open Access Journals (Sweden)

    Alison McAfee

    2015-12-01

    Full Text Available Since the sequencing of the honey bee genome, proteomics by mass spectrometry has become increasingly popular for biological analyses of this insect; but we have observed that the number of honey bee protein identifications is consistently low compared to other organisms [1]. In this dataset, we use nanoelectrospray ionization-coupled liquid chromatography–tandem mass spectrometry (nLC–MS/MS to systematically investigate the root cause of low honey bee proteome coverage. To this end, we present here data from three key experiments: a controlled, cross-species analyses of samples from Apis mellifera, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Mus musculus and Homo sapiens; a proteomic analysis of an individual honey bee whose genome was also sequenced; and a cross-tissue honey bee proteome comparison. The cross-species dataset was interrogated to determine relative proteome coverages between species, and the other two datasets were used to search for polymorphic sequences and to compare protein cleavage profiles, respectively.

  10. Fluxnet Synthesis Dataset Collaboration Infrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Agarwal, Deborah A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Humphrey, Marty [Univ. of Virginia, Charlottesville, VA (United States); van Ingen, Catharine [Microsoft. San Francisco, CA (United States); Beekwilder, Norm [Univ. of Virginia, Charlottesville, VA (United States); Goode, Monte [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jackson, Keith [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Rodriguez, Matt [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Weber, Robin [Univ. of California, Berkeley, CA (United States)

    2008-02-06

    The Fluxnet synthesis dataset originally compiled for the La Thuile workshop contained approximately 600 site years. Since the workshop, several additional site years have been added and the dataset now contains over 920 site years from over 240 sites. A data refresh update is expected to increase those numbers in the next few months. The ancillary data describing the sites continues to evolve as well. There are on the order of 120 site contacts and 60proposals have been approved to use thedata. These proposals involve around 120 researchers. The size and complexity of the dataset and collaboration has led to a new approach to providing access to the data and collaboration support and the support team attended the workshop and worked closely with the attendees and the Fluxnet project office to define the requirements for the support infrastructure. As a result of this effort, a new website (http://www.fluxdata.org) has been created to provide access to the Fluxnet synthesis dataset. This new web site is based on a scientific data server which enables browsing of the data on-line, data download, and version tracking. We leverage database and data analysis tools such as OLAP data cubes and web reports to enable browser and Excel pivot table access to the data.

  11. Simulation of Smart Home Activity Datasets

    Directory of Open Access Journals (Sweden)

    Jonathan Synnott

    2015-06-01

    Full Text Available A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  12. Simulation of Smart Home Activity Datasets.

    Science.gov (United States)

    Synnott, Jonathan; Nugent, Chris; Jeffers, Paul

    2015-06-16

    A globally ageing population is resulting in an increased prevalence of chronic conditions which affect older adults. Such conditions require long-term care and management to maximize quality of life, placing an increasing strain on healthcare resources. Intelligent environments such as smart homes facilitate long-term monitoring of activities in the home through the use of sensor technology. Access to sensor datasets is necessary for the development of novel activity monitoring and recognition approaches. Access to such datasets is limited due to issues such as sensor cost, availability and deployment time. The use of simulated environments and sensors may address these issues and facilitate the generation of comprehensive datasets. This paper provides a review of existing approaches for the generation of simulated smart home activity datasets, including model-based approaches and interactive approaches which implement virtual sensors, environments and avatars. The paper also provides recommendation for future work in intelligent environment simulation.

  13. Algorithms for assessing person-based consistency among linked records for the investigation of maternal use of medications and safety

    Directory of Open Access Journals (Sweden)

    Duong Tran

    2017-04-01

    Quality assessment indicated high consistency among linked records. The set of algorithms developed in this project can be applied to similar linked perinatal datasets to promote a consistent approach and comparability across studies.

  14. The impact of the resolution of meteorological datasets on catchment-scale drought studies

    Science.gov (United States)

    Hellwig, Jost; Stahl, Kerstin

    2017-04-01

    Gridded meteorological datasets provide the basis to study drought at a range of scales, including catchment scale drought studies in hydrology. They are readily available to study past weather conditions and often serve real time monitoring as well. As these datasets differ in spatial/temporal coverage and spatial/temporal resolution, for most studies there is a tradeoff between these features. Our investigation examines whether biases occur when studying drought on catchment scale with low resolution input data. For that, a comparison among the datasets HYRAS (covering Central Europe, 1x1 km grid, daily data, 1951 - 2005), E-OBS (Europe, 0.25° grid, daily data, 1950-2015) and GPCC (whole world, 0.5° grid, monthly data, 1901 - 2013) is carried out. Generally, biases in precipitation increase with decreasing resolution. Most important variations are found during summer. In low mountain range of Central Europe the datasets of sparse resolution (E-OBS, GPCC) overestimate dry days and underestimate total precipitation since they are not able to describe high spatial variability. However, relative measures like the correlation coefficient reveal good consistencies of dry and wet periods, both for absolute precipitation values and standardized indices like the Standardized Precipitation Index (SPI) or Standardized Precipitation Evaporation Index (SPEI). Particularly the most severe droughts derived from the different datasets match very well. These results indicate that absolute values of sparse resolution datasets applied to catchment scale might be critical to use for an assessment of the hydrological drought at catchment scale, whereas relative measures for determining periods of drought are more trustworthy. Therefore, studies on drought, that downscale meteorological data, should carefully consider their data needs and focus on relative measures for dry periods if sufficient for the task.

  15. Solar Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration National Dataset Toolkit Solar Integration National Dataset Toolkit NREL is working on a Solar Integration National Dataset (SIND) Toolkit to enable researchers to perform U.S . regional solar generation integration studies. It will provide modeled, coherent subhourly solar power data

  16. Daily precipitation grids for Austria since 1961—development and evaluation of a spatial dataset for hydroclimatic monitoring and modelling

    Science.gov (United States)

    Hiebl, Johann; Frei, Christoph

    2018-04-01

    Spatial precipitation datasets that are long-term consistent, highly resolved and extend over several decades are an increasingly popular basis for modelling and monitoring environmental processes and planning tasks in hydrology, agriculture, energy resources management, etc. Here, we present a grid dataset of daily precipitation for Austria meant to promote such applications. It has a grid spacing of 1 km, extends back till 1961 and is continuously updated. It is constructed with the classical two-tier analysis, involving separate interpolations for mean monthly precipitation and daily relative anomalies. The former was accomplished by kriging with topographic predictors as external drift utilising 1249 stations. The latter is based on angular distance weighting and uses 523 stations. The input station network was kept largely stationary over time to avoid artefacts on long-term consistency. Example cases suggest that the new analysis is at least as plausible as previously existing datasets. Cross-validation and comparison against experimental high-resolution observations (WegenerNet) suggest that the accuracy of the dataset depends on interpretation. Users interpreting grid point values as point estimates must expect systematic overestimates for light and underestimates for heavy precipitation as well as substantial random errors. Grid point estimates are typically within a factor of 1.5 from in situ observations. Interpreting grid point values as area mean values, conditional biases are reduced and the magnitude of random errors is considerably smaller. Together with a similar dataset of temperature, the new dataset (SPARTACUS) is an interesting basis for modelling environmental processes, studying climate change impacts and monitoring the climate of Austria.

  17. PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

    Directory of Open Access Journals (Sweden)

    E. Hietanen

    2016-06-01

    Full Text Available In this study, a prototype service to provide data from Web Feature Service (WFS as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF data format. Next, a Web Ontology Language (OWL ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID. The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

  18. Wind Integration National Dataset Toolkit | Grid Modernization | NREL

    Science.gov (United States)

    Integration National Dataset Toolkit Wind Integration National Dataset Toolkit The Wind Integration National Dataset (WIND) Toolkit is an update and expansion of the Eastern Wind Integration Data Set and Western Wind Integration Data Set. It supports the next generation of wind integration studies. WIND

  19. Scientific Datasets: Discovery and Aggregation for Semantic Interpretation.

    Science.gov (United States)

    Lopez, L. A.; Scott, S.; Khalsa, S. J. S.; Duerr, R.

    2015-12-01

    One of the biggest challenges that interdisciplinary researchers face is finding suitable datasets in order to advance their science; this problem remains consistent across multiple disciplines. A surprising number of scientists, when asked what tool they use for data discovery, reply "Google", which is an acceptable solution in some cases but not even Google can find -or cares to compile- all the data that's relevant for science and particularly geo sciences. If a dataset is not discoverable through a well known search provider it will remain dark data to the scientific world.For the past year, BCube, an EarthCube Building Block project, has been developing, testing and deploying a technology stack capable of data discovery at web-scale using the ultimate dataset: The Internet. This stack has 2 principal components, a web-scale crawling infrastructure and a semantic aggregator. The web-crawler is a modified version of Apache Nutch (the originator of Hadoop and other big data technologies) that has been improved and tailored for data and data service discovery. The second component is semantic aggregation, carried out by a python-based workflow that extracts valuable metadata and stores it in the form of triples through the use semantic technologies.While implementing the BCube stack we have run into several challenges such as a) scaling the project to cover big portions of the Internet at a reasonable cost, b) making sense of very diverse and non-homogeneous data, and lastly, c) extracting facts about these datasets using semantic technologies in order to make them usable for the geosciences community. Despite all these challenges we have proven that we can discover and characterize data that otherwise would have remained in the dark corners of the Internet. Having all this data indexed and 'triplelized' will enable scientists to access a trove of information relevant to their work in a more natural way. An important characteristic of the BCube stack is that all

  20. Investigating country-specific music preferences and music recommendation algorithms with the LFM-1b dataset.

    Science.gov (United States)

    Schedl, Markus

    2017-01-01

    Recently, the LFM-1b dataset has been proposed to foster research and evaluation in music retrieval and music recommender systems, Schedl (Proceedings of the ACM International Conference on Multimedia Retrieval (ICMR). New York, 2016). It contains more than one billion music listening events created by more than 120,000 users of Last.fm. Each listening event is characterized by artist, album, and track name, and further includes a timestamp. Basic demographic information and a selection of more elaborate listener-specific descriptors are included as well, for anonymized users. In this article, we reveal information about LFM-1b's acquisition and content and we compare it to existing datasets. We furthermore provide an extensive statistical analysis of the dataset, including basic properties of the item sets, demographic coverage, distribution of listening events (e.g., over artists and users), and aspects related to music preference and consumption behavior (e.g., temporal features and mainstreaminess of listeners). Exploiting country information of users and genre tags of artists, we also create taste profiles for populations and determine similar and dissimilar countries in terms of their populations' music preferences. Finally, we illustrate the dataset's usage in a simple artist recommendation task, whose results are intended to serve as baseline against which more elaborate techniques can be assessed.

  1. Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

    Science.gov (United States)

    Moroni, D. F.; Quach, N.; Francis-Curley, W.

    2016-12-01

    As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to

  2. Publishing datasets with eSciDoc and panMetaDocs

    Science.gov (United States)

    Ulbricht, D.; Klump, J.; Bertelmann, R.

    2012-04-01

    Currently serveral research institutions worldwide undertake considerable efforts to have their scientific datasets published and to syndicate them to data portals as extensively described objects identified by a persistent identifier. This is done to foster the reuse of data, to make scientific work more transparent, and to create a citable entity that can be referenced unambigously in written publications. GFZ Potsdam established a publishing workflow for file based research datasets. Key software components are an eSciDoc infrastructure [1] and multiple instances of the data curation tool panMetaDocs [2]. The eSciDoc repository holds data objects and their associated metadata in container objects, called eSciDoc items. A key metadata element in this context is the publication status of the referenced data set. PanMetaDocs, which is based on PanMetaWorks [3], is a PHP based web application that allows to describe data with any XML-based metadata schema. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. Access rights can be applied to set visibility of datasets to other project members and allow collaboration on and notifying about datasets (RSS) and interaction with the internal messaging system, that was inherited from panMetaWorks. When a dataset is to be published, panMetaDocs allows to change the publication status of the eSciDoc item from status "private" to "submitted" and prepare the dataset for verification by an external reviewer. After quality checks, the item publication status can be changed to "published". This makes the data and metadata available through the internet worldwide. PanMetaDocs is developed as an eSciDoc application. It is an easy to use graphical user interface to eSciDoc items, their data and metadata. It is also an application supporting a DOI publication agent during the process of

  3. 2001 - 2010 Danish design reference year. Reference climate dataset for technical dimensioning in building, construction and other sectors

    Energy Technology Data Exchange (ETDEWEB)

    Grunnet Wang, P.; Scharling, M.; Pagh Nielsen, K.; Kern-Hansen, C. [Danish Meteorological Institute (DMI), Copenhagen (Denmark); Wittchen, K.B. [Aalborg Univ., Danish Building Research Institute (SBi), Copenhagen (Denmark)

    2013-09-15

    This report presents the Danish Design Reference Year based on observed data from 2001 - 2010. In various sectors - i.e. building and construction, energy, etc. - the climate and weather usually plays a part in a given project. The Danish Design Reference Year dataset is a collection of data series for eleven specific parameters, that each represents a typical year in Denmark. The uses of the dataset may vary from simulations to statistical analysis, graphical overviews etc. The Danish land areas have been sectionalised into five to six climatological zones depending on the parameter, each characterized by distinct diurnal and yearly variations. The dataset consists of observed data from one station located within and representing each zone. In addition to the complete Danish Design Reference Year dataset, a subset specifically selected to be used for energy performance calculations for obtaining a building permit is included. (Author)

  4. Designing the colorectal cancer core dataset in Iran

    Directory of Open Access Journals (Sweden)

    Sara Dorri

    2017-01-01

    Full Text Available Background: There is no need to explain the importance of collection, recording and analyzing the information of disease in any health organization. In this regard, systematic design of standard data sets can be helpful to record uniform and consistent information. It can create interoperability between health care systems. The main purpose of this study was design the core dataset to record colorectal cancer information in Iran. Methods: For the design of the colorectal cancer core data set, a combination of literature review and expert consensus were used. In the first phase, the draft of the data set was designed based on colorectal cancer literature review and comparative studies. Then, in the second phase, this data set was evaluated by experts from different discipline such as medical informatics, oncology and surgery. Their comments and opinion were taken. In the third phase refined data set, was evaluated again by experts and eventually data set was proposed. Results: In first phase, based on the literature review, a draft set of 85 data elements was designed. In the second phase this data set was evaluated by experts and supplementary information was offered by professionals in subgroups especially in treatment part. In this phase the number of elements totally were arrived to 93 numbers. In the third phase, evaluation was conducted by experts and finally this dataset was designed in five main parts including: demographic information, diagnostic information, treatment information, clinical status assessment information, and clinical trial information. Conclusion: In this study the comprehensive core data set of colorectal cancer was designed. This dataset in the field of collecting colorectal cancer information can be useful through facilitating exchange of health information. Designing such data set for similar disease can help providers to collect standard data from patients and can accelerate retrieval from storage systems.

  5. The influence of international and domestic events in the evolution of forest inventory and reporting consistency in the United States

    Science.gov (United States)

    W. Brad Smith

    2009-01-01

    This article takes a brief chronological look at resource inventory and reporting and links to international influences. It explores events as drivers of more consistent data within the United States and highlights key dates and events in the evolution of inventory policy and practice. From King George to L?Ecole nationale forestiere to the Food and Agriculture...

  6. Merged SAGE II, Ozone_cci and OMPS ozone profile dataset and evaluation of ozone trends in the stratosphere

    Directory of Open Access Journals (Sweden)

    V. F. Sofieva

    2017-10-01

    Full Text Available In this paper, we present a merged dataset of ozone profiles from several satellite instruments: SAGE II on ERBS, GOMOS, SCIAMACHY and MIPAS on Envisat, OSIRIS on Odin, ACE-FTS on SCISAT, and OMPS on Suomi-NPP. The merged dataset is created in the framework of the European Space Agency Climate Change Initiative (Ozone_cci with the aim of analyzing stratospheric ozone trends. For the merged dataset, we used the latest versions of the original ozone datasets. The datasets from the individual instruments have been extensively validated and intercompared; only those datasets which are in good agreement, and do not exhibit significant drifts with respect to collocated ground-based observations and with respect to each other, are used for merging. The long-term SAGE–CCI–OMPS dataset is created by computation and merging of deseasonalized anomalies from individual instruments. The merged SAGE–CCI–OMPS dataset consists of deseasonalized anomalies of ozone in 10° latitude bands from 90° S to 90° N and from 10 to 50 km in steps of 1 km covering the period from October 1984 to July 2016. This newly created dataset is used for evaluating ozone trends in the stratosphere through multiple linear regression. Negative ozone trends in the upper stratosphere are observed before 1997 and positive trends are found after 1997. The upper stratospheric trends are statistically significant at midlatitudes and indicate ozone recovery, as expected from the decrease of stratospheric halogens that started in the middle of the 1990s and stratospheric cooling.

  7. A New Outlier Detection Method for Multidimensional Datasets

    KAUST Repository

    Abdel Messih, Mario A.

    2012-07-01

    This study develops a novel hybrid method for outlier detection (HMOD) that combines the idea of distance based and density based methods. The proposed method has two main advantages over most of the other outlier detection methods. The first advantage is that it works well on both dense and sparse datasets. The second advantage is that, unlike most other outlier detection methods that require careful parameter setting and prior knowledge of the data, HMOD is not very sensitive to small changes in parameter values within certain parameter ranges. The only required parameter to set is the number of nearest neighbors. In addition, we made a fully parallelized implementation of HMOD that made it very efficient in applications. Moreover, we proposed a new way of using the outlier detection for redundancy reduction in datasets where the confidence level that evaluates how accurate the less redundant dataset can be used to represent the original dataset can be specified by users. HMOD is evaluated on synthetic datasets (dense and mixed “dense and sparse”) and a bioinformatics problem of redundancy reduction of dataset of position weight matrices (PWMs) of transcription factor binding sites. In addition, in the process of assessing the performance of our redundancy reduction method, we developed a simple tool that can be used to evaluate the confidence level of reduced dataset representing the original dataset. The evaluation of the results shows that our method can be used in a wide range of problems.

  8. Assessment of test-retest reliability and internal consistency of the Wisconsin Gait Scale in hemiparetic post-stroke patients

    Directory of Open Access Journals (Sweden)

    Guzik Agnieszka

    2016-09-01

    Full Text Available Introduction: A proper assessment of gait pattern is a significant aspect in planning the process of teaching gait in hemiparetic post-stroke patients. The Wisconsin Gait Scale (WGS is an observational tool for assessing post-stroke patients’ gait. The aim of the study was to assess test-retest reliability and internal consistency of the WGS and examine correlations between gait assessment made with the WGS and gait speed, Brunnström scale, Ashworth’s scale and the Barthel Index.

  9. Establishing macroecological trait datasets: digitalization, extrapolation, and validation of diet preferences in terrestrial mammals worldwide.

    Science.gov (United States)

    Kissling, Wilm Daniel; Dalby, Lars; Fløjgaard, Camilla; Lenoir, Jonathan; Sandel, Brody; Sandom, Christopher; Trøjelsgaard, Kristian; Svenning, Jens-Christian

    2014-07-01

    Ecological trait data are essential for understanding the broad-scale distribution of biodiversity and its response to global change. For animals, diet represents a fundamental aspect of species' evolutionary adaptations, ecological and functional roles, and trophic interactions. However, the importance of diet for macroevolutionary and macroecological dynamics remains little explored, partly because of the lack of comprehensive trait datasets. We compiled and evaluated a comprehensive global dataset of diet preferences of mammals ("MammalDIET"). Diet information was digitized from two global and cladewide data sources and errors of data entry by multiple data recorders were assessed. We then developed a hierarchical extrapolation procedure to fill-in diet information for species with missing information. Missing data were extrapolated with information from other taxonomic levels (genus, other species within the same genus, or family) and this extrapolation was subsequently validated both internally (with a jack-knife approach applied to the compiled species-level diet data) and externally (using independent species-level diet information from a comprehensive continentwide data source). Finally, we grouped mammal species into trophic levels and dietary guilds, and their species richness as well as their proportion of total richness were mapped at a global scale for those diet categories with good validation results. The success rate of correctly digitizing data was 94%, indicating that the consistency in data entry among multiple recorders was high. Data sources provided species-level diet information for a total of 2033 species (38% of all 5364 terrestrial mammal species, based on the IUCN taxonomy). For the remaining 3331 species, diet information was mostly extrapolated from genus-level diet information (48% of all terrestrial mammal species), and only rarely from other species within the same genus (6%) or from family level (8%). Internal and external

  10. VisIVO: A Library and Integrated Tools for Large Astrophysical Dataset Exploration

    Science.gov (United States)

    Becciani, U.; Costa, A.; Ersotelos, N.; Krokos, M.; Massimino, P.; Petta, C.; Vitello, F.

    2012-09-01

    VisIVO provides an integrated suite of tools and services that can be used in many scientific fields. VisIVO development starts in the Virtual Observatory framework. VisIVO allows users to visualize meaningfully highly-complex, large-scale datasets and create movies of these visualizations based on distributed infrastructures. VisIVO supports high-performance, multi-dimensional visualization of large-scale astrophysical datasets. Users can rapidly obtain meaningful visualizations while preserving full and intuitive control of the relevant parameters. VisIVO consists of VisIVO Desktop - a stand-alone application for interactive visualization on standard PCs, VisIVO Server - a platform for high performance visualization, VisIVO Web - a custom designed web portal, VisIVOSmartphone - an application to exploit the VisIVO Server functionality and the latest VisIVO features: VisIVO Library allows a job running on a computational system (grid, HPC, etc.) to produce movies directly with the code internal data arrays without the need to produce intermediate files. This is particularly important when running on large computational facilities, where the user wants to have a look at the results during the data production phase. For example, in grid computing facilities, images can be produced directly in the grid catalogue while the user code is running in a system that cannot be directly accessed by the user (a worker node). The deployment of VisIVO on the DG and gLite is carried out with the support of EDGI and EGI-Inspire projects. Depending on the structure and size of datasets under consideration, the data exploration process could take several hours of CPU for creating customized views and the production of movies could potentially last several days. For this reason an MPI parallel version of VisIVO could play a fundamental role in increasing performance, e.g. it could be automatically deployed on nodes that are MPI aware. A central concept in our development is thus to

  11. Cross-Cultural Adaptation of the Profile Fitness Mapping Neck Questionnaire to Brazilian Portuguese: Internal Consistency, Reliability, and Construct and Structural Validity.

    Science.gov (United States)

    Ferreira, Mariana Cândido; Björklund, Martin; Dach, Fabiola; Chaves, Thais Cristina

    The purpose of this study was to adapt and evaluate the psychometric properties of the ProFitMap-neck to Brazilian Portuguese. The cross-cultural adaptation consisted of 5 stages, and 180 female patients with chronic neck pain participated in the study. A subsample (n = 30) answered the pretest, and another subsample (n = 100) answered the questionnaire a second time. Internal consistency, test-retest reliability, and construct validity (hypothesis testing and structural validity) were estimated. For construct validity, the scores of the questionnaire were correlated with the Neck Disability Index (NDI), and the Hospital Anxiety and Depression Scale (HADS), the Tampa Scale of Kinesiophobia (TSK), and the 36-item Short-Form Health Survey (SF-36). Internal consistency was determined by adequate Cronbach's α values (α > 0.70). Strong reliability was identified by high intraclass correlation coefficients (ICC > 0.75). Construct validity was identified by moderate and strong correlations of the Br-ProFitMap-neck with total NDI score (-0.56 50%, Kaiser-Meyer-Olkin index > 0.50, eigenvalue > 1, and factor loadings > 0.2. Br-ProFitMap-neck had adequate psychometric properties and can be used in clinical settings, as well as research, in patients with chronic neck pain. Copyright © 2017. Published by Elsevier Inc.

  12. NP-PAH Interaction Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  13. A dataset on tail risk of commodities markets.

    Science.gov (United States)

    Powell, Robert J; Vo, Duc H; Pham, Thach N; Singh, Abhay K

    2017-12-01

    This article contains the datasets related to the research article "The long and short of commodity tails and their relationship to Asian equity markets"(Powell et al., 2017) [1]. The datasets contain the daily prices (and price movements) of 24 different commodities decomposed from the S&P GSCI index and the daily prices (and price movements) of three share market indices including World, Asia, and South East Asia for the period 2004-2015. Then, the dataset is divided into annual periods, showing the worst 5% of price movements for each year. The datasets are convenient to examine the tail risk of different commodities as measured by Conditional Value at Risk (CVaR) as well as their changes over periods. The datasets can also be used to investigate the association between commodity markets and share markets.

  14. The Role of Datasets on Scientific Influence within Conflict Research.

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C; Moates, Shiloh; Carley, Kathleen M

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS) over a 66-year period (1945-2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped shape the

  15. Evaluating Temporal Consistency in Marine Biodiversity Hotspots.

    Science.gov (United States)

    Piacenza, Susan E; Thurman, Lindsey L; Barner, Allison K; Benkwitt, Cassandra E; Boersma, Kate S; Cerny-Chipman, Elizabeth B; Ingeman, Kurt E; Kindinger, Tye L; Lindsley, Amy J; Nelson, Jake; Reimer, Jessica N; Rowe, Jennifer C; Shen, Chenchen; Thompson, Kevin A; Heppell, Selina S

    2015-01-01

    With the ongoing crisis of biodiversity loss and limited resources for conservation, the concept of biodiversity hotspots has been useful in determining conservation priority areas. However, there has been limited research into how temporal variability in biodiversity may influence conservation area prioritization. To address this information gap, we present an approach to evaluate the temporal consistency of biodiversity hotspots in large marine ecosystems. Using a large scale, public monitoring dataset collected over an eight year period off the US Pacific Coast, we developed a methodological approach for avoiding biases associated with hotspot delineation. We aggregated benthic fish species data from research trawls and calculated mean hotspot thresholds for fish species richness and Shannon's diversity indices over the eight year dataset. We used a spatial frequency distribution method to assign hotspot designations to the grid cells annually. We found no areas containing consistently high biodiversity through the entire study period based on the mean thresholds, and no grid cell was designated as a hotspot for greater than 50% of the time-series. To test if our approach was sensitive to sampling effort and the geographic extent of the survey, we followed a similar routine for the northern region of the survey area. Our finding of low consistency in benthic fish biodiversity hotspots over time was upheld, regardless of biodiversity metric used, whether thresholds were calculated per year or across all years, or the spatial extent for which we calculated thresholds and identified hotspots. Our results suggest that static measures of benthic fish biodiversity off the US West Coast are insufficient for identification of hotspots and that long-term data are required to appropriately identify patterns of high temporal variability in biodiversity for these highly mobile taxa. Given that ecological communities are responding to a changing climate and other

  16. Evaluating soil moisture retrievals from ESA’s SMOS and NASA’s SMAP brightness temperature datasets

    Science.gov (United States)

    Al-Yaari, A.; Wigneron, J.-P.; Kerr, Y.; Rodriguez-Fernandez, N.; O’Neill, P. E.; Jackson, T. J.; De Lannoy, G.J.M.; Al Bitar, A; Mialon, A.; Richaume, P.; Walker, JP; Mahmoodi, A.; Yueh, S.

    2018-01-01

    Two satellites are currently monitoring surface soil moisture (SM) using L-band observations: SMOS (Soil Moisture and Ocean Salinity), a joint ESA (European Space Agency), CNES (Centre national d’études spatiales), and CDTI (the Spanish government agency with responsibility for space) satellite launched on November 2, 2009 and SMAP (Soil Moisture Active Passive), a National Aeronautics and Space Administration (NASA) satellite successfully launched in January 2015. In this study, we used a multilinear regression approach to retrieve SM from SMAP data to create a global dataset of SM, which is consistent with SM data retrieved from SMOS. This was achieved by calibrating coefficients of the regression model using the CATDS (Centre Aval de Traitement des Données) SMOS Level 3 SM and the horizontally and vertically polarized brightness temperatures (TB) at 40° incidence angle, over the 2013 – 2014 period. Next, this model was applied to SMAP L3 TB data from Apr 2015 to Jul 2016. The retrieved SM from SMAP (referred to here as SMAP_Reg) was compared to: (i) the operational SMAP L3 SM (SMAP_SCA), retrieved using the baseline Single Channel retrieval Algorithm (SCA); and (ii) the operational SMOSL3 SM, derived from the multiangular inversion of the L-MEB model (L-MEB algorithm) (SMOSL3). This inter-comparison was made against in situ soil moisture measurements from more than 400 sites spread over the globe, which are used here as a reference soil moisture dataset. The in situ observations were obtained from the International Soil Moisture Network (ISMN; https://ismn.geo.tuwien.ac.at/) in North of America (PBO_H2O, SCAN, SNOTEL, iRON, and USCRN), in Australia (Oznet), Africa (DAHRA), and in Europe (REMEDHUS, SMOSMANIA, FMI, and RSMN). The agreement was analyzed in terms of four classical statistical criteria: Root Mean Squared Error (RMSE), Bias, Unbiased RMSE (UnbRMSE), and correlation coefficient (R). Results of the comparison of these various products with in

  17. Proteomics dataset

    DEFF Research Database (Denmark)

    Bennike, Tue Bjerg; Carlsen, Thomas Gelsing; Ellingsen, Torkell

    2017-01-01

    patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8–10. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We...... been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples....

  18. Escala de Autoestima de Rosenberg (EAR: validade fatorial e consistência interna Rosenberg Self-Esteem Scale (RSS: factorial validity and internal consistency

    Directory of Open Access Journals (Sweden)

    Juliana Burges Sbicigo

    2010-12-01

    Full Text Available O objetivo deste estudo foi investigar as propriedades psicométricas da Escala de Autoestima de Rosenberg (EAR para adolescentes. Participaram 4.757 adolescentes, com idades entre 14 e 18 anos (M=15,77; DP=1,22, de nove cidades brasileiras. Os participantes responderam a uma versão da EAR adaptada para o Brasil. A análise fatorial exploratória apontou uma estrutura bidimensional, com 51.4% da variância explicada, que foi sustentada pela análise fatorial confirmatória. As análises de consistência interna realizadas por meio do coeficiente alfa de Cronbach, confiabilidade composta e variância extraída indicaram bons valores de fidedignidade. Diferenças nos escores de autoestima em função do sexo e da idade não foram encontradas. Conclui-se que a EAR apresenta qualidades psicométricas satisfatórias, mostrando-se um instrumento confiável para medir autoestima em adolescentes brasileiros.The aim of this study was to investigate the psychometrics properties of the Rosenberg Self-Esteem Scale (RSS for adolescents. The sample was composed of 4.757 adolescents, with ages between 14 and 18 years old (M=15.77; SD=1.22 in nine Brazilian cities. Participants responded to an adapted version of the RSS for Brazil. Exploratory factorial analysis showed a bidimensional structure, with 51.4% of explained variance. This result was supported by confirmatory factor analysis. The internal consistency analysis by Cronbach alpha coefficient, composite reliability and extracted variance indicated good reliability. Differences in self-esteem for gender and age were not found. These findings show that RSS has satisfactory psychometric qualities and it's a reliable instrument to assess self-esteem in Brazilian adolescents.

  19. Sixteen-item Anxiety Sensitivity Index: Confirmatory factor analytic evidence, internal consistency, and construct validity in a young adult sample from the Netherlands

    NARCIS (Netherlands)

    Vujanovic, Anka A.; Arrindell, Willem A.; Bernstein, Amit; Norton, Peter J.; Zvolensky, Michael J.

    The present investigation examined the factor structure, internal consistency, and construct validity of the 16-item Anxiety Sensitivity Index (ASI; Reiss Peterson, Gursky, & McNally 1986) in a young adult sample (n = 420)from the Netherlands. Confirmatory factor analysis was used to comparatively

  20. Psychometric analyses and internal consistency of the PHEEM questionnaire to measure the clinical learning environment in the clerkship of a Medical School in Chile.

    Science.gov (United States)

    Riquelme, Arnoldo; Herrera, Cristian; Aranis, Carolina; Oporto, Jorge; Padilla, Oslando

    2009-06-01

    The Spanish version of the Postgraduate Hospital Educational Environment Measure (PHEEM) was evaluated in this study to determine its psychometric properties, validity and internal consistency to measure the clinical learning environment in the hospital setting of Pontificia Universidad Católica de Chile Medical School's Internship. The 40-item PHEEM questionnaire was translated from English to Spanish and retranslated to English. Content validity was tested by a focus group and minor differences in meaning were adjusted. The PHEEM was administered to clerks in years 6 and 7. Construct validity was carried out using exploratory factor analysis followed by a Varimax rotation. Internal consistency was measured using Cronbach's alpha. A total of 125 out of 220 students responded to the PHEEM. The overall response rate was 56.8% and compliances with each item ranged from 99.2% to 100%. Analyses indicate that five factors instrument accounting for 58% of the variance and internal consistency of the 40-item questionnaire is 0.955 (Cronbach's alpha). The 40-item questionnaire had a mean score of 98.21 +/- 21.2 (maximum score of 160). The Spanish version of PHEEM is a multidimensional, valid and highly reliable instrument measuring the educational environment among undergraduate medical students working in hospital-based clerkships.

  1. K, L, and M shell datasets for PIXE spectrum fitting and analysis

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, David D., E-mail: dcz@ansto.gov.au; Crawford, Jagoda; Siegele, Rainer

    2015-11-15

    Highlights: • Differences between several datasets commonly used by PIXE codes for spectrum fitting and concentration estimates have been highlighted. • A preferred option dataset was selected which includes ionisation cross sections, fluorescence yield, Coster–Kronig probabilities and X-ray line emission rates for K, L and M subshells. • For PIXE codes differences of several tens of percent can be seen for selected elements for L and M lines depending on the data sets selected. - Abstract: Routine PIXE analysis programs, like GUPIX, GEOPIXE and PIXAN generally perform at least two key functions firstly, the fitting of K, L and M characteristic lines X-ray lines to a background, including unfolding of overlapping lines and secondly, the use of a fitted primary Kα, Lα or Mα line area to determine the elemental concentration in a given matrix. To achieve these two results to better than 3–5% the data sets for fluorescence yields, emission rates, Coster–Kronig transitions and ionisation cross sections should be determined to better than 3%. There are many different theoretical and experimental K, L and M datasets for these parameters. How they are applied and used in analysis programs can vary the results obtained for both fitting and concentration determinations. Here we discuss several commonly used datasets for fluorescence yields, emission rates, Coster–Kronig transitions and ionisation cross sections for K, L and M subshells and suggests an optimum set to obtain consistent results for PIXE analyses across a range of elements with atomic numbers from 5 ⩽ Z ⩽ 100.

  2. National Hydrography Dataset (NHD)

    Data.gov (United States)

    Kansas Data Access and Support Center — The National Hydrography Dataset (NHD) is a feature-based database that interconnects and uniquely identifies the stream segments or reaches that comprise the...

  3. The Harvard organic photovoltaic dataset.

    Science.gov (United States)

    Lopez, Steven A; Pyzer-Knapp, Edward O; Simm, Gregor N; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-09-27

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications.

  4. Consistent guiding center drift theories

    International Nuclear Information System (INIS)

    Wimmel, H.K.

    1982-04-01

    Various guiding-center drift theories are presented that are optimized in respect of consistency. They satisfy exact energy conservation theorems (in time-independent fields), Liouville's theorems, and appropriate power balance equations. A theoretical framework is given that allows direct and exact derivation of associated drift-kinetic equations from the respective guiding-center drift-orbit theories. These drift-kinetic equations are listed. Northrop's non-optimized theory is discussed for reference, and internal consistency relations of G.C. drift theories are presented. (orig.)

  5. Tables and figure datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — Soil and air concentrations of asbestos in Sumas study. This dataset is associated with the following publication: Wroble, J., T. Frederick, A. Frame, and D....

  6. A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery.

    Science.gov (United States)

    Ahmidi, Narges; Tao, Lingling; Sefati, Shahin; Gao, Yixin; Lea, Colin; Haro, Benjamin Bejar; Zappella, Luca; Khudanpur, Sanjeev; Vidal, Rene; Hager, Gregory D

    2017-09-01

    State-of-the-art techniques for surgical data analysis report promising results for automated skill assessment and action recognition. The contributions of many of these techniques, however, are limited to study-specific data and validation metrics, making assessment of progress across the field extremely challenging. In this paper, we address two major problems for surgical data analysis: First, lack of uniform-shared datasets and benchmarks, and second, lack of consistent validation processes. We address the former by presenting the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), a public dataset that we have created to support comparative research benchmarking. JIGSAWS contains synchronized video and kinematic data from multiple performances of robotic surgical tasks by operators of varying skill. We address the latter by presenting a well-documented evaluation methodology and reporting results for six techniques for automated segmentation and classification of time-series data on JIGSAWS. These techniques comprise four temporal approaches for joint segmentation and classification: hidden Markov model, sparse hidden Markov model (HMM), Markov semi-Markov conditional random field, and skip-chain conditional random field; and two feature-based ones that aim to classify fixed segments: bag of spatiotemporal features and linear dynamical systems. Most methods recognize gesture activities with approximately 80% overall accuracy under both leave-one-super-trial-out and leave-one-user-out cross-validation settings. Current methods show promising results on this shared dataset, but room for significant progress remains, particularly for consistent prediction of gesture activities across different surgeons. The results reported in this paper provide the first systematic and uniform evaluation of surgical activity recognition techniques on the benchmark database.

  7. 101 labeled brain images and a consistent human cortical labeling protocol

    Directory of Open Access Journals (Sweden)

    Arno eKlein

    2012-12-01

    Full Text Available We introduce the Mindboggle-101 dataset, the largest and most complete set of free, publicly accessible, manually labeled human brain images. To manually label the macroscopic anatomy in magnetic resonance images of 101 healthy participants, we created a new cortical labeling protocol that relies on robust anatomical landmarks and minimal manual edits after initialization with automated labels. The Desikan-Killiany-Tourville (DKT protocol is intended to improve the ease, consistency, and accuracy of labeling human cortical areas. Given how difficult it is to label brains, the Mindboggle-101 dataset is intended to serve as brain atlases for use in labeling other brains, as a normative dataset to establish morphometric variation in a healthy population for comparison against clinical populations, and contribute to the development, training, testing, and evaluation of automated registration and labeling algorithms. To this end, we also introduce benchmarks for the evaluation of such algorithms by comparing our manual labels with labels automatically generated by probabilistic and multi-atlas registration-based approaches. All data and related software and updated information are available on the http://www.mindboggle.info/data/ website.

  8. 101 Labeled Brain Images and a Consistent Human Cortical Labeling Protocol

    Science.gov (United States)

    Klein, Arno; Tourville, Jason

    2012-01-01

    We introduce the Mindboggle-101 dataset, the largest and most complete set of free, publicly accessible, manually labeled human brain images. To manually label the macroscopic anatomy in magnetic resonance images of 101 healthy participants, we created a new cortical labeling protocol that relies on robust anatomical landmarks and minimal manual edits after initialization with automated labels. The “Desikan–Killiany–Tourville” (DKT) protocol is intended to improve the ease, consistency, and accuracy of labeling human cortical areas. Given how difficult it is to label brains, the Mindboggle-101 dataset is intended to serve as brain atlases for use in labeling other brains, as a normative dataset to establish morphometric variation in a healthy population for comparison against clinical populations, and contribute to the development, training, testing, and evaluation of automated registration and labeling algorithms. To this end, we also introduce benchmarks for the evaluation of such algorithms by comparing our manual labels with labels automatically generated by probabilistic and multi-atlas registration-based approaches. All data and related software and updated information are available on the http://mindboggle.info/data website. PMID:23227001

  9. Integrated Surface Dataset (Global)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Integrated Surface (ISD) Dataset (ISD) is composed of worldwide surface weather observations from over 35,000 stations, though the best spatial coverage is...

  10. Aaron Journal article datasets

    Data.gov (United States)

    U.S. Environmental Protection Agency — All figures used in the journal article are in netCDF format. This dataset is associated with the following publication: Sims, A., K. Alapaty , and S. Raman....

  11. Market Squid Ecology Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains ecological information collected on the major adult spawning and juvenile habitats of market squid off California and the US Pacific Northwest....

  12. International and internal migration measured from the School Census in England.

    Science.gov (United States)

    Simpson, Ludi; Marquis, Naomi; Jivraj, Stephen

    2010-01-01

    The School Census is the only regularly updated dataset covering almost all of the population of a specific age, which records changes of address along with ethnicity and some family economic circumstances. It can be used to measure internal and international family migration as shown in this report. The School Census is suited to identify and quantify new local migration streams between censuses, successfully identifying the local distribution of Eastern European immigration in the decade since 2000. The measures do not provide a complete measure of migration, either internally or internationally. The exclusion of those outside the state school system means that internal migration is under-estimated, and international migration is approximately measured. The advantages of the School Census are its frequent updates, its fine geographical information, and its indicators of ethnicity and low family income, which powerfully complement other sources.

  13. ATLAS File and Dataset Metadata Collection and Use

    CERN Document Server

    Albrand, S; The ATLAS collaboration; Lambert, F; Gallas, E J

    2012-01-01

    The ATLAS Metadata Interface (“AMI”) was designed as a generic cataloguing system, and as such it has found many uses in the experiment including software release management, tracking of reconstructed event sizes and control of dataset nomenclature. The primary use of AMI is to provide a catalogue of datasets (file collections) which is searchable using physics criteria. In this paper we discuss the various mechanisms used for filling the AMI dataset and file catalogues. By correlating information from different sources we can derive aggregate information which is important for physics analysis; for example the total number of events contained in dataset, and possible reasons for missing events such as a lost file. Finally we will describe some specialized interfaces which were developed for the Data Preparation and reprocessing coordinators. These interfaces manipulate information from both the dataset domain held in AMI, and the run-indexed information held in the ATLAS COMA application (Conditions and ...

  14. One click dataset transfer: toward efficient coupling of distributed storage resources and CPUs

    Czech Academy of Sciences Publication Activity Database

    Zerola, Michal; Lauret, J.; Barták, R.; Šumbera, Michal

    2012-01-01

    Roč. 368, 012022 (2012), s. 1-10 ISSN 1742-6588. [14th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT). Uxbridge, 05.09.2011-09.09.2011] R&D Projects: GA MŠk LC07048; GA MŠk LA09013 Institutional support: RVO:61389005 Keywords : distributed storage * Grid computing * dataset transfer Subject RIV: BG - Nuclear, Atomic and Molecular Physics, Colliders http://iopscience.iop.org/1742-6596/368/1/012022/pdf/1742-6596_368_1_012022.pdf

  15. The Harvard organic photovoltaic dataset

    Science.gov (United States)

    Lopez, Steven A.; Pyzer-Knapp, Edward O.; Simm, Gregor N.; Lutzow, Trevor; Li, Kewei; Seress, Laszlo R.; Hachmann, Johannes; Aspuru-Guzik, Alán

    2016-01-01

    The Harvard Organic Photovoltaic Dataset (HOPV15) presented in this work is a collation of experimental photovoltaic data from the literature, and corresponding quantum-chemical calculations performed over a range of conformers, each with quantum chemical results using a variety of density functionals and basis sets. It is anticipated that this dataset will be of use in both relating electronic structure calculations to experimental observations through the generation of calibration schemes, as well as for the creation of new semi-empirical methods and the benchmarking of current and future model chemistries for organic electronic applications. PMID:27676312

  16. Identification of the Consistently Altered Metabolic Targets in Human Hepatocellular CarcinomaSummary

    Directory of Open Access Journals (Sweden)

    Zeribe Chike Nwosu

    2017-09-01

    Full Text Available Background & Aims: Cancer cells rely on metabolic alterations to enhance proliferation and survival. Metabolic gene alterations that repeatedly occur in liver cancer are largely unknown. We aimed to identify metabolic genes that are consistently deregulated, and are of potential clinical significance in human hepatocellular carcinoma (HCC. Methods: We studied the expression of 2,761 metabolic genes in 8 microarray datasets comprising 521 human HCC tissues. Genes exclusively up-regulated or down-regulated in 6 or more datasets were defined as consistently deregulated. The consistent genes that correlated with tumor progression markers (ECM2 and MMP9 (Pearson correlation P < .05 were used for Kaplan-Meier overall survival analysis in a patient cohort. We further compared proteomic expression of metabolic genes in 19 tumors vs adjacent normal liver tissues. Results: We identified 634 consistent metabolic genes, ∼60% of which are not yet described in HCC. The down-regulated genes (n = 350 are mostly involved in physiologic hepatocyte metabolic functions (eg, xenobiotic, fatty acid, and amino acid metabolism. In contrast, among consistently up-regulated metabolic genes (n = 284 are those involved in glycolysis, pentose phosphate pathway, nucleotide biosynthesis, tricarboxylic acid cycle, oxidative phosphorylation, proton transport, membrane lipid, and glycan metabolism. Several metabolic genes (n = 434 correlated with progression markers, and of these, 201 predicted overall survival outcome in the patient cohort analyzed. Over 90% of the metabolic targets significantly altered at the protein level were similarly up- or down-regulated as in genomic profile. Conclusions: We provide the first exposition of the consistently altered metabolic genes in HCC and show that these genes are potentially relevant targets for onward studies in preclinical and clinical contexts. Keywords: Liver Cancer, HCC, Tumor Metabolism

  17. Georeferencing UAS Derivatives Through Point Cloud Registration with Archived Lidar Datasets

    Science.gov (United States)

    Magtalas, M. S. L. Y.; Aves, J. C. L.; Blanco, A. C.

    2016-10-01

    Georeferencing gathered images is a common step before performing spatial analysis and other processes on acquired datasets using unmanned aerial systems (UAS). Methods of applying spatial information to aerial images or their derivatives is through onboard GPS (Global Positioning Systems) geotagging, or through tying of models through GCPs (Ground Control Points) acquired in the field. Currently, UAS (Unmanned Aerial System) derivatives are limited to meter-levels of accuracy when their generation is unaided with points of known position on the ground. The use of ground control points established using survey-grade GPS or GNSS receivers can greatly reduce model errors to centimeter levels. However, this comes with additional costs not only with instrument acquisition and survey operations, but also in actual time spent in the field. This study uses a workflow for cloud-based post-processing of UAS data in combination with already existing LiDAR data. The georeferencing of the UAV point cloud is executed using the Iterative Closest Point algorithm (ICP). It is applied through the open-source CloudCompare software (Girardeau-Montaut, 2006) on a `skeleton point cloud'. This skeleton point cloud consists of manually extracted features consistent on both LiDAR and UAV data. For this cloud, roads and buildings with minimal deviations given their differing dates of acquisition are considered consistent. Transformation parameters are computed for the skeleton cloud which could then be applied to the whole UAS dataset. In addition, a separate cloud consisting of non-vegetation features automatically derived using CANUPO classification algorithm (Brodu and Lague, 2012) was used to generate a separate set of parameters. Ground survey is done to validate the transformed cloud. An RMSE value of around 16 centimeters was found when comparing validation data to the models georeferenced using the CANUPO cloud and the manual skeleton cloud. Cloud-to-cloud distance computations of

  18. A high-resolution European dataset for hydrologic modeling

    Science.gov (United States)

    Ntegeka, Victor; Salamon, Peter; Gomes, Goncalo; Sint, Hadewij; Lorini, Valerio; Thielen, Jutta

    2013-04-01

    There is an increasing demand for large scale hydrological models not only in the field of modeling the impact of climate change on water resources but also for disaster risk assessments and flood or drought early warning systems. These large scale models need to be calibrated and verified against large amounts of observations in order to judge their capabilities to predict the future. However, the creation of large scale datasets is challenging for it requires collection, harmonization, and quality checking of large amounts of observations. For this reason, only a limited number of such datasets exist. In this work, we present a pan European, high-resolution gridded dataset of meteorological observations (EFAS-Meteo) which was designed with the aim to drive a large scale hydrological model. Similar European and global gridded datasets already exist, such as the HadGHCND (Caesar et al., 2006), the JRC MARS-STAT database (van der Goot and Orlandi, 2003) and the E-OBS gridded dataset (Haylock et al., 2008). However, none of those provide similarly high spatial resolution and/or a complete set of variables to force a hydrologic model. EFAS-Meteo contains daily maps of precipitation, surface temperature (mean, minimum and maximum), wind speed and vapour pressure at a spatial grid resolution of 5 x 5 km for the time period 1 January 1990 - 31 December 2011. It furthermore contains calculated radiation, which is calculated by using a staggered approach depending on the availability of sunshine duration, cloud cover and minimum and maximum temperature, and evapotranspiration (potential evapotranspiration, bare soil and open water evapotranspiration). The potential evapotranspiration was calculated using the Penman-Monteith equation with the above-mentioned meteorological variables. The dataset was created as part of the development of the European Flood Awareness System (EFAS) and has been continuously updated throughout the last years. The dataset variables are used as

  19. Consistent force fields for saccharides

    DEFF Research Database (Denmark)

    Rasmussen, Kjeld

    1999-01-01

    Consistent force fields for carbohydrates were hitherto developed by extensive optimization ofpotential energy function parameters on experimental data and on ab initio results. A wide range of experimental data is used: internal structures obtained from gas phase electron diffraction and from x......-anomeric effects are accounted for without addition of specific terms. The work is done in the framework of the Consistent Force Field which originatedin Israel and was further developed in Denmark. The actual methods and strategies employed havebeen described previously. Extensive testing of the force field...

  20. The Role of Datasets on Scientific Influence within Conflict Research.

    Directory of Open Access Journals (Sweden)

    Tracy Van Holt

    Full Text Available We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving "conflict" in the Web of Science (WoS over a 66-year period (1945-2011. We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA, a specialized social network analysis on this citation network (~1.5 million works, to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed-such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957-1971 where ideas didn't persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993. The critical path consisted of a number of key features: 1 Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2 Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3 We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography. Publically available conflict datasets developed early on helped

  1. The Role of Datasets on Scientific Influence within Conflict Research

    Science.gov (United States)

    Van Holt, Tracy; Johnson, Jeffery C.; Moates, Shiloh; Carley, Kathleen M.

    2016-01-01

    We inductively tested if a coherent field of inquiry in human conflict research emerged in an analysis of published research involving “conflict” in the Web of Science (WoS) over a 66-year period (1945–2011). We created a citation network that linked the 62,504 WoS records and their cited literature. We performed a critical path analysis (CPA), a specialized social network analysis on this citation network (~1.5 million works), to highlight the main contributions in conflict research and to test if research on conflict has in fact evolved to represent a coherent field of inquiry. Out of this vast dataset, 49 academic works were highlighted by the CPA suggesting a coherent field of inquiry; which means that researchers in the field acknowledge seminal contributions and share a common knowledge base. Other conflict concepts that were also analyzed—such as interpersonal conflict or conflict among pharmaceuticals, for example, did not form their own CP. A single path formed, meaning that there was a cohesive set of ideas that built upon previous research. This is in contrast to a main path analysis of conflict from 1957–1971 where ideas didn’t persist in that multiple paths existed and died or emerged reflecting lack of scientific coherence (Carley, Hummon, and Harty, 1993). The critical path consisted of a number of key features: 1) Concepts that built throughout include the notion that resource availability drives conflict, which emerged in the 1960s-1990s and continued on until 2011. More recent intrastate studies that focused on inequalities emerged from interstate studies on the democracy of peace earlier on the path. 2) Recent research on the path focused on forecasting conflict, which depends on well-developed metrics and theories to model. 3) We used keyword analysis to independently show how the CP was topically linked (i.e., through democracy, modeling, resources, and geography). Publically available conflict datasets developed early on helped

  2. Would the ‘real’ observed dataset stand up? A critical examination of eight observed gridded climate datasets for China

    International Nuclear Information System (INIS)

    Sun, Qiaohong; Miao, Chiyuan; Duan, Qingyun; Kong, Dongxian; Ye, Aizhong; Di, Zhenhua; Gong, Wei

    2014-01-01

    This research compared and evaluated the spatio-temporal similarities and differences of eight widely used gridded datasets. The datasets include daily precipitation over East Asia (EA), the Climate Research Unit (CRU) product, the Global Precipitation Climatology Centre (GPCC) product, the University of Delaware (UDEL) product, Precipitation Reconstruction over Land (PREC/L), the Asian Precipitation Highly Resolved Observational (APHRO) product, the Institute of Atmospheric Physics (IAP) dataset from the Chinese Academy of Sciences, and the National Meteorological Information Center dataset from the China Meteorological Administration (CN05). The meteorological variables focus on surface air temperature (SAT) or precipitation (PR) in China. All datasets presented general agreement on the whole spatio-temporal scale, but some differences appeared for specific periods and regions. On a temporal scale, EA shows the highest amount of PR, while APHRO shows the lowest. CRU and UDEL show higher SAT than IAP or CN05. On a spatial scale, the most significant differences occur in western China for PR and SAT. For PR, the difference between EA and CRU is the largest. When compared with CN05, CRU shows higher SAT in the central and southern Northwest river drainage basin, UDEL exhibits higher SAT over the Southwest river drainage system, and IAP has lower SAT in the Tibetan Plateau. The differences in annual mean PR and SAT primarily come from summer and winter, respectively. Finally, potential factors impacting agreement among gridded climate datasets are discussed, including raw data sources, quality control (QC) schemes, orographic correction, and interpolation techniques. The implications and challenges of these results for climate research are also briefly addressed. (paper)

  3. Guided color consistency optimization for image mosaicking

    Science.gov (United States)

    Xie, Renping; Xia, Menghan; Yao, Jian; Li, Li

    2018-01-01

    This paper studies the problem of color consistency correction for sequential images with diverse color characteristics. Existing algorithms try to adjust all images to minimize color differences among images under a unified energy framework, however, the results are prone to presenting a consistent but unnatural appearance when the color difference between images is large and diverse. In our approach, this problem is addressed effectively by providing a guided initial solution for the global consistency optimization, which avoids converging to a meaningless integrated solution. First of all, to obtain the reliable intensity correspondences in overlapping regions between image pairs, we creatively propose the histogram extreme point matching algorithm which is robust to image geometrical misalignment to some extents. In the absence of the extra reference information, the guided initial solution is learned from the major tone of the original images by searching some image subset as the reference, whose color characteristics will be transferred to the others via the paths of graph analysis. Thus, the final results via global adjustment will take on a consistent color similar to the appearance of the reference image subset. Several groups of convincing experiments on both the synthetic dataset and the challenging real ones sufficiently demonstrate that the proposed approach can achieve as good or even better results compared with the state-of-the-art approaches.

  4. Bulk Data Movement for Climate Dataset: Efficient Data Transfer Management with Dynamic Transfer Adjustment

    International Nuclear Information System (INIS)

    Sim, Alexander; Balman, Mehmet; Williams, Dean; Shoshani, Arie; Natarajan, Vijaya

    2010-01-01

    Many scientific applications and experiments, such as high energy and nuclear physics, astrophysics, climate observation and modeling, combustion, nano-scale material sciences, and computational biology, generate extreme volumes of data with a large number of files. These data sources are distributed among national and international data repositories, and are shared by large numbers of geographically distributed scientists. A large portion of data is frequently accessed, and a large volume of data is moved from one place to another for analysis and storage. One challenging issue in such efforts is the limited network capacity for moving large datasets to explore and manage. The Bulk Data Mover (BDM), a data transfer management tool in the Earth System Grid (ESG) community, has been managing the massive dataset transfers efficiently with the pre-configured transfer properties in the environment where the network bandwidth is limited. Dynamic transfer adjustment was studied to enhance the BDM to handle significant end-to-end performance changes in the dynamic network environment as well as to control the data transfers for the desired transfer performance. We describe the results from the BDM transfer management for the climate datasets. We also describe the transfer estimation model and results from the dynamic transfer adjustment.

  5. Estimating parameters for probabilistic linkage of privacy-preserved datasets.

    Science.gov (United States)

    Brown, Adrian P; Randall, Sean M; Ferrante, Anna M; Semmens, James B; Boyd, James H

    2017-07-10

    Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher

  6. Viking Seismometer PDS Archive Dataset

    Science.gov (United States)

    Lorenz, R. D.

    2016-12-01

    The Viking Lander 2 seismometer operated successfully for over 500 Sols on the Martian surface, recording at least one likely candidate Marsquake. The Viking mission, in an era when data handling hardware (both on board and on the ground) was limited in capability, predated modern planetary data archiving, and ad-hoc repositories of the data, and the very low-level record at NSSDC, were neither convenient to process nor well-known. In an effort supported by the NASA Mars Data Analysis Program, we have converted the bulk of the Viking dataset (namely the 49,000 and 270,000 records made in High- and Event- modes at 20 and 1 Hz respectively) into a simple ASCII table format. Additionally, since wind-generated lander motion is a major component of the signal, contemporaneous meteorological data are included in summary records to facilitate correlation. These datasets are being archived at the PDS Geosciences Node. In addition to brief instrument and dataset descriptions, the archive includes code snippets in the freely-available language 'R' to demonstrate plotting and analysis. Further, we present examples of lander-generated noise, associated with the sampler arm, instrument dumps and other mechanical operations.

  7. Evidence for the Psychometric Validity, Internal Consistency and Measurement Invariance of Warwick Edinburgh Mental Well-being Scale Scores in Scottish and Irish Adolescents.

    Science.gov (United States)

    McKay, Michael T; Andretta, James R

    2017-09-01

    Mental well-being is an important indicator of current, but also the future health of adolescents. The 14-item Warwick Edinburgh Mental Well-being Scale (WEMWBS) has been well validated in adults world-wide, but less work has been undertaken to examine the psychometric validity and internal consistency of WEMWBS scores in adolescents. In particular, little research has examined scores on the short 7-item version of the WEMWBS. The present study used two large samples of school children in Scotland and Northern Ireland and found that for both forms of the WEMWBS, scores were psychometrically valid, internally consistent, factor saturated, and measurement invariant by country. Using the WEMWBS full form, males reported significantly higher scores than females, and Northern Irish adolescents reported significantly higher scores than their Scottish counterparts. Last, the lowest overall levels of well-being were observed among Scottish females. Copyright © 2017. Published by Elsevier B.V.

  8. Utilizing the Antarctic Master Directory to find orphan datasets

    Science.gov (United States)

    Bonczkowski, J.; Carbotte, S. M.; Arko, R. A.; Grebas, S. K.

    2011-12-01

    While most Antarctic data are housed at an established disciplinary-specific data repository, there are data types for which no suitable repository exists. In some cases, these "orphan" data, without an appropriate national archive, are served from local servers by the principal investigators who produced the data. There are many pitfalls with data served privately, including the frequent lack of adequate documentation to ensure the data can be understood by others for re-use and the impermanence of personal web sites. For example, if an investigator leaves an institution and the data moves, the link published is no longer accessible. To ensure continued availability of data, submission to long-term national data repositories is needed. As stated in the National Science Foundation Office of Polar Programs (NSF/OPP) Guidelines and Award Conditions for Scientific Data, investigators are obligated to submit their data for curation and long-term preservation; this includes the registration of a dataset description into the Antarctic Master Directory (AMD), http://gcmd.nasa.gov/Data/portals/amd/. The AMD is a Web-based, searchable directory of thousands of dataset descriptions, known as DIF records, submitted by scientists from over 20 countries. It serves as a node of the International Directory Network/Global Change Master Directory (IDN/GCMD). The US Antarctic Program Data Coordination Center (USAP-DCC), http://www.usap-data.org/, funded through NSF/OPP, was established in 2007 to help streamline the process of data submission and DIF record creation. When data does not quite fit within any existing disciplinary repository, it can be registered within the USAP-DCC as the fallback data repository. Within the scope of the USAP-DCC we undertook the challenge of discovering and "rescuing" orphan datasets currently registered within the AMD. In order to find which DIF records led to data served privately, all records relating to US data within the AMD were parsed. After

  9. A geospatial database model for the management of remote sensing datasets at multiple spectral, spatial, and temporal scales

    Science.gov (United States)

    Ifimov, Gabriela; Pigeau, Grace; Arroyo-Mora, J. Pablo; Soffer, Raymond; Leblanc, George

    2017-10-01

    In this study the development and implementation of a geospatial database model for the management of multiscale datasets encompassing airborne imagery and associated metadata is presented. To develop the multi-source geospatial database we have used a Relational Database Management System (RDBMS) on a Structure Query Language (SQL) server which was then integrated into ArcGIS and implemented as a geodatabase. The acquired datasets were compiled, standardized, and integrated into the RDBMS, where logical associations between different types of information were linked (e.g. location, date, and instrument). Airborne data, at different processing levels (digital numbers through geocorrected reflectance), were implemented in the geospatial database where the datasets are linked spatially and temporally. An example dataset consisting of airborne hyperspectral imagery, collected for inter and intra-annual vegetation characterization and detection of potential hydrocarbon seepage events over pipeline areas, is presented. Our work provides a model for the management of airborne imagery, which is a challenging aspect of data management in remote sensing, especially when large volumes of data are collected.

  10. Homogenised Australian climate datasets used for climate change monitoring

    International Nuclear Information System (INIS)

    Trewin, Blair; Jones, David; Collins; Dean; Jovanovic, Branislava; Braganza, Karl

    2007-01-01

    Full text: The Australian Bureau of Meteorology has developed a number of datasets for use in climate change monitoring. These datasets typically cover 50-200 stations distributed as evenly as possible over the Australian continent, and have been subject to detailed quality control and homogenisation.The time period over which data are available for each element is largely determined by the availability of data in digital form. Whilst nearly all Australian monthly and daily precipitation data have been digitised, a significant quantity of pre-1957 data (for temperature and evaporation) or pre-1987 data (for some other elements) remains to be digitised, and is not currently available for use in the climate change monitoring datasets. In the case of temperature and evaporation, the start date of the datasets is also determined by major changes in instruments or observing practices for which no adjustment is feasible at the present time. The datasets currently available cover: Monthly and daily precipitation (most stations commence 1915 or earlier, with many extending back to the late 19th century, and a few to the mid-19th century); Annual temperature (commences 1910); Daily temperature (commences 1910, with limited station coverage pre-1957); Twice-daily dewpoint/relative humidity (commences 1957); Monthly pan evaporation (commences 1970); Cloud amount (commences 1957) (Jovanovic etal. 2007). As well as the station-based datasets listed above, an additional dataset being developed for use in climate change monitoring (and other applications) covers tropical cyclones in the Australian region. This is described in more detail in Trewin (2007). The datasets already developed are used in analyses of observed climate change, which are available through the Australian Bureau of Meteorology website (http://www.bom.gov.au/silo/products/cli_chg/). They are also used as a basis for routine climate monitoring, and in the datasets used for the development of seasonal

  11. Developing a Resource for Implementing ArcSWAT Using Global Datasets

    Science.gov (United States)

    Taggart, M.; Caraballo Álvarez, I. O.; Mueller, C.; Palacios, S. L.; Schmidt, C.; Milesi, C.; Palmer-Moloney, L. J.

    2015-12-01

    This project developed a comprehensive user manual outlining methods for adapting and implementing global datasets for use within ArcSWAT for international and worldwide applications. The Soil and Water Assessment Tool (SWAT) is a hydrologic model that looks at a number of hydrologic variables including runoff and the chemical makeup of water at a given location on the Earth's surface using Digital Elevation Models (DEM), land cover, soil, and weather data. However, the application of ArcSWAT for projects outside of the United States is challenging as there is no standard framework for inputting global datasets into ArcSWAT. This project aims to remove this obstacle by outlining methods for adapting and implementing these global datasets via the user manual. The manual takes the user through the processes of data conditioning while providing solutions and suggestions for common errors. The efficacy of the manual was explored using examples from watersheds located in Puerto Rico, Mexico and Western Africa. Each run explored the various options for setting up a ArcSWAT project as well as a range of satellite data products and soil databases. Future work will incorporate in-situ data for validation and calibration of the model and outline additional resources to assist future users in efficiently implementing the model for worldwide applications. The capacity to manage and monitor freshwater availability is of critical importance in both developed and developing countries. As populations grow and climate changes, both the quality and quantity of freshwater are affected resulting in negative impacts on the health of the surrounding population. The use of hydrologic models such as ArcSWAT can help stakeholders and decision makers understand the future impacts of these changes enabling informed and substantiated decisions.

  12. Happiness as stable extraversion : internal consistency reliability and construct validity of the Oxford Happiness Questionnaire among undergraduate students\\ud \\ud

    OpenAIRE

    Robbins, Mandy; Francis, Leslie J.; Edwards, Bethan

    2010-01-01

    The Oxford Happiness Questionnaire (OHQ) was developed by Hills and Argyle (2002) to provide a more accessible equivalent measure of the Oxford Happiness Inventory (OHI). The aim of the present study was to examine the internal consistency reliability, and construct validity of this new instrument alongside the Eysenckian dimensional model of personality. The Oxford Happiness Questionnaire was completed by a sample of 131 undergraduate students together with the abbreviated form of the Revise...

  13. The case for developing publicly-accessible datasets for health services research in the Middle East and North Africa (MENA region

    Directory of Open Access Journals (Sweden)

    El-Jardali Fadi

    2009-10-01

    Full Text Available Abstract Background The existence of publicly-accessible datasets comprised a significant opportunity for health services research to evolve into a science that supports health policy making and evaluation, proper inter- and intra-organizational decisions and optimal clinical interventions. This paper investigated the role of publicly-accessible datasets in the enhancement of health care systems in the developed world and highlighted the importance of their wide existence and use in the Middle East and North Africa (MENA region. Discussion A search was conducted to explore the availability of publicly-accessible datasets in the MENA region. Although datasets were found in most countries in the region, those were limited in terms of their relevance, quality and public-accessibility. With rare exceptions, publicly-accessible datasets - as present in the developed world - were absent. Based on this, we proposed a gradual approach and a set of recommendations to promote the development and use of publicly-accessible datasets in the region. These recommendations target potential actions by governments, researchers, policy makers and international organizations. Summary We argue that the limited number of publicly-accessible datasets in the MENA region represents a lost opportunity for the evidence-based advancement of health systems in the region. The availability and use of publicly-accessible datasets would encourage policy makers in this region to base their decisions on solid representative data and not on estimates or small-scale studies; researchers would be able to exercise their expertise in a meaningful manner to both, policy makers and the public. The population of the MENA countries would exercise the right to benefit from locally- or regionally-based studies, versus imported and in 'best cases' customized ones. Furthermore, on a macro scale, the availability of regionally comparable publicly-accessible datasets would allow for the

  14. Data Mining for Imbalanced Datasets: An Overview

    Science.gov (United States)

    Chawla, Nitesh V.

    A dataset is imbalanced if the classification categories are not approximately equally represented. Recent years brought increased interest in applying machine learning techniques to difficult "real-world" problems, many of which are characterized by imbalanced data. Additionally the distribution of the testing data may differ from that of the training data, and the true misclassification costs may be unknown at learning time. Predictive accuracy, a popular choice for evaluating performance of a classifier, might not be appropriate when the data is imbalanced and/or the costs of different errors vary markedly. In this Chapter, we discuss some of the sampling techniques used for balancing the datasets, and the performance measures more appropriate for mining imbalanced datasets.

  15. Performance evaluation of tile-based Fisher Ratio analysis using a benchmark yeast metabolome dataset.

    Science.gov (United States)

    Watson, Nathanial E; Parsons, Brendon A; Synovec, Robert E

    2016-08-12

    Performance of tile-based Fisher Ratio (F-ratio) data analysis, recently developed for discovery-based studies using comprehensive two-dimensional gas chromatography coupled with time-of-flight mass spectrometry (GC×GC-TOFMS), is evaluated with a metabolomics dataset that had been previously analyzed in great detail, but while taking a brute force approach. The previously analyzed data (referred to herein as the benchmark dataset) were intracellular extracts from Saccharomyces cerevisiae (yeast), either metabolizing glucose (repressed) or ethanol (derepressed), which define the two classes in the discovery-based analysis to find metabolites that are statistically different in concentration between the two classes. Beneficially, this previously analyzed dataset provides a concrete means to validate the tile-based F-ratio software. Herein, we demonstrate and validate the significant benefits of applying tile-based F-ratio analysis. The yeast metabolomics data are analyzed more rapidly in about one week versus one year for the prior studies with this dataset. Furthermore, a null distribution analysis is implemented to statistically determine an adequate F-ratio threshold, whereby the variables with F-ratio values below the threshold can be ignored as not class distinguishing, which provides the analyst with confidence when analyzing the hit table. Forty-six of the fifty-four benchmarked changing metabolites were discovered by the new methodology while consistently excluding all but one of the benchmarked nineteen false positive metabolites previously identified. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. A hybrid organic-inorganic perovskite dataset

    Science.gov (United States)

    Kim, Chiho; Huan, Tran Doan; Krishnan, Sridevi; Ramprasad, Rampi

    2017-05-01

    Hybrid organic-inorganic perovskites (HOIPs) have been attracting a great deal of attention due to their versatility of electronic properties and fabrication methods. We prepare a dataset of 1,346 HOIPs, which features 16 organic cations, 3 group-IV cations and 4 halide anions. Using a combination of an atomic structure search method and density functional theory calculations, the optimized structures, the bandgap, the dielectric constant, and the relative energies of the HOIPs are uniformly prepared and validated by comparing with relevant experimental and/or theoretical data. We make the dataset available at Dryad Digital Repository, NoMaD Repository, and Khazana Repository (http://khazana.uconn.edu/), hoping that it could be useful for future data-mining efforts that can explore possible structure-property relationships and phenomenological models. Progressive extension of the dataset is expected as new organic cations become appropriate within the HOIP framework, and as additional properties are calculated for the new compounds found.

  17. Genomics dataset of unidentified disclosed isolates

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-09-01

    Full Text Available Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis. Keywords: BioLABs, Blunt ends, Genomics, NEB cutter, Restriction digestion, Short DNA sequences, Sticky ends

  18. A dynamic Thurstonian item response theory of motive expression in the picture story exercise: solving the internal consistency paradox of the PSE.

    Science.gov (United States)

    Lang, Jonas W B

    2014-07-01

    The measurement of implicit or unconscious motives using the picture story exercise (PSE) has long been a target of debate in the psychological literature. Most debates have centered on the apparent paradox that PSE measures of implicit motives typically show low internal consistency reliability on common indices like Cronbach's alpha but nevertheless predict behavioral outcomes. I describe a dynamic Thurstonian item response theory (IRT) model that builds on dynamic system theories of motivation, theorizing on the PSE response process, and recent advancements in Thurstonian IRT modeling of choice data. To assess the models' capability to explain the internal consistency paradox, I first fitted the model to archival data (Gurin, Veroff, & Feld, 1957) and then simulated data based on bias-corrected model estimates from the real data. Simulation results revealed that the average squared correlation reliability for the motives in the Thurstonian IRT model was .74 and that Cronbach's alpha values were similar to the real data (value of extant evidence from motivational research using PSE motive measures. (c) 2014 APA, all rights reserved.

  19. The Iranian version of 12-item Short Form Health Survey (SF-12: factor structure, internal consistency and construct validity

    Directory of Open Access Journals (Sweden)

    Mousavi Sayed

    2009-09-01

    Full Text Available Abstract Background The 12-item Short Form Health Survey (SF-12 as a shorter alternative of the SF-36 is largely used in health outcomes surveys. The aim of this study was to validate the SF-12 in Iran. Methods A random sample of the general population aged 15 years and over living in Tehran, Iran completed the SF-12. Reliability was estimated using internal consistency and validity was assessed using known groups comparison and convergent validity. In addition, the factor structure of the questionnaire was extracted by performing both exploratory factor analysis (EFA and confirmatory factor analysis (CFA. Results: In all, 5587 individuals were studied (2721 male and 2866 female. The mean age and formal education of the respondents were 35.1 (SD = 15.4 and 10.2 (SD = 4.4 years respectively. The results showed satisfactory internal consistency for both summary measures, that are the Physical Component Summary (PCS and the Mental Component Summary (MCS; Cronbach's α for PCS-12 and MCS-12 was 0.73 and 0.72, respectively. Known-groups comparison showed that the SF-12 discriminated well between men and women and those who differed in age and educational status (P Conclusion In general the findings suggest that the SF-12 is a reliable and valid measure of health related quality of life among Iranian population. However, further studies are needed to establish stronger psychometric properties for this alternative form of the SF-36 Health Survey in Iran.

  20. On the visualization of water-related big data: extracting insights from drought proxies' datasets

    Science.gov (United States)

    Diaz, Vitali; Corzo, Gerald; van Lanen, Henny A. J.; Solomatine, Dimitri

    2017-04-01

    Big data is a growing area of science where hydroinformatics can benefit largely. There have been a number of important developments in the area of data science aimed at analysis of large datasets. Such datasets related to water include measurements, simulations, reanalysis, scenario analyses and proxies. By convention, information contained in these databases is referred to a specific time and a space (i.e., longitude/latitude). This work is motivated by the need to extract insights from large water-related datasets, i.e., transforming large amounts of data into useful information that helps to better understand of water-related phenomena, particularly about drought. In this context, data visualization, part of data science, involves techniques to create and to communicate data by encoding it as visual graphical objects. They may help to better understand data and detect trends. Base on existing methods of data analysis and visualization, this work aims to develop tools for visualizing water-related large datasets. These tools were developed taking advantage of existing libraries for data visualization into a group of graphs which include both polar area diagrams (PADs) and radar charts (RDs). In both graphs, time steps are represented by the polar angles and the percentages of area in drought by the radios. For illustration, three large datasets of drought proxies are chosen to identify trends, prone areas and spatio-temporal variability of drought in a set of case studies. The datasets are (1) SPI-TS2p1 (1901-2002, 11.7 GB), (2) SPI-PRECL0p5 (1948-2016, 7.91 GB) and (3) SPEI-baseV2.3 (1901-2013, 15.3 GB). All of them are on a monthly basis and with a spatial resolution of 0.5 degrees. First two were retrieved from the repository of the International Research Institute for Climate and Society (IRI). They are included into the Analyses Standardized Precipitation Index (SPI) project (iridl.ldeo.columbia.edu/SOURCES/.IRI/.Analyses/.SPI/). The third dataset was

  1. Employment Growth and International Trade

    DEFF Research Database (Denmark)

    Ibsen, Rikke; Warzynski, Frederic; Westergård-Nielsen, Niels Chr.

    In this paper, we use a detailed dataset containing information about all international trade transactions of the population of Danish ?rms over more than a decade to analyze the relationship between export and import decisions and employment growth. We further distinguish between imports of ?nal...

  2. 26 CFR 1.338-8 - Asset and stock consistency.

    Science.gov (United States)

    2010-04-01

    ... that are controlled foreign corporations. (6) Stock consistency. This section limits the application of... 26 Internal Revenue 4 2010-04-01 2010-04-01 false Asset and stock consistency. 1.338-8 Section 1... (CONTINUED) INCOME TAXES Effects on Corporation § 1.338-8 Asset and stock consistency. (a) Introduction—(1...

  3. Omicseq: a web-based search engine for exploring omics datasets

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S.; Xu, Tianlei; Chen, Li; Zwick, Michael E.; Jiang, Xiaoqian; Wang, Fusheng

    2017-01-01

    Abstract The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve ‘findability’ of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. PMID:28402462

  4. Age, Gender, and Fine-Grained Ethnicity Prediction using Convolutional Neural Networks for the East Asian Face Dataset

    Energy Technology Data Exchange (ETDEWEB)

    Srinivas, Nisha [ORNL; Rose, Derek C [ORNL; Bolme, David S [ORNL; Mahalingam, Gayathri [ORNL; Atwal, Harleen [ORNL; Ricanek, Karl [ORNL

    2017-01-01

    This paper examines the difficulty associated with performing machine-based automatic demographic prediction on a sub-population of Asian faces. We introduce the Wild East Asian Face dataset (WEAFD), a new and unique dataset to the research community. This dataset consists primarily of labeled face images of individuals from East Asian countries, including Vietnam, Burma, Thailand, China, Korea, Japan, Indonesia, and Malaysia. East Asian turk annotators were uniquely used to judge the age and fine grain ethnicity attributes to reduce the impact of the other race effect and improve quality of annotations. We focus on predicting age, gender and fine-grained ethnicity of an individual by providing baseline results with a convolutional neural network (CNN). Finegrained ethnicity prediction refers to predicting ethnicity of an individual by country or sub-region (Chinese, Japanese, Korean, etc.) of the East Asian continent. Performance for two CNN architectures is presented, highlighting the difficulty of these tasks and showcasing potential design considerations that ease network optimization by promoting region based feature extraction.

  5. Nanoparticle-organic pollutant interaction dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Dataset presents concentrations of organic pollutants, such as polyaromatic hydrocarbon compounds, in water samples. Water samples of known volume and concentration...

  6. Segmentation of teeth in CT volumetric dataset by panoramic projection and variational level set

    Energy Technology Data Exchange (ETDEWEB)

    Hosntalab, Mohammad [Islamic Azad University, Faculty of Engineering, Science and Research Branch, Tehran (Iran); Aghaeizadeh Zoroofi, Reza [University of Tehran, Control and Intelligent Processing Center of Excellence, School of Electrical and Computer Engineering, College of Engineering, Tehran (Iran); Abbaspour Tehrani-Fard, Ali [Islamic Azad University, Faculty of Engineering, Science and Research Branch, Tehran (Iran); Sharif University of Technology, Department of Electrical Engineering, Tehran (Iran); Shirani, Gholamreza [Faculty of Dentistry Medical Science of Tehran University, Oral and Maxillofacial Surgery Department, Tehran (Iran)

    2008-09-15

    Quantification of teeth is of clinical importance for various computer assisted procedures such as dental implant, orthodontic planning, face, jaw and cosmetic surgeries. In this regard, segmentation is a major step. In this paper, we propose a method for segmentation of teeth in volumetric computed tomography (CT) data using panoramic re-sampling of the dataset in the coronal view and variational level set. The proposed method consists of five steps as follows: first, we extract a mask in a CT images using Otsu thresholding. Second, the teeth are segmented from other bony tissues by utilizing anatomical knowledge of teeth in the jaws. Third, the proposed method is followed by estimating the arc of the upper and lower jaws and panoramic re-sampling of the dataset. Separation of upper and lower jaws and initial segmentation of teeth are performed by employing the horizontal and vertical projections of the panoramic dataset, respectively. Based the above mentioned procedures an initial mask for each tooth is obtained. Finally, we utilize the initial mask of teeth and apply a Variational level set to refine initial teeth boundaries to final contours. The proposed algorithm was evaluated in the presence of 30 multi-slice CT datasets including 3,600 images. Experimental results reveal the effectiveness of the proposed method. In the proposed algorithm, the variational level set technique was utilized to trace the contour of the teeth. In view of the fact that, this technique is based on the characteristic of the overall region of the teeth image, it is possible to extract a very smooth and accurate tooth contour using this technique. In the presence of the available datasets, the proposed technique was successful in teeth segmentation compared to previous techniques. (orig.)

  7. Segmentation of teeth in CT volumetric dataset by panoramic projection and variational level set

    International Nuclear Information System (INIS)

    Hosntalab, Mohammad; Aghaeizadeh Zoroofi, Reza; Abbaspour Tehrani-Fard, Ali; Shirani, Gholamreza

    2008-01-01

    Quantification of teeth is of clinical importance for various computer assisted procedures such as dental implant, orthodontic planning, face, jaw and cosmetic surgeries. In this regard, segmentation is a major step. In this paper, we propose a method for segmentation of teeth in volumetric computed tomography (CT) data using panoramic re-sampling of the dataset in the coronal view and variational level set. The proposed method consists of five steps as follows: first, we extract a mask in a CT images using Otsu thresholding. Second, the teeth are segmented from other bony tissues by utilizing anatomical knowledge of teeth in the jaws. Third, the proposed method is followed by estimating the arc of the upper and lower jaws and panoramic re-sampling of the dataset. Separation of upper and lower jaws and initial segmentation of teeth are performed by employing the horizontal and vertical projections of the panoramic dataset, respectively. Based the above mentioned procedures an initial mask for each tooth is obtained. Finally, we utilize the initial mask of teeth and apply a Variational level set to refine initial teeth boundaries to final contours. The proposed algorithm was evaluated in the presence of 30 multi-slice CT datasets including 3,600 images. Experimental results reveal the effectiveness of the proposed method. In the proposed algorithm, the variational level set technique was utilized to trace the contour of the teeth. In view of the fact that, this technique is based on the characteristic of the overall region of the teeth image, it is possible to extract a very smooth and accurate tooth contour using this technique. In the presence of the available datasets, the proposed technique was successful in teeth segmentation compared to previous techniques. (orig.)

  8. A new bed elevation dataset for Greenland

    Directory of Open Access Journals (Sweden)

    J. L. Bamber

    2013-03-01

    Full Text Available We present a new bed elevation dataset for Greenland derived from a combination of multiple airborne ice thickness surveys undertaken between the 1970s and 2012. Around 420 000 line kilometres of airborne data were used, with roughly 70% of this having been collected since the year 2000, when the last comprehensive compilation was undertaken. The airborne data were combined with satellite-derived elevations for non-glaciated terrain to produce a consistent bed digital elevation model (DEM over the entire island including across the glaciated–ice free boundary. The DEM was extended to the continental margin with the aid of bathymetric data, primarily from a compilation for the Arctic. Ice thickness was determined where an ice shelf exists from a combination of surface elevation and radar soundings. The across-track spacing between flight lines warranted interpolation at 1 km postings for significant sectors of the ice sheet. Grids of ice surface elevation, error estimates for the DEM, ice thickness and data sampling density were also produced alongside a mask of land/ocean/grounded ice/floating ice. Errors in bed elevation range from a minimum of ±10 m to about ±300 m, as a function of distance from an observation and local topographic variability. A comparison with the compilation published in 2001 highlights the improvement in resolution afforded by the new datasets, particularly along the ice sheet margin, where ice velocity is highest and changes in ice dynamics most marked. We estimate that the volume of ice included in our land-ice mask would raise mean sea level by 7.36 m, excluding any solid earth effects that would take place during ice sheet decay.

  9. Large-scale Labeled Datasets to Fuel Earth Science Deep Learning Applications

    Science.gov (United States)

    Maskey, M.; Ramachandran, R.; Miller, J.

    2017-12-01

    Deep learning has revolutionized computer vision and natural language processing with various algorithms scaled using high-performance computing. However, generic large-scale labeled datasets such as the ImageNet are the fuel that drives the impressive accuracy of deep learning results. Large-scale labeled datasets already exist in domains such as medical science, but creating them in the Earth science domain is a challenge. While there are ways to apply deep learning using limited labeled datasets, there is a need in the Earth sciences for creating large-scale labeled datasets for benchmarking and scaling deep learning applications. At the NASA Marshall Space Flight Center, we are using deep learning for a variety of Earth science applications where we have encountered the need for large-scale labeled datasets. We will discuss our approaches for creating such datasets and why these datasets are just as valuable as deep learning algorithms. We will also describe successful usage of these large-scale labeled datasets with our deep learning based applications.

  10. An Affinity Propagation Clustering Algorithm for Mixed Numeric and Categorical Datasets

    Directory of Open Access Journals (Sweden)

    Kang Zhang

    2014-01-01

    Full Text Available Clustering has been widely used in different fields of science, technology, social science, and so forth. In real world, numeric as well as categorical features are usually used to describe the data objects. Accordingly, many clustering methods can process datasets that are either numeric or categorical. Recently, algorithms that can handle the mixed data clustering problems have been developed. Affinity propagation (AP algorithm is an exemplar-based clustering method which has demonstrated good performance on a wide variety of datasets. However, it has limitations on processing mixed datasets. In this paper, we propose a novel similarity measure for mixed type datasets and an adaptive AP clustering algorithm is proposed to cluster the mixed datasets. Several real world datasets are studied to evaluate the performance of the proposed algorithm. Comparisons with other clustering algorithms demonstrate that the proposed method works well not only on mixed datasets but also on pure numeric and categorical datasets.

  11. Using Multiple Big Datasets and Machine Learning to Produce a New Global Particulate Dataset: A Technology Challenge Case Study

    Science.gov (United States)

    Lary, D. J.

    2013-12-01

    A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.

  12. Chemical product and function dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — Merged product weight fraction and chemical function data. This dataset is associated with the following publication: Isaacs , K., M. Goldsmith, P. Egeghy , K....

  13. General Purpose Multimedia Dataset - GarageBand 2008

    DEFF Research Database (Denmark)

    Meng, Anders

    This document describes a general purpose multimedia data-set to be used in cross-media machine learning problems. In more detail we describe the genre taxonomy applied at http://www.garageband.com, from where the data-set was collected, and how the taxonomy have been fused into a more human...... understandable taxonomy. Finally, a description of various features extracted from both the audio and text are presented....

  14. Omicseq: a web-based search engine for exploring omics datasets.

    Science.gov (United States)

    Sun, Xiaobo; Pittard, William S; Xu, Tianlei; Chen, Li; Zwick, Michael E; Jiang, Xiaoqian; Wang, Fusheng; Qin, Zhaohui S

    2017-07-03

    The development and application of high-throughput genomics technologies has resulted in massive quantities of diverse omics data that continue to accumulate rapidly. These rich datasets offer unprecedented and exciting opportunities to address long standing questions in biomedical research. However, our ability to explore and query the content of diverse omics data is very limited. Existing dataset search tools rely almost exclusively on the metadata. A text-based query for gene name(s) does not work well on datasets wherein the vast majority of their content is numeric. To overcome this barrier, we have developed Omicseq, a novel web-based platform that facilitates the easy interrogation of omics datasets holistically to improve 'findability' of relevant data. The core component of Omicseq is trackRank, a novel algorithm for ranking omics datasets that fully uses the numerical content of the dataset to determine relevance to the query entity. The Omicseq system is supported by a scalable and elastic, NoSQL database that hosts a large collection of processed omics datasets. In the front end, a simple, web-based interface allows users to enter queries and instantly receive search results as a list of ranked datasets deemed to be the most relevant. Omicseq is freely available at http://www.omicseq.org. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Quantifying uncertainty in observational rainfall datasets

    Science.gov (United States)

    Lennard, Chris; Dosio, Alessandro; Nikulin, Grigory; Pinto, Izidine; Seid, Hussen

    2015-04-01

    The CO-ordinated Regional Downscaling Experiment (CORDEX) has to date seen the publication of at least ten journal papers that examine the African domain during 2012 and 2013. Five of these papers consider Africa generally (Nikulin et al. 2012, Kim et al. 2013, Hernandes-Dias et al. 2013, Laprise et al. 2013, Panitz et al. 2013) and five have regional foci: Tramblay et al. (2013) on Northern Africa, Mariotti et al. (2014) and Gbobaniyi el al. (2013) on West Africa, Endris et al. (2013) on East Africa and Kalagnoumou et al. (2013) on southern Africa. There also are a further three papers that the authors know about under review. These papers all use an observed rainfall and/or temperature data to evaluate/validate the regional model output and often proceed to assess projected changes in these variables due to climate change in the context of these observations. The most popular reference rainfall data used are the CRU, GPCP, GPCC, TRMM and UDEL datasets. However, as Kalagnoumou et al. (2013) point out there are many other rainfall datasets available for consideration, for example, CMORPH, FEWS, TAMSAT & RIANNAA, TAMORA and the WATCH & WATCH-DEI data. They, with others (Nikulin et al. 2012, Sylla et al. 2012) show that the observed datasets can have a very wide spread at a particular space-time coordinate. As more ground, space and reanalysis-based rainfall products become available, all which use different methods to produce precipitation data, the selection of reference data is becoming an important factor in model evaluation. A number of factors can contribute to a uncertainty in terms of the reliability and validity of the datasets such as radiance conversion algorithims, the quantity and quality of available station data, interpolation techniques and blending methods used to combine satellite and guage based products. However, to date no comprehensive study has been performed to evaluate the uncertainty in these observational datasets. We assess 18 gridded

  16. Modelling and analysis of turbulent datasets using Auto Regressive Moving Average processes

    International Nuclear Information System (INIS)

    Faranda, Davide; Dubrulle, Bérengère; Daviaud, François; Pons, Flavio Maria Emanuele; Saint-Michel, Brice; Herbert, Éric; Cortet, Pierre-Philippe

    2014-01-01

    We introduce a novel way to extract information from turbulent datasets by applying an Auto Regressive Moving Average (ARMA) statistical analysis. Such analysis goes well beyond the analysis of the mean flow and of the fluctuations and links the behavior of the recorded time series to a discrete version of a stochastic differential equation which is able to describe the correlation structure in the dataset. We introduce a new index Υ that measures the difference between the resulting analysis and the Obukhov model of turbulence, the simplest stochastic model reproducing both Richardson law and the Kolmogorov spectrum. We test the method on datasets measured in a von Kármán swirling flow experiment. We found that the ARMA analysis is well correlated with spatial structures of the flow, and can discriminate between two different flows with comparable mean velocities, obtained by changing the forcing. Moreover, we show that the Υ is highest in regions where shear layer vortices are present, thereby establishing a link between deviations from the Kolmogorov model and coherent structures. These deviations are consistent with the ones observed by computing the Hurst exponents for the same time series. We show that some salient features of the analysis are preserved when considering global instead of local observables. Finally, we analyze flow configurations with multistability features where the ARMA technique is efficient in discriminating different stability branches of the system

  17. Making of a solar spectral irradiance dataset I: observations, uncertainties, and methods

    Directory of Open Access Journals (Sweden)

    Schöll Micha

    2016-01-01

    Full Text Available Context. Changes in the spectral solar irradiance (SSI are a key driver of the variability of the Earth’s environment, strongly affecting the upper atmosphere, but also impacting climate. However, its measurements have been sparse and of different quality. The “First European Comprehensive Solar Irradiance Data Exploitation project” (SOLID aims at merging the complete set of European irradiance data, complemented by archive data that include data from non-European missions. Aims. As part of SOLID, we present all available space-based SSI measurements, reference spectra, and relevant proxies in a unified format with regular temporal re-gridding, interpolation, gap-filling as well as associated uncertainty estimations. Methods. We apply a coherent methodology to all available SSI datasets. Our pipeline approach consists of the pre-processing of the data, the interpolation of missing data by utilizing the spectral coherency of SSI, the temporal re-gridding of the data, an instrumental outlier detection routine, and a proxy-based interpolation for missing and flagged values. In particular, to detect instrumental outliers, we combine an autoregressive model with proxy data. We independently estimate the precision and stability of each individual dataset and flag all changes due to processing in an accompanying quality mask. Results. We present a unified database of solar activity records with accompanying meta-data and uncertainties. Conclusions. This dataset can be used for further investigations of the long-term trend of solar activity and the construction of a homogeneous SSI record.

  18. A contribution to semantic indexing and retrieval based on FCA - An application to song datasets

    OpenAIRE

    Codocedo , Victor; Lykourentzou , Ioanna; Napoli , Amedeo

    2012-01-01

    International audience; Semantic indexing and retrieval is an important research area, as the available amount of information on the Web is growing more and more. In this paper, we introduce an original approach to semantic indexing and retrieval based on Formal Concept Analysis. The concept lattice is used as a semantic index and we propose an original algorithm for traversing the lattice and answering user queries. This framework has been used and evaluated on a song dataset.

  19. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

    OpenAIRE

    Lemaitre , Guillaume; Nogueira , Fernando; Aridas , Christos ,

    2017-01-01

    International audience; imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over-and under-sampling, and (iv) ensemble learning methods. The proposed toolbox depends only on numpy, scipy, and scikit-learn...

  20. Turkey Run Landfill Emissions Dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — landfill emissions measurements for the Turkey run landfill in Georgia. This dataset is associated with the following publication: De la Cruz, F., R. Green, G....

  1. The StreamCat Dataset: Accumulated Attributes for NHDPlusV2 Catchments (Version 2.1) for the Conterminous United States: Predicted Biological Condition

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset consists of predicted probabilities of good biological condition based in the US EPA 2008/2009 National Rivers and Streams Assessment (NRSA). NRSA...

  2. Topic modeling for cluster analysis of large biological and medical datasets.

    Science.gov (United States)

    Zhao, Weizhong; Zou, Wen; Chen, James J

    2014-01-01

    The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors. Although multivariate techniques such as cluster analysis may allow researchers to identify groups, or clusters, of related variables, the accuracies and effectiveness of traditional clustering methods diminish for large and hyper dimensional datasets. Topic modeling is an active research field in machine learning and has been mainly used as an analytical tool to structure large textual corpora for data mining. Its ability to reduce high dimensionality to a small number of latent variables makes it suitable as a means for clustering or overcoming clustering difficulties in large biological and medical datasets. In this study, three topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, are proposed and tested on the cluster analysis of three large datasets: Salmonella pulsed-field gel electrophoresis (PFGE) dataset, lung cancer dataset, and breast cancer dataset, which represent various types of large biological or medical datasets. All three various methods are shown to improve the efficacy/effectiveness of clustering results on the three datasets in comparison to traditional methods. A preferable cluster analysis method emerged for each of the three datasets on the basis of replicating known biological truths. Topic modeling could be advantageously applied to the large datasets of biological or medical research. The three proposed topic model-derived clustering methods, highest probable topic assignment, feature selection and feature extraction, yield clustering improvements for the three different data types. Clusters more efficaciously represent truthful groupings and subgroupings in the data than traditional methods, suggesting

  3. An Analysis of the GTZAN Music Genre Dataset

    DEFF Research Database (Denmark)

    Sturm, Bob L.

    2012-01-01

    Most research in automatic music genre recognition has used the dataset assembled by Tzanetakis et al. in 2001. The composition and integrity of this dataset, however, has never been formally analyzed. For the first time, we provide an analysis of its composition, and create a machine...

  4. Personality Predicts Mortality Risk: An Integrative Data Analysis of 15 International Longitudinal Studies.

    Science.gov (United States)

    Graham, Eileen K; Rutsohn, Joshua P; Turiano, Nicholas A; Bendayan, Rebecca; Batterham, Philip J; Gerstorf, Denis; Katz, Mindy J; Reynolds, Chandra A; Sharp, Emily S; Yoneda, Tomiko B; Bastarache, Emily D; Elleman, Lorien G; Zelinski, Elizabeth M; Johansson, Boo; Kuh, Diana; Barnes, Lisa L; Bennett, David A; Deeg, Dorly J H; Lipton, Richard B; Pedersen, Nancy L; Piccinin, Andrea M; Spiro, Avron; Muniz-Terrera, Graciela; Willis, Sherry L; Schaie, K Warner; Roan, Carol; Herd, Pamela; Hofer, Scott M; Mroczek, Daniel K

    2017-10-01

    This study examined the Big Five personality traits as predictors of mortality risk, and smoking as a mediator of that association. Replication was built into the fabric of our design: we used a Coordinated Analysis with 15 international datasets, representing 44,094 participants. We found that high neuroticism and low conscientiousness, extraversion, and agreeableness were consistent predictors of mortality across studies. Smoking had a small mediating effect for neuroticism. Country and baseline age explained variation in effects: studies with older baseline age showed a pattern of protective effects (HReffects for extraversion. This study demonstrated coordinated analysis as a powerful approach to enhance replicability and reproducibility, especially for aging-related longitudinal research.

  5. Dataset definition for CMS operations and physics analyses

    Science.gov (United States)

    Franzoni, Giovanni; Compact Muon Solenoid Collaboration

    2016-04-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets and secondary datasets/dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concepts of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the LHC run I, and we discuss the plans for the run II.

  6. Dataset definition for CMS operations and physics analyses

    CERN Document Server

    AUTHOR|(CDS)2051291

    2016-01-01

    Data recorded at the CMS experiment are funnelled into streams, integrated in the HLT menu, and further organised in a hierarchical structure of primary datasets, secondary datasets, and dedicated skims. Datasets are defined according to the final-state particles reconstructed by the high level trigger, the data format and the use case (physics analysis, alignment and calibration, performance studies). During the first LHC run, new workflows have been added to this canonical scheme, to exploit at best the flexibility of the CMS trigger and data acquisition systems. The concept of data parking and data scouting have been introduced to extend the physics reach of CMS, offering the opportunity of defining physics triggers with extremely loose selections (e.g. dijet resonance trigger collecting data at a 1 kHz). In this presentation, we review the evolution of the dataset definition during the first run, and we discuss the plans for the second LHC run.

  7. Dataset of NRDA emission data

    Data.gov (United States)

    U.S. Environmental Protection Agency — Emissions data from open air oil burns. This dataset is associated with the following publication: Gullett, B., J. Aurell, A. Holder, B. Mitchell, D. Greenwell, M....

  8. Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.

    Science.gov (United States)

    Kohli, Marc D; Summers, Ronald M; Geis, J Raymond

    2017-08-01

    At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  9. Discovery and Reuse of Open Datasets: An Exploratory Study

    Directory of Open Access Journals (Sweden)

    Sara

    2016-07-01

    Full Text Available Objective: This article analyzes twenty cited or downloaded datasets and the repositories that house them, in order to produce insights that can be used by academic libraries to encourage discovery and reuse of research data in institutional repositories. Methods: Using Thomson Reuters’ Data Citation Index and repository download statistics, we identified twenty cited/downloaded datasets. We documented the characteristics of the cited/downloaded datasets and their corresponding repositories in a self-designed rubric. The rubric includes six major categories: basic information; funding agency and journal information; linking and sharing; factors to encourage reuse; repository characteristics; and data description. Results: Our small-scale study suggests that cited/downloaded datasets generally comply with basic recommendations for facilitating reuse: data are documented well; formatted for use with a variety of software; and shared in established, open access repositories. Three significant factors also appear to contribute to dataset discovery: publishing in discipline-specific repositories; indexing in more than one location on the web; and using persistent identifiers. The cited/downloaded datasets in our analysis came from a few specific disciplines, and tended to be funded by agencies with data publication mandates. Conclusions: The results of this exploratory research provide insights that can inform academic librarians as they work to encourage discovery and reuse of institutional datasets. Our analysis also suggests areas in which academic librarians can target open data advocacy in their communities in order to begin to build open data success stories that will fuel future advocacy efforts.

  10. The SAIL databank: linking multiple health and social care datasets

    Directory of Open Access Journals (Sweden)

    Ford David V

    2009-01-01

    Full Text Available Abstract Background Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Methods Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR, to assess the efficacy of this process, and the optimum matching technique. Results The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL at the 50% threshold, and error rates were Conclusion With the infrastructure that has been put in place, the reliable matching process that has been developed enables an ALF to be consistently allocated to records in the databank. The SAIL databank represents a research-ready platform for record-linkage studies.

  11. Visualization of conserved structures by fusing highly variable datasets.

    Science.gov (United States)

    Silverstein, Jonathan C; Chhadia, Ankur; Dech, Fred

    2002-01-01

    Skill, effort, and time are required to identify and visualize anatomic structures in three-dimensions from radiological data. Fundamentally, automating these processes requires a technique that uses symbolic information not in the dynamic range of the voxel data. We were developing such a technique based on mutual information for automatic multi-modality image fusion (MIAMI Fuse, University of Michigan). This system previously demonstrated facility at fusing one voxel dataset with integrated symbolic structure information to a CT dataset (different scale and resolution) from the same person. The next step of development of our technique was aimed at accommodating the variability of anatomy from patient to patient by using warping to fuse our standard dataset to arbitrary patient CT datasets. A standard symbolic information dataset was created from the full color Visible Human Female by segmenting the liver parenchyma, portal veins, and hepatic veins and overwriting each set of voxels with a fixed color. Two arbitrarily selected patient CT scans of the abdomen were used for reference datasets. We used the warping functions in MIAMI Fuse to align the standard structure data to each patient scan. The key to successful fusion was the focused use of multiple warping control points that place themselves around the structure of interest automatically. The user assigns only a few initial control points to align the scans. Fusion 1 and 2 transformed the atlas with 27 points around the liver to CT1 and CT2 respectively. Fusion 3 transformed the atlas with 45 control points around the liver to CT1 and Fusion 4 transformed the atlas with 5 control points around the portal vein. The CT dataset is augmented with the transformed standard structure dataset, such that the warped structure masks are visualized in combination with the original patient dataset. This combined volume visualization is then rendered interactively in stereo on the ImmersaDesk in an immersive Virtual

  12. Dataset - Adviesregel PPL 2010

    NARCIS (Netherlands)

    Evert, van F.K.; Schans, van der D.A.; Geel, van W.C.A.; Slabbekoorn, J.J.; Booij, R.; Jukema, J.N.; Meurs, E.J.J.; Uenk, D.

    2011-01-01

    This dataset contains experimental data from a number of field experiments with potato in The Netherlands (Van Evert et al., 2011). The data are presented as an SQL dump of a PostgreSQL database (version 8.4.4). An outline of the entity-relationship diagram of the database is given in an

  13. Tension in the recent Type Ia supernovae datasets

    International Nuclear Information System (INIS)

    Wei, Hao

    2010-01-01

    In the present work, we investigate the tension in the recent Type Ia supernovae (SNIa) datasets Constitution and Union. We show that they are in tension not only with the observations of the cosmic microwave background (CMB) anisotropy and the baryon acoustic oscillations (BAO), but also with other SNIa datasets such as Davis and SNLS. Then, we find the main sources responsible for the tension. Further, we make this more robust by employing the method of random truncation. Based on the results of this work, we suggest two truncated versions of the Union and Constitution datasets, namely the UnionT and ConstitutionT SNIa samples, whose behaviors are more regular.

  14. Viability of Controlling Prosthetic Hand Utilizing Electroencephalograph (EEG) Dataset Signal

    Science.gov (United States)

    Miskon, Azizi; A/L Thanakodi, Suresh; Raihan Mazlan, Mohd; Mohd Haziq Azhar, Satria; Nooraya Mohd Tawil, Siti

    2016-11-01

    This project presents the development of an artificial hand controlled by Electroencephalograph (EEG) signal datasets for the prosthetic application. The EEG signal datasets were used as to improvise the way to control the prosthetic hand compared to the Electromyograph (EMG). The EMG has disadvantages to a person, who has not used the muscle for a long time and also to person with degenerative issues due to age factor. Thus, the EEG datasets found to be an alternative for EMG. The datasets used in this work were taken from Brain Computer Interface (BCI) Project. The datasets were already classified for open, close and combined movement operations. It served the purpose as an input to control the prosthetic hand by using an Interface system between Microsoft Visual Studio and Arduino. The obtained results reveal the prosthetic hand to be more efficient and faster in response to the EEG datasets with an additional LiPo (Lithium Polymer) battery attached to the prosthetic. Some limitations were also identified in terms of the hand movements, weight of the prosthetic, and the suggestions to improve were concluded in this paper. Overall, the objective of this paper were achieved when the prosthetic hand found to be feasible in operation utilizing the EEG datasets.

  15. Technical note: An inorganic water chemistry dataset (1972–2011 ...

    African Journals Online (AJOL)

    A national dataset of inorganic chemical data of surface waters (rivers, lakes, and dams) in South Africa is presented and made freely available. The dataset comprises more than 500 000 complete water analyses from 1972 up to 2011, collected from more than 2 000 sample monitoring stations in South Africa. The dataset ...

  16. The effects of spatial population dataset choice on estimates of population at risk of disease

    Directory of Open Access Journals (Sweden)

    Gething Peter W

    2011-02-01

    consistently more accurate than the others in estimating PAR. The sizes of such differences among modeled human populations were related to variations in the methods, input resolution, and date of the census data underlying each dataset. Data quality varied from country to country within the spatial population datasets. Conclusions Detailed, highly spatially resolved human population data are an essential resource for planning health service delivery for disease control, for the spatial modeling of epidemics, and for decision-making processes related to public health. However, our results highlight that for the low-income regions of the world where disease burden is greatest, existing datasets display substantial variations in estimated population distributions, resulting in uncertainty in disease assessments that utilize them. Increased efforts are required to gather contemporary and spatially detailed demographic data to reduce this uncertainty, particularly in Africa, and to develop population distribution modeling methods that match the rigor, sophistication, and ability to handle uncertainty of contemporary disease mapping and spread modeling. In the meantime, studies that utilize a particular spatial population dataset need to acknowledge the uncertainties inherent within them and consider how the methods and data that comprise each will affect conclusions.

  17. Wind and wave dataset for Matara, Sri Lanka

    Science.gov (United States)

    Luo, Yao; Wang, Dongxiao; Priyadarshana Gamage, Tilak; Zhou, Fenghua; Madusanka Widanage, Charith; Liu, Taiwei

    2018-01-01

    We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1) is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017) is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447).

  18. Wind and wave dataset for Matara, Sri Lanka

    Directory of Open Access Journals (Sweden)

    Y. Luo

    2018-01-01

    Full Text Available We present a continuous in situ hydro-meteorology observational dataset from a set of instruments first deployed in December 2012 in the south of Sri Lanka, facing toward the north Indian Ocean. In these waters, simultaneous records of wind and wave data are sparse due to difficulties in deploying measurement instruments, although the area hosts one of the busiest shipping lanes in the world. This study describes the survey, deployment, and measurements of wind and waves, with the aim of offering future users of the dataset the most comprehensive and as much information as possible. This dataset advances our understanding of the nearshore hydrodynamic processes and wave climate, including sea waves and swells, in the north Indian Ocean. Moreover, it is a valuable resource for ocean model parameterization and validation. The archived dataset (Table 1 is examined in detail, including wave data at two locations with water depths of 20 and 10 m comprising synchronous time series of wind, ocean astronomical tide, air pressure, etc. In addition, we use these wave observations to evaluate the ERA-Interim reanalysis product. Based on Buoy 2 data, the swells are the main component of waves year-round, although monsoons can markedly alter the proportion between swell and wind sea. The dataset (Luo et al., 2017 is publicly available from Science Data Bank (https://doi.org/10.11922/sciencedb.447.

  19. Heuristics for Relevancy Ranking of Earth Dataset Search Results

    Science.gov (United States)

    Lynnes, Christopher; Quinn, Patrick; Norton, James

    2016-01-01

    As the Variety of Earth science datasets increases, science researchers find it more challenging to discover and select the datasets that best fit their needs. The most common way of search providers to address this problem is to rank the datasets returned for a query by their likely relevance to the user. Large web page search engines typically use text matching supplemented with reverse link counts, semantic annotations and user intent modeling. However, this produces uneven results when applied to dataset metadata records simply externalized as a web page. Fortunately, data and search provides have decades of experience in serving data user communities, allowing them to form heuristics that leverage the structure in the metadata together with knowledge about the user community. Some of these heuristics include specific ways of matching the user input to the essential measurements in the dataset and determining overlaps of time range and spatial areas. Heuristics based on the novelty of the datasets can prioritize later, better versions of data over similar predecessors. And knowledge of how different user types and communities use data can be brought to bear in cases where characteristics of the user (discipline, expertise) or their intent (applications, research) can be divined. The Earth Observing System Data and Information System has begun implementing some of these heuristics in the relevancy algorithm of its Common Metadata Repository search engine.

  20. QSAR ligand dataset for modelling mutagenicity, genotoxicity, and rodent carcinogenicity

    Directory of Open Access Journals (Sweden)

    Davy Guan

    2018-04-01

    Full Text Available Five datasets were constructed from ligand and bioassay result data from the literature. These datasets include bioassay results from the Ames mutagenicity assay, Greenscreen GADD-45a-GFP assay, Syrian Hamster Embryo (SHE assay, and 2 year rat carcinogenicity assay results. These datasets provide information about chemical mutagenicity, genotoxicity and carcinogenicity.

  1. The Dataset of Countries at Risk of Electoral Violence

    OpenAIRE

    Birch, Sarah; Muchlinski, David

    2017-01-01

    Electoral violence is increasingly affecting elections around the world, yet researchers have been limited by a paucity of granular data on this phenomenon. This paper introduces and describes a new dataset of electoral violence – the Dataset of Countries at Risk of Electoral Violence (CREV) – that provides measures of 10 different types of electoral violence across 642 elections held around the globe between 1995 and 2013. The paper provides a detailed account of how and why the dataset was ...

  2. VideoWeb Dataset for Multi-camera Activities and Non-verbal Communication

    Science.gov (United States)

    Denina, Giovanni; Bhanu, Bir; Nguyen, Hoang Thanh; Ding, Chong; Kamal, Ahmed; Ravishankar, Chinya; Roy-Chowdhury, Amit; Ivers, Allen; Varda, Brenda

    Human-activity recognition is one of the most challenging problems in computer vision. Researchers from around the world have tried to solve this problem and have come a long way in recognizing simple motions and atomic activities. As the computer vision community heads toward fully recognizing human activities, a challenging and labeled dataset is needed. To respond to that need, we collected a dataset of realistic scenarios in a multi-camera network environment (VideoWeb) involving multiple persons performing dozens of different repetitive and non-repetitive activities. This chapter describes the details of the dataset. We believe that this VideoWeb Activities dataset is unique and it is one of the most challenging datasets available today. The dataset is publicly available online at http://vwdata.ee.ucr.edu/ along with the data annotation.

  3. Toward computational cumulative biology by combining models of biological datasets.

    Science.gov (United States)

    Faisal, Ali; Peltonen, Jaakko; Georgii, Elisabeth; Rung, Johan; Kaski, Samuel

    2014-01-01

    A main challenge of data-driven sciences is how to make maximal use of the progressively expanding databases of experimental datasets in order to keep research cumulative. We introduce the idea of a modeling-based dataset retrieval engine designed for relating a researcher's experimental dataset to earlier work in the field. The search is (i) data-driven to enable new findings, going beyond the state of the art of keyword searches in annotations, (ii) modeling-driven, to include both biological knowledge and insights learned from data, and (iii) scalable, as it is accomplished without building one unified grand model of all data. Assuming each dataset has been modeled beforehand, by the researchers or automatically by database managers, we apply a rapidly computable and optimizable combination model to decompose a new dataset into contributions from earlier relevant models. By using the data-driven decomposition, we identify a network of interrelated datasets from a large annotated human gene expression atlas. While tissue type and disease were major driving forces for determining relevant datasets, the found relationships were richer, and the model-based search was more accurate than the keyword search; moreover, it recovered biologically meaningful relationships that are not straightforwardly visible from annotations-for instance, between cells in different developmental stages such as thymocytes and T-cells. Data-driven links and citations matched to a large extent; the data-driven links even uncovered corrections to the publication data, as two of the most linked datasets were not highly cited and turned out to have wrong publication entries in the database.

  4. 3DSEM: A 3D microscopy dataset

    Directory of Open Access Journals (Sweden)

    Ahmad P. Tafti

    2016-03-01

    Full Text Available The Scanning Electron Microscope (SEM as a 2D imaging instrument has been widely used in many scientific disciplines including biological, mechanical, and materials sciences to determine the surface attributes of microscopic objects. However the SEM micrographs still remain 2D images. To effectively measure and visualize the surface properties, we need to truly restore the 3D shape model from 2D SEM images. Having 3D surfaces would provide anatomic shape of micro-samples which allows for quantitative measurements and informative visualization of the specimens being investigated. The 3DSEM is a dataset for 3D microscopy vision which is freely available at [1] for any academic, educational, and research purposes. The dataset includes both 2D images and 3D reconstructed surfaces of several real microscopic samples. Keywords: 3D microscopy dataset, 3D microscopy vision, 3D SEM surface reconstruction, Scanning Electron Microscope (SEM

  5. Data-driven analysis of collections of big datasets by the Bi-CoPaM method yields field-specific novel insights

    DEFF Research Database (Denmark)

    Abu-Jamous, Basel; Liu, Chao; Roberts, David, J.

    2017-01-01

    not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field......Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope...... clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression...

  6. Active Semisupervised Clustering Algorithm with Label Propagation for Imbalanced and Multidensity Datasets

    Directory of Open Access Journals (Sweden)

    Mingwei Leng

    2013-01-01

    Full Text Available The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.

  7. A reanalysis dataset of the South China Sea

    Science.gov (United States)

    Zeng, Xuezhi; Peng, Shiqiu; Li, Zhijin; Qi, Yiquan; Chen, Rongyu

    2014-01-01

    Ocean reanalysis provides a temporally continuous and spatially gridded four-dimensional estimate of the ocean state for a better understanding of the ocean dynamics and its spatial/temporal variability. Here we present a 19-year (1992–2010) high-resolution ocean reanalysis dataset of the upper ocean in the South China Sea (SCS) produced from an ocean data assimilation system. A wide variety of observations, including in-situ temperature/salinity profiles, ship-measured and satellite-derived sea surface temperatures, and sea surface height anomalies from satellite altimetry, are assimilated into the outputs of an ocean general circulation model using a multi-scale incremental three-dimensional variational data assimilation scheme, yielding a daily high-resolution reanalysis dataset of the SCS. Comparisons between the reanalysis and independent observations support the reliability of the dataset. The presented dataset provides the research community of the SCS an important data source for studying the thermodynamic processes of the ocean circulation and meso-scale features in the SCS, including their spatial and temporal variability. PMID:25977803

  8. A Dataset for Visual Navigation with Neuromorphic Methods

    Directory of Open Access Journals (Sweden)

    Francisco eBarranco

    2016-02-01

    Full Text Available Standardized benchmarks in Computer Vision have greatly contributed to the advance of approaches to many problems in the field. If we want to enhance the visibility of event-driven vision and increase its impact, we will need benchmarks that allow comparison among different neuromorphic methods as well as comparison to Computer Vision conventional approaches. We present datasets to evaluate the accuracy of frame-free and frame-based approaches for tasks of visual navigation. Similar to conventional Computer Vision datasets, we provide synthetic and real scenes, with the synthetic data created with graphics packages, and the real data recorded using a mobile robotic platform carrying a dynamic and active pixel vision sensor (DAVIS and an RGB+Depth sensor. For both datasets the cameras move with a rigid motion in a static scene, and the data includes the images, events, optic flow, 3D camera motion, and the depth of the scene, along with calibration procedures. Finally, we also provide simulated event data generated synthetically from well-known frame-based optical flow datasets.

  9. Sparse Group Penalized Integrative Analysis of Multiple Cancer Prognosis Datasets

    Science.gov (United States)

    Liu, Jin; Huang, Jian; Xie, Yang; Ma, Shuangge

    2014-01-01

    SUMMARY In cancer research, high-throughput profiling studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Recent studies have shown that integrative analysis, which simultaneously analyzes multiple datasets, can be more effective than single-dataset analysis and classic meta-analysis. In most of existing integrative analysis, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. Several approaches have been designed to reinforce this assumption. In practice, different datasets may differ in terms of patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. In this study, we assume the heterogeneity model, under which different datasets are allowed to have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) model to describe survival. This model may have the lowest computational cost among popular semiparametric survival models. For marker selection, we adopt a sparse group MCP (minimax concave penalty) approach. This approach has an intuitive formulation and can be computed using an effective group coordinate descent algorithm. Simulation study shows that it outperforms the existing approaches under both the homogeneity and heterogeneity models. Data analysis further demonstrates the merit of heterogeneity model and proposed approach. PMID:23938111

  10. Internally consistent gamma ray burst time history phenomenology

    International Nuclear Information System (INIS)

    Cline, T.L.

    1985-01-01

    A phenomenology for gamma ray burst time histories is outlined. Order of their generally chaotic appearance is attempted, based on the speculation that any one burst event can be represented above 150 keV as a superposition of similarly shaped increases of varying intensity. The increases can generally overlap, however, confusing the picture, but a given event must at least exhibit its own limiting characteristic rise and decay times if the measurements are made with instruments having adequate temporal resolution. Most catalogued observations may be of doubtful or marginal utility to test this hypothesis, but some time histories from Helios-2, Pioneer Venus Orbiter and other instruments having one-to several-millisecond capabilities appear to provide consistency. Also, recent studies of temporally resolved Solar Maximum Mission burst energy spectra are entirely compatible with this picture. The phenomenology suggested here, if correct, may assist as an analytic tool for modelling of burst processes and possibly in the definition of burst source populations

  11. Total ozone trends from 1979 to 2016 derived from five merged observational datasets - the emergence into ozone recovery

    Science.gov (United States)

    Weber, Mark; Coldewey-Egbers, Melanie; Fioletov, Vitali E.; Frith, Stacey M.; Wild, Jeannette D.; Burrows, John P.; Long, Craig S.; Loyola, Diego

    2018-02-01

    We report on updated trends using different merged datasets from satellite and ground-based observations for the period from 1979 to 2016. Trends were determined by applying a multiple linear regression (MLR) to annual mean zonal mean data. Merged datasets used here include NASA MOD v8.6 and National Oceanic and Atmospheric Administration (NOAA) merge v8.6, both based on data from the series of Solar Backscatter UltraViolet (SBUV) and SBUV-2 satellite instruments (1978-present) as well as the Global Ozone Monitoring Experiment (GOME)-type Total Ozone (GTO) and GOME-SCIAMACHY-GOME-2 (GSG) merged datasets (1995-present), mainly comprising satellite data from GOME, the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY), and GOME-2A. The fifth dataset consists of the monthly mean zonal mean data from ground-based measurements collected at World Ozone and UV Data Center (WOUDC). The addition of four more years of data since the last World Meteorological Organization (WMO) ozone assessment (2013-2016) shows that for most datasets and regions the trends since the stratospheric halogen reached its maximum (˜ 1996 globally and ˜ 2000 in polar regions) are mostly not significantly different from zero. However, for some latitudes, in particular the Southern Hemisphere extratropics and Northern Hemisphere subtropics, several datasets show small positive trends of slightly below +1 % decade-1 that are barely statistically significant at the 2σ uncertainty level. In the tropics, only two datasets show significant trends of +0.5 to +0.8 % decade-1, while the others show near-zero trends. Positive trends since 2000 have been observed over Antarctica in September, but near-zero trends are found in October as well as in March over the Arctic. Uncertainties due to possible drifts between the datasets, from the merging procedure used to combine satellite datasets and related to the low sampling of ground-based data, are not accounted for in the trend

  12. Developing consistent pronunciation models for phonemic variants

    CSIR Research Space (South Africa)

    Davel, M

    2006-09-01

    Full Text Available Pronunciation lexicons often contain pronunciation variants. This can create two problems: It can be difficult to define these variants in an internally consistent way and it can also be difficult to extract generalised grapheme-to-phoneme rule sets...

  13. International commodity prices and civil war outbreak: new evidence for Sub-Saharan Africa and beyond

    OpenAIRE

    Ciccone, Antonio

    2018-01-01

    A new dataset by Bazzi and Blattman (2014) allows examining the effects of international commodity prices on the risk of civil war outbreak with more comprehensive data. I find that international commodity price downturns sparked civil wars in Sub-Saharan Africa. Another finding with the new dataset is that commodity price downturns also sparked civil wars beyond Sub-Saharan Africa since 1980. Effects are sizable relative to the baseline risk of civil war outbreak. My conclusions contrast wit...

  14. Large scale validation of the M5L lung CAD on heterogeneous CT datasets

    Energy Technology Data Exchange (ETDEWEB)

    Lopez Torres, E., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [CEADEN, Havana 11300, Cuba and INFN, Sezione di Torino, Torino 10125 (Italy); Fiorina, E.; Pennazio, F.; Peroni, C. [Department of Physics, University of Torino, Torino 10125, Italy and INFN, Sezione di Torino, Torino 10125 (Italy); Saletta, M.; Cerello, P., E-mail: Ernesto.Lopez.Torres@cern.ch, E-mail: cerello@to.infn.it [INFN, Sezione di Torino, Torino 10125 (Italy); Camarlinghi, N.; Fantacci, M. E. [Department of Physics, University of Pisa, Pisa 56127, Italy and INFN, Sezione di Pisa, Pisa 56127 (Italy)

    2015-04-15

    Purpose: M5L, a fully automated computer-aided detection (CAD) system for the detection and segmentation of lung nodules in thoracic computed tomography (CT), is presented and validated on several image datasets. Methods: M5L is the combination of two independent subsystems, based on the Channeler Ant Model as a segmentation tool [lung channeler ant model (lungCAM)] and on the voxel-based neural approach. The lungCAM was upgraded with a scan equalization module and a new procedure to recover the nodules connected to other lung structures; its classification module, which makes use of a feed-forward neural network, is based of a small number of features (13), so as to minimize the risk of lacking generalization, which could be possible given the large difference between the size of the training and testing datasets, which contain 94 and 1019 CTs, respectively. The lungCAM (standalone) and M5L (combined) performance was extensively tested on 1043 CT scans from three independent datasets, including a detailed analysis of the full Lung Image Database Consortium/Image Database Resource Initiative database, which is not yet found in literature. Results: The lungCAM and M5L performance is consistent across the databases, with a sensitivity of about 70% and 80%, respectively, at eight false positive findings per scan, despite the variable annotation criteria and acquisition and reconstruction conditions. A reduced sensitivity is found for subtle nodules and ground glass opacities (GGO) structures. A comparison with other CAD systems is also presented. Conclusions: The M5L performance on a large and heterogeneous dataset is stable and satisfactory, although the development of a dedicated module for GGOs detection could further improve it, as well as an iterative optimization of the training procedure. The main aim of the present study was accomplished: M5L results do not deteriorate when increasing the dataset size, making it a candidate for supporting radiologists on large

  15. Climatic Analysis of Oceanic Water Vapor Transports Based on Satellite E-P Datasets

    Science.gov (United States)

    Smith, Eric A.; Sohn, Byung-Ju; Mehta, Vikram

    2004-01-01

    Understanding the climatically varying properties of water vapor transports from a robust observational perspective is an essential step in calibrating climate models. This is tantamount to measuring year-to-year changes of monthly- or seasonally-averaged, divergent water vapor transport distributions. This cannot be done effectively with conventional radiosonde data over ocean regions where sounding data are generally sparse. This talk describes how a methodology designed to derive atmospheric water vapor transports over the world oceans from satellite-retrieved precipitation (P) and evaporation (E) datasets circumvents the problem of inadequate sampling. Ultimately, the method is intended to take advantage of the relatively complete and consistent coverage, as well as continuity in sampling, associated with E and P datasets obtained from satellite measurements. Independent P and E retrievals from Special Sensor Microwave Imager (SSM/I) measurements, along with P retrievals from Tropical Rainfall Measuring Mission (TRMM) measurements, are used to obtain transports by solving a potential function for the divergence of water vapor transport as balanced by large scale E - P conditions.

  16. An Analysis on Better Testing than Training Performances on the Iris Dataset

    NARCIS (Netherlands)

    Schutten, Marten; Wiering, Marco

    2016-01-01

    The Iris dataset is a well known dataset containing information on three different types of Iris flowers. A typical and popular method for solving classification problems on datasets such as the Iris set is the support vector machine (SVM). In order to do so the dataset is separated in a set used

  17. Interactive visualization and analysis of multimodal datasets for surgical applications.

    Science.gov (United States)

    Kirmizibayrak, Can; Yim, Yeny; Wakid, Mike; Hahn, James

    2012-12-01

    Surgeons use information from multiple sources when making surgical decisions. These include volumetric datasets (such as CT, PET, MRI, and their variants), 2D datasets (such as endoscopic videos), and vector-valued datasets (such as computer simulations). Presenting all the information to the user in an effective manner is a challenging problem. In this paper, we present a visualization approach that displays the information from various sources in a single coherent view. The system allows the user to explore and manipulate volumetric datasets, display analysis of dataset values in local regions, combine 2D and 3D imaging modalities and display results of vector-based computer simulations. Several interaction methods are discussed: in addition to traditional interfaces including mouse and trackers, gesture-based natural interaction methods are shown to control these visualizations with real-time performance. An example of a medical application (medialization laryngoplasty) is presented to demonstrate how the combination of different modalities can be used in a surgical setting with our approach.

  18. The Rucio Consistency Service

    CERN Document Server

    Serfon, Cedric; The ATLAS collaboration

    2016-01-01

    One of the biggest challenge with Large scale data management system is to ensure the consistency between the global file catalog and what is physically on all storage elements. To tackle this issue, the Rucio software which is used by the ATLAS Distributed Data Management system has been extended to automatically handle lost or unregistered files (aka Dark Data). This system automatically detects these inconsistencies and take actions like recovery or deletion of unneeded files in a central manner. In this talk, we will present this system, explain the internals and give some results.

  19. Something From Nothing (There): Collecting Global IPv6 Datasets from DNS

    NARCIS (Netherlands)

    Fiebig, T.; Borgolte, Kevin; Hao, Shuang; Kruegel, Christopher; Vigna, Giovanny; Spring, Neil; Riley, George F.

    2017-01-01

    Current large-scale IPv6 studies mostly rely on non-public datasets, asmost public datasets are domain specific. For instance, traceroute-based datasetsare biased toward network equipment. In this paper, we present a new methodologyto collect IPv6 address datasets that does not require access to

  20. Automatic processing of multimodal tomography datasets.

    Science.gov (United States)

    Parsons, Aaron D; Price, Stephen W T; Wadeson, Nicola; Basham, Mark; Beale, Andrew M; Ashton, Alun W; Mosselmans, J Frederick W; Quinn, Paul D

    2017-01-01

    With the development of fourth-generation high-brightness synchrotrons on the horizon, the already large volume of data that will be collected on imaging and mapping beamlines is set to increase by orders of magnitude. As such, an easy and accessible way of dealing with such large datasets as quickly as possible is required in order to be able to address the core scientific problems during the experimental data collection. Savu is an accessible and flexible big data processing framework that is able to deal with both the variety and the volume of data of multimodal and multidimensional scientific datasets output such as those from chemical tomography experiments on the I18 microfocus scanning beamline at Diamond Light Source.

  1. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3 Final, Individual Reports in the International Maritime Meteorological Archive Format version 1 (IMMA1)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset, the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), is the most widely-used freely available collection of surface marine observations,...

  2. GUDM: Automatic Generation of Unified Datasets for Learning and Reasoning in Healthcare.

    Science.gov (United States)

    Ali, Rahman; Siddiqi, Muhammad Hameed; Idris, Muhammad; Ali, Taqdir; Hussain, Shujaat; Huh, Eui-Nam; Kang, Byeong Ho; Lee, Sungyoung

    2015-07-02

    A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

  3. Cross-cultural adaptation, reliability, internal consistency and validation of the Hand Function Sort (HFS©) for French speaking patients with upper limb complaints.

    Science.gov (United States)

    Konzelmann, M; Burrus, C; Hilfiker, R; Rivier, G; Deriaz, O; Luthi, F

    2015-03-01

    Functional evaluation of upper limb is not only based on clinical findings but requires self-administered questionnaires to address patients' perspective. The Hand Function Sort (HFS©) was only validated in English. The aim of this study was the French cross cultural adaptation and validation of the HFS© (HFS-F). 150 patients with various upper limbs impairments were recruited in a rehabilitation center. Translation and cross-cultural adaptation were made according to international guidelines. Construct validity was estimated through correlations with Disabilities Arm Shoulder and Hand (DASH) questionnaire, SF-36 mental component summary (MCS),SF-36 physical component summary (PCS) and pain intensity. Internal consistency was assessed by Cronbach's α and test-retest reliability by intraclass correlation. Cronbach's α was 0.98, test-retest reliability was excellent at 0.921 (95 % CI 0.871-0.971) same as original HFS©. Correlations with DASH were-0.779 (95 % CI -0.847 to -0.685); with SF 36 PCS 0.452 (95 % CI 0.276-0.599); with pain -0.247 (95 % CI -0.429 to -0.041); with SF 36 MCS 0.242 (95 % CI 0.042-0.422). There were no floor or ceiling effects. The HFS-F has the same good psychometric properties as the original HFS© (internal consistency, test retest reliability, convergent validity with DASH, divergent validity with SF-36 MCS, and no floor or ceiling effects). The convergent validity with SF-36 PCS was poor; we found no correlation with pain. The HFS-F could be used with confidence in a population of working patients. Other studies are necessary to study its psychometric properties in other populations.

  4. Bootstrap embedding: An internally consistent fragment-based method

    Energy Technology Data Exchange (ETDEWEB)

    Welborn, Matthew; Tsuchimochi, Takashi; Van Voorhis, Troy [Department of Chemistry, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139 (United States)

    2016-08-21

    Strong correlation poses a difficult problem for electronic structure theory, with computational cost scaling quickly with system size. Fragment embedding is an attractive approach to this problem. By dividing a large complicated system into smaller manageable fragments “embedded” in an approximate description of the rest of the system, we can hope to ameliorate the steep cost of correlated calculations. While appealing, these methods often converge slowly with fragment size because of small errors at the boundary between fragment and bath. We describe a new electronic embedding method, dubbed “Bootstrap Embedding,” a self-consistent wavefunction-in-wavefunction embedding theory that uses overlapping fragments to improve the description of fragment edges. We apply this method to the one dimensional Hubbard model and a translationally asymmetric variant, and find that it performs very well for energies and populations. We find Bootstrap Embedding converges rapidly with embedded fragment size, overcoming the surface-area-to-volume-ratio error typical of many embedding methods. We anticipate that this method may lead to a low-scaling, high accuracy treatment of electron correlation in large molecular systems.

  5. Providing Consistent (A)ATSR Solar Channel Radiometry for Climate Studies

    Science.gov (United States)

    Smith, D.; Latter, B. G.; Poulsen, C.

    2012-04-01

    Data from the solar reflectance channels of the Along Track Scanning Radiometer (ATSR) series of instruments are being used in applications for monitoring trends in clouds and aerosols. In order to provide quantitative information, the radiometric calibrations of the sensors must be consistent, stable and ideally traced to international standards. This paper describes the methods used to monitor the calibrations of the ATSR instruments to provide consistent Level 1b radiometric data sets. Comparisons of the in-orbit calibrations are made by reference to data from quasi stable sites such as DOME-C in Antarctica or Saharan Desert sites. Comparisons are performed either by time coincident match-ups of the radiometric data for sensors having similar spectral bands and view/solar geometry and overpass times as for AATSR and MERIS; or via a reference BRDF model derived from averages of measurements over the site from a reference sensor where there is limited or no temporal overlap (e.g. MODIS-Aqua, ATSR-1 and ATSR-2). The results of the intercomparisons provide values of the long term calibration drift and systematic biases between the sensors. Look-up tables based on smoothed averages of the drift measurements are used to provide the corrections to Level 1b data. The uncertainty budgets for the comparisons are provided. It is also possible to perform comparisons of measurements against high spectral resolution instruments that are co-located on the same platform, i.e. AATSR/SCIA on ENVISAT and ATSR-2/GOME on ERS-2. The comparisons are performed by averaging the spectrometer data to the spectral response of the filter radiometer, and averaging the radiometer data to the spatial resolution of the spectrometer. In this paper, the authors present the results of the inter-comparisons to achieve a consistent calibration for the solar channels of the complete ATSR dataset. An assessment of the uncertainties associated with the techniques will be discussed. The impacts of the

  6. Construct validity, test-retest reliability and internal consistency of the Thai version of the disabilities of the arm, shoulder and hand questionnaire (DASH-TH) in patients with carpal tunnel syndrome.

    Science.gov (United States)

    Buntragulpoontawee, Montana; Phutrit, Suphatha; Tongprasert, Siam; Wongpakaran, Tinakon; Khunachiva, Jeeranan

    2018-03-27

    This study evaluated additional psychometric properties of the Thai version of the disabilities of the arm, shoulder and hand questionnaire (DASH-TH) which included, test-retest reliability, construct validity, internal consistency of in patients with carpal tunnel syndrome. As for determining construct validity, the Thai EuroQOL questionnaire (EQ-5D-5L) was also administered in order to examine convergent and divergent validity. Fifty patients completed both questionnaires. The DASH-TH showed excellent test-retest reliability (intraclass correlation coefficient = 0.811) and internal consistency (Cronbach's alpha = 0.911). The exploratory factor analysis yielded a six-factor solution while the confirmatory factor analysis denoted that the hypothesized model adequately fit the data with a comparative fit index of 0.967 and a Tucker-Lewis index of 0.964. The related subscales between the DASH-TH and the Thai EQ-5D-5L were significantly correlated, indicating the DASH-TH's convergent and discriminant validity. The DASH-TH demonstrated good reliability, internal consistency construct validity, and multidimensionality, in assessing the upper extremity function in carpal tunnel syndrome patients.

  7. Process mining in oncology using the MIMIC-III dataset

    Science.gov (United States)

    Prima Kurniati, Angelina; Hall, Geoff; Hogg, David; Johnson, Owen

    2018-03-01

    Process mining is a data analytics approach to discover and analyse process models based on the real activities captured in information systems. There is a growing body of literature on process mining in healthcare, including oncology, the study of cancer. In earlier work we found 37 peer-reviewed papers describing process mining research in oncology with a regular complaint being the limited availability and accessibility of datasets with suitable information for process mining. Publicly available datasets are one option and this paper describes the potential to use MIMIC-III, for process mining in oncology. MIMIC-III is a large open access dataset of de-identified patient records. There are 134 publications listed as using the MIMIC dataset, but none of them have used process mining. The MIMIC-III dataset has 16 event tables which are potentially useful for process mining and this paper demonstrates the opportunities to use MIMIC-III for process mining in oncology. Our research applied the L* lifecycle method to provide a worked example showing how process mining can be used to analyse cancer pathways. The results and data quality limitations are discussed along with opportunities for further work and reflection on the value of MIMIC-III for reproducible process mining research.

  8. Integrating pipeline data management application and Google maps dataset on web based GIS application unsing open source technology Sharp Map and Open Layers

    Energy Technology Data Exchange (ETDEWEB)

    Wisianto, Arie; Sania, Hidayatus [PT PERTAMINA GAS, Bontang (Indonesia); Gumilar, Oki [PT PERTAMINA GAS, Jakarta (Indonesia)

    2010-07-01

    PT Pertamina Gas operates 3 pipe segments carrying natural gas from producers to PT Pupuk Kaltim in the Kalimantan area. The company wants to build a pipeline data management system consisting of pipeline facilities, inspections and risk assessments which would run on Geographic Information Systems (GIS) platforms. The aim of this paper is to present the integration of the pipeline data management system with GIS. A web based GIS application is developed using the combination of Google maps datasets with local spatial datasets. In addition, Open Layers is used to integrate pipeline data model and Google Map dataset into a single map display on Sharp Map. The GIS based pipeline data management system developed herein constitutes a low cost, powerful and efficient web based GIS solution.

  9. Examination of Inequalities in Hungary by Microsimulation in Consistency with Macro Data

    OpenAIRE

    Cserháti, Ilona; Keresztély, Tibor; Takács, Tibor

    2016-01-01

    Effective decision making uses various databases including both micro and macro level datasets. In many cases it is a big challenge to ensure the consistency of the two levels. Different types of problems can occur and several methods can be used to solve them. The paper concentrates on the input alignment of the households’ income for microsimulation, which means refers to improving the elements of a micro data survey (EU-SILC) by using macro data from administrative sources. We use a combin...

  10. Spillover effects of international standards

    DEFF Research Database (Denmark)

    Trifkovic, Neda

    Most studies focus on trade effects and organizational outcomes of international standards, neglecting the effect of standards on employees. Using a two-year matched firm–employee panel dataset, this paper finds that the application of standards improves work conditions in small and medium....... The study reveals unexpected benefits from certification, calling for higher investment in standards....

  11. SU-E-J-29: Audiovisual Biofeedback Improves Tumor Motion Consistency for Lung Cancer Patients

    International Nuclear Information System (INIS)

    Lee, D; Pollock, S; Makhija, K; Keall, P; Greer, P; Arm, J; Hunter, P; Kim, T

    2014-01-01

    Purpose: To investigate whether the breathing-guidance system: audiovisual (AV) biofeedback improves tumor motion consistency for lung cancer patients. This will minimize respiratory-induced tumor motion variations across cancer imaging and radiotherapy procedues. This is the first study to investigate the impact of respiratory guidance on tumor motion. Methods: Tumor motion consistency was investigated with five lung cancer patients (age: 55 to 64), who underwent a training session to get familiarized with AV biofeedback, followed by two MRI sessions across different dates (pre and mid treatment). During the training session in a CT room, two patient specific breathing patterns were obtained before (Breathing-Pattern-1) and after (Breathing-Pattern-2) training with AV biofeedback. In each MRI session, four MRI scans were performed to obtain 2D coronal and sagittal image datasets in free breathing (FB), and with AV biofeedback utilizing Breathing-Pattern-2. Image pixel values of 2D images after the normalization of 2D images per dataset and Gaussian filter per image were used to extract tumor motion using image pixel values. The tumor motion consistency of the superior-inferior (SI) direction was evaluated in terms of an average tumor motion range and period. Results: Audiovisual biofeedback improved tumor motion consistency by 60% (p value = 0.019) from 1.0±0.6 mm (FB) to 0.4±0.4 mm (AV) in SI motion range, and by 86% (p value < 0.001) from 0.7±0.6 s (FB) to 0.1±0.2 s (AV) in period. Conclusion: This study demonstrated that audiovisual biofeedback improves both breathing pattern and tumor motion consistency for lung cancer patients. These results suggest that AV biofeedback has the potential for facilitating reproducible tumor motion towards achieving more accurate medical imaging and radiation therapy procedures

  12. SU-E-J-29: Audiovisual Biofeedback Improves Tumor Motion Consistency for Lung Cancer Patients

    Energy Technology Data Exchange (ETDEWEB)

    Lee, D; Pollock, S; Makhija, K; Keall, P [The University of Sydney, Camperdown, NSW (Australia); Greer, P [The University of Newcastle, Newcastle, NSW (Australia); Calvary Mater Newcastle Hospital, Newcastle, NSW (Australia); Arm, J; Hunter, P [Calvary Mater Newcastle Hospital, Newcastle, NSW (Australia); Kim, T [The University of Sydney, Camperdown, NSW (Australia); University of Virginia Health System, Charlottesville, VA (United States)

    2014-06-01

    Purpose: To investigate whether the breathing-guidance system: audiovisual (AV) biofeedback improves tumor motion consistency for lung cancer patients. This will minimize respiratory-induced tumor motion variations across cancer imaging and radiotherapy procedues. This is the first study to investigate the impact of respiratory guidance on tumor motion. Methods: Tumor motion consistency was investigated with five lung cancer patients (age: 55 to 64), who underwent a training session to get familiarized with AV biofeedback, followed by two MRI sessions across different dates (pre and mid treatment). During the training session in a CT room, two patient specific breathing patterns were obtained before (Breathing-Pattern-1) and after (Breathing-Pattern-2) training with AV biofeedback. In each MRI session, four MRI scans were performed to obtain 2D coronal and sagittal image datasets in free breathing (FB), and with AV biofeedback utilizing Breathing-Pattern-2. Image pixel values of 2D images after the normalization of 2D images per dataset and Gaussian filter per image were used to extract tumor motion using image pixel values. The tumor motion consistency of the superior-inferior (SI) direction was evaluated in terms of an average tumor motion range and period. Results: Audiovisual biofeedback improved tumor motion consistency by 60% (p value = 0.019) from 1.0±0.6 mm (FB) to 0.4±0.4 mm (AV) in SI motion range, and by 86% (p value < 0.001) from 0.7±0.6 s (FB) to 0.1±0.2 s (AV) in period. Conclusion: This study demonstrated that audiovisual biofeedback improves both breathing pattern and tumor motion consistency for lung cancer patients. These results suggest that AV biofeedback has the potential for facilitating reproducible tumor motion towards achieving more accurate medical imaging and radiation therapy procedures.

  13. Veterans Affairs Suicide Prevention Synthetic Dataset

    Data.gov (United States)

    Department of Veterans Affairs — The VA's Veteran Health Administration, in support of the Open Data Initiative, is providing the Veterans Affairs Suicide Prevention Synthetic Dataset (VASPSD). The...

  14. Internal migration and income of immigrant families

    OpenAIRE

    Rashid, Saman

    2004-01-01

    Using a longitudinal dataset from the years 1995 and 2000, respectively, this study examines whether migration within the host country of Sweden generates higher total annual income for (two-earner) immigrant families. The empirical findings indicate that internal migration generates a positive outcome in terms of higher family income for newly arrived refugee-immigrant families. Further, with the length of residence in the host country, the monetary gain accruing from internal migration decr...

  15. SAR image classification based on CNN in real and simulation datasets

    Science.gov (United States)

    Peng, Lijiang; Liu, Ming; Liu, Xiaohua; Dong, Liquan; Hui, Mei; Zhao, Yuejin

    2018-04-01

    Convolution neural network (CNN) has made great success in image classification tasks. Even in the field of synthetic aperture radar automatic target recognition (SAR-ATR), state-of-art results has been obtained by learning deep representation of features on the MSTAR benchmark. However, the raw data of MSTAR have shortcomings in training a SAR-ATR model because of high similarity in background among the SAR images of each kind. This indicates that the CNN would learn the hierarchies of features of backgrounds as well as the targets. To validate the influence of the background, some other SAR images datasets have been made which contains the simulation SAR images of 10 manufactured targets such as tank and fighter aircraft, and the backgrounds of simulation SAR images are sampled from the whole original MSTAR data. The simulation datasets contain the dataset that the backgrounds of each kind images correspond to the one kind of backgrounds of MSTAR targets or clutters and the dataset that each image shares the random background of whole MSTAR targets or clutters. In addition, mixed datasets of MSTAR and simulation datasets had been made to use in the experiments. The CNN architecture proposed in this paper are trained on all datasets mentioned above. The experimental results shows that the architecture can get high performances on all datasets even the backgrounds of the images are miscellaneous, which indicates the architecture can learn a good representation of the targets even though the drastic changes on background.

  16. Really big data: Processing and analysis of large datasets

    Science.gov (United States)

    Modern animal breeding datasets are large and getting larger, due in part to the recent availability of DNA data for many animals. Computational methods for efficiently storing and analyzing those data are under development. The amount of storage space required for such datasets is increasing rapidl...

  17. Possible world based consistency learning model for clustering and classifying uncertain data.

    Science.gov (United States)

    Liu, Han; Zhang, Xianchao; Zhang, Xiaotong

    2018-06-01

    Possible world has shown to be effective for handling various types of data uncertainty in uncertain data management. However, few uncertain data clustering and classification algorithms are proposed based on possible world. Moreover, existing possible world based algorithms suffer from the following issues: (1) they deal with each possible world independently and ignore the consistency principle across different possible worlds; (2) they require the extra post-processing procedure to obtain the final result, which causes that the effectiveness highly relies on the post-processing method and the efficiency is also not very good. In this paper, we propose a novel possible world based consistency learning model for uncertain data, which can be extended both for clustering and classifying uncertain data. This model utilizes the consistency principle to learn a consensus affinity matrix for uncertain data, which can make full use of the information across different possible worlds and then improve the clustering and classification performance. Meanwhile, this model imposes a new rank constraint on the Laplacian matrix of the consensus affinity matrix, thereby ensuring that the number of connected components in the consensus affinity matrix is exactly equal to the number of classes. This also means that the clustering and classification results can be directly obtained without any post-processing procedure. Furthermore, for the clustering and classification tasks, we respectively derive the efficient optimization methods to solve the proposed model. Experimental results on real benchmark datasets and real world uncertain datasets show that the proposed model outperforms the state-of-the-art uncertain data clustering and classification algorithms in effectiveness and performs competitively in efficiency. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. A Comparative Analysis of Classification Algorithms on Diverse Datasets

    Directory of Open Access Journals (Sweden)

    M. Alghobiri

    2018-04-01

    Full Text Available Data mining involves the computational process to find patterns from large data sets. Classification, one of the main domains of data mining, involves known structure generalizing to apply to a new dataset and predict its class. There are various classification algorithms being used to classify various data sets. They are based on different methods such as probability, decision tree, neural network, nearest neighbor, boolean and fuzzy logic, kernel-based etc. In this paper, we apply three diverse classification algorithms on ten datasets. The datasets have been selected based on their size and/or number and nature of attributes. Results have been discussed using some performance evaluation measures like precision, accuracy, F-measure, Kappa statistics, mean absolute error, relative absolute error, ROC Area etc. Comparative analysis has been carried out using the performance evaluation measures of accuracy, precision, and F-measure. We specify features and limitations of the classification algorithms for the diverse nature datasets.

  19. Acquiring a four-dimensional computed tomography dataset using an external respiratory signal

    International Nuclear Information System (INIS)

    Vedam, S S; Keall, P J; Kini, V R; Mostafavi, H; Shukla, H P; Mohan, R

    2003-01-01

    Four-dimensional (4D) methods strive to achieve highly conformal radiotherapy, particularly for lung and breast tumours, in the presence of respiratory-induced motion of tumours and normal tissues. Four-dimensional radiotherapy accounts for respiratory motion during imaging, planning and radiation delivery, and requires a 4D CT image in which the internal anatomy motion as a function of the respiratory cycle can be quantified. The aims of our research were (a) to develop a method to acquire 4D CT images from a spiral CT scan using an external respiratory signal and (b) to examine the potential utility of 4D CT imaging. A commercially available respiratory motion monitoring system provided an 'external' tracking signal of the patient's breathing. Simultaneous recording of a TTL 'X-Ray ON' signal from the CT scanner indicated the start time of CT image acquisition, thus facilitating time stamping of all subsequent images. An over-sampled spiral CT scan was acquired using a pitch of 0.5 and scanner rotation time of 1.5 s. Each image from such a scan was sorted into an image bin that corresponded with the phase of the respiratory cycle in which the image was acquired. The complete set of such image bins accumulated over a respiratory cycle constitutes a 4D CT dataset. Four-dimensional CT datasets of a mechanical oscillator phantom and a patient undergoing lung radiotherapy were acquired. Motion artefacts were significantly reduced in the images in the 4D CT dataset compared to the three-dimensional (3D) images, for which respiratory motion was not accounted. Accounting for respiratory motion using 4D CT imaging is feasible and yields images with less distortion than 3D images. 4D images also contain respiratory motion information not available in a 3D CT image

  20. An assessment of differences in gridded precipitation datasets in complex terrain

    Science.gov (United States)

    Henn, Brian; Newman, Andrew J.; Livneh, Ben; Daly, Christopher; Lundquist, Jessica D.

    2018-01-01

    Hydrologic modeling and other geophysical applications are sensitive to precipitation forcing data quality, and there are known challenges in spatially distributing gauge-based precipitation over complex terrain. We conduct a comparison of six high-resolution, daily and monthly gridded precipitation datasets over the Western United States. We compare the long-term average spatial patterns, and interannual variability of water-year total precipitation, as well as multi-year trends in precipitation across the datasets. We find that the greatest absolute differences among datasets occur in high-elevation areas and in the maritime mountain ranges of the Western United States, while the greatest percent differences among datasets relative to annual total precipitation occur in arid and rain-shadowed areas. Differences between datasets in some high-elevation areas exceed 200 mm yr-1 on average, and relative differences range from 5 to 60% across the Western United States. In areas of high topographic relief, true uncertainties and biases are likely higher than the differences among the datasets; we present evidence of this based on streamflow observations. Precipitation trends in the datasets differ in magnitude and sign at smaller scales, and are sensitive to how temporal inhomogeneities in the underlying precipitation gauge data are handled.

  1. Strontium removal jar test dataset for all figures and tables.

    Data.gov (United States)

    U.S. Environmental Protection Agency — The datasets where used to generate data to demonstrate strontium removal under various water quality and treatment conditions. This dataset is associated with the...

  2. Benchmarking of Typical Meteorological Year datasets dedicated to Concentrated-PV systems

    Science.gov (United States)

    Realpe, Ana Maria; Vernay, Christophe; Pitaval, Sébastien; Blanc, Philippe; Wald, Lucien; Lenoir, Camille

    2016-04-01

    Accurate analysis of meteorological and pyranometric data for long-term analysis is the basis of decision-making for banks and investors, regarding solar energy conversion systems. This has led to the development of methodologies for the generation of Typical Meteorological Years (TMY) datasets. The most used method for solar energy conversion systems was proposed in 1978 by the Sandia Laboratory (Hall et al., 1978) considering a specific weighted combination of different meteorological variables with notably global, diffuse horizontal and direct normal irradiances, air temperature, wind speed, relative humidity. In 2012, a new approach was proposed in the framework of the European project FP7 ENDORSE. It introduced the concept of "driver" that is defined by the user as an explicit function of the pyranometric and meteorological relevant variables to improve the representativeness of the TMY datasets with respect the specific solar energy conversion system of interest. The present study aims at comparing and benchmarking different TMY datasets considering a specific Concentrated-PV (CPV) system as the solar energy conversion system of interest. Using long-term (15+ years) time-series of high quality meteorological and pyranometric ground measurements, three types of TMY datasets generated by the following methods: the Sandia method, a simplified driver with DNI as the only representative variable and a more sophisticated driver. The latter takes into account the sensitivities of the CPV system with respect to the spectral distribution of the solar irradiance and wind speed. Different TMY datasets from the three methods have been generated considering different numbers of years in the historical dataset, ranging from 5 to 15 years. The comparisons and benchmarking of these TMY datasets are conducted considering the long-term time series of simulated CPV electric production as a reference. The results of this benchmarking clearly show that the Sandia method is not

  3. SIAM 2007 Text Mining Competition dataset

    Data.gov (United States)

    National Aeronautics and Space Administration — Subject Area: Text Mining Description: This is the dataset used for the SIAM 2007 Text Mining competition. This competition focused on developing text mining...

  4. Environmental Dataset Gateway (EDG) REST Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  5. Cross- cultural validation of the Brazilian Portuguese version of the Social Phobia Inventory (SPIN): study of the items and internal consistency.

    Science.gov (United States)

    Osório, Flávia de Lima; Crippa, José Alexandre S; Loureiro, Sonia Regina

    2009-03-01

    The objective of the present study was to carry out the cross- cultural validation for Brazilian Portuguese of the Social Phobia Inventory, an instrument for the evaluation of fear, avoidance and physiological symptoms associated with social anxiety disorder. The process of translation and adaptation involved four bilingual professionals, appreciation and approval of the back- translation by the authors of the original scale, a pilot study with 30 Brazilian university students, and appreciation by raters who confirmed the face validity of the Portuguese version, which was named ' Inventário de Fobia Social' . As part of the psychometric study of the Social Phobia Inventory, analysis of the items and evaluation of the internal consistency of the instrument were performed in a study conducted on 2314 university students. The results demonstrated that item 11, related to the fear of public speaking, was the most frequently scored item. The correlation of the items with the total score was quite adequate, ranging from 0.44 to 0.71, as was the internal consistency, which ranged from 0.71 to 0.90. The authors conclude that the Brazilian Portuguese version of the Social Phobia Inventory proved to be adequate regarding the psychometric properties initially studied, with qualities quite close to those of the original study. Studies that will evaluate the remaining indicators of validity of the Social Phobia Inventory in clinical and non-clinical samples are considered to be opportune and necessary.

  6. Is cosmology consistent?

    International Nuclear Information System (INIS)

    Wang Xiaomin; Tegmark, Max; Zaldarriaga, Matias

    2002-01-01

    We perform a detailed analysis of the latest cosmic microwave background (CMB) measurements (including BOOMERaNG, DASI, Maxima and CBI), both alone and jointly with other cosmological data sets involving, e.g., galaxy clustering and the Lyman Alpha Forest. We first address the question of whether the CMB data are internally consistent once calibration and beam uncertainties are taken into account, performing a series of statistical tests. With a few minor caveats, our answer is yes, and we compress all data into a single set of 24 bandpowers with associated covariance matrix and window functions. We then compute joint constraints on the 11 parameters of the 'standard' adiabatic inflationary cosmological model. Our best fit model passes a series of physical consistency checks and agrees with essentially all currently available cosmological data. In addition to sharp constraints on the cosmic matter budget in good agreement with those of the BOOMERaNG, DASI and Maxima teams, we obtain a heaviest neutrino mass range 0.04-4.2 eV and the sharpest constraints to date on gravity waves which (together with preference for a slight red-tilt) favor 'small-field' inflation models

  7. Geoseq: a tool for dissecting deep-sequencing datasets

    Directory of Open Access Journals (Sweden)

    Homann Robert

    2010-10-01

    Full Text Available Abstract Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO, Sequence Read Archive (SRA hosted by the NCBI, or the DNA Data Bank of Japan (ddbj. Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a identify differential isoform expression in mRNA-seq datasets, b identify miRNAs (microRNAs in libraries, and identify mature and star sequences in miRNAS and c to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.

  8. 26 CFR 301.6222(a)-1 - Consistent treatment of partnership items.

    Science.gov (United States)

    2010-04-01

    ... 26 Internal Revenue 18 2010-04-01 2010-04-01 false Consistent treatment of partnership items. 301... Consistent treatment of partnership items. (a) In general. The treatment of a partnership item on the partner's return must be consistent with the treatment of that item by the partnership on the partnership...

  9. Test-Retest Reliability, Convergent Validity, and Internal Consistency of the Persian Version of Fullerton Advanced Balance Scale in Iranian Community-Dwelling Older Adults

    OpenAIRE

    Azar Sabet; Akram Azad; Ghorban Taghizadeh

    2016-01-01

    Objectives: This study was performed to evaluate convergent validity, test-retest reliability and internal consistency of the Persian translation of the Fullerton advanced balance (FAB) for use in Iranian community- dwelling older adults and improve the quality of their functional balance assessment. Methods & Materials: The original scale was translated with forward-backward protocol. In the next step, using convenience sampling and inclusion criteria, 88 functionally indep...

  10. Sensitivity of a numerical wave model on wind re-analysis datasets

    Science.gov (United States)

    Lavidas, George; Venugopal, Vengatesan; Friedrich, Daniel

    2017-03-01

    Wind is the dominant process for wave generation. Detailed evaluation of metocean conditions strengthens our understanding of issues concerning potential offshore applications. However, the scarcity of buoys and high cost of monitoring systems pose a barrier to properly defining offshore conditions. Through use of numerical wave models, metocean conditions can be hindcasted and forecasted providing reliable characterisations. This study reports the sensitivity of wind inputs on a numerical wave model for the Scottish region. Two re-analysis wind datasets with different spatio-temporal characteristics are used, the ERA-Interim Re-Analysis and the CFSR-NCEP Re-Analysis dataset. Different wind products alter results, affecting the accuracy obtained. The scope of this study is to assess different available wind databases and provide information concerning the most appropriate wind dataset for the specific region, based on temporal, spatial and geographic terms for wave modelling and offshore applications. Both wind input datasets delivered results from the numerical wave model with good correlation. Wave results by the 1-h dataset have higher peaks and lower biases, in expense of a high scatter index. On the other hand, the 6-h dataset has lower scatter but higher biases. The study shows how wind dataset affects the numerical wave modelling performance, and that depending on location and study needs, different wind inputs should be considered.

  11. Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone

    Science.gov (United States)

    Kashparov, Valery; Levchuk, Sviatoslav; Zhurba, Marina; Protsak, Valentyn; Khomutinin, Yuri; Beresford, Nicholas A.; Chaplow, Jacqueline S.

    2018-02-01

    The dataset Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone was developed to enable data collected between May 1986 (immediately after Chernobyl) and 2014 by the Ukrainian Institute of Agricultural Radiology (UIAR) after the Chernobyl accident to be made publicly available. The dataset includes results from comprehensive soil sampling across the Chernobyl Exclusion Zone (CEZ). Analyses include radiocaesium (134Cs and 134Cs) 90Sr, 154Eu and soil property data; plutonium isotope activity concentrations in soil (including distribution in the soil profile); analyses of hot (or fuel) particles from the CEZ (data from Poland and across Europe are also included); and results of monitoring in the Ivankov district, a region adjacent to the exclusion zone. The purpose of this paper is to describe the available data and methodology used to obtain them. The data will be valuable to those conducting studies within the CEZ in a number of ways, for instance (i) for helping to perform robust exposure estimates to wildlife, (ii) for predicting comparative activity concentrations of different key radionuclides, (iii) for providing a baseline against which future surveys in the CEZ can be compared, (iv) as a source of information on the behaviour of fuel particles (FPs), (v) for performing retrospective dose assessments and (vi) for assessing natural background dose rates in the CEZ. The CEZ has been proposed as a radioecological observatory (i.e. a radioactively contaminated site that will provide a focus for long-term, radioecological collaborative international research). Key to the future success of this concept is open access to data for the CEZ. The data presented here are a first step in this process. The data and supporting documentation are freely available from the Environmental Information Data Centre (EIDC) under the terms and conditions of the Open Government Licence: https://doi.org/10.5285/782ec845-2135-4698-8881-b38823e533bf.

  12. Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone

    Directory of Open Access Journals (Sweden)

    V. Kashparov

    2018-02-01

    Full Text Available The dataset Spatial datasets of radionuclide contamination in the Ukrainian Chernobyl Exclusion Zone was developed to enable data collected between May 1986 (immediately after Chernobyl and 2014 by the Ukrainian Institute of Agricultural Radiology (UIAR after the Chernobyl accident to be made publicly available. The dataset includes results from comprehensive soil sampling across the Chernobyl Exclusion Zone (CEZ. Analyses include radiocaesium (134Cs and 134Cs 90Sr, 154Eu and soil property data; plutonium isotope activity concentrations in soil (including distribution in the soil profile; analyses of hot (or fuel particles from the CEZ (data from Poland and across Europe are also included; and results of monitoring in the Ivankov district, a region adjacent to the exclusion zone. The purpose of this paper is to describe the available data and methodology used to obtain them. The data will be valuable to those conducting studies within the CEZ in a number of ways, for instance (i for helping to perform robust exposure estimates to wildlife, (ii for predicting comparative activity concentrations of different key radionuclides, (iii for providing a baseline against which future surveys in the CEZ can be compared, (iv as a source of information on the behaviour of fuel particles (FPs, (v for performing retrospective dose assessments and (vi for assessing natural background dose rates in the CEZ. The CEZ has been proposed as a radioecological observatory (i.e. a radioactively contaminated site that will provide a focus for long-term, radioecological collaborative international research. Key to the future success of this concept is open access to data for the CEZ. The data presented here are a first step in this process. The data and supporting documentation are freely available from the Environmental Information Data Centre (EIDC under the terms and conditions of the Open Government Licence: https://doi.org/10.5285/782ec845-2135-4698-8881-b

  13. The Added Utility of Hydrological Model and Satellite Based Datasets in Agricultural Drought Analysis over Turkey

    Science.gov (United States)

    Bulut, B.; Hüsami Afşar, M.; Yilmaz, M. T.

    2017-12-01

    Analysis of agricultural drought, which causes substantial socioeconomically costs in Turkey and in the world, is critical in terms of understanding this natural disaster's characteristics (intensity, duration, influence area) and research on possible precautions. Soil moisture is one of the most important parameters which is used to observe agricultural drought, can be obtained using different methods. The most common, consistent and reliable soil moisture datasets used for large scale analysis are obtained from hydrologic models and remote sensing retrievals. On the other hand, Normalized difference vegetation index (NDVI) and gauge based precipitation observations are also commonly used for drought analysis. In this study, soil moisture products obtained from different platforms, NDVI and precipitation datasets over several different agricultural regions under various climate conditions in Turkey are obtained in growth season period. These datasets are later used to investigate agricultural drought by the help of annual crop yield data of selected agricultural lands. The type of vegetation over these regions are obtained using CORINE Land Cover (CLC 2012) data. The crop yield data were taken from the record of related district's statistics which is provided by Turkish Statistical Institute (TÜİK). This project is supported by TÜBİTAK project number 114Y676.

  14. Querying Large Biological Network Datasets

    Science.gov (United States)

    Gulsoy, Gunhan

    2013-01-01

    New experimental methods has resulted in increasing amount of genetic interaction data to be generated every day. Biological networks are used to store genetic interaction data gathered. Increasing amount of data available requires fast large scale analysis methods. Therefore, we address the problem of querying large biological network datasets.…

  15. BanglaLekha-Isolated: A multi-purpose comprehensive dataset of Handwritten Bangla Isolated characters

    Directory of Open Access Journals (Sweden)

    Mithun Biswas

    2017-06-01

    Full Text Available BanglaLekha-Isolated, a Bangla handwritten isolated character dataset is presented in this article. This dataset contains 84 different characters comprising of 50 Bangla basic characters, 10 Bangla numerals and 24 selected compound characters. 2000 handwriting samples for each of the 84 characters were collected, digitized and pre-processed. After discarding mistakes and scribbles, 1,66,105 handwritten character images were included in the final dataset. The dataset also includes labels indicating the age and the gender of the subjects from whom the samples were collected. This dataset could be used not only for optical handwriting recognition research but also to explore the influence of gender and age on handwriting. The dataset is publicly available at https://data.mendeley.com/datasets/hf6sf8zrkc/2.

  16. Standardization of GIS datasets for emergency preparedness of NPPs

    International Nuclear Information System (INIS)

    Saindane, Shashank S.; Suri, M.M.K.; Otari, Anil; Pradeepkumar, K.S.

    2012-01-01

    Probability of a major nuclear accident which can lead to large scale release of radioactivity into environment is extremely small by the incorporation of safety systems and defence-in-depth philosophy. Nevertheless emergency preparedness for implementation of counter measures to reduce the consequences are required for all major nuclear facilities. Iodine prophylaxis, Sheltering, evacuation etc. are protective measures to be implemented for members of public in the unlikely event of any significant releases from nuclear facilities. Bhabha Atomic Research Centre has developed a GIS supported Nuclear Emergency Preparedness Program. Preparedness for Response to Nuclear emergencies needs geographical details of the affected locations specially Nuclear Power Plant Sites and nearby public domain. Geographical information system data sets which the planners are looking for will have appropriate details in order to take decision and mobilize the resources in time and follow the Standard Operating Procedures. Maps are 2-dimensional representations of our real world and GIS makes it possible to manipulate large amounts of geo-spatially referenced data and convert it into information. This has become an integral part of the nuclear emergency preparedness and response planning. This GIS datasets consisting of layers such as village settlements, roads, hospitals, police stations, shelters etc. is standardized and effectively used during the emergency. The paper focuses on the need of standardization of GIS datasets which in turn can be used as a tool to display and evaluate the impact of standoff distances and selected zones in community planning. It will also highlight the database specifications which will help in fast processing of data and analysis to derive useful and helpful information. GIS has the capability to store, manipulate, analyze and display the large amount of required spatial and tabular data. This study intends to carry out a proper response and preparedness

  17. An 18-yr long (1993–2011 snow and meteorological dataset from a mid-altitude mountain site (Col de Porte, France, 1325 m alt. for driving and evaluating snowpack models

    Directory of Open Access Journals (Sweden)

    S. Morin

    2012-07-01

    Full Text Available A quality-controlled snow and meteorological dataset spanning the period 1 August 1993–31 July 2011 is presented, originating from the experimental station Col de Porte (1325 m altitude, Chartreuse range, France. Emphasis is placed on meteorological data relevant to the observation and modelling of the seasonal snowpack. In-situ driving data, at the hourly resolution, consist of measurements of air temperature, relative humidity, windspeed, incoming short-wave and long-wave radiation, precipitation rate partitioned between snow- and rainfall, with a focus on the snow-dominated season. Meteorological data for the three summer months (generally from 10 June to 20 September, when the continuity of the field record is not warranted, are taken from a local meteorological reanalysis (SAFRAN, in order to provide a continuous and consistent gap-free record. Data relevant to snowpack properties are provided at the daily (snow depth, snow water equivalent, runoff and albedo and hourly (snow depth, albedo, runoff, surface temperature, soil temperature time resolution. Internal snowpack information is provided from weekly manual snowpit observations (mostly consisting in penetration resistance, snow type, snow temperature and density profiles and from a hourly record of temperature and height of vertically free ''settling'' disks. This dataset has been partially used in the past to assist in developing snowpack models and is presented here comprehensively for the purpose of multi-year model performance assessment. The data is placed on the PANGAEA repository (http://dx.doi.org/10.1594/PANGAEA.774249 as well as on the public ftp server ftp://ftp-cnrm.meteo.fr/pub-cencdp/.

  18. A dataset of human decision-making in teamwork management

    Science.gov (United States)

    Yu, Han; Shen, Zhiqi; Miao, Chunyan; Leung, Cyril; Chen, Yiqiang; Fauvel, Simon; Lin, Jun; Cui, Lizhen; Pan, Zhengxiang; Yang, Qiang

    2017-01-01

    Today, most endeavours require teamwork by people with diverse skills and characteristics. In managing teamwork, decisions are often made under uncertainty and resource constraints. The strategies and the effectiveness of the strategies different people adopt to manage teamwork under different situations have not yet been fully explored, partially due to a lack of detailed large-scale data. In this paper, we describe a multi-faceted large-scale dataset to bridge this gap. It is derived from a game simulating complex project management processes. It presents the participants with different conditions in terms of team members' capabilities and task characteristics for them to exhibit their decision-making strategies. The dataset contains detailed data reflecting the decision situations, decision strategies, decision outcomes, and the emotional responses of 1,144 participants from diverse backgrounds. To our knowledge, this is the first dataset simultaneously covering these four facets of decision-making. With repeated measurements, the dataset may help establish baseline variability of decision-making in teamwork management, leading to more realistic decision theoretic models and more effective decision support approaches.

  19. Demonstrating the value of publishing open data by linking DOI-based citations of source datasets to uses in research and policy

    Science.gov (United States)

    Copas, K.; Legind, J. K.; Hahn, A.; Braak, K.; Høftt, M.; Noesgaard, D.; Robertson, T.; Méndez Hernández, F.; Schigel, D.; Ko, C.

    2017-12-01

    GBIF—the Global Biodiversity Information Facility—has recently demonstrated a system that tracks publications back to individual datasets, giving data providers demonstrable evidence of the benefit and utility of sharing data to support an array of scholarly topics and practical applications. GBIF is an open-data network and research infrastructure funded by the world's governments. Its community consists of more than 90 formal participants and almost 1,000 data-publishing institutions, which currently make tens of thousands of datasets containing nearly 800 million species occurrence records freely and publicly available for discovery, use and reuse across a wide range of biodiversity-related research and policy investigations. Starting in 2015 with the help of DataONE, GBIF introduced DOIs as persistent identifiers for the datasets shared through its network. This enhancement soon extended to the assignment of DOIs to user downloads from GBIF.org, which typically filter the available records with a variety of taxonomic, geographic, temporal and other search terms. Despite the lack of widely accepted standards for citing data among researchers and publications, this technical infrastructure is beginning to take hold and support open, transparent, persistent and repeatable use and reuse of species occurrence data. These `download DOIs' provide canonical references for the search results researchers process and use in peer-reviewed articles—a practice GBIF encourages by confirming new DOIs with each download and offering guidelines on citation. GBIF has recently started linking these citation results back to dataset and publisher pages, offering more consistent, traceable evidence of the value of sharing data to support others' research. GBIF's experience may be a useful model for other repositories to follow.

  20. EVALUATION OF LAND USE/LAND COVER DATASETS FOR URBAN WATERSHED MODELING

    International Nuclear Information System (INIS)

    S.J. BURIAN; M.J. BROWN; T.N. MCPHERSON

    2001-01-01

    Land use/land cover (LULC) data are a vital component for nonpoint source pollution modeling. Most watershed hydrology and pollutant loading models use, in some capacity, LULC information to generate runoff and pollutant loading estimates. Simple equation methods predict runoff and pollutant loads using runoff coefficients or pollutant export coefficients that are often correlated to LULC type. Complex models use input variables and parameters to represent watershed characteristics and pollutant buildup and washoff rates as a function of LULC type. Whether using simple or complex models an accurate LULC dataset with an appropriate spatial resolution and level of detail is paramount for reliable predictions. The study presented in this paper compared and evaluated several LULC dataset sources for application in urban environmental modeling. The commonly used USGS LULC datasets have coarser spatial resolution and lower levels of classification than other LULC datasets. In addition, the USGS datasets do not accurately represent the land use in areas that have undergone significant land use change during the past two decades. We performed a watershed modeling analysis of three urban catchments in Los Angeles, California, USA to investigate the relative difference in average annual runoff volumes and total suspended solids (TSS) loads when using the USGS LULC dataset versus using a more detailed and current LULC dataset. When the two LULC datasets were aggregated to the same land use categories, the relative differences in predicted average annual runoff volumes and TSS loads from the three catchments were 8 to 14% and 13 to 40%, respectively. The relative differences did not have a predictable relationship with catchment size

  1. Interpolation of diffusion weighted imaging datasets

    DEFF Research Database (Denmark)

    Dyrby, Tim B; Lundell, Henrik; Burke, Mark W

    2014-01-01

    anatomical details and signal-to-noise-ratio for reliable fibre reconstruction. We assessed the potential benefits of interpolating DWI datasets to a higher image resolution before fibre reconstruction using a diffusion tensor model. Simulations of straight and curved crossing tracts smaller than or equal......Diffusion weighted imaging (DWI) is used to study white-matter fibre organisation, orientation and structural connectivity by means of fibre reconstruction algorithms and tractography. For clinical settings, limited scan time compromises the possibilities to achieve high image resolution for finer...... interpolation methods fail to disentangle fine anatomical details if PVE is too pronounced in the original data. As for validation we used ex-vivo DWI datasets acquired at various image resolutions as well as Nissl-stained sections. Increasing the image resolution by a factor of eight yielded finer geometrical...

  2. ClimateNet: A Machine Learning dataset for Climate Science Research

    Science.gov (United States)

    Prabhat, M.; Biard, J.; Ganguly, S.; Ames, S.; Kashinath, K.; Kim, S. K.; Kahou, S.; Maharaj, T.; Beckham, C.; O'Brien, T. A.; Wehner, M. F.; Williams, D. N.; Kunkel, K.; Collins, W. D.

    2017-12-01

    Deep Learning techniques have revolutionized commercial applications in Computer vision, speech recognition and control systems. The key for all of these developments was the creation of a curated, labeled dataset ImageNet, for enabling multiple research groups around the world to develop methods, benchmark performance and compete with each other. The success of Deep Learning can be largely attributed to the broad availability of this dataset. Our empirical investigations have revealed that Deep Learning is similarly poised to benefit the task of pattern detection in climate science. Unfortunately, labeled datasets, a key pre-requisite for training, are hard to find. Individual research groups are typically interested in specialized weather patterns, making it hard to unify, and share datasets across groups and institutions. In this work, we are proposing ClimateNet: a labeled dataset that provides labeled instances of extreme weather patterns, as well as associated raw fields in model and observational output. We develop a schema in NetCDF to enumerate weather pattern classes/types, store bounding boxes, and pixel-masks. We are also working on a TensorFlow implementation to natively import such NetCDF datasets, and are providing a reference convolutional architecture for binary classification tasks. Our hope is that researchers in Climate Science, as well as ML/DL, will be able to use (and extend) ClimateNet to make rapid progress in the application of Deep Learning for Climate Science research.

  3. BASE MAP DATASET, INYO COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. BASE MAP DATASET, JACKSON COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. BASE MAP DATASET, KINGFISHER COUNTY, OKLAHOMA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. Do countries consistently engage in misinforming the international community about their efforts to combat money laundering? Evidence using Benford's law

    NARCIS (Netherlands)

    Deleanu, Ioana Sorina

    2017-01-01

    Indicators of compliance and efficiency in combatting money laundering, collected by EUROSTAT, are plagued with shortcomings. In this paper, I have carried out a forensic analysis on a 2003-2010 dataset of indicators of compliance and efficiency in combatting money laundering, that European Union

  7. Small Open Economy Firms in International Trade

    DEFF Research Database (Denmark)

    Eriksson, Tor Viking; Smeets, Valérie; Warzynski, Frederic

    In this paper, we use a rich dataset disaggregating imports and exports decisions by product and origin/destination of all Danish companies for the period 1993-2003 to provide key elements in characterizing Danish firms in international trade. Most evidence to date emanates from the U.S. or devel......In this paper, we use a rich dataset disaggregating imports and exports decisions by product and origin/destination of all Danish companies for the period 1993-2003 to provide key elements in characterizing Danish firms in international trade. Most evidence to date emanates from the U...... in Denmark than in the U.S. There are few traces of the European Union's Single Market Program and the adoption the Euro in 1998. We observe no impact of these changes on the number of exporters, but some signs of impacts on the number of products and export destination countries. Finally, we find that trade...

  8. Impact of automatization in temperature series in Spain and comparison with the POST-AWS dataset

    Science.gov (United States)

    Aguilar, Enric; López-Díaz, José Antonio; Prohom Duran, Marc; Gilabert, Alba; Luna Rico, Yolanda; Venema, Victor; Auchmann, Renate; Stepanek, Petr; Brandsma, Theo

    2016-04-01

    Climate data records are most of the times affected by inhomogeneities. Especially inhomogeneities introducing network-wide biases are sometimes related to changes happening almost simultaneously in an entire network. Relative homogenization is difficult in these cases, especially at the daily scale. A good example of this is the substitution of manual observations (MAN) by automatic weather stations (AWS). Parallel measurements (i.e. records taken at the same time with the old (MAN) and new (AWS) sensors can provide an idea of the bias introduced and help to evaluate the suitability of different correction approaches. We present here a quality controlled dataset compiled under the DAAMEC Project, comprising 46 stations across Spain and over 85,000 parallel measurements (AWS-MAN) of daily maximum and minimum temperature. We study the differences between both sensors and compare it with the available metadata to account for internal inhomogeneities. The differences between both systems vary much across stations, with patterns more related to their particular settings than to climatic/geographical reasons. The typical median biases (AWS-MAN) by station (comprised between the interquartile range) oscillate between -0.2°C and 0.4 in daily maximum temperature and between -0.4°C and 0.2°C in daily minimum temperature. These and other results are compared with a larger network, the Parallel Observations Scientific Team, a working group of the International Surface Temperatures Initiative (ISTI-POST) dataset, which comprises our stations, as well as others from different countries in America, Asia and Europe.

  9. Image segmentation evaluation for very-large datasets

    Science.gov (United States)

    Reeves, Anthony P.; Liu, Shuang; Xie, Yiting

    2016-03-01

    With the advent of modern machine learning methods and fully automated image analysis there is a need for very large image datasets having documented segmentations for both computer algorithm training and evaluation. Current approaches of visual inspection and manual markings do not scale well to big data. We present a new approach that depends on fully automated algorithm outcomes for segmentation documentation, requires no manual marking, and provides quantitative evaluation for computer algorithms. The documentation of new image segmentations and new algorithm outcomes are achieved by visual inspection. The burden of visual inspection on large datasets is minimized by (a) customized visualizations for rapid review and (b) reducing the number of cases to be reviewed through analysis of quantitative segmentation evaluation. This method has been applied to a dataset of 7,440 whole-lung CT images for 6 different segmentation algorithms designed to fully automatically facilitate the measurement of a number of very important quantitative image biomarkers. The results indicate that we could achieve 93% to 99% successful segmentation for these algorithms on this relatively large image database. The presented evaluation method may be scaled to much larger image databases.

  10. Evaluating the use of different precipitation datasets in simulating a flood event

    Science.gov (United States)

    Akyurek, Z.; Ozkaya, A.

    2016-12-01

    Floods caused by convective storms in mountainous regions are sensitive to the temporal and spatial variability of rainfall. Space-time estimates of rainfall from weather radar, satellites and numerical weather prediction models can be a remedy to represent pattern of the rainfall with some inaccuracy. However, there is a strong need for evaluation of the performance and limitations of these estimates in hydrology. This study aims to provide a comparison of gauge, radar, satellite (Hydro-Estimator (HE)) and numerical weather prediciton model (Weather Research and Forecasting (WRF)) precipitation datasets during an extreme flood event (22.11.2014) lasting 40 hours in Samsun-Turkey. For this study, hourly rainfall data from 13 ground observation stations were used in the analyses. This event having a peak discharge of 541 m3/sec created flooding at the downstream of Terme Basin. Comparisons were performed in two parts. First the analysis were performed in areal and point based manner. Secondly, a semi-distributed hydrological model was used to assess the accuracy of the rainfall datasets to simulate river flows for the flood event. Kalman Filtering was used in the bias correction of radar rainfall data compared to gauge measurements. Radar, gauge, corrected radar, HE and WRF rainfall data were used as model inputs. Generally, the HE product underestimates the cumulative rainfall amounts in all stations, radar data underestimates the results in cumulative sense but keeps the consistency in the results. On the other hand, almost all stations in WRF mean statistics computations have better results compared to the HE product but worse than the radar dataset. Results in point comparisons indicated that, trend of the rainfall is captured by the radar rainfall estimation well but radar underestimates the maximum values. According to cumulative gauge value, radar underestimated the cumulative rainfall amount by % 32. Contrary to other datasets, the bias of WRF is positive

  11. A New Dataset Size Reduction Approach for PCA-Based Classification in OCR Application

    Directory of Open Access Journals (Sweden)

    Mohammad Amin Shayegan

    2014-01-01

    Full Text Available A major problem of pattern recognition systems is due to the large volume of training datasets including duplicate and similar training samples. In order to overcome this problem, some dataset size reduction and also dimensionality reduction techniques have been introduced. The algorithms presently used for dataset size reduction usually remove samples near to the centers of classes or support vector samples between different classes. However, the samples near to a class center include valuable information about the class characteristics and the support vector is important for evaluating system efficiency. This paper reports on the use of Modified Frequency Diagram technique for dataset size reduction. In this new proposed technique, a training dataset is rearranged and then sieved. The sieved training dataset along with automatic feature extraction/selection operation using Principal Component Analysis is used in an OCR application. The experimental results obtained when using the proposed system on one of the biggest handwritten Farsi/Arabic numeral standard OCR datasets, Hoda, show about 97% accuracy in the recognition rate. The recognition speed increased by 2.28 times, while the accuracy decreased only by 0.7%, when a sieved version of the dataset, which is only as half as the size of the initial training dataset, was used.

  12. The CMS dataset bookkeeping service

    Science.gov (United States)

    Afaq, A.; Dolgert, A.; Guo, Y.; Jones, C.; Kosyakov, S.; Kuznetsov, V.; Lueking, L.; Riley, D.; Sekhri, V.

    2008-07-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  13. The CMS dataset bookkeeping service

    Energy Technology Data Exchange (ETDEWEB)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V [Fermilab, Batavia, Illinois 60510 (United States); Dolgert, A; Jones, C; Kuznetsov, V; Riley, D [Cornell University, Ithaca, New York 14850 (United States)

    2008-07-15

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems.

  14. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, A; Guo, Y; Kosyakov, S; Lueking, L; Sekhri, V; Dolgert, A; Jones, C; Kuznetsov, V; Riley, D

    2008-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  15. The CMS dataset bookkeeping service

    International Nuclear Information System (INIS)

    Afaq, Anzar; Dolgert, Andrew; Guo, Yuyi; Jones, Chris; Kosyakov, Sergey; Kuznetsov, Valentin; Lueking, Lee; Riley, Dan; Sekhri, Vijay

    2007-01-01

    The CMS Dataset Bookkeeping Service (DBS) has been developed to catalog all CMS event data from Monte Carlo and Detector sources. It provides the ability to identify MC or trigger source, track data provenance, construct datasets for analysis, and discover interesting data. CMS requires processing and analysis activities at various service levels and the DBS system provides support for localized processing or private analysis, as well as global access for CMS users at large. Catalog entries can be moved among the various service levels with a simple set of migration tools, thus forming a loose federation of databases. DBS is available to CMS users via a Python API, Command Line, and a Discovery web page interfaces. The system is built as a multi-tier web application with Java servlets running under Tomcat, with connections via JDBC to Oracle or MySQL database backends. Clients connect to the service through HTTP or HTTPS with authentication provided by GRID certificates and authorization through VOMS. DBS is an integral part of the overall CMS Data Management and Workflow Management systems

  16. Toward a new data standard for combined marine biological and environmental datasets - expanding OBIS beyond species occurrences

    Directory of Open Access Journals (Sweden)

    Daphnis De Pooter

    2017-01-01

    Full Text Available The Ocean Biogeographic Information System (OBIS is the world’s most comprehensive online, open-access database of marine species distributions. OBIS grows with millions of new species observations every year. Contributions come from a network of hundreds of institutions, projects and individuals with common goals: to build a scientific knowledge base that is open to the public for scientific discovery and exploration and to detect trends and changes that inform society as essential elements in conservation management and sustainable development. Until now, OBIS has focused solely on the collection of biogeographic data (the presence of marine species in space and time and operated with optimized data flows, quality control procedures and data standards specifically targeted to these data. Based on requirements from the growing OBIS community to manage datasets that combine biological, physical and chemical measurements, the OBIS-ENV-DATA pilot project was launched to develop a proposed standard and guidelines to make sure these combined datasets can stay together and are not, as is often the case, split and sent to different repositories. The proposal in this paper allows for the management of sampling methodology, animal tracking and telemetry data, biological measurements (e.g., body length, percent live cover, ... as well as environmental measurements such as nutrient concentrations, sediment characteristics or other abiotic parameters measured during sampling to characterize the environment from which biogeographic data was collected. The recommended practice builds on the Darwin Core Archive (DwC-A standard and on practices adopted by the Global Biodiversity Information Facility (GBIF. It consists of a DwC Event Core in combination with a DwC Occurrence Extension and a proposed enhancement to the DwC MeasurementOrFact Extension. This new structure enables the linkage of measurements or facts - quantitative and qualitative properties - to

  17. A cross-country Exchange Market Pressure (EMP dataset

    Directory of Open Access Journals (Sweden)

    Mohit Desai

    2017-06-01

    Full Text Available The data presented in this article are related to the research article titled - “An exchange market pressure measure for cross country analysis” (Patnaik et al. [1]. In this article, we present the dataset for Exchange Market Pressure values (EMP for 139 countries along with their conversion factors, ρ (rho. Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values for the point estimates of ρ’s. Using the standard errors of estimates of ρ’s, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  18. A cross-country Exchange Market Pressure (EMP) dataset.

    Science.gov (United States)

    Desai, Mohit; Patnaik, Ila; Felman, Joshua; Shah, Ajay

    2017-06-01

    The data presented in this article are related to the research article titled - "An exchange market pressure measure for cross country analysis" (Patnaik et al. [1]). In this article, we present the dataset for Exchange Market Pressure values (EMP) for 139 countries along with their conversion factors, ρ (rho). Exchange Market Pressure, expressed in percentage change in exchange rate, measures the change in exchange rate that would have taken place had the central bank not intervened. The conversion factor ρ can interpreted as the change in exchange rate associated with $1 billion of intervention. Estimates of conversion factor ρ allow us to calculate a monthly time series of EMP for 139 countries. Additionally, the dataset contains the 68% confidence interval (high and low values) for the point estimates of ρ 's. Using the standard errors of estimates of ρ 's, we obtain one sigma intervals around mean estimates of EMP values. These values are also reported in the dataset.

  19. Time series pCO2 at a coastal mooring: Internal consistency, seasonal cycles, and interannual variability

    Science.gov (United States)

    Reimer, Janet J.; Cai, Wei-Jun; Xue, Liang; Vargas, Rodrigo; Noakes, Scott; Hu, Xinping; Signorini, Sergio R.; Mathis, Jeremy T.; Feely, Richard A.; Sutton, Adrienne J.; Sabine, Christopher; Musielewicz, Sylvia; Chen, Baoshan; Wanninkhof, Rik

    2017-08-01

    Marine carbonate system monitoring programs often consist of multiple observational methods that include underway cruise data, moored autonomous time series, and discrete water bottle samples. Monitored parameters include all, or some of the following: partial pressure of CO2 of the water (pCO2w) and air, dissolved inorganic carbon (DIC), total alkalinity (TA), and pH. Any combination of at least two of the aforementioned parameters can be used to calculate the others. In this study at the Gray's Reef (GR) mooring in the South Atlantic Bight (SAB) we: examine the internal consistency of pCO2w from underway cruise, moored autonomous time series, and calculated from bottle samples (DIC-TA pairing); describe the seasonal to interannual pCO2w time series variability and air-sea flux (FCO2), as well as describe the potential sources of pCO2w variability; and determine the source/sink for atmospheric pCO2. Over the 8.5 years of GR mooring time series, mooring-underway and mooring-bottle calculated-pCO2w strongly correlate with r-values > 0.90. pCO2w and FCO2 time series follow seasonal thermal patterns; however, seasonal non-thermal processes, such as terrestrial export, net biological production, and air-sea exchange also influence variability. The linear slope of time series pCO2w increases by 5.2 ± 1.4 μatm y-1 with FCO2 increasing 51-70 mmol m-2 y-1. The net FCO2 sign can switch interannually with the magnitude varying greatly. Non-thermal pCO2w is also increasing over the time series, likely indicating that terrestrial export and net biological processes drive the long term pCO2w increase.

  20. Knowledge Mining from Clinical Datasets Using Rough Sets and Backpropagation Neural Network

    Directory of Open Access Journals (Sweden)

    Kindie Biredagn Nahato

    2015-01-01

    Full Text Available The availability of clinical datasets and knowledge mining methodologies encourages the researchers to pursue research in extracting knowledge from clinical datasets. Different data mining techniques have been used for mining rules, and mathematical models have been developed to assist the clinician in decision making. The objective of this research is to build a classifier that will predict the presence or absence of a disease by learning from the minimal set of attributes that has been extracted from the clinical dataset. In this work rough set indiscernibility relation method with backpropagation neural network (RS-BPNN is used. This work has two stages. The first stage is handling of missing values to obtain a smooth data set and selection of appropriate attributes from the clinical dataset by indiscernibility relation method. The second stage is classification using backpropagation neural network on the selected reducts of the dataset. The classifier has been tested with hepatitis, Wisconsin breast cancer, and Statlog heart disease datasets obtained from the University of California at Irvine (UCI machine learning repository. The accuracy obtained from the proposed method is 97.3%, 98.6%, and 90.4% for hepatitis, breast cancer, and heart disease, respectively. The proposed system provides an effective classification model for clinical datasets.

  1. Institutional Distance and Partner Selection in International Technological Alliances

    NARCIS (Netherlands)

    Krammer, Marius

    2013-01-01

    This study posits that institutional distance has a negative influence on partner selection in international technological alliances. Empirical results based on a dataset of firms in the global tire industry confirm that firms prefer technological partners from closer cognitive, normative and

  2. The Iranian version of 12-item Short Form Health Survey (SF-12): factor structure, internal consistency and construct validity.

    Science.gov (United States)

    Montazeri, Ali; Vahdaninia, Mariam; Mousavi, Sayed Javad; Omidvari, Speideh

    2009-09-16

    The 12-item Short Form Health Survey (SF-12) as a shorter alternative of the SF-36 is largely used in health outcomes surveys. The aim of this study was to validate the SF-12 in Iran. A random sample of the general population aged 15 years and over living in Tehran, Iran completed the SF-12. Reliability was estimated using internal consistency and validity was assessed using known groups comparison and convergent validity. In addition, the factor structure of the questionnaire was extracted by performing both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). In all, 5587 individuals were studied (2721 male and 2866 female). The mean age and formal education of the respondents were 35.1 (SD = 15.4) and 10.2 (SD = 4.4) years respectively. The results showed satisfactory internal consistency for both summary measures, that are the Physical Component Summary (PCS) and the Mental Component Summary (MCS); Cronbach's alpha for PCS-12 and MCS-12 was 0.73 and 0.72, respectively. Known-groups comparison showed that the SF-12 discriminated well between men and women and those who differed in age and educational status (P < 0.001). In addition, correlations between the SF-12 scales and single items showed that the physical functioning, role physical, bodily pain and general health subscales correlated higher with the PCS-12 score, while the vitality, social functioning, role emotional and mental health subscales more correlated with the MCS-12 score lending support to its good convergent validity. Finally the principal component analysis indicated a two-factor structure (physical and mental health) that jointly accounted for 57.8% of the variance. The confirmatory factory analysis also indicated a good fit to the data for the two-latent structure (physical and mental health). In general the findings suggest that the SF-12 is a reliable and valid measure of health related quality of life among Iranian population. However, further studies are needed to

  3. Spatially-explicit estimation of geographical representation in large-scale species distribution datasets.

    Science.gov (United States)

    Kalwij, Jesse M; Robertson, Mark P; Ronk, Argo; Zobel, Martin; Pärtel, Meelis

    2014-01-01

    Much ecological research relies on existing multispecies distribution datasets. Such datasets, however, can vary considerably in quality, extent, resolution or taxonomic coverage. We provide a framework for a spatially-explicit evaluation of geographical representation within large-scale species distribution datasets, using the comparison of an occurrence atlas with a range atlas dataset as a working example. Specifically, we compared occurrence maps for 3773 taxa from the widely-used Atlas Florae Europaeae (AFE) with digitised range maps for 2049 taxa of the lesser-known Atlas of North European Vascular Plants. We calculated the level of agreement at a 50-km spatial resolution using average latitudinal and longitudinal species range, and area of occupancy. Agreement in species distribution was calculated and mapped using Jaccard similarity index and a reduced major axis (RMA) regression analysis of species richness between the entire atlases (5221 taxa in total) and between co-occurring species (601 taxa). We found no difference in distribution ranges or in the area of occupancy frequency distribution, indicating that atlases were sufficiently overlapping for a valid comparison. The similarity index map showed high levels of agreement for central, western, and northern Europe. The RMA regression confirmed that geographical representation of AFE was low in areas with a sparse data recording history (e.g., Russia, Belarus and the Ukraine). For co-occurring species in south-eastern Europe, however, the Atlas of North European Vascular Plants showed remarkably higher richness estimations. Geographical representation of atlas data can be much more heterogeneous than often assumed. Level of agreement between datasets can be used to evaluate geographical representation within datasets. Merging atlases into a single dataset is worthwhile in spite of methodological differences, and helps to fill gaps in our knowledge of species distribution ranges. Species distribution

  4. The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset

    Science.gov (United States)

    Huffman, George J.; Adler, Robert F.; Arkin, Philip; Chang, Alfred; Ferraro, Ralph; Gruber, Arnold; Janowiak, John; McNab, Alan; Rudolf, Bruno; Schneider, Udo

    1997-01-01

    The Global Precipitation Climatology Project (GPCP) has released the GPCP Version 1 Combined Precipitation Data Set, a global, monthly precipitation dataset covering the period July 1987 through December 1995. The primary product in the dataset is a merged analysis incorporating precipitation estimates from low-orbit-satellite microwave data, geosynchronous-orbit -satellite infrared data, and rain gauge observations. The dataset also contains the individual input fields, a combination of the microwave and infrared satellite estimates, and error estimates for each field. The data are provided on 2.5 deg x 2.5 deg latitude-longitude global grids. Preliminary analyses show general agreement with prior studies of global precipitation and extends prior studies of El Nino-Southern Oscillation precipitation patterns. At the regional scale there are systematic differences with standard climatologies.

  5. A new dataset and algorithm evaluation for mood estimation in music

    OpenAIRE

    Godec, Primož

    2014-01-01

    This thesis presents a new dataset of perceived and induced emotions for 200 audio clips. The gathered dataset provides users' perceived and induced emotions for each clip, the association of color, along with demographic and personal data, such as user's emotion state and emotion ratings, genre preference, music experience, among others. With an online survey we collected more than 7000 responses for a dataset of 200 audio excerpts, thus providing about 37 user responses per clip. The foc...

  6. A Large-Scale 3D Object Recognition dataset

    DEFF Research Database (Denmark)

    Sølund, Thomas; Glent Buch, Anders; Krüger, Norbert

    2016-01-01

    geometric groups; concave, convex, cylindrical and flat 3D object models. The object models have varying amount of local geometric features to challenge existing local shape feature descriptors in terms of descriptiveness and robustness. The dataset is validated in a benchmark which evaluates the matching...... performance of 7 different state-of-the-art local shape descriptors. Further, we validate the dataset in a 3D object recognition pipeline. Our benchmark shows as expected that local shape feature descriptors without any global point relation across the surface have a poor matching performance with flat...

  7. The Wind Integration National Dataset (WIND) toolkit (Presentation)

    Energy Technology Data Exchange (ETDEWEB)

    Caroline Draxl: NREL

    2014-01-01

    Regional wind integration studies require detailed wind power output data at many locations to perform simulations of how the power system will operate under high penetration scenarios. The wind datasets that serve as inputs into the study must realistically reflect the ramping characteristics, spatial and temporal correlations, and capacity factors of the simulated wind plants, as well as being time synchronized with available load profiles.As described in this presentation, the WIND Toolkit fulfills these requirements by providing a state-of-the-art national (US) wind resource, power production and forecast dataset.

  8. An integrated pan-tropical biomass map using multiple reference datasets

    NARCIS (Netherlands)

    Avitabile, V.; Herold, M.; Heuvelink, G.B.M.; Lewis, S.L.; Phillips, O.L.; Asner, G.P.; Armston, J.; Asthon, P.; Banin, L.F.; Bayol, N.; Berry, N.; Boeckx, P.; Jong, De B.; Devries, B.; Girardin, C.; Kearsley, E.; Lindsell, J.A.; Lopez-gonzalez, G.; Lucas, R.; Malhi, Y.; Morel, A.; Mitchard, E.; Nagy, L.; Qie, L.; Quinones, M.; Ryan, C.M.; Slik, F.; Sunderland, T.; Vaglio Laurin, G.; Valentini, R.; Verbeeck, H.; Wijaya, A.; Willcock, S.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of

  9. The SAIL databank: linking multiple health and social care datasets.

    Science.gov (United States)

    Lyons, Ronan A; Jones, Kerina H; John, Gareth; Brooks, Caroline J; Verplancke, Jean-Philippe; Ford, David V; Brown, Ginevra; Leake, Ken

    2009-01-16

    Vast amounts of data are collected about patients and service users in the course of health and social care service delivery. Electronic data systems for patient records have the potential to revolutionise service delivery and research. But in order to achieve this, it is essential that the ability to link the data at the individual record level be retained whilst adhering to the principles of information governance. The SAIL (Secure Anonymised Information Linkage) databank has been established using disparate datasets, and over 500 million records from multiple health and social care service providers have been loaded to date, with further growth in progress. Having established the infrastructure of the databank, the aim of this work was to develop and implement an accurate matching process to enable the assignment of a unique Anonymous Linking Field (ALF) to person-based records to make the databank ready for record-linkage research studies. An SQL-based matching algorithm (MACRAL, Matching Algorithm for Consistent Results in Anonymised Linkage) was developed for this purpose. Firstly the suitability of using a valid NHS number as the basis of a unique identifier was assessed using MACRAL. Secondly, MACRAL was applied in turn to match primary care, secondary care and social services datasets to the NHS Administrative Register (NHSAR), to assess the efficacy of this process, and the optimum matching technique. The validation of using the NHS number yielded specificity values > 99.8% and sensitivity values > 94.6% using probabilistic record linkage (PRL) at the 50% threshold, and error rates were SAIL databank represents a research-ready platform for record-linkage studies.

  10. Factor structure and internal consistency of the 12-item General Health Questionnaire (GHQ-12 and the Subjective Vitality Scale (VS, and the relationship between them: a study from France

    Directory of Open Access Journals (Sweden)

    Ismaïl Amany

    2009-03-01

    Full Text Available Abstract Background The objectives of this study were to test the factor structure and internal consistency of the 12-item General Health Questionnaire (GHQ-12 and the Subjective Vitality Scale (VS in elderly French people, and to test the relationship between these two questionnaires. Methods Using a standard 'forward-backward' translation procedure, the English language versions of the two instruments (i.e. the 12-item General Health Questionnaire and the Subjective Vitality Scale were translated into French. A sample of adults aged 58–72 years then completed both questionnaires. Internal consistency was assessed by Cronbach's alpha coefficient. The factor structures of the two instruments were extracted by confirmatory factor analysis (CFA. Finally, the relationship between the two instruments was assessed by correlation analysis. Results In all, 217 elderly adults participated in the study. The mean age of the respondents was 61.7 (SD = 6.2 years. The mean GHQ-12 score was 17.4 (SD = 8.0, and analysis showed satisfactory internal consistency (Cronbach's alpha coefficient = 0.78. The mean VS score was 22.4 (SD = 7.4 and its internal consistency was found to be good (Cronbach's alpha coefficient = 0.83. While CFA showed that the VS was uni-dimensional, analysis for the GHQ-12 demonstrated a good fit not only to the two-factor model (positive vs. negative items but also to a three-factor model. As expected, there was a strong and significant negative correlation between the GHQ-12 and the VS (r = -0.71, P Conclusion The results showed that the French versions of the 12-item General Health Questionnaire (GHQ-12 and the Subjective Vitality Scale (VS are reliable measures of psychological distress and vitality. They also confirm a significant negative correlation between these two instruments, lending support to their convergent validity in an elderly French population. The findings indicate that both measures have good structural

  11. Global Human Built-up And Settlement Extent (HBASE) Dataset From Landsat

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Human Built-up And Settlement Extent (HBASE) Dataset from Landsat is a global map of HBASE derived from the Global Land Survey (GLS) Landsat dataset for...

  12. Passive Containment DataSet

    Science.gov (United States)

    This data is for Figures 6 and 7 in the journal article. The data also includes the two EPANET input files used for the analysis described in the paper, one for the looped system and one for the block system.This dataset is associated with the following publication:Grayman, W., R. Murray , and D. Savic. Redesign of Water Distribution Systems for Passive Containment of Contamination. JOURNAL OF THE AMERICAN WATER WORKS ASSOCIATION. American Water Works Association, Denver, CO, USA, 108(7): 381-391, (2016).

  13. Knowledge Evolution in Distributed Geoscience Datasets and the Role of Semantic Technologies

    Science.gov (United States)

    Ma, X.

    2014-12-01

    Knowledge evolves in geoscience, and the evolution is reflected in datasets. In a context with distributed data sources, the evolution of knowledge may cause considerable challenges to data management and re-use. For example, a short news published in 2009 (Mascarelli, 2009) revealed the geoscience community's concern that the International Commission on Stratigraphy's change to the definition of Quaternary may bring heavy reworking of geologic maps. Now we are in the era of the World Wide Web, and geoscience knowledge is increasingly modeled and encoded in the form of ontologies and vocabularies by using semantic technologies. Accordingly, knowledge evolution leads to a consequence called ontology dynamics. Flouris et al. (2008) summarized 10 topics of general ontology changes/dynamics such as: ontology mapping, morphism, evolution, debugging and versioning, etc. Ontology dynamics makes impacts at several stages of a data life cycle and causes challenges, such as: the request for reworking of the extant data in a data center, semantic mismatch among data sources, differentiated understanding of a same piece of dataset between data providers and data users, as well as error propagation in cross-discipline data discovery and re-use (Ma et al., 2014). This presentation will analyze the best practices in the geoscience community so far and summarize a few recommendations to reduce the negative impacts of ontology dynamics in a data life cycle, including: communities of practice and collaboration on ontology and vocabulary building, link data records to standardized terms, and methods for (semi-)automatic reworking of datasets using semantic technologies. References: Flouris, G., Manakanatas, D., Kondylakis, H., Plexousakis, D., Antoniou, G., 2008. Ontology change: classification and survey. The Knowledge Engineering Review 23 (2), 117-152. Ma, X., Fox, P., Rozell, E., West, P., Zednik, S., 2014. Ontology dynamics in a data life cycle: Challenges and recommendations

  14. The Lunar Source Disk: Old Lunar Datasets on a New CD-ROM

    Science.gov (United States)

    Hiesinger, H.

    1998-01-01

    A compilation of previously published datasets on CD-ROM is presented. This Lunar Source Disk is intended to be a first step in the improvement/expansion of the Lunar Consortium Disk, in order to create an "image-cube"-like data pool that can be easily accessed and might be useful for a variety of future lunar investigations. All datasets were transformed to a standard map projection that allows direct comparison of different types of information on a pixel-by pixel basis. Lunar observations have a long history and have been important to mankind for centuries, notably since the work of Plutarch and Galileo. As a consequence of centuries of lunar investigations, knowledge of the characteristics and properties of the Moon has accumulated over time. However, a side effect of this accumulation is that it has become more and more complicated for scientists to review all the datasets obtained through different techniques, to interpret them properly, to recognize their weaknesses and strengths in detail, and to combine them synoptically in geologic interpretations. Such synoptic geologic interpretations are crucial for the study of planetary bodies through remote-sensing data in order to avoid misinterpretation. In addition, many of the modem datasets, derived from Earth-based telescopes as well as from spacecraft missions, are acquired at different geometric and radiometric conditions. These differences make it challenging to compare or combine datasets directly or to extract information from different datasets on a pixel-by-pixel basis. Also, as there is no convention for the presentation of lunar datasets, different authors choose different map projections, depending on the location of the investigated areas and their personal interests. Insufficient or incomplete information on the map parameters used by different authors further complicates the reprojection of these datasets to a standard geometry. The goal of our efforts was to transfer previously published lunar

  15. Structural Consistency, Consistency, and Sequential Rationality.

    OpenAIRE

    Kreps, David M; Ramey, Garey

    1987-01-01

    Sequential equilibria comprise consistent beliefs and a sequentially ra tional strategy profile. Consistent beliefs are limits of Bayes ratio nal beliefs for sequences of strategies that approach the equilibrium strategy. Beliefs are structurally consistent if they are rationaliz ed by some single conjecture concerning opponents' strategies. Consis tent beliefs are not necessarily structurally consistent, notwithstan ding a claim by Kreps and Robert Wilson (1982). Moreover, the spirit of stru...

  16. Factor structure, internal consistency and reliability of the Posttraumatic Stress Disorder Checklist (PCL: an exploratory study Estrutura fatorial, consistência interna e confiabilidade do Posttraumatic Stress Disorder Checklist (PCL: um estudo exploratório

    Directory of Open Access Journals (Sweden)

    Eduardo de Paula Lima

    2012-01-01

    Full Text Available INTRODUCTION: Posttraumatic stress disorder (PTSD is an anxiety disorder resulting from exposure to traumatic events. The Posttraumatic Stress Disorder Checklist (PCL is a self-report measure largely used to evaluate the presence of PTSD. OBJECTIVE: To investigate the internal consistency, temporal reliability and factor validity of the Portuguese language version of the PCL used in Brazil. METHODS: A total of 186 participants were recruited. The sample was heterogeneous with regard to occupation, sociodemographic data, mental health history, and exposure to traumatic events. Subjects answered the PCL at two occasions within a 15 days’ interval (range: 5-15 days. RESULTS: Cronbach’s alpha coefficients indicated high internal consistency for the total scale (0.91 and for the theoretical dimensions of the Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV (0.83, 0.81, and 0.80. Temporal reliability (test-retest was high and consistent for different cutoffs. Maximum likelihood exploratory factor analysis (EFA was conducted and oblique rotation (Promax was applied. The Kaiser-Meyer-Olkin (KMO index (0.911 and Bartlett’s test of sphericity (χ² = 1,381.34, p INTRODUÇÃO: O transtorno do estresse pós-traumático (TEPT é um transtorno de ansiedade decorrente da exposição a eventos traumáticos. Entre as medidas de avaliação dos sintomas, destaca-se o Posttraumatic Stress Disorder Checklist (PCL. OBJETIVO: Investigar a consistência interna, a confiabilidade temporal e a validade fatorial da versão do PCL em português, utilizada no Brasil. MÉTODOS: Participaram do estudo 186 indivíduos heterogêneos em relação a ocupação, características sociodemográficas, histórico de saúde mental e exposição a eventos traumáticos. O PCL foi aplicado em dois momentos considerando um intervalo máximo de 15 dias (intervalo: 5-15 dias. RESULTADOS: A consistência interna (alfa de Cronbach foi adequada para a escala

  17. ENHANCED DATA DISCOVERABILITY FOR IN SITU HYPERSPECTRAL DATASETS

    Directory of Open Access Journals (Sweden)

    B. Rasaiah

    2016-06-01

    Full Text Available Field spectroscopic metadata is a central component in the quality assurance, reliability, and discoverability of hyperspectral data and the products derived from it. Cataloguing, mining, and interoperability of these datasets rely upon the robustness of metadata protocols for field spectroscopy, and on the software architecture to support the exchange of these datasets. Currently no standard for in situ spectroscopy data or metadata protocols exist. This inhibits the effective sharing of growing volumes of in situ spectroscopy datasets, to exploit the benefits of integrating with the evolving range of data sharing platforms. A core metadataset for field spectroscopy was introduced by Rasaiah et al., (2011-2015 with extended support for specific applications. This paper presents a prototype model for an OGC and ISO compliant platform-independent metadata discovery service aligned to the specific requirements of field spectroscopy. In this study, a proof-of-concept metadata catalogue has been described and deployed in a cloud-based architecture as a demonstration of an operationalized field spectroscopy metadata standard and web-based discovery service.

  18. Environmental Dataset Gateway (EDG) CS-W Interface

    Data.gov (United States)

    U.S. Environmental Protection Agency — Use the Environmental Dataset Gateway (EDG) to find and access EPA's environmental resources. Many options are available for easily reusing EDG content in other...

  19. Annotating spatio-temporal datasets for meaningful analysis in the Web

    Science.gov (United States)

    Stasch, Christoph; Pebesma, Edzer; Scheider, Simon

    2014-05-01

    More and more environmental datasets that vary in space and time are available in the Web. This comes along with an advantage of using the data for other purposes than originally foreseen, but also with the danger that users may apply inappropriate analysis procedures due to lack of important assumptions made during the data collection process. In order to guide towards a meaningful (statistical) analysis of spatio-temporal datasets available in the Web, we have developed a Higher-Order-Logic formalism that captures some relevant assumptions in our previous work [1]. It allows to proof on meaningful spatial prediction and aggregation in a semi-automated fashion. In this poster presentation, we will present a concept for annotating spatio-temporal datasets available in the Web with concepts defined in our formalism. Therefore, we have defined a subset of the formalism as a Web Ontology Language (OWL) pattern. It allows capturing the distinction between the different spatio-temporal variable types, i.e. point patterns, fields, lattices and trajectories, that in turn determine whether a particular dataset can be interpolated or aggregated in a meaningful way using a certain procedure. The actual annotations that link spatio-temporal datasets with the concepts in the ontology pattern are provided as Linked Data. In order to allow data producers to add the annotations to their datasets, we have implemented a Web portal that uses a triple store at the backend to store the annotations and to make them available in the Linked Data cloud. Furthermore, we have implemented functions in the statistical environment R to retrieve the RDF annotations and, based on these annotations, to support a stronger typing of spatio-temporal datatypes guiding towards a meaningful analysis in R. [1] Stasch, C., Scheider, S., Pebesma, E., Kuhn, W. (2014): "Meaningful spatial prediction and aggregation", Environmental Modelling & Software, 51, 149-165.

  20. Evolving hard problems: Generating human genetics datasets with a complex etiology

    Directory of Open Access Journals (Sweden)

    Himmelstein Daniel S

    2011-07-01

    Full Text Available Abstract Background A goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models. Results Here we develop and evaluate a model free evolution strategy to generate datasets which display a complex relationship between individual genotype and disease susceptibility. We show that this model free approach is capable of generating a diverse array of datasets with distinct gene-disease relationships for an arbitrary interaction order and sample size. We specifically generate eight-hundred Pareto fronts; one for each independent run of our algorithm. In each run the predictiveness of single genetic variation and pairs of genetic variants have been minimized, while the predictiveness of third, fourth, or fifth-order combinations is maximized. Two hundred runs of the algorithm are further dedicated to creating datasets with predictive four or five order interactions and minimized lower-level effects. Conclusions This method and the resulting datasets will allow the capabilities of novel methods to be tested without pre-specified genetic models. This allows researchers to evaluate which methods will succeed on human genetics problems where the model is not known in advance. We further make freely available to the community the entire Pareto-optimal front of datasets from each run so that novel methods may be rigorously evaluated. These 76,600 datasets are available from http://discovery.dartmouth.edu/model_free_data/.

  1. A Dataset from TIMSS to Examine the Relationship between Computer Use and Mathematics Achievement

    Science.gov (United States)

    Kadijevich, Djordje M.

    2015-01-01

    Because the relationship between computer use and achievement is still puzzling, there is a need to prepare and analyze good quality datasets on computer use and achievement. Such a dataset can be derived from TIMSS data. This paper describes how this dataset can be prepared. It also gives an example of how the dataset may be analyzed. The…

  2. Protective Factors, Risk Indicators, and Contraceptive Consistency Among College Women.

    Science.gov (United States)

    Morrison, Leslie F; Sieving, Renee E; Pettingell, Sandra L; Hellerstedt, Wendy L; McMorris, Barbara J; Bearinger, Linda H

    2016-01-01

    To explore risk and protective factors associated with consistent contraceptive use among emerging adult female college students and whether effects of risk indicators were moderated by protective factors. Secondary analysis of National Longitudinal Study of Adolescent to Adult Health Wave III data. Data collected through in-home interviews in 2001 and 2002. National sample of 18- to 25-year-old women (N = 842) attending 4-year colleges. We examined relationships between protective factors, risk indicators, and consistent contraceptive use. Consistent contraceptive use was defined as use all of the time during intercourse in the past 12 months. Protective factors included external supports of parental closeness and relationship with caring nonparental adult and internal assets of self-esteem, confidence, independence, and life satisfaction. Risk indicators included heavy episodic drinking, marijuana use, and depression symptoms. Multivariable logistic regression models were used to evaluate relationships between protective factors and consistent contraceptive use and between risk indicators and contraceptive use. Self-esteem, confidence, independence, and life satisfaction were significantly associated with more consistent contraceptive use. In a final model including all internal assets, life satisfaction was significantly related to consistent contraceptive use. Marijuana use and depression symptoms were significantly associated with less consistent use. With one exception, protective factors did not moderate relationships between risk indicators and consistent use. Based on our findings, we suggest that risk and protective factors may have largely independent influences on consistent contraceptive use among college women. A focus on risk and protective factors may improve contraceptive use rates and thereby reduce unintended pregnancy among college students. Copyright © 2016 AWHONN, the Association of Women's Health, Obstetric and Neonatal Nurses. Published

  3. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) with Enhanced Trimming, Release 3

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains the latest official release of International Comprehensive Ocean-Atmosphere Data Set (ICOADS) with Enhanced Trimming, provided in a common...

  4. Data Recommender: An Alternative Way to Discover Open Scientific Datasets

    Science.gov (United States)

    Klump, J. F.; Devaraju, A.; Williams, G.; Hogan, D.; Davy, R.; Page, J.; Singh, D.; Peterson, N.

    2017-12-01

    Over the past few years, institutions and government agencies have adopted policies to openly release their data, which has resulted in huge amounts of open data becoming available on the web. When trying to discover the data, users face two challenges: an overload of choice and the limitations of the existing data search tools. On the one hand, there are too many datasets to choose from, and therefore, users need to spend considerable effort to find the datasets most relevant to their research. On the other hand, data portals commonly offer keyword and faceted search, which depend fully on the user queries to search and rank relevant datasets. Consequently, keyword and faceted search may return loosely related or irrelevant results, although the results may contain the same query. They may also return highly specific results that depend more on how well metadata was authored. They do not account well for variance in metadata due to variance in author styles and preferences. The top-ranked results may also come from the same data collection, and users are unlikely to discover new and interesting datasets. These search modes mainly suits users who can express their information needs in terms of the structure and terminology of the data portals, but may pose a challenge otherwise. The above challenges reflect that we need a solution that delivers the most relevant (i.e., similar and serendipitous) datasets to users, beyond the existing search functionalities on the portals. A recommender system is an information filtering system that presents users with relevant and interesting contents based on users' context and preferences. Delivering data recommendations to users can make data discovery easier, and as a result may enhance user engagement with the portal. We developed a hybrid data recommendation approach for the CSIRO Data Access Portal. The approach leverages existing recommendation techniques (e.g., content-based filtering and item co-occurrence) to produce

  5. Analytic Intermodel Consistent Modeling of Volumetric Human Lung Dynamics.

    Science.gov (United States)

    Ilegbusi, Olusegun; Seyfi, Behnaz; Neylon, John; Santhanam, Anand P

    2015-10-01

    Human lung undergoes breathing-induced deformation in the form of inhalation and exhalation. Modeling the dynamics is numerically complicated by the lack of information on lung elastic behavior and fluid-structure interactions between air and the tissue. A mathematical method is developed to integrate deformation results from a deformable image registration (DIR) and physics-based modeling approaches in order to represent consistent volumetric lung dynamics. The computational fluid dynamics (CFD) simulation assumes the lung is a poro-elastic medium with spatially distributed elastic property. Simulation is performed on a 3D lung geometry reconstructed from four-dimensional computed tomography (4DCT) dataset of a human subject. The heterogeneous Young's modulus (YM) is estimated from a linear elastic deformation model with the same lung geometry and 4D lung DIR. The deformation obtained from the CFD is then coupled with the displacement obtained from the 4D lung DIR by means of the Tikhonov regularization (TR) algorithm. The numerical results include 4DCT registration, CFD, and optimal displacement data which collectively provide consistent estimate of the volumetric lung dynamics. The fusion method is validated by comparing the optimal displacement with the results obtained from the 4DCT registration.

  6. Artificial intelligence (AI) systems for interpreting complex medical datasets.

    Science.gov (United States)

    Altman, R B

    2017-05-01

    Advances in machine intelligence have created powerful capabilities in algorithms that find hidden patterns in data, classify objects based on their measured characteristics, and associate similar patients/diseases/drugs based on common features. However, artificial intelligence (AI) applications in medical data have several technical challenges: complex and heterogeneous datasets, noisy medical datasets, and explaining their output to users. There are also social challenges related to intellectual property, data provenance, regulatory issues, economics, and liability. © 2017 ASCPT.

  7. Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

    KAUST Repository

    Zhang, Bohai; Sang, Huiyan; Huang, Jianhua Z.

    2014-01-01

    of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov

  8. PERFORMANCE COMPARISON FOR INTRUSION DETECTION SYSTEM USING NEURAL NETWORK WITH KDD DATASET

    Directory of Open Access Journals (Sweden)

    S. Devaraju

    2014-04-01

    Full Text Available Intrusion Detection Systems are challenging task for finding the user as normal user or attack user in any organizational information systems or IT Industry. The Intrusion Detection System is an effective method to deal with the kinds of problem in networks. Different classifiers are used to detect the different kinds of attacks in networks. In this paper, the performance of intrusion detection is compared with various neural network classifiers. In the proposed research the four types of classifiers used are Feed Forward Neural Network (FFNN, Generalized Regression Neural Network (GRNN, Probabilistic Neural Network (PNN and Radial Basis Neural Network (RBNN. The performance of the full featured KDD Cup 1999 dataset is compared with that of the reduced featured KDD Cup 1999 dataset. The MATLAB software is used to train and test the dataset and the efficiency and False Alarm Rate is measured. It is proved that the reduced dataset is performing better than the full featured dataset.

  9. Recent Development on the NOAA's Global Surface Temperature Dataset

    Science.gov (United States)

    Zhang, H. M.; Huang, B.; Boyer, T.; Lawrimore, J. H.; Menne, M. J.; Rennie, J.

    2016-12-01

    Global Surface Temperature (GST) is one of the most widely used indicators for climate trend and extreme analyses. A widely used GST dataset is the NOAA merged land-ocean surface temperature dataset known as NOAAGlobalTemp (formerly MLOST). The NOAAGlobalTemp had recently been updated from version 3.5.4 to version 4. The update includes a significant improvement in the ocean surface component (Extended Reconstructed Sea Surface Temperature or ERSST, from version 3b to version 4) which resulted in an increased temperature trends in recent decades. Since then, advancements in both the ocean component (ERSST) and land component (GHCN-Monthly) have been made, including the inclusion of Argo float SSTs and expanded EOT modes in ERSST, and the use of ISTI databank in GHCN-Monthly. In this presentation, we describe the impact of those improvements on the merged global temperature dataset, in terms of global trends and other aspects.

  10. The OXL format for the exchange of integrated datasets

    Directory of Open Access Journals (Sweden)

    Taubert Jan

    2007-12-01

    Full Text Available A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i cover data from a broad range of application domains, ii be flexible and extensible to combine many different complex data structures, iii include metadata and semantic definitions, iv include inferred information, v identify the original data source for integrated entities and vi transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML or the generic approaches (RDF, OWL fulfil these requirements in a systematic way.

  11. Application of data science tools to quantify and distinguish between structures and models in molecular dynamics datasets.

    Science.gov (United States)

    Kalidindi, Surya R; Gomberg, Joshua A; Trautt, Zachary T; Becker, Chandler A

    2015-08-28

    Structure quantification is key to successful mining and extraction of core materials knowledge from both multiscale simulations as well as multiscale experiments. The main challenge stems from the need to transform the inherently high dimensional representations demanded by the rich hierarchical material structure into useful, high value, low dimensional representations. In this paper, we develop and demonstrate the merits of a data-driven approach for addressing this challenge at the atomic scale. The approach presented here is built on prior successes demonstrated for mesoscale representations of material internal structure, and involves three main steps: (i) digital representation of the material structure, (ii) extraction of a comprehensive set of structure measures using the framework of n-point spatial correlations, and (iii) identification of data-driven low dimensional measures using principal component analyses. These novel protocols, applied on an ensemble of structure datasets output from molecular dynamics (MD) simulations, have successfully classified the datasets based on several model input parameters such as the interatomic potential and the temperature used in the MD simulations.

  12. Application of data science tools to quantify and distinguish between structures and models in molecular dynamics datasets

    International Nuclear Information System (INIS)

    Kalidindi, Surya R; Gomberg, Joshua A; Trautt, Zachary T; Becker, Chandler A

    2015-01-01

    Structure quantification is key to successful mining and extraction of core materials knowledge from both multiscale simulations as well as multiscale experiments. The main challenge stems from the need to transform the inherently high dimensional representations demanded by the rich hierarchical material structure into useful, high value, low dimensional representations. In this paper, we develop and demonstrate the merits of a data-driven approach for addressing this challenge at the atomic scale. The approach presented here is built on prior successes demonstrated for mesoscale representations of material internal structure, and involves three main steps: (i) digital representation of the material structure, (ii) extraction of a comprehensive set of structure measures using the framework of n-point spatial correlations, and (iii) identification of data-driven low dimensional measures using principal component analyses. These novel protocols, applied on an ensemble of structure datasets output from molecular dynamics (MD) simulations, have successfully classified the datasets based on several model input parameters such as the interatomic potential and the temperature used in the MD simulations. (paper)

  13. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  14. BASE MAP DATASET, MAYES COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications: cadastral, geodetic control,...

  15. PENERAPAN TEKNIK BAGGING PADA ALGORITMA KLASIFIKASI UNTUK MENGATASI KETIDAKSEIMBANGAN KELAS DATASET MEDIS

    Directory of Open Access Journals (Sweden)

    Rizki Tri Prasetio

    2016-03-01

    Full Text Available ABSTRACT – The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems. Some medical dataset has two classes has two classes or binominal experiencing an imbalance that causes lack of accuracy in classification. This research proposed a combination technique of bagging and algorithms of classification to improve the accuracy of medical datasets. Bagging technique used to solve the problem of imbalanced class. The proposed method is applied on three classifier algorithm i.e., naïve bayes, decision tree and k-nearest neighbor. This research uses five medical datasets obtained from UCI Machine Learning i.e.., breast-cancer, liver-disorder, heart-disease, pima-diabetes and vertebral column. Results of this research indicate that the proposed method makes a significant improvement on two algorithms of classification i.e. decision tree with p value of t-Test 0.0184 and k-nearest neighbor with p value of t-Test 0.0292, but not significant in naïve bayes with p value of t-Test 0.9236. After bagging technique applied at five medical datasets, naïve bayes has the highest accuracy for breast-cancer dataset of 96.14% with AUC of 0.984, heart-disease of 84.44% with AUC of 0.911 and pima-diabetes of 74.73% with AUC of 0.806. While the k-nearest neighbor has the best accuracy for dataset liver-disorder of 62.03% with AUC of 0.632 and vertebral-column of 82.26% with the AUC of 0.867. Keywords: ensemble technique, bagging, imbalanced class, medical dataset. ABSTRAKSI – Masalah ketidakseimbangan kelas telah dilaporkan sangat menghambat kinerja klasifikasi banyak algoritma klasifikasi dan telah menarik banyak perhatian dari

  16. International Comprehensive Ocean Atmosphere Data Set (ICOADS) in Near-Real Time (NRT)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Near-Real-Time (NRT) product is an extension of the official ICOADS dataset with preliminary...

  17. CERC Dataset (Full Hadza Data)

    DEFF Research Database (Denmark)

    2016-01-01

    The dataset includes demographic, behavioral, and religiosity data from eight different populations from around the world. The samples were drawn from: (1) Coastal and (2) Inland Tanna, Vanuatu; (3) Hadzaland, Tanzania; (4) Lovu, Fiji; (5) Pointe aux Piment, Mauritius; (6) Pesqueiro, Brazil; (7......) Kyzyl, Tyva Republic; and (8) Yasawa, Fiji. Related publication: Purzycki, et al. (2016). Moralistic Gods, Supernatural Punishment and the Expansion of Human Sociality. Nature, 530(7590): 327-330....

  18. A new integrated and homogenized global monthly land surface air temperature dataset for the period since 1900

    Science.gov (United States)

    Xu, Wenhui; Li, Qingxiang; Jones, Phil; Wang, Xiaolan L.; Trewin, Blair; Yang, Su; Zhu, Chen; Zhai, Panmao; Wang, Jinfeng; Vincent, Lucie; Dai, Aiguo; Gao, Yun; Ding, Yihui

    2018-04-01

    A new dataset of integrated and homogenized monthly surface air temperature over global land for the period since 1900 [China Meteorological Administration global Land Surface Air Temperature (CMA-LSAT)] is developed. In total, 14 sources have been collected and integrated into the newly developed dataset, including three global (CRUTEM4, GHCN, and BEST), three regional and eight national sources. Duplicate stations are identified, and those with the higher priority are chosen or spliced. Then, a consistency test and a climate outlier test are conducted to ensure that each station series is quality controlled. Next, two steps are adopted to assure the homogeneity of the station series: (1) homogenized station series in existing national datasets (by National Meteorological Services) are directly integrated into the dataset without any changes (50% of all stations), and (2) the inhomogeneities are detected and adjusted for in the remaining data series using a penalized maximal t test (50% of all stations). Based on the dataset, we re-assess the temperature changes in global and regional areas compared with GHCN-V3 and CRUTEM4, as well as the temperature changes during the three periods of 1900-2014, 1979-2014 and 1998-2014. The best estimates of warming trends and there 95% confidence ranges for 1900-2014 are approximately 0.102 ± 0.006 °C/decade for the whole year, and 0.104 ± 0.009, 0.112 ± 0.007, 0.090 ± 0.006, and 0.092 ± 0.007 °C/decade for the DJF (December, January, February), MAM, JJA, and SON seasons, respectively. MAM saw the most significant warming trend in both 1900-2014 and 1979-2014. For an even shorter and more recent period (1998-2014), MAM, JJA and SON show similar warming trends, while DJF shows opposite trends. The results show that the ability of CMA-LAST for describing the global temperature changes is similar with other existing products, while there are some differences when describing regional temperature changes.

  19. Error characterisation of global active and passive microwave soil moisture datasets

    Directory of Open Access Journals (Sweden)

    W. A. Dorigo

    2010-12-01

    Full Text Available Understanding the error structures of remotely sensed soil moisture observations is essential for correctly interpreting observed variations and trends in the data or assimilating them in hydrological or numerical weather prediction models. Nevertheless, a spatially coherent assessment of the quality of the various globally available datasets is often hampered by the limited availability over space and time of reliable in-situ measurements. As an alternative, this study explores the triple collocation error estimation technique for assessing the relative quality of several globally available soil moisture products from active (ASCAT and passive (AMSR-E and SSM/I microwave sensors. The triple collocation is a powerful statistical tool to estimate the root mean square error while simultaneously solving for systematic differences in the climatologies of a set of three linearly related data sources with independent error structures. Prerequisite for this technique is the availability of a sufficiently large number of timely corresponding observations. In addition to the active and passive satellite-based datasets, we used the ERA-Interim and GLDAS-NOAH reanalysis soil moisture datasets as a third, independent reference. The prime objective is to reveal trends in uncertainty related to different observation principles (passive versus active, the use of different frequencies (C-, X-, and Ku-band for passive microwave observations, and the choice of the independent reference dataset (ERA-Interim versus GLDAS-NOAH. The results suggest that the triple collocation method provides realistic error estimates. Observed spatial trends agree well with the existing theory and studies on the performance of different observation principles and frequencies with respect to land cover and vegetation density. In addition, if all theoretical prerequisites are fulfilled (e.g. a sufficiently large number of common observations is available and errors of the different

  20. Synthetic ALSPAC longitudinal datasets for the Big Data VR project.

    Science.gov (United States)

    Avraam, Demetris; Wilson, Rebecca C; Burton, Paul

    2017-01-01

    Three synthetic datasets - of observation size 15,000, 155,000 and 1,555,000 participants, respectively - were created by simulating eleven cardiac and anthropometric variables from nine collection ages of the ALSAPC birth cohort study. The synthetic datasets retain similar data properties to the ALSPAC study data they are simulated from (co-variance matrices, as well as the mean and variance values of the variables) without including the original data itself or disclosing participant information.  In this instance, the three synthetic datasets have been utilised in an academia-industry collaboration to build a prototype virtual reality data analysis software, but they could have a broader use in method and software development projects where sensitive data cannot be freely shared.

  1. BASE MAP DATASET, HONOLULU COUNTY, HAWAII, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  2. BASE MAP DATASET, LOS ANGELES COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  3. BASE MAP DATASET, CHEROKEE COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  4. BASE MAP DATASET, EDGEFIELD COUNTY, SOUTH CAROLINA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  5. BASE MAP DATASET, SANTA CRIZ COUNTY, CALIFORNIA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — FEMA Framework Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme,...

  6. FASTQSim: platform-independent data characterization and in silico read generation for NGS datasets.

    Science.gov (United States)

    Shcherbina, Anna

    2014-08-15

    High-throughput next generation sequencing technologies have enabled rapid characterization of clinical and environmental samples. Consequently, the largest bottleneck to actionable data has become sample processing and bioinformatics analysis, creating a need for accurate and rapid algorithms to process genetic data. Perfectly characterized in silico datasets are a useful tool for evaluating the performance of such algorithms. Background contaminating organisms are observed in sequenced mixtures of organisms. In silico samples provide exact truth. To create the best value for evaluating algorithms, in silico data should mimic actual sequencer data as closely as possible. FASTQSim is a tool that provides the dual functionality of NGS dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with specific error profiles obtained in the characterization step. FASTQSim enables users to assess the quality of NGS datasets. The tool provides information about read length, read quality, repetitive and non-repetitive indel profiles, and single base pair substitutions. FASTQSim allows the user to simulate individual read datasets that can be used as standardized test scenarios for planning sequencing projects or for benchmarking metagenomic software. In this regard, in silico datasets generated with the FASTQsim tool hold several advantages over natural datasets: they are sequencing platform independent, extremely well characterized, and less expensive to generate. Such datasets are valuable in a number of applications, including the training of assemblers for multiple platforms, benchmarking bioinformatics algorithm performance, and creating challenge

  7. Se-SAD serial femtosecond crystallography datasets from selenobiotinyl-streptavidin

    Science.gov (United States)

    Yoon, Chun Hong; Demirci, Hasan; Sierra, Raymond G.; Dao, E. Han; Ahmadi, Radman; Aksit, Fulya; Aquila, Andrew L.; Batyuk, Alexander; Ciftci, Halilibrahim; Guillet, Serge; Hayes, Matt J.; Hayes, Brandon; Lane, Thomas J.; Liang, Meng; Lundström, Ulf; Koglin, Jason E.; Mgbam, Paul; Rao, Yashas; Rendahl, Theodore; Rodriguez, Evan; Zhang, Lindsey; Wakatsuki, Soichi; Boutet, Sébastien; Holton, James M.; Hunter, Mark S.

    2017-04-01

    We provide a detailed description of selenobiotinyl-streptavidin (Se-B SA) co-crystal datasets recorded using the Coherent X-ray Imaging (CXI) instrument at the Linac Coherent Light Source (LCLS) for selenium single-wavelength anomalous diffraction (Se-SAD) structure determination. Se-B SA was chosen as the model system for its high affinity between biotin and streptavidin where the sulfur atom in the biotin molecule (C10H16N2O3S) is substituted with selenium. The dataset was collected at three different transmissions (100, 50, and 10%) using a serial sample chamber setup which allows for two sample chambers, a front chamber and a back chamber, to operate simultaneously. Diffraction patterns from Se-B SA were recorded to a resolution of 1.9 Å. The dataset is publicly available through the Coherent X-ray Imaging Data Bank (CXIDB) and also on LCLS compute nodes as a resource for research and algorithm development.

  8. Dataset of transcriptional landscape of B cell early activation

    Directory of Open Access Journals (Sweden)

    Alexander S. Garruss

    2015-09-01

    Full Text Available Signaling via B cell receptors (BCR and Toll-like receptors (TLRs result in activation of B cells with distinct physiological outcomes, but transcriptional regulatory mechanisms that drive activation and distinguish these pathways remain unknown. At early time points after BCR and TLR ligand exposure, 0.5 and 2 h, RNA-seq was performed allowing observations on rapid transcriptional changes. At 2 h, ChIP-seq was performed to allow observations on important regulatory mechanisms potentially driving transcriptional change. The dataset includes RNA-seq, ChIP-seq of control (Input, RNA Pol II, H3K4me3, H3K27me3, and a separate RNA-seq for miRNA expression, which can be found at Gene Expression Omnibus Dataset GSE61608. Here, we provide details on the experimental and analysis methods used to obtain and analyze this dataset and to examine the transcriptional landscape of B cell early activation.

  9. U.S. Climate Divisional Dataset (Version Superseded)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This data has been superseded by a newer version of the dataset. Please refer to NOAA's Climate Divisional Database for more information. The U.S. Climate Divisional...

  10. UK surveillance: provision of quality assured information from combined datasets.

    Science.gov (United States)

    Paiba, G A; Roberts, S R; Houston, C W; Williams, E C; Smith, L H; Gibbens, J C; Holdship, S; Lysons, R

    2007-09-14

    Surveillance information is most useful when provided within a risk framework, which is achieved by presenting results against an appropriate denominator. Often the datasets are captured separately and for different purposes, and will have inherent errors and biases that can be further confounded by the act of merging. The United Kingdom Rapid Analysis and Detection of Animal-related Risks (RADAR) system contains data from several sources and provides both data extracts for research purposes and reports for wider stakeholders. Considerable efforts are made to optimise the data in RADAR during the Extraction, Transformation and Loading (ETL) process. Despite efforts to ensure data quality, the final dataset inevitably contains some data errors and biases, most of which cannot be rectified during subsequent analysis. So, in order for users to establish the 'fitness for purpose' of data merged from more than one data source, Quality Statements are produced as defined within the overarching surveillance Quality Framework. These documents detail identified data errors and biases following ETL and report construction as well as relevant aspects of the datasets from which the data originated. This paper illustrates these issues using RADAR datasets, and describes how they can be minimised.

  11. Climate Prediction Center IR 4km Dataset

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — CPC IR 4km dataset was created from all available individual geostationary satellite data which have been merged to form nearly seamless global (60N-60S) IR...

  12. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

    Science.gov (United States)

    Hervé, Maxime R; Nicolè, Florence; Lê Cao, Kim-Anh

    2018-03-01

    Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.

  13. Self-consistent asset pricing models

    Science.gov (United States)

    Malevergne, Y.; Sornette, D.

    2007-08-01

    We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the

  14. Construct validity and internal consistency reliability of the Malay version of the 21-item depression anxiety stress scale (Malay-DASS-21) among male outpatient clinic attendees in Johor.

    Science.gov (United States)

    Rusli, B N; Amrina, K; Trived, S; Loh, K P; Shashi, M

    2017-10-01

    The 21-item English version of the Depression Anxiety Stress Scale (DASS-21) has been proposed as a method for assessing self-perceived depression, anxiety and stress over the past week in various clinical and nonclinical populations. Several Malay versions of the DASS-21 have been validated in various populations with varying success. One particular Malay version has been validated in various occupational groups (such as nurses and automotive workers) but not among male clinic outpatient attendees in Malaysia. To validate the Malay version of the DASS-21 (Malay-DASS-21) among male outpatient clinic attendees in Johor. A validation study with a random sample of 402 male respondents attending the outpatient clinic of a major public outpatient clinic in Johor Bahru and Segamat was carried out from January to March 2016. Construct validity of the Malay-DASS-21 was examined using Exploratory Factor Analysis (KMO = 0.947; Bartlett's test of sphericity is significant, pDASS- 21 and the internal consistency reliability using Cronbach's alpha. Construct validity of the Malay-DASS-21 based on eigenvalues and factor loadings to confirm the three factor structure (depression, anxiety, and stress) was acceptable. The internal consistency reliability of the factor construct was very impressive with Cronbach's alpha values in the range of 0.837 to 0.863. The present study showed that the Malay- DASS-21 has acceptable psychometric construct and high internal consistency reliability to measure self-perceived depression, anxiety and stress over the past week in male outpatient clinic attendees in Johor. Further studies are necessary to revalidate the Malay-DASS-21 across different populations and cultures, and using confirmatory factor analyses.

  15. Domain Adaptation for Pedestrian Detection Based on Prediction Consistency

    Directory of Open Access Journals (Sweden)

    Yu Li-ping

    2014-01-01

    Full Text Available Pedestrian detection is an active area of research in computer vision. It remains a quite challenging problem in many applications where many factors cause a mismatch between source dataset used to train the pedestrian detector and samples in the target scene. In this paper, we propose a novel domain adaptation model for merging plentiful source domain samples with scared target domain samples to create a scene-specific pedestrian detector that performs as well as rich target domain simples are present. Our approach combines the boosting-based learning algorithm with an entropy-based transferability, which is derived from the prediction consistency with the source classifications, to selectively choose the samples showing positive transferability in source domains to the target domain. Experimental results show that our approach can improve the detection rate, especially with the insufficient labeled data in target scene.

  16. Large Scale Flood Risk Analysis using a New Hyper-resolution Population Dataset

    Science.gov (United States)

    Smith, A.; Neal, J. C.; Bates, P. D.; Quinn, N.; Wing, O.

    2017-12-01

    Here we present the first national scale flood risk analyses, using high resolution Facebook Connectivity Lab population data and data from a hyper resolution flood hazard model. In recent years the field of large scale hydraulic modelling has been transformed by new remotely sensed datasets, improved process representation, highly efficient flow algorithms and increases in computational power. These developments have allowed flood risk analysis to be undertaken in previously unmodeled territories and from continental to global scales. Flood risk analyses are typically conducted via the integration of modelled water depths with an exposure dataset. Over large scales and in data poor areas, these exposure data typically take the form of a gridded population dataset, estimating population density using remotely sensed data and/or locally available census data. The local nature of flooding dictates that for robust flood risk analysis to be undertaken both hazard and exposure data should sufficiently resolve local scale features. Global flood frameworks are enabling flood hazard data to produced at 90m resolution, resulting in a mis-match with available population datasets which are typically more coarsely resolved. Moreover, these exposure data are typically focused on urban areas and struggle to represent rural populations. In this study we integrate a new population dataset with a global flood hazard model. The population dataset was produced by the Connectivity Lab at Facebook, providing gridded population data at 5m resolution, representing a resolution increase over previous countrywide data sets of multiple orders of magnitude. Flood risk analysis undertaken over a number of developing countries are presented, along with a comparison of flood risk analyses undertaken using pre-existing population datasets.

  17. Comparing the accuracy of food outlet datasets in an urban environment

    Directory of Open Access Journals (Sweden)

    Michelle S. Wong

    2017-05-01

    Full Text Available Studies that investigate the relationship between the retail food environment and health outcomes often use geospatial datasets. Prior studies have identified challenges of using the most common data sources. Retail food environment datasets created through academic-government partnership present an alternative, but their validity (retail existence, type, location has not been assessed yet. In our study, we used ground-truth data to compare the validity of two datasets, a 2015 commercial dataset (InfoUSA and data collected from 2012 to 2014 through the Maryland Food Systems Mapping Project (MFSMP, an academic-government partnership, on the retail food environment in two low-income, inner city neighbourhoods in Baltimore City. We compared sensitivity and positive predictive value (PPV of the commercial and academic-government partnership data to ground-truth data for two broad categories of unhealthy food retailers: small food retailers and quick-service restaurants. Ground-truth data was collected in 2015 and analysed in 2016. Compared to the ground-truth data, MFSMP and InfoUSA generally had similar sensitivity that was greater than 85%. MFSMP had higher PPV compared to InfoUSA for both small food retailers (MFSMP: 56.3% vs InfoUSA: 40.7% and quick-service restaurants (MFSMP: 58.6% vs InfoUSA: 36.4%. We conclude that data from academic-government partnerships like MFSMP might be an attractive alternative option and improvement to relying only on commercial data. Other research institutes or cities might consider efforts to create and maintain such an environmental dataset. Even if these datasets cannot be updated on an annual basis, they are likely more accurate than commercial data.

  18. Comparing the accuracy of food outlet datasets in an urban environment.

    Science.gov (United States)

    Wong, Michelle S; Peyton, Jennifer M; Shields, Timothy M; Curriero, Frank C; Gudzune, Kimberly A

    2017-05-11

    Studies that investigate the relationship between the retail food environment and health outcomes often use geospatial datasets. Prior studies have identified challenges of using the most common data sources. Retail food environment datasets created through academic-government partnership present an alternative, but their validity (retail existence, type, location) has not been assessed yet. In our study, we used ground-truth data to compare the validity of two datasets, a 2015 commercial dataset (InfoUSA) and data collected from 2012 to 2014 through the Maryland Food Systems Mapping Project (MFSMP), an academic-government partnership, on the retail food environment in two low-income, inner city neighbourhoods in Baltimore City. We compared sensitivity and positive predictive value (PPV) of the commercial and academic-government partnership data to ground-truth data for two broad categories of unhealthy food retailers: small food retailers and quick-service restaurants. Ground-truth data was collected in 2015 and analysed in 2016. Compared to the ground-truth data, MFSMP and InfoUSA generally had similar sensitivity that was greater than 85%. MFSMP had higher PPV compared to InfoUSA for both small food retailers (MFSMP: 56.3% vs InfoUSA: 40.7%) and quick-service restaurants (MFSMP: 58.6% vs InfoUSA: 36.4%). We conclude that data from academic-government partnerships like MFSMP might be an attractive alternative option and improvement to relying only on commercial data. Other research institutes or cities might consider efforts to create and maintain such an environmental dataset. Even if these datasets cannot be updated on an annual basis, they are likely more accurate than commercial data.

  19. Global-scale evaluation of 22 precipitation datasets using gauge observations and hydrological modeling

    Directory of Open Access Journals (Sweden)

    H. E. Beck

    2017-12-01

    Full Text Available We undertook a comprehensive evaluation of 22 gridded (quasi-global (sub-daily precipitation (P datasets for the period 2000–2016. Thirteen non-gauge-corrected P datasets were evaluated using daily P gauge observations from 76 086 gauges worldwide. Another nine gauge-corrected datasets were evaluated using hydrological modeling, by calibrating the HBV conceptual model against streamflow records for each of 9053 small to medium-sized ( <  50 000 km2 catchments worldwide, and comparing the resulting performance. Marked differences in spatio-temporal patterns and accuracy were found among the datasets. Among the uncorrected P datasets, the satellite- and reanalysis-based MSWEP-ng V1.2 and V2.0 datasets generally showed the best temporal correlations with the gauge observations, followed by the reanalyses (ERA-Interim, JRA-55, and NCEP-CFSR and the satellite- and reanalysis-based CHIRP V2.0 dataset, the estimates based primarily on passive microwave remote sensing of rainfall (CMORPH V1.0, GSMaP V5/6, and TMPA 3B42RT V7 or near-surface soil moisture (SM2RAIN-ASCAT, and finally, estimates based primarily on thermal infrared imagery (GridSat V1.0, PERSIANN, and PERSIANN-CCS. Two of the three reanalyses (ERA-Interim and JRA-55 unexpectedly obtained lower trend errors than the satellite datasets. Among the corrected P datasets, the ones directly incorporating daily gauge data (CPC Unified, and MSWEP V1.2 and V2.0 generally provided the best calibration scores, although the good performance of the fully gauge-based CPC Unified is unlikely to translate to sparsely or ungauged regions. Next best results were obtained with P estimates directly incorporating temporally coarser gauge data (CHIRPS V2.0, GPCP-1DD V1.2, TMPA 3B42 V7, and WFDEI-CRU, which in turn outperformed the one indirectly incorporating gauge data through another multi-source dataset (PERSIANN-CDR V1R1. Our results highlight large differences in estimation accuracy

  20. Creation of the Naturalistic Engagement in Secondary Tasks (NEST) distracted driving dataset.

    Science.gov (United States)

    Owens, Justin M; Angell, Linda; Hankey, Jonathan M; Foley, James; Ebe, Kazutoshi

    2015-09-01

    Distracted driving has become a topic of critical importance to driving safety research over the past several decades. Naturalistic driving data offer a unique opportunity to study how drivers engage with secondary tasks in real-world driving; however, the complexities involved with identifying and coding relevant epochs of naturalistic data have limited its accessibility to the general research community. This project was developed to help address this problem by creating an accessible dataset of driver behavior and situational factors observed during distraction-related safety-critical events and baseline driving epochs, using the Strategic Highway Research Program 2 (SHRP2) naturalistic dataset. The new NEST (Naturalistic Engagement in Secondary Tasks) dataset was created using crashes and near-crashes from the SHRP2 dataset that were identified as including secondary task engagement as a potential contributing factor. Data coding included frame-by-frame video analysis of secondary task and hands-on-wheel activity, as well as summary event information. In addition, information about each secondary task engagement within the trip prior to the crash/near-crash was coded at a higher level. Data were also coded for four baseline epochs and trips per safety-critical event. 1,180 events and baseline epochs were coded, and a dataset was constructed. The project team is currently working to determine the most useful way to allow broad public access to the dataset. We anticipate that the NEST dataset will be extraordinarily useful in allowing qualified researchers access to timely, real-world data concerning how drivers interact with secondary tasks during safety-critical events and baseline driving. The coded dataset developed for this project will allow future researchers to have access to detailed data on driver secondary task engagement in the real world. It will be useful for standalone research, as well as for integration with additional SHRP2 data to enable the

  1. A multimodal dataset for authoring and editing multimedia content: The MAMEM project

    Directory of Open Access Journals (Sweden)

    Spiros Nikolopoulos

    2017-12-01

    Full Text Available We present a dataset that combines multimodal biosignals and eye tracking information gathered under a human-computer interaction framework. The dataset was developed in the vein of the MAMEM project that aims to endow people with motor disabilities with the ability to edit and author multimedia content through mental commands and gaze activity. The dataset includes EEG, eye-tracking, and physiological (GSR and Heart rate signals collected from 34 individuals (18 able-bodied and 16 motor-impaired. Data were collected during the interaction with specifically designed interface for web browsing and multimedia content manipulation and during imaginary movement tasks. The presented dataset will contribute towards the development and evaluation of modern human-computer interaction systems that would foster the integration of people with severe motor impairments back into society.

  2. Gene-Environment Interplay in Internalizing Disorders: Consistent Findings across Six Environmental Risk Factors

    Science.gov (United States)

    Hicks, Brian M.; Dirago, Ana C.; Iacono, William G.; McGue, Matt

    2009-01-01

    Background: Behavior genetic methods can help to elucidate gene-environment (G-E) interplay in the development of internalizing (INT) disorders (i.e., major depression and anxiety disorders). To date, however, no study has conducted a comprehensive analysis examining multiple environmental risk factors with the purpose of delineating general…

  3. An energetically consistent vertical mixing parameterization in CCSM4

    DEFF Research Database (Denmark)

    Nielsen, Søren Borg; Jochum, Markus; Eden, Carsten

    2018-01-01

    An energetically consistent stratification-dependent vertical mixing parameterization is implemented in the Community Climate System Model 4 and forced with energy conversion from the barotropic tides to internal waves. The structures of the resulting dissipation and diffusivity fields are compared......, however, depends greatly on the details of the vertical mixing parameterizations, where the new energetically consistent parameterization results in low thermocline diffusivities and a sharper and shallower thermocline. It is also investigated if the ocean state is more sensitive to a change in forcing...

  4. Mining and Utilizing Dataset Relevancy from Oceanographic Dataset Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

    Data.gov (United States)

    National Aeronautics and Space Administration — We propose to mine and utilize the combination of Earth Science dataset, metadata with usage metrics and user feedback to objectively extract relevance for improved...

  5. An integrated dataset for in silico drug discovery

    Directory of Open Access Journals (Sweden)

    Cockell Simon J

    2010-12-01

    Full Text Available Drug development is expensive and prone to failure. It is potentially much less risky and expensive to reuse a drug developed for one condition for treating a second disease, than it is to develop an entirely new compound. Systematic approaches to drug repositioning are needed to increase throughput and find candidates more reliably. Here we address this need with an integrated systems biology dataset, developed using the Ondex data integration platform, for the in silico discovery of new drug repositioning candidates. We demonstrate that the information in this dataset allows known repositioning examples to be discovered. We also propose a means of automating the search for new treatment indications of existing compounds.

  6. Geographic information system datasets of regolith-thickness data, regolith-thickness contours, raster-based regolith thickness, and aquifer-test and specific-capacity data for the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado

    Science.gov (United States)

    Arnold, L. Rick

    2010-01-01

    These datasets were compiled in support of U.S. Geological Survey Scientific-Investigations Report 2010-5082-Hydrogeology and Steady-State Numerical Simulation of Groundwater Flow in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. The datasets were developed by the U.S. Geological Survey in cooperation with the Lost Creek Ground Water Management District and the Colorado Geological Survey. The four datasets are described as follows and methods used to develop the datasets are further described in Scientific-Investigations Report 2010-5082: (1) ds507_regolith_data: This point dataset contains geologic information concerning regolith (unconsolidated sediment) thickness and top-of-bedrock altitude at selected well and test-hole locations in and near the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Data were compiled from published reports, consultant reports, and from lithologic logs of wells and test holes on file with the U.S. Geological Survey Colorado Water Science Center and the Colorado Division of Water Resources. (2) ds507_regthick_contours: This dataset consists of contours showing generalized lines of equal regolith thickness overlying bedrock in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Regolith thickness was contoured manually on the basis of information provided in the dataset ds507_regolith_data. (3) ds507_regthick_grid: This dataset consists of raster-based generalized thickness of regolith overlying bedrock in the Lost Creek Designated Ground Water Basin, Weld, Adams, and Arapahoe Counties, Colorado. Regolith thickness in this dataset was derived from contours presented in the dataset ds507_regthick_contours. (4) ds507_welltest_data: This point dataset contains estimates of aquifer transmissivity and hydraulic conductivity at selected well locations in the Lost Creek Designated Ground Water Basin, Weld, Adams, and

  7. Probabilistic and machine learning-based retrieval approaches for biomedical dataset retrieval

    Science.gov (United States)

    Karisani, Payam; Qin, Zhaohui S; Agichtein, Eugene

    2018-01-01

    Abstract The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system. Database URL: https://github.com/emory-irlab/biocaddie

  8. An innovative privacy preserving technique for incremental datasets on cloud computing.

    Science.gov (United States)

    Aldeen, Yousra Abdul Alsahib S; Salleh, Mazleena; Aljeroudi, Yazan

    2016-08-01

    Cloud computing (CC) is a magnificent service-based delivery with gigantic computer processing power and data storage across connected communications channels. It imparted overwhelming technological impetus in the internet (web) mediated IT industry, where users can easily share private data for further analysis and mining. Furthermore, user affable CC services enable to deploy sundry applications economically. Meanwhile, simple data sharing impelled various phishing attacks and malware assisted security threats. Some privacy sensitive applications like health services on cloud that are built with several economic and operational benefits necessitate enhanced security. Thus, absolute cyberspace security and mitigation against phishing blitz became mandatory to protect overall data privacy. Typically, diverse applications datasets are anonymized with better privacy to owners without providing all secrecy requirements to the newly added records. Some proposed techniques emphasized this issue by re-anonymizing the datasets from the scratch. The utmost privacy protection over incremental datasets on CC is far from being achieved. Certainly, the distribution of huge datasets volume across multiple storage nodes limits the privacy preservation. In this view, we propose a new anonymization technique to attain better privacy protection with high data utility over distributed and incremental datasets on CC. The proficiency of data privacy preservation and improved confidentiality requirements is demonstrated through performance evaluation. Copyright © 2016 Elsevier Inc. All rights reserved.

  9. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Mü ller, Matthias; Bibi, Adel Aamer; Giancola, Silvio; Al-Subaihi, Salman; Ghanem, Bernard

    2018-01-01

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  10. TrackingNet: A Large-Scale Dataset and Benchmark for Object Tracking in the Wild

    KAUST Repository

    Müller, Matthias

    2018-03-28

    Despite the numerous developments in object tracking, further development of current tracking algorithms is limited by small and mostly saturated datasets. As a matter of fact, data-hungry trackers based on deep-learning currently rely on object detection datasets due to the scarcity of dedicated large-scale tracking datasets. In this work, we present TrackingNet, the first large-scale dataset and benchmark for object tracking in the wild. We provide more than 30K videos with more than 14 million dense bounding box annotations. Our dataset covers a wide selection of object classes in broad and diverse context. By releasing such a large-scale dataset, we expect deep trackers to further improve and generalize. In addition, we introduce a new benchmark composed of 500 novel videos, modeled with a distribution similar to our training dataset. By sequestering the annotation of the test set and providing an online evaluation server, we provide a fair benchmark for future development of object trackers. Deep trackers fine-tuned on a fraction of our dataset improve their performance by up to 1.6% on OTB100 and up to 1.7% on TrackingNet Test. We provide an extensive benchmark on TrackingNet by evaluating more than 20 trackers. Our results suggest that object tracking in the wild is far from being solved.

  11. An Adapted Measure of Sibling Attachment: Factor Structure and Internal Consistency of the Sibling Attachment Inventory in Youth.

    Science.gov (United States)

    Noel, Valerie A; Francis, Sarah E; Tilley, Micah A

    2018-04-01

    Parent-youth and peer relationship inventories based on attachment theory measure communication, trust, and alienation, yet sibling relationships have been overlooked. We developed the Sibling Attachment Inventory and evaluated its psychometric properties in a sample of 172 youth ages 10-14 years. We adapted the 25-item Sibling Attachment Inventory from the Inventory of Parent and Peer Attachment-Revised peer measure. Items loaded onto three factors, identified as communication, trust, and alienation, α = 0.93, 0.90, and 0.76, respectively. Sibling trust and alienation correlated with depression (r s  = -0.33, r s  = 0.48) and self-worth (r s  = 0.23; r s  = -0.32); sibling trust and alienation correlated with depression after controlling for parent trust and parent alienation (r s  = -0.23, r s  = 0.22). Preliminary analyses showed good internal consistency, construct validity, and incremental predictive validity. Following replication of these properties, this measure can facilitate large cohort assessments of sibling attachment.

  12. Web mapping system for complex processing and visualization of environmental geospatial datasets

    Science.gov (United States)

    Titov, Alexander; Gordov, Evgeny; Okladnikov, Igor

    2016-04-01

    Environmental geospatial datasets (meteorological observations, modeling and reanalysis results, etc.) are used in numerous research applications. Due to a number of objective reasons such as inherent heterogeneity of environmental datasets, big dataset volume, complexity of data models used, syntactic and semantic differences that complicate creation and use of unified terminology, the development of environmental geodata access, processing and visualization services as well as client applications turns out to be quite a sophisticated task. According to general INSPIRE requirements to data visualization geoportal web applications have to provide such standard functionality as data overview, image navigation, scrolling, scaling and graphical overlay, displaying map legends and corresponding metadata information. It should be noted that modern web mapping systems as integrated geoportal applications are developed based on the SOA and might be considered as complexes of interconnected software tools for working with geospatial data. In the report a complex web mapping system including GIS web client and corresponding OGC services for working with geospatial (NetCDF, PostGIS) dataset archive is presented. There are three basic tiers of the GIS web client in it: 1. Tier of geospatial metadata retrieved from central MySQL repository and represented in JSON format 2. Tier of JavaScript objects implementing methods handling: --- NetCDF metadata --- Task XML object for configuring user calculations, input and output formats --- OGC WMS/WFS cartographical services 3. Graphical user interface (GUI) tier representing JavaScript objects realizing web application business logic Metadata tier consists of a number of JSON objects containing technical information describing geospatial datasets (such as spatio-temporal resolution, meteorological parameters, valid processing methods, etc). The middleware tier of JavaScript objects implementing methods for handling geospatial

  13. Validity and internal consistency of a Hausa version of the Ibadan knee/hip osteoarthritis outcome measure

    Directory of Open Access Journals (Sweden)

    Akinpelu Aderonke O

    2008-10-01

    Full Text Available Abstract Background The Ibadan Knee/Hip Osteoarthritis Outcome Measure (IKHOAM was developed for measuring end results of care in patients with knee or hip OA in Nigeria. The purpose of this study was to validate a Hausa translation of IKHOAM in order to promote its use among the Hausa populations of Nigeria and other West African countries. Methods Sixty-seven patients with knee OA, literate in Hausa and English, recruited consecutively from all government hospitals in Kano were assessed on both English and Hausa versions of IKHOAM. The order of assessment with the versions was randomized and separated by 24 hours. Participants also rated their pain intensity on the Visual Analogue Scale. Data was analyzed using the Spearman Rank Order correlation and Cronbach's alpha. Results The participants (17 males, 50 females were aged 55.7 ± 13.4 years. Participants' scores on the Hausa version correlated significantly with the original version (r = 0.67, p = 0.000 and with pain intensity scores on the Visual Analogue Scale (r = -0.24, p = 0.005. The Cronbach's alpha for correlation on the different parts of the Hausa version ranged between 0.28 and 0.95. Conclusion The Hausa version of IKHOAM meets the criteria for validity and internal consistency and may be used in the Hausa speaking parts of Nigeria and other West African countries.

  14. Nurses' knowledge and attitudes towards aged sexuality: validity and internal consistency of the Dutch version of the Aging Sexual Knowledge and Attitudes Scale.

    Science.gov (United States)

    Mahieu, Lieslot; de Casterlé, Bernadette Dierckx; Van Elssen, Kim; Gastmans, Chris

    2013-11-01

    This paper reports a study testing the content and face validity and internal consistency of the Dutch version of the Aging Sexual Knowledge and Attitudes Scale. The ability of older residents to sexually express themselves is known to be influenced by the knowledge and attitudes of nursing home staff towards later-life sexuality. Although the Aging Sexual Knowledge and Attitudes Scale is a widely used instrument to measure this, there is no validated, Dutch translation available. Instrument development. Following a standard forward/backward translation into Dutch, the scale was further adapted for use in Flemish nursing home settings. Content and face validity and user-friendliness were assessed. The psychometric properties were determined by means of an exploratory study. Data were collected from March-April 2011 at eight Flemish nursing homes. Reliability was assessed using internal consistency and item-total correlations. Both subscales of the Flemish adaptation showed acceptable content validity. The face validity and user-friendliness were deemed favourable with hardly any remarks given by the expert panel. The Cronbach's α was 0.80 and 0.88 for the knowledge and attitude subscales, respectively. The item-total correlations ranged from 0.21-0.48 for the knowledge section and from 0.09-0.68 for the attitude subscale. We conclude from our study that the Dutch version of the scale has acceptable to good psychometric properties. The Flemish adaptation therefore seems to be a valuable instrument for studying nursing staff's knowledge and attitudes towards aged sexuality in Flanders. © 2013 Blackwell Publishing Ltd.

  15. Decoys Selection in Benchmarking Datasets: Overview and Perspectives

    Science.gov (United States)

    Réau, Manon; Langenfeld, Florent; Zagury, Jean-François; Lagarde, Nathalie; Montes, Matthieu

    2018-01-01

    Virtual Screening (VS) is designed to prospectively help identifying potential hits, i.e., compounds capable of interacting with a given target and potentially modulate its activity, out of large compound collections. Among the variety of methodologies, it is crucial to select the protocol that is the most adapted to the query/target system under study and that yields the most reliable output. To this aim, the performance of VS methods is commonly evaluated and compared by computing their ability to retrieve active compounds in benchmarking datasets. The benchmarking datasets contain a subset of known active compounds together with a subset of decoys, i.e., assumed non-active molecules. The composition of both the active and the decoy compounds subsets is critical to limit the biases in the evaluation of the VS methods. In this review, we focus on the selection of decoy compounds that has considerably changed over the years, from randomly selected compounds to highly customized or experimentally validated negative compounds. We first outline the evolution of decoys selection in benchmarking databases as well as current benchmarking databases that tend to minimize the introduction of biases, and secondly, we propose recommendations for the selection and the design of benchmarking datasets. PMID:29416509

  16. Multiresolution persistent homology for excessively large biomolecular datasets

    Energy Technology Data Exchange (ETDEWEB)

    Xia, Kelin; Zhao, Zhixiong [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Wei, Guo-Wei, E-mail: wei@math.msu.edu [Department of Mathematics, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824 (United States); Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824 (United States)

    2015-10-07

    Although persistent homology has emerged as a promising tool for the topological simplification of complex data, it is computationally intractable for large datasets. We introduce multiresolution persistent homology to handle excessively large datasets. We match the resolution with the scale of interest so as to represent large scale datasets with appropriate resolution. We utilize flexibility-rigidity index to access the topological connectivity of the data set and define a rigidity density for the filtration analysis. By appropriately tuning the resolution of the rigidity density, we are able to focus the topological lens on the scale of interest. The proposed multiresolution topological analysis is validated by a hexagonal fractal image which has three distinct scales. We further demonstrate the proposed method for extracting topological fingerprints from DNA molecules. In particular, the topological persistence of a virus capsid with 273 780 atoms is successfully analyzed which would otherwise be inaccessible to the normal point cloud method and unreliable by using coarse-grained multiscale persistent homology. The proposed method has also been successfully applied to the protein domain classification, which is the first time that persistent homology is used for practical protein domain analysis, to our knowledge. The proposed multiresolution topological method has potential applications in arbitrary data sets, such as social networks, biological networks, and graphs.

  17. Integrating Diverse Data Systems for International Collaboration

    Science.gov (United States)

    Fox, Peter

    2014-05-01

    International collaborations, especially ones that arise with little or no financial resources, still face challenges in opening up data collections via a wide variety of differing and often non-interoperable means. In turn, this hampers the collaborative process, slows or even prevents scientific exchange. Early efforts that proposed a centralized, and project specific data archive encountered many difficulties, ranging from little or no adoption, to the inability to provide required documentation and metadata to make the datasets findable or usable. In time, virtualized approaches appeared to gain traction, for e.g. virtual observatories. In this contribution, we report on several international collaboration case studies with distributed data systems; their needs, successes, challenges and failures and synthesize a set of suggested practices to inform future international collaboration efforts.

  18. Joint local and global consistency on interdocument and interword relationships for co-clustering.

    Science.gov (United States)

    Bao, Bing-Kun; Min, Weiqing; Li, Teng; Xu, Changsheng

    2015-01-01

    Co-clustering has recently received a lot of attention due to its effectiveness in simultaneously partitioning words and documents by exploiting the relationships between them. However, most of the existing co-clustering methods neglect or only partially reveal the interword and interdocument relationships. To fully utilize those relationships, the local and global consistencies on both word and document spaces need to be considered, respectively. Local consistency indicates that the label of a word/document can be predicted from its neighbors, while global consistency enforces a smoothness constraint on words/documents labels over the whole data manifold. In this paper, we propose a novel co-clustering method, called co-clustering via local and global consistency, to not only make use of the relationship between word and document, but also jointly explore the local and global consistency on both word and document spaces, respectively. The proposed method has the following characteristics: 1) the word-document relationships is modeled by following information-theoretic co-clustering (ITCC); 2) the local consistency on both interword and interdocument relationships is revealed by a local predictor; and 3) the global consistency on both interword and interdocument relationships is explored by a global smoothness regularization. All the fitting errors from these three-folds are finally integrated together to formulate an objective function, which is iteratively optimized by a convergence provable updating procedure. The extensive experiments on two benchmark document datasets validate the effectiveness of the proposed co-clustering method.

  19. Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

    Science.gov (United States)

    Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

    2013-12-01

    Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM

  20. Cross-Cultural Concept Mapping of Standardized Datasets

    DEFF Research Database (Denmark)

    Kano Glückstad, Fumiko

    2012-01-01

    This work compares four feature-based similarity measures derived from cognitive sciences. The purpose of the comparative analysis is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain [1]. Here, datasets based...

  1. dataTEL - Datasets for Technology Enhanced Learning

    NARCIS (Netherlands)

    Drachsler, Hendrik; Verbert, Katrien; Sicilia, Miguel-Angel; Wolpers, Martin; Manouselis, Nikos; Vuorikari, Riina; Lindstaedt, Stefanie; Fischer, Frank

    2011-01-01

    Drachsler, H., Verbert, K., Sicilia, M. A., Wolpers, M., Manouselis, N., Vuorikari, R., Lindstaedt, S., & Fischer, F. (2011). dataTEL - Datasets for Technology Enhanced Learning. STELLAR Alpine Rendez-Vous White Paper. Alpine Rendez-Vous 2011 White paper collection, Nr. 13., France (2011)

  2. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander; Mularoni, Loris; Cope, Leslie M.; Medvedeva, Yulia; Mironov, Andrey A.; Makeev, Vsevolod J.; Wheelan, Sarah J.

    2012-01-01

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  3. Principal Component Analysis of Process Datasets with Missing Values

    Directory of Open Access Journals (Sweden)

    Kristen A. Severson

    2017-07-01

    Full Text Available Datasets with missing values arising from causes such as sensor failure, inconsistent sampling rates, and merging data from different systems are common in the process industry. Methods for handling missing data typically operate during data pre-processing, but can also occur during model building. This article considers missing data within the context of principal component analysis (PCA, which is a method originally developed for complete data that has widespread industrial application in multivariate statistical process control. Due to the prevalence of missing data and the success of PCA for handling complete data, several PCA algorithms that can act on incomplete data have been proposed. Here, algorithms for applying PCA to datasets with missing values are reviewed. A case study is presented to demonstrate the performance of the algorithms and suggestions are made with respect to choosing which algorithm is most appropriate for particular settings. An alternating algorithm based on the singular value decomposition achieved the best results in the majority of test cases involving process datasets.

  4. Exploring massive, genome scale datasets with the genometricorr package

    KAUST Repository

    Favorov, Alexander

    2012-05-31

    We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. Availability and implementation: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor. © 2012 Favorov et al.

  5. Testing the Neutral Theory of Biodiversity with Human Microbiome Datasets

    OpenAIRE

    Li, Lianwei; Ma, Zhanshan (Sam)

    2016-01-01

    The human microbiome project (HMP) has made it possible to test important ecological theories for arguably the most important ecosystem to human health?the human microbiome. Existing limited number of studies have reported conflicting evidence in the case of the neutral theory; the present study aims to comprehensively test the neutral theory with extensive HMP datasets covering all five major body sites inhabited by the human microbiome. Utilizing 7437 datasets of bacterial community samples...

  6. Self-Reported Juvenile Firesetting: Results from Two National Survey Datasets

    OpenAIRE

    Howell Bowling, Carrie; Merrick, Joav; Omar, Hatim A.

    2013-01-01

    The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school ...

  7. Self-reported juvenile firesetting: Results from two national survey datasets

    OpenAIRE

    Carrie Howell Bowling; Joav eMerrick; Joav eMerrick; Joav eMerrick; Joav eMerrick; Hatim A Omar

    2013-01-01

    The main purpose of this study was to address gaps in existing research by examining the relationship between academic performance and attention problems with juvenile firesetting. Two datasets from the Achenbach System for Empirically Based Assessment (ASEBA) were used. The Factor Analysis Dataset (N = 975) was utilized and results indicated that adolescents who report lower academic performance are more likely to set fires. Additionally, adolescents who report a poor attitude toward school...

  8. A high quality finger vascular pattern dataset collected using a custom designed capturing device

    NARCIS (Netherlands)

    Ton, B.T.; Veldhuis, Raymond N.J.

    2013-01-01

    The number of finger vascular pattern datasets available for the research community is scarce, therefore a new finger vascular pattern dataset containing 1440 images is prsented. This dataset is unique in its kind as the images are of high resolution and have a known pixel density. Furthermore this

  9. RetroTransformDB: A Dataset of Generic Transforms for Retrosynthetic Analysis

    Directory of Open Access Journals (Sweden)

    Svetlana Avramova

    2018-04-01

    Full Text Available Presently, software tools for retrosynthetic analysis are widely used by organic, medicinal, and computational chemists. Rule-based systems extensively use collections of retro-reactions (transforms. While there are many public datasets with reactions in synthetic direction (usually non-generic reactions, there are no publicly-available databases with generic reactions in computer-readable format which can be used for the purposes of retrosynthetic analysis. Here we present RetroTransformDB—a dataset of transforms, compiled and coded in SMIRKS line notation by us. The collection is comprised of more than 100 records, with each one including the reaction name, SMIRKS linear notation, the functional group to be obtained, and the transform type classification. All SMIRKS transforms were tested syntactically, semantically, and from a chemical point of view in different software platforms. The overall dataset design and the retrosynthetic fitness were analyzed and curated by organic chemistry experts. The RetroTransformDB dataset may be used by open-source and commercial software packages, as well as chemoinformatics tools.

  10. A multi-environment dataset for activity of daily living recognition in video streams.

    Science.gov (United States)

    Borreo, Alessandro; Onofri, Leonardo; Soda, Paolo

    2015-08-01

    Public datasets played a key role in the increasing level of interest that vision-based human action recognition has attracted in last years. While the production of such datasets has been influenced by the variability introduced by various actors performing the actions, the different modalities of interactions with the environment introduced by the variation of the scenes around the actors has been scarcely took into account. As a consequence, public datasets do not provide a proper test-bed for recognition algorithms that aim at achieving high accuracy, irrespective of the environment where actions are performed. This is all the more so, when systems are designed to recognize activities of daily living (ADL), which are characterized by a high level of human-environment interaction. For that reason, we present in this manuscript the MEA dataset, a new multi-environment ADL dataset, which permitted us to show how the change of scenario can affect the performances of state-of-the-art approaches for action recognition.

  11. Visual Comparison of Multiple Gene Expression Datasets in a Genomic Context

    Directory of Open Access Journals (Sweden)

    Borowski Krzysztof

    2008-06-01

    Full Text Available The need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.

  12. AFSC/REFM: Seabird Necropsy dataset of North Pacific

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The seabird necropsy dataset contains information on seabird specimens that were collected under salvage and scientific collection permits primarily by...

  13. Using national hip fracture registries and audit databases to develop an international perspective

    DEFF Research Database (Denmark)

    Johansen, Antony; Golding, David; Brent, Louise

    2017-01-01

    to audit the care offered to older people by health services around the world. We have reviewed the reports of eight national audit programmes, to examine the approach used in each, and highlight differences in case mix, management and outcomes in different countries. The national audits provide....... These national audits provide a unique opportunity to compare how health care systems of different countries are responding to the same clinical challenge. This review will encourage the development and reporting of a standardised dataset to support international collaboration in healthcare audit....... a consistent picture of typical patients - an average age of 80 years, with less than a third being men, and a third of all patients having cognitive impairment - but there was surprising variation in the type of fracture, of operation and of anaesthesia and hospital length of stay in different countries...

  14. NCDC International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 3

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The International Best Track Archive for Climate Stewardship (IBTrACS) dataset was developed by the NOAA National Climatic Data Center, which took the initial step...

  15. Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator

    Science.gov (United States)

    Seyed, P.; Chastain, K.; McGuinness, D. L.

    2013-12-01

    library of vocabularies to assist the user in locating terms to describe observed entities, their properties, and relationships. The Annotator leverages vocabulary definitions of these concepts to guide the user in describing data in a logically consistent manner. The vocabularies made available through the Annotator are open, as is the Annotator itself. We have taken a step towards making semantic annotation/translation of data more accessible. Our vision for the Annotator is as a tool that can be integrated into a semantic data 'workbench' environment, which would allow semantic annotation of a variety of data formats, using standard vocabularies. These vocabularies involved enable search for similar datasets, and integration with any semantically-enabled applications for analysis and visualization.

  16. Random Coefficient Logit Model for Large Datasets

    NARCIS (Netherlands)

    C. Hernández-Mireles (Carlos); D. Fok (Dennis)

    2010-01-01

    textabstractWe present an approach for analyzing market shares and products price elasticities based on large datasets containing aggregate sales data for many products, several markets and for relatively long time periods. We consider the recently proposed Bayesian approach of Jiang et al [Jiang,

  17. NOAA Global Surface Temperature Dataset, Version 4.0

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The NOAA Global Surface Temperature Dataset (NOAAGlobalTemp) is derived from two independent analyses: the Extended Reconstructed Sea Surface Temperature (ERSST)...

  18. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3.0 - Monthly Summary Groups (MSG)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset, the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), is the most widely-used freely available collection of surface marine observations,...

  19. Election Districts and Precincts, PrecinctPoly-The data set is a polygon feature consisting of 220 segments representing voter precinct boundaries., Published in 1991, Davis County Government.

    Data.gov (United States)

    NSGIC Local Govt | GIS Inventory — Election Districts and Precincts dataset current as of 1991. PrecinctPoly-The data set is a polygon feature consisting of 220 segments representing voter precinct...

  20. Internal consistency, test-retest reliability and measurement error of the self-report version of the social skills rating system in a sample of Australian adolescents.

    Directory of Open Access Journals (Sweden)

    Sharmila Vaz

    Full Text Available The social skills rating system (SSRS is used to assess social skills and competence in children and adolescents. While its characteristics based on United States samples (US are published, corresponding Australian figures are unavailable. Using a 4-week retest design, we examined the internal consistency, retest reliability and measurement error (ME of the SSRS secondary student form (SSF in a sample of Year 7 students (N = 187, from five randomly selected public schools in Perth, western Australia. Internal consistency (IC of the total scale and most subscale scores (except empathy on the frequency rating scale was adequate to permit independent use. On the importance rating scale, most IC estimates for girls fell below the benchmark. Test-retest estimates of the total scale and subscales were insufficient to permit reliable use. ME of the total scale score (frequency rating for boys was equivalent to the US estimate, while that for girls was lower than the US error. ME of the total scale score (importance rating was larger than the error using the frequency rating scale. The study finding supports the idea of using multiple informants (e.g. teacher and parent reports, not just student as recommended in the manual. Future research needs to substantiate the clinical meaningfulness of the MEs calculated in this study by corroborating them against the respective Minimum Clinically Important Difference (MCID.

  1. Internal consistency, test-retest reliability and measurement error of the self-report version of the social skills rating system in a sample of Australian adolescents.

    Science.gov (United States)

    Vaz, Sharmila; Parsons, Richard; Passmore, Anne Elizabeth; Andreou, Pantelis; Falkmer, Torbjörn

    2013-01-01

    The social skills rating system (SSRS) is used to assess social skills and competence in children and adolescents. While its characteristics based on United States samples (US) are published, corresponding Australian figures are unavailable. Using a 4-week retest design, we examined the internal consistency, retest reliability and measurement error (ME) of the SSRS secondary student form (SSF) in a sample of Year 7 students (N = 187), from five randomly selected public schools in Perth, western Australia. Internal consistency (IC) of the total scale and most subscale scores (except empathy) on the frequency rating scale was adequate to permit independent use. On the importance rating scale, most IC estimates for girls fell below the benchmark. Test-retest estimates of the total scale and subscales were insufficient to permit reliable use. ME of the total scale score (frequency rating) for boys was equivalent to the US estimate, while that for girls was lower than the US error. ME of the total scale score (importance rating) was larger than the error using the frequency rating scale. The study finding supports the idea of using multiple informants (e.g. teacher and parent reports), not just student as recommended in the manual. Future research needs to substantiate the clinical meaningfulness of the MEs calculated in this study by corroborating them against the respective Minimum Clinically Important Difference (MCID).

  2. A multimodal MRI dataset of professional chess players.

    Science.gov (United States)

    Li, Kaiming; Jiang, Jing; Qiu, Lihua; Yang, Xun; Huang, Xiaoqi; Lui, Su; Gong, Qiyong

    2015-01-01

    Chess is a good model to study high-level human brain functions such as spatial cognition, memory, planning, learning and problem solving. Recent studies have demonstrated that non-invasive MRI techniques are valuable for researchers to investigate the underlying neural mechanism of playing chess. For professional chess players (e.g., chess grand masters and masters or GM/Ms), what are the structural and functional alterations due to long-term professional practice, and how these alterations relate to behavior, are largely veiled. Here, we report a multimodal MRI dataset from 29 professional Chinese chess players (most of whom are GM/Ms), and 29 age matched novices. We hope that this dataset will provide researchers with new materials to further explore high-level human brain functions.

  3. Experiences and lessons learned from creating a generalized workflow for data publication of field campaign datasets

    Science.gov (United States)

    Santhana Vannan, S. K.; Ramachandran, R.; Deb, D.; Beaty, T.; Wright, D.

    2017-12-01

    This paper summarizes the workflow challenges of curating and publishing data produced from disparate data sources and provides a generalized workflow solution to efficiently archive data generated by researchers. The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) for biogeochemical dynamics and the Global Hydrology Resource Center (GHRC) DAAC have been collaborating on the development of a generalized workflow solution to efficiently manage the data publication process. The generalized workflow presented here are built on lessons learned from implementations of the workflow system. Data publication consists of the following steps: Accepting the data package from the data providers, ensuring the full integrity of the data files. Identifying and addressing data quality issues Assembling standardized, detailed metadata and documentation, including file level details, processing methodology, and characteristics of data files Setting up data access mechanisms Setup of the data in data tools and services for improved data dissemination and user experience Registering the dataset in online search and discovery catalogues Preserving the data location through Digital Object Identifiers (DOI) We will describe the steps taken to automate, and realize efficiencies to the above process. The goals of the workflow system are to reduce the time taken to publish a dataset, to increase the quality of documentation and metadata, and to track individual datasets through the data curation process. Utilities developed to achieve these goal will be described. We will also share metrics driven value of the workflow system and discuss the future steps towards creation of a common software framework.

  4. A review of culturally adapted versions of the Oswestry Disability Index: the adaptation process, construct validity, test-retest reliability and internal consistency.

    Science.gov (United States)

    Sheahan, Peter J; Nelson-Wong, Erika J; Fischer, Steven L

    2015-01-01

    The Oswestry Disability Index (ODI) is a self-report-based outcome measure used to quantify the extent of disability related to low back pain (LBP), a substantial contributor to workplace absenteeism. The ODI tool has been adapted for use by patients in several non-English speaking nations. It is unclear, however, if these adapted versions of the ODI are as credible as the original ODI developed for English-speaking nations. The objective of this study was to conduct a review of the literature to identify culturally adapted versions of the ODI and to report on the adaptation process, construct validity, test-retest reliability and internal consistency of these ODIs. Following a pragmatic review process, data were extracted from each study with regard to these four outcomes. While most studies applied adaptation processes in accordance with best-practice guidelines, there were some deviations. However, all studies reported high-quality psychometric properties: group mean construct validity was 0.734 ± 0.094 (indicated via a correlation coefficient), test-retest reliability was 0.937 ± 0.032 (indicated via an intraclass correlation coefficient) and internal consistency was 0.876 ± 0.047 (indicated via Cronbach's alpha). Researchers can be confident when using any of these culturally adapted ODIs, or when comparing and contrasting results between cultures where these versions were employed. Implications for Rehabilitation Low back pain is the second leading cause of disability in the world, behind only cancer. The Oswestry Disability Index (ODI) has been developed as a self-report outcome measure of low back pain for administration to patients. An understanding of the various cross-cultural adaptations of the ODI is important for more concerted multi-national research efforts. This review examines 16 cross-cultural adaptations of the ODI and should inform the work of health care and rehabilitation professionals.

  5. An integrated pan-tropical biomass map using multiple reference datasets

    OpenAIRE

    Avitabile, V.; Herold, M.; Heuvelink, G. B. M.; Lewis, S. L.; Phillips, O. L.; Asner, G. P.; Armston, J.; Ashton, P. S.; Banin, L.; Bayol, N.; Berry, N. J.; Boeckx, P.; de Jong, B. H. J.; DeVries, B.; Girardin, C. A. J.

    2016-01-01

    We combined two existing datasets of vegetation aboveground biomass (AGB) (Proceedings of the National Academy of Sciences of the United States of America, 108, 2011, 9899; Nature Climate Change, 2, 2012, 182) into a pan-tropical AGB map at 1-km resolution using an independent reference dataset of field observations and locally calibrated high-resolution biomass maps, harmonized and upscaled to 14 477 1-km AGB estimates. Our data fusion approach uses bias removal and weighted linear averaging...

  6. Application of global datasets for hydrological modelling of a remote, snowmelt driven catchment in the Canadian Sub-Arctic

    Science.gov (United States)

    Casson, David; Werner, Micha; Weerts, Albrecht; Schellekens, Jaap; Solomatine, Dimitri

    2017-04-01

    Hydrological modelling in the Canadian Sub-Arctic is hindered by the limited spatial and temporal coverage of local meteorological data. Local watershed modelling often relies on data from a sparse network of meteorological stations with a rough density of 3 active stations per 100,000 km2. Global datasets hold great promise for application due to more comprehensive spatial and extended temporal coverage. A key objective of this study is to demonstrate the application of global datasets and data assimilation techniques for hydrological modelling of a data sparse, Sub-Arctic watershed. Application of available datasets and modelling techniques is currently limited in practice due to a lack of local capacity and understanding of available tools. Due to the importance of snow processes in the region, this study also aims to evaluate the performance of global SWE products for snowpack modelling. The Snare Watershed is a 13,300 km2 snowmelt driven sub-basin of the Mackenzie River Basin, Northwest Territories, Canada. The Snare watershed is data sparse in terms of meteorological data, but is well gauged with consistent discharge records since the late 1970s. End of winter snowpack surveys have been conducted every year from 1978-present. The application of global re-analysis datasets from the EU FP7 eartH2Observe project are investigated in this study. Precipitation data are taken from Multi-Source Weighted-Ensemble Precipitation (MSWEP) and temperature data from Watch Forcing Data applied to European Reanalysis (ERA)-Interim data (WFDEI). GlobSnow-2 is a global Snow Water Equivalent (SWE) measurement product funded by the European Space Agency (ESA) and is also evaluated over the local watershed. Downscaled precipitation, temperature and potential evaporation datasets are used as forcing data in a distributed version of the HBV model implemented in the WFLOW framework. Results demonstrate the successful application of global datasets in local watershed modelling, but

  7. Providing comprehensive and consistent access to astronomical observatory archive data: the NASA archive model

    Science.gov (United States)

    McGlynn, Thomas; Fabbiano, Giuseppina; Accomazzi, Alberto; Smale, Alan; White, Richard L.; Donaldson, Thomas; Aloisi, Alessandra; Dower, Theresa; Mazzerella, Joseph M.; Ebert, Rick; Pevunova, Olga; Imel, David; Berriman, Graham B.; Teplitz, Harry I.; Groom, Steve L.; Desai, Vandana R.; Landry, Walter

    2016-07-01

    Since the turn of the millennium a constant concern of astronomical archives have begun providing data to the public through standardized protocols unifying data from disparate physical sources and wavebands across the electromagnetic spectrum into an astronomical virtual observatory (VO). In October 2014, NASA began support for the NASA Astronomical Virtual Observatories (NAVO) program to coordinate the efforts of NASA astronomy archives in providing data to users through implementation of protocols agreed within the International Virtual Observatory Alliance (IVOA). A major goal of the NAVO collaboration has been to step back from a piecemeal implementation of IVOA standards and define what the appropriate presence for the US and NASA astronomy archives in the VO should be. This includes evaluating what optional capabilities in the standards need to be supported, the specific versions of standards that should be used, and returning feedback to the IVOA, to support modifications as needed. We discuss a standard archive model developed by the NAVO for data archive presence in the virtual observatory built upon a consistent framework of standards defined by the IVOA. Our standard model provides for discovery of resources through the VO registries, access to observation and object data, downloads of image and spectral data and general access to archival datasets. It defines specific protocol versions, minimum capabilities, and all dependencies. The model will evolve as the capabilities of the virtual observatory and needs of the community change.

  8. Technical note: Correlation of respiratory motion between external patient surface and internal anatomical landmarks

    Science.gov (United States)

    Fayad, Hadi; Pan, Tinsu; Clément, Jean-François; Visvikis, Dimitris

    2011-01-01

    Purpose Current respiratory motion monitoring devices used for motion synchronization in medical imaging and radiotherapy provide either 1D respiratory signals over a specific region or 3D information based on few external or internal markers. On the other hand, newer technology may offer the potential to monitor the entire patient external surface in real time. The main objective of this study was to assess the motion correlation between such an external patient surface and internal anatomical landmarks motion. Methods Four dimensional Computed Tomography (4D CT) volumes for ten patients were used in this study. Anatomical landmarks were manually selected in the thoracic region across the 4D CT datasets by two experts. The landmarks included normal structures as well as the tumour location. In addition, a distance map representing the entire external patient surface, which corresponds to surfaces acquired by a Time of Flight (ToF) camera or similar devices, was created by segmenting the skin of all 4D CT volumes using a thresholding algorithm. Finally, the correlation between the internal landmarks and external surface motion was evaluated for different regions (placement and size) throughout a patient’s surface. Results Significant variability was observed in the motion of the different parts of the external patient surface. The larger motion magnitude was consistently measured in the central regions of the abdominal and the thoracic areas for the different patient datasets considered. The highest correlation coefficients were observed between the motion of these external surface areas and internal landmarks such as the diaphragm and mediastinum structures as well as the tumour location landmarks (0.8 ± 0.18 and 0.72 ± 0.12 for the abdominal and the thoracic regions respectively). Worse correlation was observed when one considered landmarks not significantly influenced by respiratory motion such as the apex and the sternum. Discussion and conclusions There

  9. USGS National Hydrography Dataset from The National Map

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — USGS The National Map - National Hydrography Dataset (NHD) is a comprehensive set of digital spatial data that encodes information about naturally occurring and...

  10. Newton SSANTA Dr Water using POU filters dataset

    Data.gov (United States)

    U.S. Environmental Protection Agency — This dataset contains information about all the features extracted from the raw data files, the formulas that were assigned to some of these features, and the...

  11. A proposed grading system for standardizing tumor consistency of intracranial meningiomas.

    Science.gov (United States)

    Zada, Gabriel; Yashar, Parham; Robison, Aaron; Winer, Jesse; Khalessi, Alexander; Mack, William J; Giannotta, Steven L

    2013-12-01

    Tumor consistency plays an important and underrecognized role in the surgeon's ability to resect meningiomas, especially with evolving trends toward minimally invasive and keyhole surgical approaches. Aside from descriptors such as "hard" or "soft," no objective criteria exist for grading, studying, and conveying the consistency of meningiomas. The authors designed a practical 5-point scale for intraoperative grading of meningiomas based on the surgeon's ability to internally debulk the tumor and on the subsequent resistance to folding of the tumor capsule. Tumor consistency grades and features are as follows: 1) extremely soft tumor, internal debulking with suction only; 2) soft tumor, internal debulking mostly with suction, and remaining fibrous strands resected with easily folded capsule; 3) average consistency, tumor cannot be freely suctioned and requires mechanical debulking, and the capsule then folds with relative ease; 4) firm tumor, high degree of mechanical debulking required, and capsule remains difficult to fold; and 5) extremely firm, calcified tumor, approaches density of bone, and capsule does not fold. Additional grading categories included tumor heterogeneity (with minimum and maximum consistency scores) and a 3-point vascularity score. This grading system was prospectively assessed in 50 consecutive patients undergoing craniotomy for meningioma resection by 2 surgeons in an independent fashion. Grading scores were subjected to a linear weighted kappa analysis for interuser reliability. Fifty patients (100 scores) were included in the analysis. The mean maximal tumor diameter was 4.3 cm. The distribution of overall tumor consistency scores was as follows: Grade 1, 4%; Grade 2, 9%; Grade 3, 43%; Grade 4, 44%; and Grade 5, 0%. Regions of Grade 5 consistency were reported only focally in 14% of heterogeneous tumors. Tumors were designated as homogeneous in 68% and heterogeneous in 32% of grades. The kappa analysis score for overall tumor consistency

  12. Full-Scale Approximations of Spatio-Temporal Covariance Models for Large Datasets

    KAUST Repository

    Zhang, Bohai

    2014-01-01

    Various continuously-indexed spatio-temporal process models have been constructed to characterize spatio-temporal dependence structures, but the computational complexity for model fitting and predictions grows in a cubic order with the size of dataset and application of such models is not feasible for large datasets. This article extends the full-scale approximation (FSA) approach by Sang and Huang (2012) to the spatio-temporal context to reduce computational complexity. A reversible jump Markov chain Monte Carlo (RJMCMC) algorithm is proposed to select knots automatically from a discrete set of spatio-temporal points. Our approach is applicable to nonseparable and nonstationary spatio-temporal covariance models. We illustrate the effectiveness of our method through simulation experiments and application to an ozone measurement dataset.

  13. USGS National Boundary Dataset (NBD) Downloadable Data Collection

    Data.gov (United States)

    U.S. Geological Survey, Department of the Interior — The USGS Governmental Unit Boundaries dataset from The National Map (TNM) represents major civil areas for the Nation, including States or Territories, counties (or...

  14. Dynamics of International Reserve Accumulation in Turkish Economy

    Directory of Open Access Journals (Sweden)

    Duygu Ayhan

    2015-05-01

    Full Text Available Many of the emerging market economies embody macroeconomic and structural vulnerabilities due to large deficits, high inflation, slowing growth and heavy reliance on short-term capital inflows. Therefore, accumulation of international reserve holdings has been frequently used by authorities to serve as an insurance against the volatility of the capital flows and strengthen the fragile nature of these economies. Turkish economy, classified as one of the most fragile of the emerging economies, has been experiencing a similar process of international reserve accumulation. The chronically high current account deficit and low savings rate boost the importance of international reserves. Thus, the aim of this paper is to investigate the determinants of international reserves in Turkey. The dataset covers the 2000-2013 period. Consequently, we find that the international reserve accumulation is mainly explained by current account balance, per capita income and past crisis experience.

  15. A high-throughput system for high-quality tomographic reconstruction of large datasets at Diamond Light Source.

    Science.gov (United States)

    Atwood, Robert C; Bodey, Andrew J; Price, Stephen W T; Basham, Mark; Drakopoulos, Michael

    2015-06-13

    Tomographic datasets collected at synchrotrons are becoming very large and complex, and, therefore, need to be managed efficiently. Raw images may have high pixel counts, and each pixel can be multidimensional and associated with additional data such as those derived from spectroscopy. In time-resolved studies, hundreds of tomographic datasets can be collected in sequence, yielding terabytes of data. Users of tomographic beamlines are drawn from various scientific disciplines, and many are keen to use tomographic reconstruction software that does not require a deep understanding of reconstruction principles. We have developed Savu, a reconstruction pipeline that enables users to rapidly reconstruct data to consistently create high-quality results. Savu is designed to work in an 'orthogonal' fashion, meaning that data can be converted between projection and sinogram space throughout the processing workflow as required. The Savu pipeline is modular and allows processing strategies to be optimized for users' purposes. In addition to the reconstruction algorithms themselves, it can include modules for identification of experimental problems, artefact correction, general image processing and data quality assessment. Savu is open source, open licensed and 'facility-independent': it can run on standard cluster infrastructure at any institution.

  16. Cultural competence in mental health nursing: validity and internal consistency of the Portuguese version of the multicultural mental health awareness scale-MMHAS.

    Science.gov (United States)

    de Almeida Vieira Monteiro, Ana Paula Teixeira; Fernandes, Alexandre Bastos

    2016-05-17

    Cultural competence is an essential component in rendering effective and culturally responsive services to culturally and ethnically diverse clients. Still, great difficulty exists in assessing the cultural competence of mental health nurses. There are no Portuguese validated measurement instruments to assess cultural competence in mental health nurses. This paper reports a study testing the reliability and validity of the Portuguese version of the Multicultural Mental Health Awareness Scale-MMHAS in a sample of Portuguese nurses. Following a standard forward/backward translation into Portuguese, the adapted version of MMHAS, along with a sociodemographic questionnaire, were applied to a sample of 306 Portuguese nurses (299 males, 77 females; ages 21-68 years, M = 35.43, SD = 9.85 years). A psychometric research design was used with content and construct validity and reliability. Reliability was assessed using internal consistency and item-total correlations. Construct validity was determined using factor analysis. The factor analysis confirmed that the Portuguese version of MMHAS has a three-factor structure of multicultural competencies (Awareness, Knowledge, and Skills) explaining 59.51% of the total variance. Strong content validity and reliability correlations were demonstrated. The Portuguese version of MMHAS has a strong internal consistency, with a Cronbach's alpha of 0.958 for the total scale. The results supported the construct validity and reliability of the Portuguese version of MMHAS, proving that is a reliable and valid measure of multicultural counselling competencies in mental health nursing. The MMHAS Portuguese version can be used to evaluate the effectiveness of multicultural competency training programs in Portuguese-speaking mental health nurses. The scale can also be a useful in future studies of multicultural competencies in Portuguese-speaking nurses.

  17. Thesaurus Dataset of Educational Technology in Chinese

    Science.gov (United States)

    Wu, Linjing; Liu, Qingtang; Zhao, Gang; Huang, Huan; Huang, Tao

    2015-01-01

    The thesaurus dataset of educational technology is a knowledge description of educational technology in Chinese. The aims of this thesaurus were to collect the subject terms in the domain of educational technology, facilitate the standardization of terminology and promote the communication between Chinese researchers and scholars from various…

  18. BASE MAP DATASET, LE FLORE COUNTY, OKLAHOMA, USA

    Data.gov (United States)

    Federal Emergency Management Agency, Department of Homeland Security — Basemap datasets comprise six of the seven FGDC themes of geospatial data that are used by most GIS applications (Note: the seventh framework theme, orthographic...

  19. The New Planetary Science Archive (PSA): Exploration and Discovery of Scientific Datasets from ESA's Planetary Missions

    Science.gov (United States)

    Heather, David; Besse, Sebastien; Vallat, Claire; Barbarisi, Isa; Arviset, Christophe; De Marchi, Guido; Barthelemy, Maud; Coia, Daniela; Costa, Marc; Docasal, Ruben; Fraga, Diego; Grotheer, Emmanuel; Lim, Tanya; MacFarlane, Alan; Martinez, Santa; Rios, Carlos; Vallejo, Fran; Saiz, Jaime

    2017-04-01

    -to-date with versions 3 and 4 of the PDS standards, as PDS4 will be used for ESA's ExoMars and upcoming BepiColombo missions. Users will have direct access to documentation, information and tools that are relevant to the scientific use of the dataset, including ancillary datasets, Software Interface Specification (SIS) documents, and any tools/help that the PSA team can provide. The new PSA interface was released in January 2017. The home page provides a direct and simple access to the scientific data, aiming to help scientists to discover and explore its content. The archive can be explored through a set of parameters that allow the selection of products through space and time. Quick views provide information needed for the selection of appropriate scientific products. During 2017, the PSA team will focus their efforts on developing a map search interface using GIS technologies to display ESA planetary datasets, an image gallery providing navigation through images to explore the datasets, and interoperability with international partners. This will be done in parallel with additional metadata searchable through the interface (i.e., geometry), and with a dedication to improve the content of 20 years of space exploration.

  20. Exudate-based diabetic macular edema detection in fundus images using publicly available datasets

    Energy Technology Data Exchange (ETDEWEB)

    Giancardo, Luca [ORNL; Meriaudeau, Fabrice [ORNL; Karnowski, Thomas Paul [ORNL; Li, Yaquin [University of Tennessee, Knoxville (UTK); Garg, Seema [University of North Carolina; Tobin Jr, Kenneth William [ORNL; Chaum, Edward [University of Tennessee, Knoxville (UTK)

    2011-01-01

    Diabetic macular edema (DME) is a common vision threatening complication of diabetic retinopathy. In a large scale screening environment DME can be assessed by detecting exudates (a type of bright lesions) in fundus images. In this work, we introduce a new methodology for diagnosis of DME using a novel set of features based on colour, wavelet decomposition and automatic lesion segmentation. These features are employed to train a classifier able to automatically diagnose DME through the presence of exudation. We present a new publicly available dataset with ground-truth data containing 169 patients from various ethnic groups and levels of DME. This and other two publicly available datasets are employed to evaluate our algorithm. We are able to achieve diagnosis performance comparable to retina experts on the MESSIDOR (an independently labelled dataset with 1200 images) with cross-dataset testing (e.g., the classifier was trained on an independent dataset and tested on MESSIDOR). Our algorithm obtained an AUC between 0.88 and 0.94 depending on the dataset/features used. Additionally, it does not need ground truth at lesion level to reject false positives and is computationally efficient, as it generates a diagnosis on an average of 4.4 s (9.3 s, considering the optic nerve localization) per image on an 2.6 GHz platform with an unoptimized Matlab implementation.