WorldWideScience

Sample records for eadgene chicken data-set

  1. Gene set analysis of the EADGENE chicken data-set

    DEFF Research Database (Denmark)

    Skarman, Axel; Jiang, Li; Hornshøj, Henrik

    2009-01-01

     Abstract Background: Gene set analysis is considered to be a way of improving our biological interpretation of the observed expression patterns. This paper describes different methods applied to analyse expression data from a chicken DNA microarray dataset. Results: Applying different gene set...... analyses to the chicken expression data led to different ranking of the Gene Ontology terms tested. A method for prediction of possible annotations was applied. Conclusion: Biological interpretation based on gene set analyses dependent on the statistical method used. Methods for predicting the possible...

  2. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Jaffrézic, Florence; de Koning, Dirk-Jan; Boettcher, Paul J

    2007-01-01

    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical...... methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two...... quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised...

  3. Analysis of the real EADGENE data set:

    DEFF Research Database (Denmark)

    Sørensen, Peter; Bonnet, Agnès; Buitenhuis, Bart

    2007-01-01

    The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) ...

  4. Data set for the proteomic inventory and quantitative analysis of chicken uterine fluid during eggshell biomineralization

    Directory of Open Access Journals (Sweden)

    Pauline Marie

    2014-12-01

    Full Text Available Chicken eggshell is the protective barrier of the egg. It is a biomineral composed of 95% calcium carbonate on calcitic form and 3.5% organic matrix proteins. Mineralization process occurs in uterus into the uterine fluid. This acellular fluid contains ions and organic matrix proteins precursors which are interacting with the mineral phase and control crystal growth, eggshell structure and mechanical properties. We performed a proteomic approach and identified 308 uterine fluid proteins. Gene Ontology terms enrichments were determined to investigate their potential functions. Mass spectrometry analyses were also combined to label free quantitative analysis to determine the relative abundance of 96 proteins at initiation, rapid growth phase and termination of shell calcification. Sixty four showed differential abundance according to the mineralization stage. Their potential functions have been annotated. The complete proteomic, bioinformatic and functional analyses are reported in Marie et al., J. Proteomics (2015 [1].

  5. The EADGENE and SABRE post-analyses workshop

    DEFF Research Database (Denmark)

    Jaffrezic, Florence; Hedegaard, Jakob; Sancristobal, Magali

    2009-01-01

    of phenotypic outcomes using gene expression results. Prior to the workshop, we distributed two sets of data to the workshop participants. The first set of gene expression data deals with experimental challenge of chicken with two types of Eimeria. This experiment is described in some detail in one...

  6. Data set for the proteomic inventory and quantitative analysis of chicken eggshell matrix proteins during the primary events of eggshell mineralization and the active growth phase of calcification.

    Science.gov (United States)

    Marie, Pauline; Labas, Valérie; Brionne, Aurélien; Harichaux, Grégoire; Hennequet-Antier, Christelle; Rodriguez-Navarro, Alejandro B; Nys, Yves; Gautron, Joël

    2015-09-01

    Chicken eggshell is a biomineral composed of 95% calcite calcium carbonate mineral and of 3.5% organic matrix proteins. The assembly of mineral and its structural organization is controlled by its organic matrix. In a recent study [1], we have used quantitative proteomic, bioinformatic and functional analyses to explore the distribution of 216 eggshell matrix proteins at four key stages of shell mineralization defined as: (1) widespread deposition of amorphous calcium carbonate (ACC), (2) ACC transformation into crystalline calcite aggregates, (3) formation of larger calcite crystal units and (4) rapid growth of calcite as columnar structure with preferential crystal orientation. The current article detailed the quantitative analysis performed at the four stages of shell mineralization to determine the proteins which are the most abundant. Additionally, we reported the enriched GO terms and described the presence of 35 antimicrobial proteins equally distributed at all stages to keep the egg free of bacteria and of 81 proteins, the function of which could not be ascribed.

  7. Data set for the proteomic inventory and quantitative analysis of chicken eggshell matrix proteins during the primary events of eggshell mineralization and the active growth phase of calcification

    Directory of Open Access Journals (Sweden)

    Pauline Marie

    2015-09-01

    Full Text Available Chicken eggshell is a biomineral composed of 95% calcite calcium carbonate mineral and of 3.5% organic matrix proteins. The assembly of mineral and its structural organization is controlled by its organic matrix. In a recent study [1], we have used quantitative proteomic, bioinformatic and functional analyses to explore the distribution of 216 eggshell matrix proteins at four key stages of shell mineralization defined as: (1 widespread deposition of amorphous calcium carbonate (ACC, (2 ACC transformation into crystalline calcite aggregates, (3 formation of larger calcite crystal units and (4 rapid growth of calcite as columnar structure with preferential crystal orientation. The current article detailed the quantitative analysis performed at the four stages of shell mineralization to determine the proteins which are the most abundant. Additionally, we reported the enriched GO terms and described the presence of 35 antimicrobial proteins equally distributed at all stages to keep the egg free of bacteria and of 81 proteins, the function of which could not be ascribed.

  8. Large Data Set Mining

    NARCIS (Netherlands)

    Leemans, I.B.; Broomhall, Susan

    2017-01-01

    Digital emotion research has yet to make history. Until now large data set mining has not been a very active field of research in early modern emotion studies. This is indeed surprising since first, the early modern field has such rich, copyright-free, digitized data sets and second, emotion studies

  9. General Paleoclimatology Data Sets

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — Data of past climate and environment derived from unusual proxy evidence. Parameter keywords describe what was measured in this data set. Additional summary...

  10. Analysis of successive data sets

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan; Breeuwer, Marcel; Haselhoff, Eltjo Hans

    2008-01-01

    The invention relates to the analysis of successive data sets. A local intensity variation is formed from such successive data sets, that is, from data values in successive data sets at corresponding positions in each of the data sets. A region of interest is localized in the individual data sets on

  11. Analysis of successive data sets

    NARCIS (Netherlands)

    Spreeuwers, Lieuwe Jan; Breeuwer, Marcel; Haselhoff, Eltjo Hans

    2002-01-01

    The invention relates to the analysis of successive data sets. A local intensity variation is formed from such successive data sets, that is, from data values in successive data sets at corresponding positions in each of the data sets. A region of interest is localized in the individual data sets on

  12. The IRI marketing data set

    NARCIS (Netherlands)

    Bronnenberg, B.J.; Kruger, M.W.; Mela, C.

    2008-01-01

    This paper describes a new data set available to academic researchers (at the following website: http://mktsci.pubs.informs.org). These data are comprised of store sales and consumer panel data for 30 product categories. The store sales data contain 5 years of product sales, pricing, and promotion

  13. Data Sets from Major NCI Initiaves

    Science.gov (United States)

    The NCI Data Catalog includes links to data collections produced by major NCI initiatives and other widely used data sets, including animal models, human tumor cell lines, epidemiology data sets, genomics data sets from TCGA, TARGET, COSMIC, GSK, NCI60.

  14. Wind Integration Data Sets | Grid Modernization | NREL

    Science.gov (United States)

    Wind Integration Data Sets Wind Integration Data Sets NREL's wind integration data sets provide the Integration Data Sets Ten-minute time-series wind data for 2004, 2005, and 2006 to help energy professionals perform wind integration studies and estimate power production from hypothetical wind power plants. Access

  15. Initial data sets for the Schwarzschild spacetime

    International Nuclear Information System (INIS)

    Gomez-Lobo, Alfonso Garcia-Parrado; Kroon, Juan A. Valiente

    2007-01-01

    A characterization of initial data sets for the Schwarzschild spacetime is provided. This characterization is obtained by performing a 3+1 decomposition of a certain invariant characterization of the Schwarzschild spacetime given in terms of concomitants of the Weyl tensor. This procedure renders a set of necessary conditions--which can be written in terms of the electric and magnetic parts of the Weyl tensor and their concomitants--for an initial data set to be a Schwarzschild initial data set. Our approach also provides a formula for a static Killing initial data set candidate--a KID candidate. Sufficient conditions for an initial data set to be a Schwarzschild initial data set are obtained by supplementing the necessary conditions with the requirement that the initial data set possesses a stationary Killing initial data set of the form given by our KID candidate. Thus, we obtain an algorithmic procedure of checking whether a given initial data set is Schwarzschildean or not

  16. Solar Integration Data Sets | Grid Modernization | NREL

    Science.gov (United States)

    Solar Integration Data Sets Solar Integration Data Sets NREL provides the energy community with for Integration Studies Modeled solar data for energy professionals-such as transmission planners , utility planners, project developers, and university researchers-who perform solar integration studies and

  17. Ontology-based geographic data set integration

    NARCIS (Netherlands)

    Uitermark, H.T.J.A.; Uitermark, Harry T.; Oosterom, Peter J.M.; Mars, Nicolaas; Molenaar, Martien; Molenaar, M.

    1999-01-01

    In order to develop a system to propagate updates we investigate the semantic and spatial relationships between independently produced geographic data sets of the same region (data set integration). The goal of this system is to reduce operator intervention in update operations between corresponding

  18. Evaluation of integrated data sets: four examples

    International Nuclear Information System (INIS)

    Bolivar, S.L.; Freeman, S.B.; Weaver, T.A.

    1982-01-01

    Several large data sets have been integrated and utilized for rapid evaluation on a reconnaissance scale for the Montrose 1 0 x 2 0 quadrangle, Colorado. The data sets include Landsat imagery, hydrogeochemical and stream sediment analyses, airborne geophysical data, known mineral occurrences, and a geologic map. All data sets were registered to a 179 x 119 rectangular grid and projected onto Universal Transverse Mercator coordinates. A grid resolution of 1 km was used. All possible combinations of three, for most data sets, were examined for general geologic correlations by utilizing a color microfilm output. In addition, gray-level pictures of statistical output, e.g., factor analysis, have been employed to aid evaluations. Examples for the data sets dysprosium-calcium, lead-copper-zinc, and equivalent uranium-uranium in water-uranium in sediment are described with respect to geologic applications, base-metal regimes, and geochemical associations

  19. Accuracy in Robot Generated Image Data Sets

    DEFF Research Database (Denmark)

    Aanæs, Henrik; Dahl, Anders Bjorholm

    2015-01-01

    In this paper we present a practical innovation concerning how to achieve high accuracy of camera positioning, when using a 6 axis industrial robots to generate high quality data sets for computer vision. This innovation is based on the realization that to a very large extent the robots positioning...... error is deterministic, and can as such be calibrated away. We have successfully used this innovation in our efforts for creating data sets for computer vision. Since the use of this innovation has a significant effect on the data set quality, we here present it in some detail, to better aid others...

  20. Uniform Facility Data Set US (UFDS-1997)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Uniform Facility Data Set (UFDS), formerly the National Drug and Alcohol Treatment Unit Survey or NDATUS, was designed to measure the scope and use of drug abuse...

  1. SIS - Species and Stock Administrative Data Set

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Species and Stock Administrative data set within the Species Information System (SIS) defines entities within the database that serve as the basis for recording...

  2. Uniform Facility Data Set US (UFDS-1998)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Uniform Facility Data Set (UFDS) was designed to measure the scope and use of drug abuse treatment services in the United States. The survey collects information...

  3. 2010 Federal STEM Education Inventory Data Set

    Data.gov (United States)

    Office of Science and Technology Policy, Executive Office of the President — This data set provides information for STEM education (pre-kindergarten through graduate) investments funded by Federal agencies at the level of $300,000 or above.

  4. Wind and solar resource data sets

    DEFF Research Database (Denmark)

    Clifton, Andrew; Hodge, Bri-Mathias; Draxl, Caroline

    2017-01-01

    The range of resource data sets spans from static cartography showing the mean annual wind speed or solar irradiance across a region to high temporal and high spatial resolution products that provide detailed information at a potential wind or solar energy facility. These data sets are used...... to support continental-scale, national, or regional renewable energy development; facilitate prospecting by developers; and enable grid integration studies. This review first provides an introduction to the wind and solar resource data sets, then provides an overview of the common methods used...... for their creation and validation. A brief history of wind and solar resource data sets is then presented, followed by areas for future research. For further resources related to this article, please visit the WIREs website....

  5. Health Outcomes Survey - Limited Data Set

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Medicare Health Outcomes Survey (HOS) limited data sets (LDS) are comprised of the entire national sample for a given 2-year cohort (including both respondents...

  6. Long Term Care Minimum Data Set (MDS)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Long-Term Care Minimum Data Set (MDS) is a standardized, primary screening and assessment tool of health status that forms the foundation of the comprehensive...

  7. Complex Urban LiDAR Data Set

    OpenAIRE

    Jeong, Jinyong; Cho, Younggun; Shin, Young-Sik; Roh, Hyunchul; Kim, Ayoung

    2018-01-01

    This paper presents a Light Detection and Ranging (LiDAR) data set that targets complex urban environments. Urban environments with high-rise buildings and congested traffic pose a significant challenge for many robotics applications. The presented data set is unique in the sense it is able to capture the genuine features of an urban environment (e.g. metropolitan areas, large building complexes and underground parking lots). Data of two-dimensional (2D) and threedimensional (3D) LiDAR, which...

  8. Search Engine Customization and Data Set Builder

    OpenAIRE

    Arias Moreno, Fco Javier

    2009-01-01

    There are two core objectives in this work: firstly, to build a data set, and secondly, to customize a search engine. The first objective is to design and implement a data set builder. There are two steps required for this. The first step is to build a crawler. The second step is to include a cleaner. The crawler collects Web links. The cleaner extracts the main content and removes noise from the files crawled. The goal of this application is crawling Web news sites to find the...

  9. Multiple variables data sets visualization in ROOT

    International Nuclear Information System (INIS)

    Couet, O

    2008-01-01

    The ROOT graphical framework provides support for many different functions including basic graphics, high-level visualization techniques, output on files, 3D viewing etc. They use well-known world standards to render graphics on screen, to produce high-quality output files, and to generate images for Web publishing. Many techniques allow visualization of all the basic ROOT data types, but the graphical framework was still a bit weak in the visualization of multiple variables data sets. This paper presents latest developments done in the ROOT framework to visualize multiple variables (>4) data sets

  10. EFFECTIVE SUMMARY FOR MASSIVE DATA SET

    Directory of Open Access Journals (Sweden)

    A. Radhika

    2015-07-01

    Full Text Available The research efforts attempt to investigate size of the data increasing interest in designing the effective algorithm for space and time reduction. Providing high-dimensional technique over large data set is difficult. However, Randomized techniques are used for analyzing the data set where the performance of the data from part of storage in networks needs to be collected and analyzed continuously. Previously collaborative filtering approach is used for finding the similar patterns based on the user ranking but the outcomes are not observed yet. Linear approach requires high running time and more space. To overcome this sketching technique is used to represent massive data sets. Sketching allows short fingerprints of the item sets of users which allow approximately computing similarity between sets of different users. The concept of sketching is to generate minimum subset of record that executes all the original records. Sketching performs two techniques dimensionality reduction which reduces rows or columns and data reduction. It is proved that sketching can be performed using Principal Component Analysis for finding index value

  11. Earth Observing System precursor data sets

    Science.gov (United States)

    Mah, Grant R.; Eidenshink, Jeff C.; Sheffield, K. W.; Myers, Jeffrey S.

    1993-08-01

    The Land Processes Distributed Active Archive Center (DAAC) is archiving and processing precursor data from airborne and spaceborne instruments such as the thermal infrared multispectral scanner (TIMS), the NS-001 and thematic mapper simulators (TMS), and the advanced very high resolution radiometer (AVHRR). The instrument data are being used to construct data sets that simulate the spectral and spatial characteristics of the advanced spaceborne thermal emission and reflection radiometer (ASTER) and the moderate resolution imaging spectrometer (MODIS) flight instruments scheduled to be flown on the EOS-AM spacecraft. Ames Research Center has developed and is flying a MODIS airborne simulator (MAS), which provides coverage in both MODIS and ASTER bands. A simulation of an ASTER data set over Death Valley, California has been constructed using a combination of TMS and TIMS data, along with existing digital elevation models that were used to develop the topographic information. MODIS data sets are being simulated by using MAS for full-band site coverage at high resolution and AVHRR for global coverage at 1 km resolution.

  12. Developing a Data-Set for Stereopsis

    Directory of Open Access Journals (Sweden)

    D.W Hunter

    2014-08-01

    Full Text Available Current research on binocular stereopsis in humans and non-human primates has been limited by a lack of available data-sets. Current data-sets fall into two categories; stereo-image sets with vergence but no ranging information (Hibbard, 2008, Vision Research, 48(12, 1427-1439 or combinations of depth information with binocular images and video taken from cameras in fixed fronto-parallel configurations exhibiting neither vergence or focus effects (Hirschmuller & Scharstein, 2007, IEEE Conf. Computer Vision and Pattern Recognition. The techniques for generating depth information are also imperfect. Depth information is normally inaccurate or simply missing near edges and on partially occluded surfaces. For many areas of vision research these are the most interesting parts of the image (Goutcher, Hunter, Hibbard, 2013, i-Perception, 4(7, 484; Scarfe & Hibbard, 2013, Vision Research. Using state-of-the-art open-source ray-tracing software (PBRT as a back-end, our intention is to release a set of tools that will allow researchers in this field to generate artificial binocular stereoscopic data-sets. Although not as realistic as photographs, computer generated images have significant advantages in terms of control over the final output and ground-truth information about scene depth is easily calculated at all points in the scene, even partially occluded areas. While individual researchers have been developing similar stimuli by hand for many decades, we hope that our software will greatly reduce the time and difficulty of creating naturalistic binocular stimuli. Our intension in making this presentation is to elicit feedback from the vision community about what sort of features would be desirable in such software.

  13. Data set for Tifinagh handwriting character recognition

    Directory of Open Access Journals (Sweden)

    Omar Bencharef

    2015-09-01

    Full Text Available The Tifinagh alphabet-IRCAM is the official alphabet of the Amazigh language widely used in North Africa [1]. It includes thirty-one basic letter and two letters each composed of a base letter followed by the sign of labialization. Normalized only in 2003 (Unicode [2], ICRAM-Tifinagh is a young character repertoire. Which needs more work on all levels. In this context we propose a data set for handwritten Tifinagh characters composed of 1376 image; 43 Image For Each character. The dataset can be used to train a Tifinagh character recognition system, or to extract the meaning characteristics of each character.

  14. Multidimensional scaling for large genomic data sets

    Directory of Open Access Journals (Sweden)

    Lu Henry

    2008-04-01

    Full Text Available Abstract Background Multi-dimensional scaling (MDS is aimed to represent high dimensional data in a low dimensional space with preservation of the similarities between data points. This reduction in dimensionality is crucial for analyzing and revealing the genuine structure hidden in the data. For noisy data, dimension reduction can effectively reduce the effect of noise on the embedded structure. For large data set, dimension reduction can effectively reduce information retrieval complexity. Thus, MDS techniques are used in many applications of data mining and gene network research. However, although there have been a number of studies that applied MDS techniques to genomics research, the number of analyzed data points was restricted by the high computational complexity of MDS. In general, a non-metric MDS method is faster than a metric MDS, but it does not preserve the true relationships. The computational complexity of most metric MDS methods is over O(N2, so that it is difficult to process a data set of a large number of genes N, such as in the case of whole genome microarray data. Results We developed a new rapid metric MDS method with a low computational complexity, making metric MDS applicable for large data sets. Computer simulation showed that the new method of split-and-combine MDS (SC-MDS is fast, accurate and efficient. Our empirical studies using microarray data on the yeast cell cycle showed that the performance of K-means in the reduced dimensional space is similar to or slightly better than that of K-means in the original space, but about three times faster to obtain the clustering results. Our clustering results using SC-MDS are more stable than those in the original space. Hence, the proposed SC-MDS is useful for analyzing whole genome data. Conclusion Our new method reduces the computational complexity from O(N3 to O(N when the dimension of the feature space is far less than the number of genes N, and it successfully

  15. Benchmark data set for wheat growth models

    DEFF Research Database (Denmark)

    Asseng, S; Ewert, F.; Martre, P

    2015-01-01

    The data set includes a current representative management treatment from detailed, quality-tested sentinel field experiments with wheat from four contrasting environments including Australia, The Netherlands, India and Argentina. Measurements include local daily climate data (solar radiation, max...... analysis with 26 models and 30 years (1981-2010) for each location, for elevated atmospheric CO2 and temperature changes, a heat stress sensitivity analysis at anthesis, and a sensitivity analysis with soil and crop management variations and a Global Climate Model end-century scenario....

  16. A FGGE water vapor wind data set

    Science.gov (United States)

    Stewart, Tod R.; Hayden, Christopher M.

    1985-01-01

    It has been recognized for some time that water vapor structure visible in infrared imagery offers a potential for obtaining motion vectors when several images are considered in sequence (Fischer et al., 1981). A study evaluating water vapor winds obtained from the VISSR atmospheric sounder (Stewart et al., 1985) has confirmed the viability of the approach. More recently, 20 data sets have been produced from METEOSAT water vapor imagery for the FGGE period of 10-25 November 1979. Where possible, two data sets were prepared for each day at 0000 and 1200 GMT and compared with rawinsondes over Europe, Africa, and aircraft observations over the oceans. Procedures for obtaining winds were, in general, similar to the earlier study. Motions were detected both by a single pixel tracking and a cross correlation method by using three images individually separated by one hour. A height assignment was determined by matching the measured brightness temperature to the temperature structure represented by the FGGE-IIIB analyses. Results show that the METEOSAT water vapor winds provide uniform horizontal coverage of mid-level flow over the globe with good accuracy.

  17. Wind and solar resource data sets: Wind and solar resource data sets

    Energy Technology Data Exchange (ETDEWEB)

    Clifton, Andrew [National Renewable Energy Laboratory, Golden CO USA; Hodge, Bri-Mathias [National Renewable Energy Laboratory, Golden CO USA; Power Systems Engineering Center, National Renewable Energy Laboratory, Golden CO USA; Draxl, Caroline [National Renewable Energy Laboratory, Golden CO USA; National Wind Technology Center, National Renewable Energy Laboratory, Golden CO USA; Badger, Jake [Department of Wind Energy, Danish Technical University, Copenhagen Denmark; Habte, Aron [National Renewable Energy Laboratory, Golden CO USA; Power Systems Engineering Center, National Renewable Energy Laboratory, Golden CO USA

    2017-12-05

    The range of resource data sets spans from static cartography showing the mean annual wind speed or solar irradiance across a region to high temporal and high spatial resolution products that provide detailed information at a potential wind or solar energy facility. These data sets are used to support continental-scale, national, or regional renewable energy development; facilitate prospecting by developers; and enable grid integration studies. This review first provides an introduction to the wind and solar resource data sets, then provides an overview of the common methods used for their creation and validation. A brief history of wind and solar resource data sets is then presented, followed by areas for future research.

  18. The Data Set on the Multiple Abilities

    DEFF Research Database (Denmark)

    Klynge, Alice Heegaard

    2008-01-01

    This paper presents a data set on multiple abilities. The abilities cover the Literacy and Math Ability, the Creative and Innovative Ability, the Learning Ability, the Communication Ability, the Social Competency, the Self-Management Ability, the Environmental Awareness, the Civic Competency......, the Intercultural Awareness, and the Health Awareness. The data stems from a unique cross-sectional survey carried out for the adult population in Denmark. Several dimensions and many questions pinpoint and measure every ability. The dimensions cover areas such as the individual behavior at work, the individual...... behavior in leisure, the motivation for using an ability, the working conditions for using an ability, and the educational conditions for using an ability. The paper defines every ability and describes the dimensions and the questions underlying the abilities. It reports the categories of answers...

  19. Entropy estimates of small data sets

    Energy Technology Data Exchange (ETDEWEB)

    Bonachela, Juan A; Munoz, Miguel A [Departamento de Electromagnetismo y Fisica de la Materia and Instituto de Fisica Teorica y Computacional Carlos I, Facultad de Ciencias, Universidad de Granada, 18071 Granada (Spain); Hinrichsen, Haye [Fakultaet fuer Physik und Astronomie, Universitaet Wuerzburg, Am Hubland, 97074 Wuerzburg (Germany)

    2008-05-23

    Estimating entropies from limited data series is known to be a non-trivial task. Naive estimations are plagued with both systematic (bias) and statistical errors. Here, we present a new 'balanced estimator' for entropy functionals (Shannon, Renyi and Tsallis) specially devised to provide a compromise between low bias and small statistical errors, for short data series. This new estimator outperforms other currently available ones when the data sets are small and the probabilities of the possible outputs of the random variable are not close to zero. Otherwise, other well-known estimators remain a better choice. The potential range of applicability of this estimator is quite broad specially for biological and digital data series. (fast track communication)

  20. Entropy estimates of small data sets

    International Nuclear Information System (INIS)

    Bonachela, Juan A; Munoz, Miguel A; Hinrichsen, Haye

    2008-01-01

    Estimating entropies from limited data series is known to be a non-trivial task. Naive estimations are plagued with both systematic (bias) and statistical errors. Here, we present a new 'balanced estimator' for entropy functionals (Shannon, Renyi and Tsallis) specially devised to provide a compromise between low bias and small statistical errors, for short data series. This new estimator outperforms other currently available ones when the data sets are small and the probabilities of the possible outputs of the random variable are not close to zero. Otherwise, other well-known estimators remain a better choice. The potential range of applicability of this estimator is quite broad specially for biological and digital data series. (fast track communication)

  1. Spatial occupancy models for large data sets

    Science.gov (United States)

    Johnson, Devin S.; Conn, Paul B.; Hooten, Mevin B.; Ray, Justina C.; Pond, Bruce A.

    2013-01-01

    Since its development, occupancy modeling has become a popular and useful tool for ecologists wishing to learn about the dynamics of species occurrence over time and space. Such models require presence–absence data to be collected at spatially indexed survey units. However, only recently have researchers recognized the need to correct for spatially induced overdisperison by explicitly accounting for spatial autocorrelation in occupancy probability. Previous efforts to incorporate such autocorrelation have largely focused on logit-normal formulations for occupancy, with spatial autocorrelation induced by a random effect within a hierarchical modeling framework. Although useful, computational time generally limits such an approach to relatively small data sets, and there are often problems with algorithm instability, yielding unsatisfactory results. Further, recent research has revealed a hidden form of multicollinearity in such applications, which may lead to parameter bias if not explicitly addressed. Combining several techniques, we present a unifying hierarchical spatial occupancy model specification that is particularly effective over large spatial extents. This approach employs a probit mixture framework for occupancy and can easily accommodate a reduced-dimensional spatial process to resolve issues with multicollinearity and spatial confounding while improving algorithm convergence. Using open-source software, we demonstrate this new model specification using a case study involving occupancy of caribou (Rangifer tarandus) over a set of 1080 survey units spanning a large contiguous region (108 000 km2) in northern Ontario, Canada. Overall, the combination of a more efficient specification and open-source software allows for a facile and stable implementation of spatial occupancy models for large data sets.

  2. Handling Imbalanced Data Sets in Multistage Classification

    Science.gov (United States)

    López, M.

    Multistage classification is a logical approach, based on a divide-and-conquer solution, for dealing with problems with a high number of classes. The classification problem is divided into several sequential steps, each one associated to a single classifier that works with subgroups of the original classes. In each level, the current set of classes is split into smaller subgroups of classes until they (the subgroups) are composed of only one class. The resulting chain of classifiers can be represented as a tree, which (1) simplifies the classification process by using fewer categories in each classifier and (2) makes it possible to combine several algorithms or use different attributes in each stage. Most of the classification algorithms can be biased in the sense of selecting the most populated class in overlapping areas of the input space. This can degrade a multistage classifier performance if the training set sample frequencies do not reflect the real prevalence in the population. Several techniques such as applying prior probabilities, assigning weights to the classes, or replicating instances have been developed to overcome this handicap. Most of them are designed for two-class (accept-reject) problems. In this article, we evaluate several of these techniques as applied to multistage classification and analyze how they can be useful for astronomy. We compare the results obtained by classifying a data set based on Hipparcos with and without these methods.

  3. International spinal cord injury cardiovascular function basic data set

    DEFF Research Database (Denmark)

    Krassioukov, A; Alexander, M S; Karlsson, Anders Hans

    2010-01-01

    To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets.......To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets....

  4. International Spinal Cord Injury Male Sexual Function Basic Data Set

    DEFF Research Database (Denmark)

    Alexander, M S; Biering-Sørensen, F; Elliott, S

    2011-01-01

    To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets.......To create the International Spinal Cord Injury (SCI) Male Sexual Function Basic Data Set within the International SCI Data Sets....

  5. Chicken Picadillo

    Science.gov (United States)

    ... this page: https://medlineplus.gov/recipe/chickenpicadillo.html Chicken Picadillo To use the sharing features on this ... together on a busy weeknight Ingredients 1 pound chicken breast, boneless, skinless, cut into thin strips 2 ...

  6. Chicken Stew

    Science.gov (United States)

    ... this page: https://medlineplus.gov/recipe/chickenstew.html Chicken Stew To use the sharing features on this ... leftovers for lunch the next day! Ingredients 8 chicken pieces (breasts or legs) 1 cup water 2 ...

  7. Users Manual for TMY3 Data Sets (Revised)

    Energy Technology Data Exchange (ETDEWEB)

    Wilcox, S.; Marion, W.

    2008-05-01

    This users manual describes how to obtain and interpret the data in the Typical Meteorological Year version 3 (TMY3) data sets. These data sets are an update to the TMY2 data released by NREL in 1994.

  8. International spinal cord injury musculoskeletal basic data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, Fin; Burns, A S; Curt, A

    2012-01-01

    To develop an International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set as part of the International SCI Data Sets to facilitate consistent collection and reporting of basic musculoskeletal findings in the SCI population.Setting:International.......To develop an International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set as part of the International SCI Data Sets to facilitate consistent collection and reporting of basic musculoskeletal findings in the SCI population.Setting:International....

  9. International spinal cord injury pulmonary function basic data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, Fin; Krassioukov, A; Alexander, M S

    2012-01-01

    To develop the International Spinal Cord Injury (SCI) Pulmonary Function Basic Data Set within the framework of the International SCI Data Sets in order to facilitate consistent collection and reporting of basic bronchopulmonary findings in the SCI population.......To develop the International Spinal Cord Injury (SCI) Pulmonary Function Basic Data Set within the framework of the International SCI Data Sets in order to facilitate consistent collection and reporting of basic bronchopulmonary findings in the SCI population....

  10. Nuclear data sets for reactor design calculations - approved 1975

    International Nuclear Information System (INIS)

    Anon.

    1978-01-01

    This standard identifies and describes the specifications for developing, preparing, and documenting nuclear data sets to be used in reactor design calculations. The specifications include (a) criteria for acceptance of evaluated nuclear data sets, (b) criteria for processing evaluated data and preparation of processed continuous data and averaged data sets, and (c) identification of specific evaluated, processed continuous, and averaged data sets which meet these criteria for specific reactor types

  11. American National Standard: nuclear data sets for reactor design calculations

    International Nuclear Information System (INIS)

    1983-01-01

    This standard identifies and describes the specifications for developing, preparing, and documenting nuclear data sets to be used in reactor design calculations. The specifications include criteria for acceptance of evaluated nuclear data sets, criteria for processing evaluated data and preparation of processed continuous data and averaged data sets, and identification of specific evaluated, processed continuous, and averaged data sets which meet these criteria for specific reactor types

  12. American National Standard nuclear data sets for reactor design calculations

    International Nuclear Information System (INIS)

    Anon.

    1975-01-01

    A standard is presented which identifies and describes the specifications for developing, preparing, and documenting nuclear data sets to be used in reactor design calculations. The specifications include (a) criteria for acceptance of evaluated nuclear data sets, (b) criteria for processing evaluated data and preparation of processed continuous data and averaged data sets, and (c) identification of specific evaluated, processed continuous, and averaged data sets which meet these criteria for specific reactor types

  13. Ontology-based integration of topographic data sets

    NARCIS (Netherlands)

    Uitermark, HT; van Oosterom, PJM; Mars, NJI; Molenaar, M

    The integration of topographic data sets is defined as the process of establishing relationships between corresponding object instances in different, autonomously produced, topographic data sets of the same geographic space. The problem of integrating topographic data sets is in finding these

  14. Dissecting random and systematic differences between noisy composite data sets.

    Science.gov (United States)

    Diederichs, Kay

    2017-04-01

    Composite data sets measured on different objects are usually affected by random errors, but may also be influenced by systematic (genuine) differences in the objects themselves, or the experimental conditions. If the individual measurements forming each data set are quantitative and approximately normally distributed, a correlation coefficient is often used to compare data sets. However, the relations between data sets are not obvious from the matrix of pairwise correlations since the numerical value of the correlation coefficient is lowered by both random and systematic differences between the data sets. This work presents a multidimensional scaling analysis of the pairwise correlation coefficients which places data sets into a unit sphere within low-dimensional space, at a position given by their CC* values [as defined by Karplus & Diederichs (2012), Science, 336, 1030-1033] in the radial direction and by their systematic differences in one or more angular directions. This dimensionality reduction can not only be used for classification purposes, but also to derive data-set relations on a continuous scale. Projecting the arrangement of data sets onto the subspace spanned by systematic differences (the surface of a unit sphere) allows, irrespective of the random-error levels, the identification of clusters of closely related data sets. The method gains power with increasing numbers of data sets. It is illustrated with an example from low signal-to-noise ratio image processing, and an application in macromolecular crystallography is shown, but the approach is completely general and thus should be widely applicable.

  15. Delve: A Data Set Retrieval and Document Analysis System

    KAUST Repository

    Akujuobi, Uchenna Thankgod

    2017-12-29

    Academic search engines (e.g., Google scholar or Microsoft academic) provide a medium for retrieving various information on scholarly documents. However, most of these popular scholarly search engines overlook the area of data set retrieval, which should provide information on relevant data sets used for academic research. Due to the increasing volume of publications, it has become a challenging task to locate suitable data sets on a particular research area for benchmarking or evaluations. We propose Delve, a web-based system for data set retrieval and document analysis. This system is different from other scholarly search engines as it provides a medium for both data set retrieval and real time visual exploration and analysis of data sets and documents.

  16. Analytic webs support the synthesis of ecological data sets.

    Science.gov (United States)

    Ellison, Aaron M; Osterweil, Leon J; Clarke, Lori; Hadley, Julian L; Wise, Alexander; Boose, Emery; Foster, David R; Hanson, Allen; Jensen, David; Kuzeja, Paul; Riseman, Edward; Schultz, Howard

    2006-06-01

    A wide variety of data sets produced by individual investigators are now synthesized to address ecological questions that span a range of spatial and temporal scales. It is important to facilitate such syntheses so that "consumers" of data sets can be confident that both input data sets and synthetic products are reliable. Necessary documentation to ensure the reliability and validation of data sets includes both familiar descriptive metadata and formal documentation of the scientific processes used (i.e., process metadata) to produce usable data sets from collections of raw data. Such documentation is complex and difficult to construct, so it is important to help "producers" create reliable data sets and to facilitate their creation of required metadata. We describe a formal representation, an "analytic web," that aids both producers and consumers of data sets by providing complete and precise definitions of scientific processes used to process raw and derived data sets. The formalisms used to define analytic webs are adaptations of those used in software engineering, and they provide a novel and effective support system for both the synthesis and the validation of ecological data sets. We illustrate the utility of an analytic web as an aid to producing synthetic data sets through a worked example: the synthesis of long-term measurements of whole-ecosystem carbon exchange. Analytic webs are also useful validation aids for consumers because they support the concurrent construction of a complete, Internet-accessible audit trail of the analytic processes used in the synthesis of the data sets. Finally we describe our early efforts to evaluate these ideas through the use of a prototype software tool, SciWalker. We indicate how this tool has been used to create analytic webs tailored to specific data-set synthesis and validation activities, and suggest extensions to it that will support additional forms of validation. The process metadata created by SciWalker is

  17. International urodynamic basic spinal cord injury data set.

    Science.gov (United States)

    Biering-Sørensen, F; Craggs, M; Kennelly, M; Schick, E; Wyndaele, J-J

    2008-07-01

    To create the International Urodynamic Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. International working group. The draft of the data set was developed by a working group consisting of members appointed by the Neurourology Committee of the International Continence Society, the European Association of Urology, the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version of the data set was developed after review and comments by members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested (international) organizations and societies (around 40) and persons and the ISCoS Council. Endorsement of the data set by relevant organizations and societies will be obtained. To make the data set uniform, each variable and each response category within each variable have been specifically defined in a way that is designed to promote the collection and reporting of comparable minimal data. Variables included in the International Urodynamic Basic SCI Data Set are date of data collection, bladder sensation during filling cystometry, detrusor function, compliance during filing cystometry, function during voiding, detrusor leak point pressure, maximum detrusor pressure, cystometric bladder capacity and post-void residual volume.

  18. International urinary tract imaging basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Biering-Sørensen, F; Craggs, M; Kennelly, M

    2008-01-01

    OBJECTIVE: To create an International Urinary Tract Imaging Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. SETTING: An international working group. METHODS: The draft of the Data Set was developed by a working group comprising members appointed...... of comparable minimal data. RESULTS: The variables included in the International Urinary Tract Imaging Basic SCI Data Set are the results obtained using the following investigations: intravenous pyelography or computer tomography urogram or ultrasound, X-ray, renography, clearance, cystogram, voiding cystogram...

  19. Measures for the characterisation of pattern-recognition data sets

    CSIR Research Space (South Africa)

    Van der Walt, Christiaan M

    2007-11-01

    Full Text Available artificial data sets to construct a meta-classifier. 4.1. Classifiers We will use model-based and discriminative classifiers to con- struct our meta-classifier; these classifiers are the Naı¨ve Bayes (NB), Gaussian (Gauss), Gaussian Mixture Model (GMM... of these classifiers for real-world data sets. 4.2. Artificial data We will make use of artificial data sets to construct a meta- classification training set; these artificial data sets are gener- ated with very specific data properties that influence classifi...

  20. Chicken Art

    Science.gov (United States)

    Bickett, Marianne

    2009-01-01

    In this article, the author describes how a visit from a flock of chickens provided inspiration for the children's chicken art. The gentle clucking of the hens, the rooster crowing, and the softness of the feathers all provided rich aural, tactile, visual, and emotional experiences. The experience affirms the importance and value of direct…

  1. Greenhouse Effect Detection Experiment (GEDEX). Selected data sets

    Science.gov (United States)

    Olsen, Lola M.; Warnock, Archibald, III

    1992-01-01

    This CD-ROM contains selected data sets compiled by the participants of the Greenhouse Effect Detection Experiment (GEDEX) workshop on atmospheric temperature. The data sets include surface, upper air, and/or satellite-derived measurements of temperature, solar irradiance, clouds, greenhouse gases, fluxes, albedo, aerosols, ozone, and water vapor, along with Southern Oscillation Indices and Quasi-Biennial Oscillation statistics.

  2. International Spinal Cord Injury Upper Extremity Basic Data Set

    DEFF Research Database (Denmark)

    Biering-Sørensen, F; Bryden, A; Curt, A

    2014-01-01

    OBJECTIVE: To develop an International Spinal Cord Injury (SCI) Upper Extremity Basic Data Set as part of the International SCI Data Sets, which facilitates consistent collection and reporting of basic upper extremity findings in the SCI population. SETTING: International. METHODS: A first draft...

  3. The International Spinal Cord Injury Pain Basic Data Set

    DEFF Research Database (Denmark)

    Widerstrom-Noga, E.; Bryce, T.; Cardenas, D.D.

    2008-01-01

    Objective:To develop a basic pain data set (International Spinal Cord Injury Basic Pain Data Set, ISCIPDS:B) within the framework of the International spinal cord injury (SCI) data sets that would facilitate consistent collection and reporting of pain in the SCI population.Setting:International.......Methods:The ISCIPDS:B was developed by a working group consisting of individuals with published evidence of expertise in SCI-related pain regarding taxonomy, psychophysics, psychology, epidemiology and assessment, and one representative of the Executive Committee of the International SCI Standards and Data Sets...... on suggestions from members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA and APS Boards, and the Neuropathic Pain Special Interest Group of the IASP, individual reviewers and societies and the ISCoS Council.Results:The final ISCIPDS:B contains...

  4. Detecting gallbladders in chicken livers using spectral analysis

    DEFF Research Database (Denmark)

    Jørgensen, Anders; Mølvig Jensen, Eigil; Moeslund, Thomas B.

    2015-01-01

    This paper presents a method for detecting gallbladders attached to chicken livers using spectral imaging. Gallbladders can contaminate good livers, making them unfit for human consumption. A data set consisting of chicken livers with and without gallbladders, has been captured using 33 wavelengths...

  5. International spinal cord injury pulmonary function basic data set.

    Science.gov (United States)

    Biering-Sørensen, F; Krassioukov, A; Alexander, M S; Donovan, W; Karlsson, A-K; Mueller, G; Perkash, I; Sheel, A William; Wecht, J; Schilero, G J

    2012-06-01

    To develop the International Spinal Cord Injury (SCI) Pulmonary Function Basic Data Set within the framework of the International SCI Data Sets in order to facilitate consistent collection and reporting of basic bronchopulmonary findings in the SCI population. International. The SCI Pulmonary Function Data Set was developed by an international working group. The initial data set document was revised on the basis of suggestions from members of the Executive Committee of the International SCI Standards and Data Sets, the International Spinal Cord Society (ISCoS) Executive and Scientific Committees, American Spinal Injury Association (ASIA) Board, other interested organizations and societies and individual reviewers. In addition, the data set was posted for 2 months on ISCoS and ASIA websites for comments. The final International SCI Pulmonary Function Data Set contains questions on the pulmonary conditions diagnosed before spinal cord lesion,if available, to be obtained only once; smoking history; pulmonary complications and conditions after the spinal cord lesion, which may be collected at any time. These data include information on pneumonia, asthma, chronic obstructive pulmonary disease and sleep apnea. Current utilization of ventilator assistance including mechanical ventilation, diaphragmatic pacing, phrenic nerve stimulation and Bi-level positive airway pressure can be reported, as well as results from pulmonary function testing includes: forced vital capacity, forced expiratory volume in one second and peak expiratory flow. The complete instructions for data collection and the data sheet itself are freely available on the website of ISCoS (http://www.iscos.org.uk).

  6. International urinary tract imaging basic spinal cord injury data set.

    Science.gov (United States)

    Biering-Sørensen, F; Craggs, M; Kennelly, M; Schick, E; Wyndaele, J-J

    2009-05-01

    To create an International Urinary Tract Imaging Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. An international working group. The draft of the Data Set was developed by a working group comprising members appointed by the Neurourology Committee of the International Continence Society, the European Association of Urology, the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version of the Data Set was developed after review and comments by members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested international organizations and societies (around 40), individual persons with specific expertise and the ISCoS Council. Endorsement of the Data Sets by relevant organizations and societies will be obtained. To make the Data Set uniform, each variable and each response category within each variable have been specifically defined in a way that is designed to promote the collection and reporting of comparable minimal data. The variables included in the International Urinary Tract Imaging Basic SCI Data Set are the results obtained using the following investigations: intravenous pyelography or computer tomography urogram or ultrasound, X-ray, renography, clearance, cystogram, voiding cystogram or micturition cystourogram or videourodynamics. The complete instructions for data collection and the data sheet itself are freely available on the websites of both ISCoS (http://www.iscos.org.uk) and ASIA (http://www.asia-spinalinjury.org).

  7. International urodynamic basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Craggs, M.; Kennelly, M.; Schick, E.

    2008-01-01

    of the data set was developed after review and comments by members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested (international) organizations and societies (around 40) and persons and the ISCoS Council......: Variables included in the International Urodynamic Basic SCI Data Set are date of data collection, bladder sensation during filling cystometry, detrusor function, compliance during filing cystometry, function during voiding, detrusor leak point pressure, maximum detrusor pressure, cystometric bladder...

  8. Prairie Chicken

    Data.gov (United States)

    Kansas Data Access and Support Center — An outline of the general range occupied by greayter and lesser prairie chickens. The range was delineated by expert opinion, then varified by local wildlife...

  9. A geographic distribution data set of biodiversity in Italian freshwaters

    Directory of Open Access Journals (Sweden)

    Angela Boggero

    2016-10-01

    Full Text Available We present a data set on the biodiversity of Italian freshwaters, including lakeshores and riverbanks of natural (N=379: springs, streams and lakes and artificial (N=11: fountains sites. The data set belongs partly to the Italian Long Term Ecological Research network (LTER-Italy and partly to LifeWatch, the European e-Science infrastructure for biodiversity and ecosystem research. The data included cover a time period corresponding to the last fifty years (1962-2014. They span a large number of taxa from prokaryotes and unicellular eukaryotes to vertebrates and plants, including taxa linked to the aquatic habitat in at least part of their life cycles (like immature stages of insects, amphibians, birds and vascular plants. The data set consists of 6463 occurrence data and distribution records for 1738 species. The complete data set is available in csv file format via the LifeWatch Service Centre.

  10. Treatment Episode Data Set: Discharges (TEDS-D-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  11. Treatment Episode Data Set: Discharges (TEDS-D-2008)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  12. Nursing Minimum Data Set Based on EHR Archetypes Approach.

    Science.gov (United States)

    Spigolon, Dandara N; Moro, Cláudia M C

    2012-01-01

    The establishment of a Nursing Minimum Data Set (NMDS) can facilitate the use of health information systems. The adoption of these sets and represent them based on archetypes are a way of developing and support health systems. The objective of this paper is to describe the definition of a minimum data set for nursing in endometriosis represent with archetypes. The study was divided into two steps: Defining the Nursing Minimum Data Set to endometriosis, and Development archetypes related to the NMDS. The nursing data set to endometriosis was represented in the form of archetype, using the whole perception of the evaluation item, organs and senses. This form of representation is an important tool for semantic interoperability and knowledge representation for health information systems.

  13. International spinal cord injury endocrine and metabolic extended data set

    DEFF Research Database (Denmark)

    Bauman, W A; Wecht, J M; Biering-Sørensen, F

    2017-01-01

    findings in the SCI population. SETTING: This study was conducted in an international setting. METHODS: The ISCIEMEDS was developed by a working group. The initial ISCIEMEDS was revised based on suggestions from members of the International SCI Data Sets Committee, the International Spinal Cord Society......OBJECTIVE: The objective of this study was to develop the International Spinal Cord Injury (SCI) Endocrine and Metabolic Extended Data Set (ISCIEMEDS) within the framework of the International SCI Data Sets that would facilitate consistent collection and reporting of endocrine and metabolic...... (ISCoS) Executive and Scientific Committees, American Spinal Injury Association (ASIA) Board, other interested organizations, societies and individual reviewers. The data set was posted for two months on ISCoS and ASIA websites for comments. Variable names were standardized, and a suggested database...

  14. Treatment Episode Data Set: Admissions (TEDS-A-2002)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  15. Treatment Episode Data Set: Discharges (TEDS-D-2010)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  16. Treatment Episode Data Set: Admissions (TEDS-A-1994)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  17. Treatment Episode Data Set: Admissions (TEDS-A-2008)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  18. Treatment Episode Data Set: Admissions (TEDS-A-2003)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  19. Treatment Episode Data Set: Admissions (TEDS-A-2006)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  20. Treatment Episode Data Set: Admissions (TEDS-A-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  1. Treatment Episode Data Set: Admissions (TEDS-A-1999)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  2. Treatment Episode Data Set: Admissions (TEDS-A-1997)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  3. Treatment Episode Data Set: Admissions (TEDS-A-2000)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  4. Treatment Episode Data Set: Admissions (TEDS-A-2009)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  5. Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic Subset

    Data.gov (United States)

    National Aeronautics and Space Administration — The Comprehensive Ocean - Atmosphere Data Set (COADS) LMRF Arctic subset contains marine surface weather reports for the region north of 65 degrees N from ships,...

  6. Portland, Oregon Test Data Set Arterial Loop Detector Data

    Data.gov (United States)

    Department of Transportation — This set of data files was acquired under USDOT FHWA cooperative agreement DTFH61-11-H-00025 as one of the four test data sets acquired by the USDOT Data Capture and...

  7. Delve: A Data Set Retrieval and Document Analysis System

    KAUST Repository

    Akujuobi, Uchenna Thankgod; Zhang, Xiangliang

    2017-01-01

    Academic search engines (e.g., Google scholar or Microsoft academic) provide a medium for retrieving various information on scholarly documents. However, most of these popular scholarly search engines overlook the area of data set retrieval, which

  8. Resident Assessment Instrument/Minimum Data Set (RAI/MDS)

    Data.gov (United States)

    Department of Veterans Affairs — The Resident Assessment Instrument/Minimum Data Set (RAI/MDS) is a comprehensive assessment and care planning process used by the nursing home industry since 1990 as...

  9. Electronic Health Information Legal Epidemiology Data Set 2014

    Data.gov (United States)

    U.S. Department of Health & Human Services — Authors: Cason Schmit, JD, Gregory Sunshine, JD, Dawn Pepin, JD, MPH, Tara Ramanathan, JD, MPH, Akshara Menon, JD, MPH, Matthew Penn, JD, MLIS This legal data set...

  10. Portland, Oregon Test Data Set Freeway Loop Detector Data

    Data.gov (United States)

    Department of Transportation — This set of data files was acquired under USDOT FHWA cooperative agreement DTFH61-11-H-00025 as one of the four test data sets acquired by the USDOT Data Capture and...

  11. Treatment Episode Data Set: Admissions (TEDS-A-2010)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  12. Treatment Episode Data Set: Admissions (TEDS-A-1998)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  13. Treatment Episode Data Set: Admissions (TEDS-A-2007)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  14. Treatment Episode Data Set: Discharges (TEDS-D-2007)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  15. Treatment Episode Data Set: Admissions (TEDS-A-1993)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  16. Treatment Episode Data Set: Admissions (TEDS-A-1995)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  17. Treatment Episode Data Set: Discharges (TEDS-D-2006)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  18. Treatment Episode Data Set: Admissions (TEDS-A-1996)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  19. Treatment Episode Data Set: Discharges (TEDS-D-2009)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  20. Treatment Episode Data Set: Admissions (TEDS-A-2005)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  1. Treatment Episode Data Set: Admissions (TEDS-A-1992)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  2. Treatment Episode Data Set: Admissions (TEDS-A-2001)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  3. Treatment Episode Data Set: Admissions (TEDS-A-2004)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  4. Treatment Episode Data Set: Admissions (TEDS-A-2013)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  5. Treatment Episode Data Set: Admissions (TEDS-A-2012)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Admissions (TEDS-A) is a national census data system of annual admissions to substance abuse treatment facilities. TEDS-A provides...

  6. Identification of noise in linear data sets by factor analysis

    International Nuclear Information System (INIS)

    Roscoe, B.A.; Hopke, Ph.K.

    1982-01-01

    A technique which has the ability to identify bad data points, after the data has been generated, is classical factor analysis. The ability of classical factor analysis to identify two different types of data errors make it ideally suited for scanning large data sets. Since the results yielded by factor analysis indicate correlations between parameters, one must know something about the nature of the data set and the analytical techniques used to obtain it to confidentially isolate errors. (author)

  7. International spinal cord injury cardiovascular function basic data set.

    Science.gov (United States)

    Krassioukov, A; Alexander, M S; Karlsson, A-K; Donovan, W; Mathias, C J; Biering-Sørensen, F

    2010-08-01

    To create an International Spinal Cord Injury (SCI) Cardiovascular Function Basic Data Set within the framework of the International SCI Data Sets. An international working group. The draft of the data set was developed by a working group comprising members appointed by the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the executive committee of the International SCI Standards and Data Sets. The final version of the data set was developed after review by members of the executive committee of the International SCI Standards and Data Sets, the ISCoS scientific committee, ASIA board, relevant and interested international organizations and societies, individual persons with specific interest and the ISCoS Council. To make the data set uniform, each variable and each response category within each variable have been specifically defined in a way that is designed to promote the collection and reporting of comparable minimal data. The variables included in the International SCI Cardiovascular Function Basic Data Set include the following items: date of data collection, cardiovascular history before the spinal cord lesion, events related to cardiovascular function after the spinal cord lesion, cardiovascular function after the spinal cord lesion, medications affecting cardiovascular function on the day of examination; and objective measures of cardiovascular functions, including time of examination, position of examination, pulse and blood pressure. The complete instructions for data collection and the data sheet itself are freely available on the websites of both ISCoS (http://www.iscos.org.uk) and ASIA (http://www.asia-spinalinjury.org).

  8. International Spinal Cord Injury Urinary Tract Infection Basic Data Set

    DEFF Research Database (Denmark)

    Goetz, L L; Cardenas, D D; Kennelly, M

    2013-01-01

    To develop an International Spinal Cord Injury (SCI) Urinary Tract Infection (UTI) Basic Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on UTIs in daily practice or research.......To develop an International Spinal Cord Injury (SCI) Urinary Tract Infection (UTI) Basic Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on UTIs in daily practice or research....

  9. Report from the Passive Microwave Data Set Management Workshop

    Science.gov (United States)

    Armstrong, Ed; Conover, Helen; Goodman, Michael; Krupp, Brian; Liu, Zhong; Moses, John; Ramapriyan, H. K.; Scott, Donna; Smith, Deborah; Weaver, Ronald

    2011-01-01

    Passive microwave data sets are some of the most important data sets in the Earth Observing System Data and Information System (EOSDIS), providing data as far back as the early 1970s. The widespread use of passive microwave (PM) radiometer data has led to their collection and distribution over the years at several different Earth science data centers. The user community is often confused by this proliferation and the uneven spread of information about the data sets. In response to this situation, a Passive Microwave Data Set Management Workshop was held 17 ]19 May 2011 at the Global Hydrology Resource Center, sponsored by the NASA Earth Science Data and Information System (ESDIS) Project. The workshop attendees reviewed all primary (Level 1 ]3) PM data sets from NASA and non ]NASA sensors held by NASA Distributed Active Archive Centers (DAACs), as well as high ]value data sets from other NASA ]funded organizations. This report provides the key findings and recommendations from the workshop as well as detailed tabluations of the datasets considered.

  10. Rapid prediction of multi-dimensional NMR data sets

    International Nuclear Information System (INIS)

    Gradmann, Sabine; Ader, Christian; Heinrich, Ines; Nand, Deepak; Dittmann, Marc; Cukkemane, Abhishek; Dijk, Marc van; Bonvin, Alexandre M. J. J.; Engelhard, Martin; Baldus, Marc

    2012-01-01

    We present a computational environment for Fast Analysis of multidimensional NMR DAta Sets (FANDAS) that allows assembling multidimensional data sets from a variety of input parameters and facilitates comparing and modifying such “in silico” data sets during the various stages of the NMR data analysis. The input parameters can vary from (partial) NMR assignments directly obtained from experiments to values retrieved from in silico prediction programs. The resulting predicted data sets enable a rapid evaluation of sample labeling in light of spectral resolution and structural content, using standard NMR software such as Sparky. In addition, direct comparison to experimental data sets can be used to validate NMR assignments, distinguish different molecular components, refine structural models or other parameters derived from NMR data. The method is demonstrated in the context of solid-state NMR data obtained for the cyclic nucleotide binding domain of a bacterial cyclic nucleotide-gated channel and on membrane-embedded sensory rhodopsin II. FANDAS is freely available as web portal under WeNMR (http://www.wenmr.eu/services/FANDAShttp://www.wenmr.eu/services/FANDAS).

  11. Rapid prediction of multi-dimensional NMR data sets

    Energy Technology Data Exchange (ETDEWEB)

    Gradmann, Sabine; Ader, Christian [Utrecht University, Faculty of Science, Bijvoet Center for Biomolecular Research (Netherlands); Heinrich, Ines [Max Planck Institute for Molecular Physiology, Department of Physical Biochemistry (Germany); Nand, Deepak [Utrecht University, Faculty of Science, Bijvoet Center for Biomolecular Research (Netherlands); Dittmann, Marc [Max Planck Institute for Molecular Physiology, Department of Physical Biochemistry (Germany); Cukkemane, Abhishek; Dijk, Marc van; Bonvin, Alexandre M. J. J. [Utrecht University, Faculty of Science, Bijvoet Center for Biomolecular Research (Netherlands); Engelhard, Martin [Max Planck Institute for Molecular Physiology, Department of Physical Biochemistry (Germany); Baldus, Marc, E-mail: m.baldus@uu.nl [Utrecht University, Faculty of Science, Bijvoet Center for Biomolecular Research (Netherlands)

    2012-12-15

    We present a computational environment for Fast Analysis of multidimensional NMR DAta Sets (FANDAS) that allows assembling multidimensional data sets from a variety of input parameters and facilitates comparing and modifying such 'in silico' data sets during the various stages of the NMR data analysis. The input parameters can vary from (partial) NMR assignments directly obtained from experiments to values retrieved from in silico prediction programs. The resulting predicted data sets enable a rapid evaluation of sample labeling in light of spectral resolution and structural content, using standard NMR software such as Sparky. In addition, direct comparison to experimental data sets can be used to validate NMR assignments, distinguish different molecular components, refine structural models or other parameters derived from NMR data. The method is demonstrated in the context of solid-state NMR data obtained for the cyclic nucleotide binding domain of a bacterial cyclic nucleotide-gated channel and on membrane-embedded sensory rhodopsin II. FANDAS is freely available as web portal under WeNMR (http://www.wenmr.eu/services/FANDAShttp://www.wenmr.eu/services/FANDAS).

  12. Iterative dictionary construction for compression of large DNA data sets.

    Science.gov (United States)

    Kuruppu, Shanika; Beresford-Smith, Bryan; Conway, Thomas; Zobel, Justin

    2012-01-01

    Genomic repositories increasingly include individual as well as reference sequences, which tend to share long identical and near-identical strings of nucleotides. However, the sequential processing used by most compression algorithms, and the volumes of data involved, mean that these long-range repetitions are not detected. An order-insensitive, disk-based dictionary construction method can detect this repeated content and use it to compress collections of sequences. We explore a dictionary construction method that improves repeat identification in large DNA data sets. Our adaptation, COMRAD, of an existing disk-based method identifies exact repeated content in collections of sequences with similarities within and across the set of input sequences. COMRAD compresses the data over multiple passes, which is an expensive process, but allows COMRAD to compress large data sets within reasonable time and space. COMRAD allows for random access to individual sequences and subsequences without decompressing the whole data set. COMRAD has no competitor in terms of the size of data sets that it can compress (extending to many hundreds of gigabytes) and, even for smaller data sets, the results are competitive compared to alternatives; as an example, 39 S. cerevisiae genomes compressed to 0.25 bits per base.

  13. [Essential data set's archetypes for nursing care of endometriosis patients].

    Science.gov (United States)

    Spigolon, Dandara Novakowski; Moro, Claudia Maria Cabral

    2012-12-01

    This study aimed to develop an Essential Data Set for Nursing Care of Patients with Endometriosis (CDEEPE), represented by archetypes. An exploratory applied research with specialists' participation that was carried out at Heath Informatics Laboratory of PUCPR, between February and November of 2010. It was divided in two stages: CDEEPE construction and evaluation including Nursing Process phases and Basic Human Needs, and archetypes development based on this data set. CDEEPE was evaluated by doctors and nurses with 95.9% of consensus and containing 51 data items. The archetype "Perception of Organs and Senses" was created to represents this data set. This study allowed identifying important information for nursing practices contributing to computerization and application of nursing process during care. The CDEEPE was the basis for archetype creation, that will make possible structured, organized, efficient, interoperable, and semantics records.

  14. International bowel function extended spinal cord injury data set

    DEFF Research Database (Denmark)

    Krogh, K; Perkash, I; Stiens, S A

    2008-01-01

    and the ASIA Board. Relevant and interested scientific and professional organizations and societies (around 40) were also invited to review the data set and it was posted on the ISCoS and ASIA websites for 3 months to allow comments and suggestions. The ISCoS Scientific Committee, ISCoS Council and ASIA Board......STUDY DESIGN: International expert working group.Objective:To develop an International Bowel Function Extended Spinal Cord Injury (SCI) Data Set presenting a standardized format for the collection and reporting of an extended amount of information on bowel function. SETTING: Working group...... consisting of members appointed by the American Spinal Injury Association (ASIA) and the International Spinal Cord Society (ISCoS). METHODS: A draft prepared by the working group was reviewed by Executive Committee of the International SCI Standards and Data Sets and later by the ISCoS Scientific Committee...

  15. International bowel function basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Krogh, K; Perkash, I; Stiens, S A

    2008-01-01

    S Scientific Committee and the ASIA Board. Relevant and interested scientific and professional (international) organizations and societies (approximately 40) were also invited to review the data set and it was posted on the ISCoS and ASIA websites for 3 months to allow comments and suggestions. The ISCo......STUDY DESIGN: International expert working group. OBJECTIVE: To develop an International Bowel Function Basic Spinal Cord Injury (SCI) Data Set presenting a standardized format for the collection and reporting of a minimal amount of information on bowel function in daily practice or in research....... SETTING: Working group consisting of members appointed by the American Spinal Injury Association (ASIA) and the International Spinal Cord Society (ISCoS). METHODS: A draft prepared by the working group was reviewed by Executive Committee of the International SCI Standards and Data Sets, and later by ISCo...

  16. Comparing initial-data sets for binary black holes

    International Nuclear Information System (INIS)

    Pfeiffer, Harald P.; Cook, Gregory B.; Teukolsky, Saul A.

    2002-01-01

    We compare the results of constructing binary black hole initial data with three different decompositions of the constraint equations of general relativity. For each decomposition we compute the initial data using a superposition of two Kerr-Schild black holes to fix the freely specifiable data. We find that these initial-data sets differ significantly, with the ADM energy varying by as much as 5% of the total mass. We find that all initial-data sets currently used for evolutions might contain unphysical gravitational radiation of the order of several percent of the total mass. This is comparable to the amount of gravitational-wave energy observed during the evolved collision. More astrophysically realistic initial data will require more careful choices of the freely specifiable data and boundary conditions for both the metric and extrinsic curvature. However, we find that the choice of extrinsic curvature affects the resulting data sets more strongly than the choice of conformal metric

  17. Testing the statistical compatibility of independent data sets

    International Nuclear Information System (INIS)

    Maltoni, M.; Schwetz, T.

    2003-01-01

    We discuss a goodness-of-fit method which tests the compatibility between statistically independent data sets. The method gives sensible results even in cases where the χ 2 minima of the individual data sets are very low or when several parameters are fitted to a large number of data points. In particular, it avoids the problem that a possible disagreement between data sets becomes diluted by data points which are insensitive to the crucial parameters. A formal derivation of the probability distribution function for the proposed test statistics is given, based on standard theorems of statistics. The application of the method is illustrated on data from neutrino oscillation experiments, and its complementarity to the standard goodness-of-fit is discussed

  18. Cancer survival classification using integrated data sets and intermediate information.

    Science.gov (United States)

    Kim, Shinuk; Park, Taesung; Kon, Mark

    2014-09-01

    Although numerous studies related to cancer survival have been published, increasing the prediction accuracy of survival classes still remains a challenge. Integration of different data sets, such as microRNA (miRNA) and mRNA, might increase the accuracy of survival class prediction. Therefore, we suggested a machine learning (ML) approach to integrate different data sets, and developed a novel method based on feature selection with Cox proportional hazard regression model (FSCOX) to improve the prediction of cancer survival time. FSCOX provides us with intermediate survival information, which is usually discarded when separating survival into 2 groups (short- and long-term), and allows us to perform survival analysis. We used an ML-based protocol for feature selection, integrating information from miRNA and mRNA expression profiles at the feature level. To predict survival phenotypes, we used the following classifiers, first, existing ML methods, support vector machine (SVM) and random forest (RF), second, a new median-based classifier using FSCOX (FSCOX_median), and third, an SVM classifier using FSCOX (FSCOX_SVM). We compared these methods using 3 types of cancer tissue data sets: (i) miRNA expression, (ii) mRNA expression, and (iii) combined miRNA and mRNA expression. The latter data set included features selected either from the combined miRNA/mRNA profile or independently from miRNAs and mRNAs profiles (IFS). In the ovarian data set, the accuracy of survival classification using the combined miRNA/mRNA profiles with IFS was 75% using RF, 86.36% using SVM, 84.09% using FSCOX_median, and 88.64% using FSCOX_SVM with a balanced 22 short-term and 22 long-term survivor data set. These accuracies are higher than those using miRNA alone (70.45%, RF; 75%, SVM; 75%, FSCOX_median; and 75%, FSCOX_SVM) or mRNA alone (65.91%, RF; 63.64%, SVM; 72.73%, FSCOX_median; and 70.45%, FSCOX_SVM). Similarly in the glioblastoma multiforme data, the accuracy of miRNA/mRNA using IFS

  19. Data Sets, Ensemble Cloud Computing, and the University Library (Invited)

    Science.gov (United States)

    Plale, B. A.

    2013-12-01

    The environmental researcher at the public university has new resources at their disposal to aid in research and publishing. Cloud computing provides compute cycles on demand for analysis and modeling scenarios. Cloud computing is attractive for e-Science because of the ease with which cores can be accessed on demand, and because the virtual machine implementation that underlies cloud computing reduces the cost of porting a numeric or analysis code to a new platform. At the university, many libraries at larger universities are developing the e-Science skills to serve as repositories of record for publishable data sets. But these are confusing times for the publication of data sets from environmental research. The large publishers of scientific literature are advocating a process whereby data sets are tightly tied to a publication. In other words, a paper published in the scientific literature that gives results based on data, must have an associated data set accessible that backs up the results. This approach supports reproducibility of results in that publishers maintain a repository for the papers they publish, and the data sets that the papers used. Does such a solution that maps one data set (or subset) to one paper fit the needs of the environmental researcher who among other things uses complex models, mines longitudinal data bases, and generates observational results? The second school of thought has emerged out of NSF, NOAA, and NASA funded efforts over time: data sets exist coherent at a location, such as occurs at National Snow and Ice Data Center (NSIDC). But when a collection is coherent, reproducibility of individual results is more challenging. We argue for a third complementary option: the university repository as a location for data sets produced as a result of university-based research. This location for a repository relies on the expertise developing in the university libraries across the country, and leverages tools, such as are being developed

  20. Accelerated EM-based clustering of large data sets

    NARCIS (Netherlands)

    Verbeek, J.J.; Nunnink, J.R.J.; Vlassis, N.

    2006-01-01

    Motivated by the poor performance (linear complexity) of the EM algorithm in clustering large data sets, and inspired by the successful accelerated versions of related algorithms like k-means, we derive an accelerated variant of the EM algorithm for Gaussian mixtures that: (1) offers speedups that

  1. A Labeled Data Set For Flow-based Intrusion Detection

    NARCIS (Netherlands)

    Sperotto, Anna; Sadre, R.; van Vliet, Frank; Pras, Aiko; Nunzi, Giorgio; Scoglio, Caterina; Li, Xing

    2009-01-01

    Flow-based intrusion detection has recently become a promising security mechanism in high speed networks (1-10 Gbps). Despite the richness in contributions in this field, benchmarking of flow-based IDS is still an open issue. In this paper, we propose the first publicly available, labeled data set

  2. data sets Simulations in articulating light-weight PRS

    NARCIS (Netherlands)

    Van den Berg, Bert

    2008-01-01

    The data sets are output of 3 different steps in the development a simulations of a PRS as described in chapter 3.3: Simulations in articulating light-weight PRS A case for Pedagogy-oriented and Rating-based Hybrid Recommendation Strategies Rob Nadolski, Bert van den Berg, Adriana Berlanga, Hans

  3. Satellite data sets for the atmospheric radiation measurement (ARM) program

    Energy Technology Data Exchange (ETDEWEB)

    Shi, L.; Bernstein, R.L. [SeaSpace Corp., San Diego, CA (United States)

    1996-04-01

    This abstract describes the type of data obtained from satellite measurements in the Atmospheric Radiation Measurement (ARM) program. The data sets have been widely used by the ARM team to derive cloud-top altitude, cloud cover, snow and ice cover, surface temperature, water vapor, and wind, vertical profiles of temperature, and continuoous observations of weather needed to track and predict severe weather.

  4. Handwriting and Gender: A Multi-Use Data Set

    Science.gov (United States)

    Bradley, Sean

    2015-01-01

    Can individuals guess the gender of a writer based on a sample of his or her handwriting? We administer an electronic survey twice to the same individuals to find out. The resulting data set is interesting to students, rich enough to be amenable to a wide array of activities, and open to a variety of exploratory tacks for statistics students and…

  5. Parallel clustering algorithm for large-scale biological data sets.

    Science.gov (United States)

    Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

    2014-01-01

    Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.

  6. Fast generation of multiple resolution instances of raster data sets

    NARCIS (Netherlands)

    Arge, L.; Haverkort, H.J.; Tsirogiannis, C.P.

    2012-01-01

    In many GIS applications it is important to study the characteristics of a raster data set at multiple resolutions. Often this is done by generating several coarser resolution rasters from a fine resolution raster. In this paper we describe efficient algorithms for different variants of this

  7. A Molecular Iodine Spectral Data Set for Rovibronic Analysis

    Science.gov (United States)

    Williamson, J. Charles; Kuntzleman, Thomas S.; Kafader, Rachael A.

    2013-01-01

    A data set of 7,381 molecular iodine vapor rovibronic transitions between the X and B electronic states has been prepared for an advanced undergraduate spectroscopic analysis project. Students apply standard theoretical techniques to these data and determine the values of three X-state constants (image omitted) and four B-state constants (image…

  8. International spinal cord injury endocrine and metabolic extended data set.

    Science.gov (United States)

    Bauman, W A; Wecht, J M; Biering-Sørensen, F

    2017-05-01

    The objective of this study was to develop the International Spinal Cord Injury (SCI) Endocrine and Metabolic Extended Data Set (ISCIEMEDS) within the framework of the International SCI Data Sets that would facilitate consistent collection and reporting of endocrine and metabolic findings in the SCI population. This study was conducted in an international setting. The ISCIEMEDS was developed by a working group. The initial ISCIEMEDS was revised based on suggestions from members of the International SCI Data Sets Committee, the International Spinal Cord Society (ISCoS) Executive and Scientific Committees, American Spinal Injury Association (ASIA) Board, other interested organizations, societies and individual reviewers. The data set was posted for two months on ISCoS and ASIA websites for comments. Variable names were standardized, and a suggested database structure for the ISCIEMEDS was provided by the Common Data Elements (CDEs) project at the National Institute on Neurological Disorders and Stroke (NINDS) of the US National Institute of Health (NIH), and are available at https://commondataelements.ninds.nih.gov/SCI.aspx#tab=Data_Standards. The final ISCIEMEDS contains questions on the endocrine and metabolic conditions related to SCI. Because the information may be collected at any time, the date of data collection is important to determine the time after SCI. ISCIEMEDS includes information on carbohydrate metabolism (6 variables), calcium and bone metabolism (12 variables), thyroid function (9 variables), adrenal function (2 variables), gonadal function (7 variables), pituitary function (6 variables), sympathetic nervous system function (1 variable) and renin-aldosterone axis function (2 variables). The complete instructions for data collection and the data sheet itself are freely available on the website of ISCoS (http://www.iscos.org.uk/international-sci-data-sets).

  9. The international spinal cord injury pain basic data set.

    Science.gov (United States)

    Widerström-Noga, E; Biering-Sørensen, F; Bryce, T; Cardenas, D D; Finnerup, N B; Jensen, M P; Richards, J S; Siddall, P J

    2008-12-01

    To develop a basic pain data set (International Spinal Cord Injury Basic Pain Data Set, ISCIPDS:B) within the framework of the International spinal cord injury (SCI) data sets that would facilitate consistent collection and reporting of pain in the SCI population. International. The ISCIPDS:B was developed by a working group consisting of individuals with published evidence of expertise in SCI-related pain regarding taxonomy, psychophysics, psychology, epidemiology and assessment, and one representative of the Executive Committee of the International SCI Standards and Data Sets. The members were appointed by four major organizations with an interest in SCI-related pain (International Spinal Cord Society, ISCoS; American Spinal Injury Association, ASIA; American Pain Society, APS and International Association for the Study of Pain, IASP). The initial ISCIPDS:B was revised based on suggestions from members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA and APS Boards, and the Neuropathic Pain Special Interest Group of the IASP, individual reviewers and societies and the ISCoS Council. The final ISCIPDS:B contains core questions about clinically relevant information concerning SCI-related pain that can be collected by health-care professionals with expertise in SCI in various clinical settings. The questions concern pain severity, physical and emotional function and include a pain-intensity rating, a pain classification and questions related to the temporal pattern of pain for each specific pain problem. The impact of pain on physical, social and emotional function, and sleep is evaluated for each pain.

  10. Optimizing distance-based methods for large data sets

    Science.gov (United States)

    Scholl, Tobias; Brenner, Thomas

    2015-10-01

    Distance-based methods for measuring spatial concentration of industries have received an increasing popularity in the spatial econometrics community. However, a limiting factor for using these methods is their computational complexity since both their memory requirements and running times are in {{O}}(n^2). In this paper, we present an algorithm with constant memory requirements and shorter running time, enabling distance-based methods to deal with large data sets. We discuss three recent distance-based methods in spatial econometrics: the D&O-Index by Duranton and Overman (Rev Econ Stud 72(4):1077-1106, 2005), the M-function by Marcon and Puech (J Econ Geogr 10(5):745-762, 2010) and the Cluster-Index by Scholl and Brenner (Reg Stud (ahead-of-print):1-15, 2014). Finally, we present an alternative calculation for the latter index that allows the use of data sets with millions of firms.

  11. CDMS: CAD data set system design description. Revision 1

    International Nuclear Information System (INIS)

    Gray, E.L.

    1994-01-01

    This document is intended to formalize the program design of the CAD Data Set Management System (CDMS) and to be the vehicle to communicate the design to the Engineering, Design Services, and Configuration Management organizations and the WHC IRM Analysts/Programmers. The SDD shows how the software system will be structured to satisfy the requirements identified in the WHC-SD-GN-CSRS-30005 CDMS Software Requirement Specification (SRS). It is a description of the software structure, software components, interfaces, and data that make up the CDMS System. The design descriptions contained within this document will describe in detail the software product that will be developed to assist the aforementioned organizations for the express purpose of managing CAD data sets associated with released drawings, replacing the existing locally developed system and laying the foundation for automating the configuration management

  12. Behavior Identification Based on Geotagged Photo Data Set

    Directory of Open Access Journals (Sweden)

    Guo-qi Liu

    2014-01-01

    Full Text Available The popularity of mobile devices has produced a set of image data with geographic information, time information, and text description information, which is called geotagged photo data set. The division of this kind of data by its behavior and the location not only can identify the user’s important location and daily behavior, but also helps users to sort the huge image data. This paper proposes a method to build an index based on multiple classification result, which can divide the data set multiple times and distribute labels to the data to build index according to the estimated probability of classification results in order to accomplish the identification of users’ important location and daily behaviors. This paper collects 1400 discrete sets of data as experimental data to verify the method proposed in this paper. The result of the experiment shows that the index and actual tagging results have a high inosculation.

  13. Querying Large Physics Data Sets Over an Information Grid

    CERN Document Server

    Baker, N; Kovács, Z; Le Goff, J M; McClatchey, R

    2001-01-01

    Optimising use of the Web (WWW) for LHC data analysis is a complex problem and illustrates the challenges arising from the integration of and computation across massive amounts of information distributed worldwide. Finding the right piece of information can, at times, be extremely time-consuming, if not impossible. So-called Grids have been proposed to facilitate LHC computing and many groups have embarked on studies of data replication, data migration and networking philosophies. Other aspects such as the role of 'middleware' for Grids are emerging as requiring research. This paper positions the need for appropriate middleware that enables users to resolve physics queries across massive data sets. It identifies the role of meta-data for query resolution and the importance of Information Grids for high-energy physics analysis rather than just Computational or Data Grids. This paper identifies software that is being implemented at CERN to enable the querying of very large collaborating HEP data-sets, initially...

  14. The 1990 conterminous U.S. AVHRR data set

    International Nuclear Information System (INIS)

    Eidenshink, J.C.

    1992-01-01

    The U.S. Geological Survey, using NOAA-11 Advanced Very High Resolution Radiometer (AVHRR) 1-km data, has produced a time series of 19 biweekly maximum normalized difference vegetation index (NDVI) composites of the conterminous United States for the 1990 growing season. Each biweekly composite included data from approximately 20 calibrated and georegistered daily overpasses. The output is a data set which includes all five calibrated AVHRR channels, NDVI values, three satellite/solar viewing angles, and date of observation pointer for each biweekly composite. The data set is intended for assessing seasonal variations in vegetation condition and provides a foundation for studying long-term changes in vegetation resulting from human interactions or global climate alterations. 12 refs

  15. AOIPS data base management systems support for GARP data sets

    Science.gov (United States)

    Gary, J. P.

    1977-01-01

    A data base management system is identified, developed to provide flexible access to data sets produced by GARP during its data systems tests. The content and coverage of the data base are defined and a computer-aided, interactive information storage and retrieval system, implemented to facilitate access to user specified data subsets, is described. The computer programs developed to provide the capability were implemented on the highly interactive, minicomputer-based AOIPS and are referred to as the data retrieval system (DRS). Implemented as a user interactive but menu guided system, the DRS permits users to inventory the data tape library and create duplicate or subset data sets based on a user selected window defined by time and latitude/longitude boundaries. The DRS permits users to select, display, or produce formatted hard copy of individual data items contained within the data records.

  16. Evolution and revision of the Perioperative Nursing Data Set.

    Science.gov (United States)

    Petersen, Carol; Kleiner, Cathy

    2011-01-01

    The Perioperative Nursing Data Set (PNDS) is a nursing language that provides standardized terminology to support perioperative nursing practice. The PNDS represents perioperative nursing knowledge and comprises data elements and definitions that demonstrate the nurse's influence on patient outcomes. Emerging issues and changes in practice associated with the PNDS standardized terminology require ongoing maintenance and periodic in-depth review of its content. Like each new edition of the Perioperative Nursing Data Set, the third edition, published in 2010, underwent content validation by numerous experts in clinical practice, vocabulary development, and informatics. The goal of this most recent edition is to enable the perioperative nurse to use the PNDS in a meaningful manner, as well as to promote standardization of PNDS implementation in practice, both in written documentation and the electronic health record. Copyright © 2011 AORN, Inc. Published by Elsevier Inc. All rights reserved.

  17. Generation new MP3 data set after compression

    Science.gov (United States)

    Atoum, Mohammed Salem; Almahameed, Mohammad

    2016-02-01

    The success of audio steganography techniques is to ensure imperceptibility of the embedded secret message in stego file and withstand any form of intentional or un-intentional degradation of secret message (robustness). Crucial to that using digital audio file such as MP3 file, which comes in different compression rate, however research studies have shown that performing steganography in MP3 format after compression is the most suitable one. Unfortunately until now the researchers can not test and implement their algorithm because no standard data set in MP3 file after compression is generated. So this paper focuses to generate standard data set with different compression ratio and different Genre to help researchers to implement their algorithms.

  18. Chicken and Food Poisoning

    Science.gov (United States)

    ... this? Submit What's this? Submit Button Past Emails Chicken and Food Poisoning Language: English (US) Español (Spanish) ... on Facebook Tweet Share Compartir Americans eat more chicken every year than any other meat. Chicken can ...

  19. Estimating the re-identification risk of clinical data sets

    Directory of Open Access Journals (Sweden)

    Dankar Fida

    2012-07-01

    Full Text Available Abstract Background De-identification is a common way to protect patient privacy when disclosing clinical data for secondary purposes, such as research. One type of attack that de-identification protects against is linking the disclosed patient data with public and semi-public registries. Uniqueness is a commonly used measure of re-identification risk under this attack. If uniqueness can be measured accurately then the risk from this kind of attack can be managed. In practice, it is often not possible to measure uniqueness directly, therefore it must be estimated. Methods We evaluated the accuracy of uniqueness estimators on clinically relevant data sets. Four candidate estimators were identified because they were evaluated in the past and found to have good accuracy or because they were new and not evaluated comparatively before: the Zayatz estimator, slide negative binomial estimator, Pitman’s estimator, and mu-argus. A Monte Carlo simulation was performed to evaluate the uniqueness estimators on six clinically relevant data sets. We varied the sampling fraction and the uniqueness in the population (the value being estimated. The median relative error and inter-quartile range of the uniqueness estimates was measured across 1000 runs. Results There was no single estimator that performed well across all of the conditions. We developed a decision rule which selected between the Pitman, slide negative binomial and Zayatz estimators depending on the sampling fraction and the difference between estimates. This decision rule had the best consistent median relative error across multiple conditions and data sets. Conclusion This study identified an accurate decision rule that can be used by health privacy researchers and disclosure control professionals to estimate uniqueness in clinical data sets. The decision rule provides a reliable way to measure re-identification risk.

  20. Designing minimum data sets of health smart card system

    OpenAIRE

    Mohtaram Nematollahi

    2014-01-01

    Introduction: Nowadays different countries benefit from health system based on health cards and projects related to smart cards. Lack of facilities which cover this technology is obvious in our society. This paper aims to design Minimum Data Sets of Health Smart Card System for Iran. Method: This research was an applied descriptive study. At first, we reviewed the same projects and guidelines of selected countries and the proposed model was designed in accordance to the country’s ...

  1. Subject de-biasing of data sets: A Bayesian approach

    International Nuclear Information System (INIS)

    Pate-Cornell, M.E.

    1994-01-01

    In this paper, the authors examine the relevance of data sets (for instance, of past incidents) for risk management decisions when there are reasons to believe that all types of incidents have not been reported at the same rate. Their objective is to infer from the data reports what actually happened in order to assess the potential benefits of different safety measures. The authors use a simple Bayesian model to correct (de-bias) the data sets given the nonreport rates, which are assessed (subjectively) by experts and encoded as the probabilities of reports given different characteristics of the events of interest. They compute a probability distribution for the past number of events given the past number of reports. They illustrate the method by the cases of two data sets: incidents in anesthesia in Australia, and oil spills in the Gulf of Mexico. In the first case, the de-biasing allows correcting for the fact that some types of incidents, such as technical malfunctions, are more likely to be reported when they occur than anesthetist mistakes. In the second case, the authors have to account for the fact that the rates of oil spill reports indifferent incident categories have increased over the years, perhaps at the same time as the rates of incidents themselves

  2. Designing minimum data sets of health smart card system

    Directory of Open Access Journals (Sweden)

    Mohtaram Nematollahi

    2014-10-01

    Full Text Available Introduction: Nowadays different countries benefit from health system based on health cards and projects related to smart cards. Lack of facilities which cover this technology is obvious in our society. This paper aims to design Minimum Data Sets of Health Smart Card System for Iran. Method: This research was an applied descriptive study. At first, we reviewed the same projects and guidelines of selected countries and the proposed model was designed in accordance to the country’s needs, taking people’s attitude about it by Delphi technique. A data analysis in study stage of MDS(Minimum Data Sets of Health Smart Card in the selective countries was done by comparative tables and determination of similarities and differences of the MDS. In the stage of gaining credit for model, it was accomplished with descriptive statistics to the extent of absolute and relative frequency through SPSS (version 16. Results: MDS of Health Smart Card for Iran is presented in the patient’s card and health provider’s card on basisof studiesin America, Australia, Turkey and Belgium and needs of our country and after doing Delphi technique with 94 percent agreement confirmed. Conclusion: Minimum Data Sets of Health Smart Card provides continuous care for patients and communication among providers. So, it causes a decrease in the complications of threatening diseases. Collection of MDS of diseases increases the quality of care assessment

  3. Looking at large data sets using binned data plots

    Energy Technology Data Exchange (ETDEWEB)

    Carr, D.B.

    1990-04-01

    This report addresses the monumental challenge of developing exploratory analysis methods for large data sets. The goals of the report are to increase awareness of large data sets problems and to contribute simple graphical methods that address some of the problems. The graphical methods focus on two- and three-dimensional data and common task such as finding outliers and tail structure, assessing central structure and comparing central structures. The methods handle large sample size problems through binning, incorporate information from statistical models and adapt image processing algorithms. Examples demonstrate the application of methods to a variety of publicly available large data sets. The most novel application addresses the too many plots to examine'' problem by using cognostics, computer guiding diagnostics, to prioritize plots. The particular application prioritizes views of computational fluid dynamics solution sets on the fly. That is, as each time step of a solution set is generated on a parallel processor the cognostics algorithms assess virtual plots based on the previous time step. Work in such areas is in its infancy and the examples suggest numerous challenges that remain. 35 refs., 15 figs.

  4. Using statistical correlation to compare geomagnetic data sets

    Science.gov (United States)

    Stanton, T.

    2009-04-01

    The major features of data curves are often matched, to a first order, by bump and wiggle matching to arrive at an offset between data sets. This poster describes a simple statistical correlation program that has proved useful during this stage by determining the optimal correlation between geomagnetic curves using a variety of fixed and floating windows. Its utility is suggested by the fact that it is simple to run, yet generates meaningful data comparisons, often when data noise precludes the obvious matching of curve features. Data sets can be scaled, smoothed, normalised and standardised, before all possible correlations are carried out between selected overlapping portions of each curve. Best-fit offset curves can then be displayed graphically. The program was used to cross-correlate directional and palaeointensity data from Holocene lake sediments (Stanton et al., submitted) and Holocene lava flows. Some example curve matches are shown, including some that illustrate the potential of this technique when examining particularly sparse data sets. Stanton, T., Snowball, I., Zillén, L. and Wastegård, S., submitted. Detecting potential errors in varve chronology and 14C ages using palaeosecular variation curves, lead pollution history and statistical correlation. Quaternary Geochronology.

  5. Data Set for Pathology Reporting of Cutaneous Invasive Melanoma

    Science.gov (United States)

    Judge, Meagan J.; Evans, Alan; Frishberg, David P.; Prieto, Victor G.; Thompson, John F.; Trotter, Martin J.; Walsh, Maureen Y.; Walsh, Noreen M.G.; Ellis, David W.

    2013-01-01

    An accurate and complete pathology report is critical for the optimal management of cutaneous melanoma patients. Protocols for the pathologic reporting of melanoma have been independently developed by the Royal College of Pathologists of Australasia (RCPA), Royal College of Pathologists (United Kingdom) (RCPath), and College of American Pathologists (CAP). In this study, data sets, checklists, and structured reporting protocols for pathologic examination and reporting of cutaneous melanoma were analyzed by an international panel of melanoma pathologists and clinicians with the aim of developing a common, internationally agreed upon, evidence-based data set. The International Collaboration on Cancer Reporting cutaneous melanoma expert review panel analyzed the existing RCPA, RCPath, and CAP data sets to develop a protocol containing “required” (mandatory/core) and “recommended” (nonmandatory/noncore) elements. Required elements were defined as those that had agreed evidentiary support at National Health and Medical Research Council level III-2 level of evidence or above and that were unanimously agreed upon by the review panel to be essential for the clinical management, staging, or assessment of the prognosis of melanoma or fundamental for pathologic diagnosis. Recommended elements were those considered to be clinically important and recommended for good practice but with lesser degrees of supportive evidence. Sixteen core/required data elements for cutaneous melanoma pathology reports were defined (with an additional 4 core/required elements for specimens received with lymph nodes). Eighteen additional data elements with a lesser level of evidentiary support were included in the recommended data set. Consensus response values (permitted responses) were formulated for each data item. Development and agreement of this evidence-based protocol at an international level was accomplished in a timely and efficient manner, and the processes described herein may

  6. Inferring source attribution from a multiyear multisource data set of Salmonella in Minnesota.

    Science.gov (United States)

    Ahlstrom, C; Muellner, P; Spencer, S E F; Hong, S; Saupe, A; Rovira, A; Hedberg, C; Perez, A; Muellner, U; Alvarez, J

    2017-12-01

    Salmonella enterica is a global health concern because of its widespread association with foodborne illness. Bayesian models have been developed to attribute the burden of human salmonellosis to specific sources with the ultimate objective of prioritizing intervention strategies. Important considerations of source attribution models include the evaluation of the quality of input data, assessment of whether attribution results logically reflect the data trends and identification of patterns within the data that might explain the detailed contribution of different sources to the disease burden. Here, more than 12,000 non-typhoidal Salmonella isolates from human, bovine, porcine, chicken and turkey sources that originated in Minnesota were analysed. A modified Bayesian source attribution model (available in a dedicated R package), accounting for non-sampled sources of infection, attributed 4,672 human cases to sources assessed here. Most (60%) cases were attributed to chicken, although there was a spike in cases attributed to a non-sampled source in the second half of the study period. Molecular epidemiological analysis methods were used to supplement risk modelling, and a visual attribution application was developed to facilitate data exploration and comprehension of the large multiyear data set assessed here. A large amount of within-source diversity and low similarity between sources was observed, and visual exploration of data provided clues into variations driving the attribution modelling results. Results from this pillared approach provided first attribution estimates for Salmonella in Minnesota and offer an understanding of current data gaps as well as key pathogen population features, such as serotype frequency, similarity and diversity across the sources. Results here will be used to inform policy and management strategies ultimately intended to prevent and control Salmonella infection in the state. © 2017 Blackwell Verlag GmbH.

  7. Data sets for modeling: A retrospective collection of Bidirectional Reflectance and Forest Ecosystems Dynamics Multisensor Aircraft Campaign data sets

    Energy Technology Data Exchange (ETDEWEB)

    Walthall, C.L.; Kim, M. (Univ. of Maryland, College Park, MD (United States). Dept. of Geography); Williams, D.L.; Meeson, B.W.; Agbu, P.A.; Newcomer, J.A.; Levine, E.R.

    1993-12-01

    The Biospheric Sciences Branch, within the Laboratory for Terrestrial Physics at NASA's Goddard Space Flight Center, has assembled two data sets for free dissemination to the remote sensing research community. One data set, referred to as the Retrospective Bidirectional Reflectance Distribution Function (BRDF) Data Collection, is a collection of bidirectional reflectance and supporting biophysical measurements of surfaces ranging in diversity from bare soil to heavily forested canopies. The other data collection, resulting from measurements made in association with the Forest Ecosystems Dynamic Multisensor Aircraft Campaign (FED MAC), contains data that are relevant to ecosystem process models, particularly those which have been modified to incorporate remotely sensed data. Both of these collections are being made available to the science community at large in order to facilitate model development, validation, and usage. These data collections are subsets which have been compiled and consolidated from individual researcher or from several large data set collections including: the First International Satellite Land Surface Climatology Project (ISLSCP) Field Experiment (FIFE); FED MAC; the Superior National Forest Project (SNF); the Geologic Remote Sensing Field Experiment (GRSFE); and Agricultural Inventories through Space Applications of Remote Sensing (AgriStars). The complete, stand-along FED MAC Data Collection contains atmospheric, vegetation, and soils data acquired during field measurement campaigns conducted at international Papers' Northern Experimental Forest located approximately 40 km north of Bangor, Maine. Reflectance measurements at the canopy, branch, and needle level are available, along with the detailed canopy architectural measurements.

  8. Under-utilized Important Data Sets from Barrow, Alaska

    Science.gov (United States)

    Jensen, A. M.; Misarti, N.

    2012-12-01

    The Barrow region has a number of high resolution data sets of high quality and high scientific and stakeholder relevance. Many are described as being of long duration, yet span mere decades. Here we highlight the fact that there are data sets available in the Barrow area that span considerably greater periods of time (centuries to millennia), at varying degrees of resolution. When used appropriately, these data sets can contribute to the study and understanding of the changing Arctic. However, because these types of data are generally acquired as part of archaeological projects, funded through Arctic Social Science and similar programs, their use in other sciences has been limited. Archaeologists focus on analyzing these data sets in ways designed to answer particular anthropological questions. That in no way precludes archaeological collaboration with other types of scientists nor the analysis of these data sets in new and innovative ways, in order to look at questions of Arctic change over a time span beginning well before the Industrial Revolution introduced complicating factors. One major data group consists of zooarchaeological data from sites in the Barrow area. This consists of faunal remains of human subsistence activities, recovered either from middens (refuse deposits) or dwellings. In effect, occupants of a site were sampling their environment as it existed at the time of occupation, although not in a random or systematic way. When analyzed to correct for biases introduced by taphonomic and human behavioral factors, such data sets are used by archaeologists to understand past people's subsistence practices, and how such practices changed through time. However, there is much additional information that can be obtained from these collections. Certain species have fairly specific habitat requirements, and their presence in significant numbers at a site indicates that such conditions existed relatively nearby at a particular time in the past, and

  9. Results of LLNL investigation of NYCT data sets

    International Nuclear Information System (INIS)

    Sale, K; Harrison, M; Guo, M; Groza, M

    2007-01-01

    Upon examination we have concluded that none of the alarms indicate the presence of a real threat. A brief history and results from our examination of the NYCT ASP occupancy data sets dated from 2007-05-14 19:11:07 to 2007-06-20 15:46:15 are presented in this letter report. When the ASP data collection campaign at NYCT was completed, rather than being shut down, the Canberra ASP annunciator box was unplugged leaving the data acquisition system running. By the time it was discovered that the ASP was still acquiring data about 15,000 occupancies had been recorded. Among these were about 500 alarms (classified by the ASP analysis system as either Threat Alarms or Suspect Alarms). At your request, these alarms have been investigated. Our conclusion is that none of the alarm data sets indicate the presence of a real threat (within statistics). The data sets (ICD1 and ICD2 files with concurrent JPEG pictures) were delivered to LLNL on a removable hard drive labeled FOUO. The contents of the data disk amounted to 53.39 GB of data requiring over two days for the standard LLNL virus checking software to scan before work could really get started. Our first step was to walk through the directory structure of the disk and create a database of occupancies. For each occupancy, the database was populated with the occupancy date and time, occupancy number, file path to the ICD1 data and the alarm ('No Alarm', 'Suspect Alarm' or 'Threat Alarm') from the ICD2 file along with some other incidental data. In an attempt to get a global understanding of what was going on, we investigated the occupancy information. The occupancy date/time and alarm type were binned into one-hour counts. These data are shown in Figures 1 and 2

  10. Description of Simulated Small Satellite Operation Data Sets

    Science.gov (United States)

    Kulkarni, Chetan S.; Guarneros Luna, Ali

    2018-01-01

    A set of two BP930 batteries (Identified as PK31 and PK35) were operated continuously for a simulated satellite operation profile completion for single cycle. The battery packs were charged to an initial voltage of around 8.35 V for 100% SOC before the experiment was started. This document explains the structure of the battery data sets. Please cite this paper when using this dataset: Z. Cameron, C. Kulkarni, A. Guarneros, K. Goebel, S. Poll, "A Battery Certification Testbed for Small Satellite Missions", IEEE AUTOTESTCON 2015, Nov 2-5, 2015, National Harbor, MA

  11. Data Set for Emperical Validation of Double Skin Facade Model

    DEFF Research Database (Denmark)

    Kalyanova, Olena; Jensen, Rasmus Lund; Heiselberg, Per

    2008-01-01

    During the recent years the attention to the double skin facade (DSF) concept has greatly increased. Nevertheless, the application of the concept depends on whether a reliable model for simulation of the DSF performance will be developed or pointed out. This is, however, not possible to do, until...... the International Energy Agency (IEA) Task 34 Annex 43. This paper describes the full-scale outdoor experimental test facility ‘the Cube', where the experiments were conducted, the experimental set-up and the measurements procedure for the data sets. The empirical data is composed for the key-functioning modes...

  12. Fast generation of multiple resolution instances of raster data sets

    DEFF Research Database (Denmark)

    Arge, Lars; Haverkort, Herman; Tsirogiannis, Constantinos

    2012-01-01

    In many GIS applications it is important to study the characteristics of a raster data set at multiple resolutions. Often this is done by generating several coarser resolution rasters from a fine resolution raster. In this paper we describe efficient algorithms for different variants of this prob......In many GIS applications it is important to study the characteristics of a raster data set at multiple resolutions. Often this is done by generating several coarser resolution rasters from a fine resolution raster. In this paper we describe efficient algorithms for different variants...... in the main memory of the computer. We also provide two algorithms that solve this problem in external memory, that is when the input raster is larger than the main memory. The first external algorithm is very easy to implement and requires O(sort(N)) data block transfers from/to the external memory....... For this variant we describe an algorithm that runs in (U logN) time in internal memory, where U is the size of the output. We show how this algorithm can be adapted to perform efficiently in the external memory using O(sort(U)) data transfers from the disk. We have also implemented two of the presented algorithms...

  13. Geophysical Data Sets in GeoMapApp

    Science.gov (United States)

    Goodwillie, A. M.

    2017-12-01

    GeoMapApp (http://www.geomapapp.org), a free map-based data tool developed at Lamont-Doherty Earth Observatory, provides access to hundreds of integrated geoscience data sets that are useful for geophysical studies. Examples include earthquake and volcano catalogues, gravity and magnetics data, seismic velocity tomographic models, geological maps, geochemical analytical data, lithospheric plate boundary information, geodetic velocities, and high-resolution bathymetry and land elevations. Users can also import and analyse their own data files. Data analytical functions provide contouring, shading, profiling, layering and transparency, allowing multiple data sets to be seamlessly compared. A new digitization and field planning portal allow stations and waypoints to be generated. Sessions can be saved and shared with colleagues and students. In this eLightning presentation we will demonstrate some of GeoMapApp's capabilities with a focus upon subduction zones and tectonics. In the attached screen shot of the Cascadia margin, the contoured depth to the top of the subducting Juan de Fuca slab is overlain on a shear wave velocity depth slice. Geochemical data coloured on Al2O3 and scaled on MgO content is shown as circles. The stack of data profiles was generated along the white line.

  14. Towards a Framework for Change Detection in Data Sets

    Science.gov (United States)

    Böttcher, Mirko; Nauck, Detlef; Ruta, Dymitr; Spott, Martin

    Since the world with its markets, innovations and customers is changing faster than ever before, the key to survival for businesses is the ability to detect, assess and respond to changing conditions rapidly and intelligently. Discovering changes and reacting to or acting upon them before others do has therefore become a strategical issue for many companies. However, existing data analysis techniques are insufflent for this task since they typically assume that the domain under consideration is stable over time. This paper presents a framework that detects changes within a data set at virtually any level of granularity. The underlying idea is to derive a rule-based description of the data set at different points in time and to subsequently analyse how these rules change. Nevertheless, further techniques are required to assist the data analyst in interpreting and assessing their changes. Therefore the framework also contains methods to discard rules that are non-drivers for change and to assess the interestingness of detected changes.

  15. A public data set of human balance evaluations

    Directory of Open Access Journals (Sweden)

    Damiana A. Santos

    2016-11-01

    Full Text Available The goal of this study was to create a public data set with results of qualitative and quantitative evaluations related to human balance. Subject’s balance was evaluated by posturography using a force platform and by the Mini Balance Evaluation Systems Tests. In the posturography test, we evaluated subjects standing still for 60 s in four different conditions where vision and the standing surface were manipulated: on a rigid surface with eyes open; on a rigid surface with eyes closed; on an unstable surface with eyes open; on an unstable surface with eyes closed. Each condition was performed three times and the order of the conditions was randomized. In addition, the following tests were employed in order to better characterize each subject: Short Falls Efficacy Scale International; International Physical Activity Questionnaire Short Version; and Trail Making Test. The subjects were also interviewed to collect information about their socio-cultural, demographic, and health characteristics. The data set comprises signals from the force platform (raw data for the force, moments of forces, and centers of pressure of 163 subjects plus one file with information about the subjects and balance conditions and the results of the other evaluations. All the data is available at PhysioNet and at Figshare.

  16. Virtual endoscopy post-processing of helical CT data sets

    International Nuclear Information System (INIS)

    Dessl, A.; Giacomuzzi, S.M.; Springer, P.; Stoeger, A.; Pototschnig, C.; Voelklein, C.; Schreder, S.G.; Jaschke, W.

    1997-01-01

    Purpose: The purpose of this work was to test a newly developed, post-processing software for virtual CT endoscopic methods. Virtual endoscopic images were generated from helical CT data sets in the region of the shoulder joint (n=2), the tracheobronchial system (n=3), the nasal sinuses (n=2), the colon (n=2), and the common carotid artery (n=1). Software developed specifically for virtual endoscopy ('Navigator') was used which, after a previous threshold value selection, makes the reconstruction of internal body surfaces possible by an automatic segmentation process. We have evaluated the usage of the software, the reconstruction time for individual images and sequences of images as well as the quality of the reconstruction. All pathological findings of the virtual endoscopy were confirmed by surgery. Results: The post-processing program is easy to use and provides virtual endoscopic images within 50 seconds. Depending of the extent of the data set, virtual tracheobronchoscopy as a cine loop sequence required about 15 minutes. Thorugh use of the threshold value-dependent surface reconstruction the demands on the computer configuration are limited; however, this also created quality problems in image calculation as a consequence of the accompanying loss of data. Conclusions: The Navigator software enables the calculation of virtual endoscopic models with only moderate demands on the hardware. (orig.) [de

  17. Gap filling strategies for long term energy flux data sets

    DEFF Research Database (Denmark)

    Falge, E.; Baldocchi, D.; Olson, R.

    2001-01-01

    At present a network of over 100 field sites are measuring carbon dioxide, water vapor and sensible heat fluxes between the biosphere and atmosphere, on a nearly continuous basis. Gaps in the long term measurements of evaporation and sensible heat flux must be filled before these data can be used...... for hydrological and meteorological applications. We adapted methods of gap filling for NEE (net ecosystem exchange of carbon) to energy fluxes and applied them to data sets available from the EUROFLUX and AmeriFlux eddy covariance databases. The average data coverage for the sites selected was 69% and 75......% for latent heat (lambdaE) and sensible heat (H). The methods were based on mean diurnal variations (half-hourly binned means of fluxes based on previous and subsequent days, MDV) and look-up tables for fluxes during assorted meteorological conditions (LookUp), and the impact of different gap filling methods...

  18. A global data set of land-surface parameters

    International Nuclear Information System (INIS)

    Claussen, M.; Lohmann, U.; Roeckner, E.; Schulzweida, U.

    1994-01-01

    A global data set of land surface parameters is provided for the climate model ECHAM developed at the Max-Planck-Institut fuer Meteorologie in Hamburg. These parameters are: background (surface) albedo α, surface roughness length z 0y , leaf area index LAI, fractional vegetation cover or vegetation ratio c y , and forest ratio c F . The global set of surface parameters is constructed by allocating parameters to major exosystem complexes of Olson et al. (1983). The global distribution of ecosystem complexes is given at a resolution of 0.5 0 x 0.5 0 . The latter data are compatible with the vegetation types used in the BIOME model of Prentice et al. (1992) which is a potential candidate of an interactive submodel within a comprehensive model of the climate system. (orig.)

  19. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

    DEFF Research Database (Denmark)

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole

    2016-01-01

    genome structure of many bacteriophages. The method is demonstrated to outperform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source...... and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e. contigs) of phage origin in metage-nomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic...... code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder....

  20. NASIS data base management system: IBM 360 TSS implementation. Volume 3: Data set specifications

    Science.gov (United States)

    1973-01-01

    The data set specifications for the NASA Aerospace Safety Information System (NASIS) are presented. The data set specifications describe the content, format, and medium of communication of every data set required by the system. All relevant information pertinent to a particular data set is prepared in a standard form and centralized in a single document. The format for the data set is provided.

  1. Physics Mining of Multi-Source Data Sets

    Science.gov (United States)

    Helly, John; Karimabadi, Homa; Sipes, Tamara

    2012-01-01

    Powerful new parallel data mining algorithms can produce diagnostic and prognostic numerical models and analyses from observational data. These techniques yield higher-resolution measures than ever before of environmental parameters by fusing synoptic imagery and time-series measurements. These techniques are general and relevant to observational data, including raster, vector, and scalar, and can be applied in all Earth- and environmental science domains. Because they can be highly automated and are parallel, they scale to large spatial domains and are well suited to change and gap detection. This makes it possible to analyze spatial and temporal gaps in information, and facilitates within-mission replanning to optimize the allocation of observational resources. The basis of the innovation is the extension of a recently developed set of algorithms packaged into MineTool to multi-variate time-series data. MineTool is unique in that it automates the various steps of the data mining process, thus making it amenable to autonomous analysis of large data sets. Unlike techniques such as Artificial Neural Nets, which yield a blackbox solution, MineTool's outcome is always an analytical model in parametric form that expresses the output in terms of the input variables. This has the advantage that the derived equation can then be used to gain insight into the physical relevance and relative importance of the parameters and coefficients in the model. This is referred to as physics-mining of data. The capabilities of MineTool are extended to include both supervised and unsupervised algorithms, handle multi-type data sets, and parallelize it.

  2. BICEP2. II. Experiment and three-year data set

    Energy Technology Data Exchange (ETDEWEB)

    Ade, P. A. R. [School of Physics and Astronomy, Cardiff University, Cardiff, CF24 3AA (United Kingdom); Aikin, R. W.; Bock, J. J.; Brevik, J. A.; Filippini, J. P.; Golwala, S. R.; Hildebrandt, S. R. [Department of Physics, California Institute of Technology, Pasadena, CA 91125 (United States); Amiri, M.; Davis, G.; Halpern, M.; Hasselfield, M. [Department of Physics and Astronomy, University of British Columbia, Vancouver, BC (Canada); Barkats, D. [Joint ALMA Observatory, ESO, Santiago (Chile); Benton, S. J. [Department of Physics, University of Toronto, Toronto, ON (Canada); Bischoff, C. A.; Buder, I. [Harvard-Smithsonian Center for Astrophysics, 60 Garden Street MS 42, Cambridge, MA 02138 (United States); Bullock, E. [Minnesota Institute for Astrophysics, University of Minnesota, Minneapolis, MN 55455 (United States); Day, P. K.; Dowell, C. D. [Jet Propulsion Laboratory, Pasadena, CA 91109 (United States); Duband, L. [Université Grenoble Alpes, CEA INAC-SBT, F-38000 Grenoble (France); Fliescher, S., E-mail: ogburn@stanford.edu [Department of Physics, University of Minnesota, Minneapolis, MN 55455 (United States); Collaboration: Bicep2 Collaboration; and others

    2014-09-01

    We report on the design and performance of the BICEP2 instrument and on its three-year data set. BICEP2 was designed to measure the polarization of the cosmic microwave background (CMB) on angular scales of 1°-5°(ℓ = 40-200), near the expected peak of the B-mode polarization signature of primordial gravitational waves from cosmic inflation. Measuring B-modes requires dramatic improvements in sensitivity combined with exquisite control of systematics. The BICEP2 telescope observed from the South Pole with a 26 cm aperture and cold, on-axis, refractive optics. BICEP2 also adopted a new detector design in which beam-defining slot antenna arrays couple to transition-edge sensor (TES) bolometers, all fabricated on a common substrate. The antenna-coupled TES detectors supported scalable fabrication and multiplexed readout that allowed BICEP2 to achieve a high detector count of 500 bolometers at 150 GHz, giving unprecedented sensitivity to B-modes at degree angular scales. After optimization of detector and readout parameters, BICEP2 achieved an instrument noise-equivalent temperature of 15.8 μK√s. The full data set reached Stokes Q and U map depths of 87.2 nK in square-degree pixels (5.'2 μK) over an effective area of 384 deg{sup 2} within a 1000 deg{sup 2} field. These are the deepest CMB polarization maps at degree angular scales to date. The power spectrum analysis presented in a companion paper has resulted in a significant detection of B-mode polarization at degree scales.

  3. A Standardized Reference Data Set for Vertebrate Taxon Name Resolution.

    Science.gov (United States)

    Zermoglio, Paula F; Guralnick, Robert P; Wieczorek, John R

    2016-01-01

    Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume) and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

  4. A Standardized Reference Data Set for Vertebrate Taxon Name Resolution.

    Directory of Open Access Journals (Sweden)

    Paula F Zermoglio

    Full Text Available Taxonomic names associated with digitized biocollections labels have flooded into repositories such as GBIF, iDigBio and VertNet. The names on these labels are often misspelled, out of date, or present other problems, as they were often captured only once during accessioning of specimens, or have a history of label changes without clear provenance. Before records are reliably usable in research, it is critical that these issues be addressed. However, still missing is an assessment of the scope of the problem, the effort needed to solve it, and a way to improve effectiveness of tools developed to aid the process. We present a carefully human-vetted analysis of 1000 verbatim scientific names taken at random from those published via the data aggregator VertNet, providing the first rigorously reviewed, reference validation data set. In addition to characterizing formatting problems, human vetting focused on detecting misspelling, synonymy, and the incorrect use of Darwin Core. Our results reveal a sobering view of the challenge ahead, as less than 47% of name strings were found to be currently valid. More optimistically, nearly 97% of name combinations could be resolved to a currently valid name, suggesting that computer-aided approaches may provide feasible means to improve digitized content. Finally, we associated names back to biocollections records and fit logistic models to test potential drivers of issues. A set of candidate variables (geographic region, year collected, higher-level clade, and the institutional digitally accessible data volume and their 2-way interactions all predict the probability of records having taxon name issues, based on model selection approaches. We strongly encourage further experiments to use this reference data set as a means to compare automated or computer-aided taxon name tools for their ability to resolve and improve the existing wealth of legacy data.

  5. CLAAS: the CM SAF cloud property data set using SEVIRI

    Science.gov (United States)

    Stengel, M. S.; Kniffka, A. K.; Meirink, J. F. M.; Lockhoff, M. L.; Tan, J. T.; Hollmann, R. H.

    2014-04-01

    An 8-year record of satellite-based cloud properties named CLAAS (CLoud property dAtAset using SEVIRI) is presented, which was derived within the EUMETSAT Satellite Application Facility on Climate Monitoring. The data set is based on SEVIRI measurements of the Meteosat Second Generation satellites, of which the visible and near-infrared channels were intercalibrated with MODIS. Applying two state-of-the-art retrieval schemes ensures high accuracy in cloud detection, cloud vertical placement and microphysical cloud properties. These properties were further processed to provide daily to monthly averaged quantities, mean diurnal cycles and monthly histograms. In particular, the per-month histogram information enhances the insight in spatio-temporal variability of clouds and their properties. Due to the underlying intercalibrated measurement record, the stability of the derived cloud properties is ensured, which is exemplarily demonstrated for three selected cloud variables for the entire SEVIRI disc and a European subregion. All data products and processing levels are introduced and validation results indicated. The sampling uncertainty of the averaged products in CLAAS is minimized due to the high temporal resolution of SEVIRI. This is emphasized by studying the impact of reduced temporal sampling rates taken at typical overpass times of polar-orbiting instruments. In particular, cloud optical thickness and cloud water path are very sensitive to the sampling rate, which in our study amounted to systematic deviations of over 10% if only sampled once a day. The CLAAS data set facilitates many cloud related applications at small spatial scales of a few kilometres and short temporal scales of a~few hours. Beyond this, the spatiotemporal characteristics of clouds on diurnal to seasonal, but also on multi-annual scales, can be studied.

  6. Reducing Information Overload in Large Seismic Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    HAMPTON,JEFFERY W.; YOUNG,CHRISTOPHER J.; MERCHANT,BION J.; CARR,DORTHE B.; AGUILAR-CHANG,JULIO

    2000-08-02

    Event catalogs for seismic data can become very large. Furthermore, as researchers collect multiple catalogs and reconcile them into a single catalog that is stored in a relational database, the reconciled set becomes even larger. The sheer number of these events makes searching for relevant events to compare with events of interest problematic. Information overload in this form can lead to the data sets being under-utilized and/or used incorrectly or inconsistently. Thus, efforts have been initiated to research techniques and strategies for helping researchers to make better use of large data sets. In this paper, the authors present their efforts to do so in two ways: (1) the Event Search Engine, which is a waveform correlation tool and (2) some content analysis tools, which area combination of custom-built and commercial off-the-shelf tools for accessing, managing, and querying seismic data stored in a relational database. The current Event Search Engine is based on a hierarchical clustering tool known as the dendrogram tool, which is written as a MatSeis graphical user interface. The dendrogram tool allows the user to build dendrogram diagrams for a set of waveforms by controlling phase windowing, down-sampling, filtering, enveloping, and the clustering method (e.g. single linkage, complete linkage, flexible method). It also allows the clustering to be based on two or more stations simultaneously, which is important to bridge gaps in the sparsely recorded event sets anticipated in such a large reconciled event set. Current efforts are focusing on tools to help the researcher winnow the clusters defined using the dendrogram tool down to the minimum optimal identification set. This will become critical as the number of reference events in the reconciled event set continually grows. The dendrogram tool is part of the MatSeis analysis package, which is available on the Nuclear Explosion Monitoring Research and Engineering Program Web Site. As part of the research

  7. Interaction Support for the Global Fluxnet Data Set

    Science.gov (United States)

    Agarwal, D.; Humphrey, M.; Beekwilder, N.; Goode, M.; Jackson, K.; Weber, R.; van Ingen, C.; Baldocchi, D.

    2008-12-01

    The FLUXNET synthesis data set contains on the order of 960 site-years of sensor data from over 260 sites around the world. This is a living data set; a data update this year should add new site-years from over 200 sites. The data are the ground truth for carbon-climate studies linking models and remote sensing as well as comparative field analyses. Over 65 synthesis teams are using this data to do global and regional scale analyses. The size of the dataset makes browsing the data difficult; for example, a search of the dataset for sites with particular meteorological characteristics would require a download of the complete dataset and then running all of the data through a preliminary analysis. Synthesis studies often need additional non- sensor measurements such as root biomass, soil composition, or fire occurrence; some of these variables require detailed knowledge of the site and the science. The large number of sites makes the assembly, cleaning, and long term curation of the non-sensor data daunting; a virtual conversation between the data providers, data users, and data curators is needed. The large number of sites also makes tracking updates to the site information and communicating with site PIs difficult for synthesis study teams. We have developed a collaborative web portal which enables data browsing on line, orchestrates the data curation virtual conversation, and enables the synthesis team conversation with sites. Behind the portal is an archive database and OLAP data cube for simple data browsing through query. Scientists can download data files, browse data summaries, update metadata and annotate the data through the portal. Synthesis teams can select sites and exchange e-mail with those sites through the portal. The data can also be browsed directly from Excel spreadsheets or MatLab from the scientist desktop; the scientist sees no difference between data "in the cloud" and on the desktop. We believe the portal enables science researchers to

  8. Three Dimensional (3D Lumbar Vertebrae Data Set

    Directory of Open Access Journals (Sweden)

    H. Bennani

    2016-08-01

    Full Text Available 3D modelling can be used for a variety of purposes, including biomedical modelling for orthopaedic or anatomical applications. Low back pain is prevalent in society yet few validated 3D models of the lumbar spine exist to facilitate assessment. We therefore created a 3D surface data set for lumbar vertebrae from human vertebrae. Models from 86 lumbar vertebrae were constructed using an inexpensive method involving image capture by digital camera and reconstruction of 3D models via an image-based technique. The reconstruction method was validated using a laser-based arm scanner and measurements derived from real vertebrae using electronic callipers. Results show a mean relative error of 5.2% between image-based models and real vertebrae, a mean relative error of 4.7% between image-based and arm scanning models and 95% of vertices’ errors are less than 3.5 millimetres with a median of 1.1 millimetres. The accuracy of the method indicates that the generated models could be useful for biomechanical modelling or 3D visualisation of the spine.

  9. Runoff simulation using the North American regional reanalysis data set

    International Nuclear Information System (INIS)

    Rasmussen, P.; Kim, S.J.; Moore, A.; Choi, W.

    2008-01-01

    In part due to concerns about the impact of climate change, there has been an increased interest in hydrological modelling of watersheds in Canada. Most of Canada is sparsely populated and a recurrent problem is the lack of quality weather data that are often not available at the sites of interest. Continuous hydrologic models require input of temperature and precipitation as a minimum, and often additional information such as solar radiation and humidity. It is not uncommon that such information must be obtained by interpolating information from weather stations located far outside the watershed. The difficulty in obtaining good calibration results is obvious in such cases. The recently released North American Regional Reanalysis (NARR) data set has been found to be in reasonable agreement with surface observations. NARR surface data, including those commonly required in hydrologic models, are available on a 32 km by 32 km grid which is appropriate for hydrologic modelling. The objective of this paper is to investigate whether hydrologic models for selected watersheds in Central Canada can be adequately calibrated using NARR data rather than conventional station information. For the specific case studies considered here, it is found that calibration with NARR weather information is quite acceptable and similar to what can be obtained using interpolated weather station data. (author)

  10. The self-describing data sets file protocol and Toolkit

    International Nuclear Information System (INIS)

    Borland, M.; Emery, L.

    1995-01-01

    The Self-Describing Data Sets (SDDS) file protocol continues to be used extensively in commissioning the Advanced Photon Source (APS) accelerator complex. SDDS protocol has proved useful primarily due to the existence of the SDDS Toolkit, a growing set of about 60 generic commandline programs that read and/or write SDDS files. The SDDS Toolkit is also used extensively for simulation postprocessing, giving physicists a single environment for experiment and simulation. With the Toolkit, new SDDS data is displayed and subjected to complex processing without developing new programs. Data from EPICS, lab instruments, simulation, and other sources are easily integrated. Because the SDDS tools are commandline-based, data processing scripts are readily written using the user's preferred shell language. Since users work within a UNIX shell rather than an application-specific shell or GUI, they may add SDDS-compliant programs and scripts to their personal toolkits without restriction or complication. The SDDS Toolkit has been run under UNIX on SUN OS4, HP-UX, and LINUX. Application of SDDS to accelerator operation is being pursued using Tcl/Tk to provide a GUI

  11. Congestion Quantification Using the National Performance Management Research Data Set

    Directory of Open Access Journals (Sweden)

    Virginia P. Sisiopiku

    2017-11-01

    Full Text Available Monitoring of transportation system performance is a key element of any transportation operation and planning strategy. Estimation of dependable performance measures relies on analysis of large amounts of traffic data, which are often expensive and difficult to gather. National databases can assist in this regard, but challenges still remain with respect to data management, accuracy, storage, and use for performance monitoring. In an effort to address such challenges, this paper showcases a process that utilizes the National Performance Management Research Data Set (NPMRDS for generating performance measures for congestion monitoring applications in the Birmingham region. The capabilities of the relational database management system (RDBMS are employed to manage the large amounts of NPMRDS data. Powerful visual maps are developed using GIS software and used to illustrate congestion location, extent and severity. Travel time reliability indices are calculated and utilized to quantify congestion, and congestion intensity measures are developed and employed to rank and prioritize congested segments in the study area. The process for managing and using big traffic data described in the Birmingham case study is a great example that can be replicated by small and mid-size Metropolitan Planning Organizations to generate performance-based measures and monitor congestion in their jurisdictions.

  12. Visualization of diversity in large multivariate data sets.

    Science.gov (United States)

    Pham, Tuan; Hess, Rob; Ju, Crystal; Zhang, Eugene; Metoyer, Ronald

    2010-01-01

    Understanding the diversity of a set of multivariate objects is an important problem in many domains, including ecology, college admissions, investing, machine learning, and others. However, to date, very little work has been done to help users achieve this kind of understanding. Visual representation is especially appealing for this task because it offers the potential to allow users to efficiently observe the objects of interest in a direct and holistic way. Thus, in this paper, we attempt to formalize the problem of visualizing the diversity of a large (more than 1000 objects), multivariate (more than 5 attributes) data set as one worth deeper investigation by the information visualization community. In doing so, we contribute a precise definition of diversity, a set of requirements for diversity visualizations based on this definition, and a formal user study design intended to evaluate the capacity of a visual representation for communicating diversity information. Our primary contribution, however, is a visual representation, called the Diversity Map, for visualizing diversity. An evaluation of the Diversity Map using our study design shows that users can judge elements of diversity consistently and as or more accurately than when using the only other representation specifically designed to visualize diversity.

  13. A Proposal for an Austrian Nursing Minimum Data Set (NMDS)

    Science.gov (United States)

    Hackl, W.O.; Ammenwerth, E.

    2014-01-01

    Summary Objective Nursing Minimum Data Sets can be used to compare nursing care across clinical populations, settings, geographical areas, and time. NMDS can support nursing research, nursing management, and nursing politics. However, in contrast to other countries, Austria does not have a unified NMDS. The objective of this study is to identify possible data elements for an Austrian NMDS. Methods A two-round Delphi survey was conducted, based on a review of available NMDS, 22 expert interviews, and a focus group discussion. Results After reaching consensus, the experts proposed the following 56 data elements for an NMDS: six data elements concerning patient demographics, four data elements concerning data of the healthcare institution, four data elements concerning patient’s medical condition, 20 data elements concerning patient problems (nursing assessment, nursing diagnoses, risk assessment), eight data elements concerning nursing outcomes, 14 data elements concerning nursing interventions, and no additional data elements concerning nursing intensity. Conclusion The proposed NMDS focuses on the long-term and acute care setting. It must now be implemented and tested in the nursing practice. PMID:25024767

  14. Optimizing Distributed Machine Learning for Large Scale EEG Data Set

    Directory of Open Access Journals (Sweden)

    M Bilal Shaikh

    2017-06-01

    Full Text Available Distributed Machine Learning (DML has gained its importance more than ever in this era of Big Data. There are a lot of challenges to scale machine learning techniques on distributed platforms. When it comes to scalability, improving the processor technology for high level computation of data is at its limit, however increasing machine nodes and distributing data along with computation looks as a viable solution. Different frameworks   and platforms are available to solve DML problems. These platforms provide automated random data distribution of datasets which miss the power of user defined intelligent data partitioning based on domain knowledge. We have conducted an empirical study which uses an EEG Data Set collected through P300 Speller component of an ERP (Event Related Potential which is widely used in BCI problems; it helps in translating the intention of subject w h i l e performing any cognitive task. EEG data contains noise due to waves generated by other activities in the brain which contaminates true P300Speller. Use of Machine Learning techniques could help in detecting errors made by P300 Speller. We are solving this classification problem by partitioning data into different chunks and preparing distributed models using Elastic CV Classifier. To present a case of optimizing distributed machine learning, we propose an intelligent user defined data partitioning approach that could impact on the accuracy of distributed machine learners on average. Our results show better average AUC as compared to average AUC obtained after applying random data partitioning which gives no control to user over data partitioning. It improves the average accuracy of distributed learner due to the domain specific intelligent partitioning by the user. Our customized approach achieves 0.66 AUC on individual sessions and 0.75 AUC on mixed sessions, whereas random / uncontrolled data distribution records 0.63 AUC.

  15. An effective filter for IBD detection in large data sets.

    KAUST Repository

    Huang, Lin

    2014-03-25

    Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.

  16. The Minimum Data Set 3.0 Cognitive Function Scale.

    Science.gov (United States)

    Thomas, Kali S; Dosa, David; Wysocki, Andrea; Mor, Vincent

    2017-09-01

    The Minimum Data Set (MDS) 3.0 introduced the Brief Interview for Mental Status (BIMS), a short performance-based cognitive screener for nursing home (NH) residents. Not all residents are able to complete the BIMS and are consequently assessed by staff. We designed a Cognitive Function Scale (CFS) integrating self-report and staff-report data and present evidence of the scale's construct validity. A retrospective cohort study. The subjects consisted of 3 cohorts: (1) long-stay NH residents (N=941,077) and (2) new admissions (N=2,066,580) during 2011-2012, and (3) residents with the older MDS 2.0 assessment in 2010 and the newer MDS 3.0 assessment (n=688,511). MDS 3.0 items were used to create a single, integrated 4-category hierarchical CFS that was compared with residents' prior MDS 2.0 Cognitive Performance Scale scores and other concurrent MDS 3.0 measures of construct validity. The new CFS suggests that 28% of the long-stay cohort in 2011-2012 were cognitively intact, 22% were mildly impaired, 33% were moderately impaired, and 17% were severely impaired. For the admission cohort, the CFS noted 56% as cognitively intact, 23% as mildly impaired, 17% as moderately impaired, and 4% as severely impaired. The CFS corresponded closely with residents' prior MDS 2.0 Cognitive Performance Scale scores and with performance of Activities of Daily Living, and nurses' judgments of function and behavior in both the admission and long-stay cohorts. The new CFS is valuable to researchers as it provides a single, integrated measure of NH residents' cognitive function, regardless of the mode of assessment.

  17. An effective filter for IBD detection in large data sets.

    KAUST Repository

    Huang, Lin; Bercovici, Sivan; Rodriguez, Jesse M; Batzoglou, Serafim

    2014-01-01

    Identity by descent (IBD) inference is the task of computationally detecting genomic segments that are shared between individuals by means of common familial descent. Accurate IBD detection plays an important role in various genomic studies, ranging from mapping disease genes to exploring ancient population histories. The majority of recent work in the field has focused on improving the accuracy of inference, targeting shorter genomic segments that originate from a more ancient common ancestor. The accuracy of these methods, however, is achieved at the expense of high computational cost, resulting in a prohibitively long running time when applied to large cohorts. To enable the study of large cohorts, we introduce SpeeDB, a method that facilitates fast IBD detection in large unphased genotype data sets. Given a target individual and a database of individuals that potentially share IBD segments with the target, SpeeDB applies an efficient opposite-homozygous filter, which excludes chromosomal segments from the database that are highly unlikely to be IBD with the corresponding segments from the target individual. The remaining segments can then be evaluated by any IBD detection method of choice. When examining simulated individuals sharing 4 cM IBD regions, SpeeDB filtered out 99.5% of genomic regions from consideration while retaining 99% of the true IBD segments. Applying the SpeeDB filter prior to detecting IBD in simulated fourth cousins resulted in an overall running time that was 10,000x faster than inferring IBD without the filter and retained 99% of the true IBD segments in the output.

  18. The EADGENE Microarray Data Analysis Workshop

    DEFF Research Database (Denmark)

    de Koning, Dirk-Jan; Jaffrézic, Florence; Lund, Mogens Sandø

    2007-01-01

    Microarray analyses have become an important tool in animal genomics. While their use is becoming widespread, there is still a lot of ongoing research regarding the analysis of microarray data. In the context of a European Network of Excellence, 31 researchers representing 14 research groups from...... 10 countries performed and discussed the statistical analyses of real and simulated 2-colour microarray data that were distributed among participants. The real data consisted of 48 microarrays from a disease challenge experiment in dairy cattle, while the simulated data consisted of 10 microarrays...... statistical weights, to omitting a large number of spots or omitting entire slides. Surprisingly, these very different approaches gave quite similar results when applied to the simulated data, although not all participating groups analysed both real and simulated data. The workshop was very successful...

  19. International Spinal Cord Injury Female Sexual and Reproductive Function Basic Data Set

    DEFF Research Database (Denmark)

    Alexander, M S; Biering-Sørensen, F; Elliott, S

    2011-01-01

    To create the International Spinal Cord Injury (SCI) Female Sexual and Reproductive Function Basic Data Set within the International SCI Data Sets.......To create the International Spinal Cord Injury (SCI) Female Sexual and Reproductive Function Basic Data Set within the International SCI Data Sets....

  20. International spinal cord injury skin and thermoregulation function basic data set

    DEFF Research Database (Denmark)

    Karlsson, Annette; Krassioukov, A; Alexander, M S

    2012-01-01

    To create an international spinal cord injury (SCI) skin and thermoregulation basic data set within the framework of the International SCI Data Sets.......To create an international spinal cord injury (SCI) skin and thermoregulation basic data set within the framework of the International SCI Data Sets....

  1. NASIS data base management system - IBM 360/370 OS MVT implementation. 3: Data set specifications

    Science.gov (United States)

    1973-01-01

    The data set specifications for the NASA Aerospace Safety Information System (NASIS) are presented. The data set specifications describe the content, format, and medium of communication of every data set required by the system. All relevant information pertinent to a particular set is prepared in a standard form and centralized in a single document. The format for the data set is provided.

  2. Saving a Unique Data Set for Space Weather Research

    Science.gov (United States)

    Bilitza, D.; Benson, R. F.; Reinisch, B. W.; Huang, X. A.

    2017-12-01

    The Canadian/US International Satellites for Ionospheric Studies (ISIS) program included the four satellites Alouette 1 and 2, ISIS 1 and 2 launched in 1962, 1965, 1969, and 1971, respectively and in operation for 10, 10, 21, and 19 years, respectively. The core experiment on these satellites was a topside sounder that could determine the ionospheric electron density from the orbit altitude down to about 250-500 km near where the ionosphere reaches its point of highest density, the F-peak. The mission was long lasting and highly successful, producing a wealth of information about the topside ionosphere in the form of analog ionosphere soundings on 7-track tapes. The analysis process required a tedious manual scaling of ionogram traces that could then, with appropriate software, be converted into electron density profiles. Even with the combined effort involving ionospheric groups from many countries only a relatively small percentage of the huge volume of recorded ionograms could be converted to electron density profiles. Even with this limited number significant new insights were achieved documented by the many Alouette/ISIS-related papers published in the 1960s and 1970s. Recognizing the importance of this unique data set for space weather research a new effort was undertaken in the late Nineties to analyze more of the Alouette/ISIS ionograms. The immediate cause for action was the threat to the more than 100,000 analog telemetry tapes in storage in Canada because of space limitations and storage costs. We were able to have nearly 20,000 tapes shipped to the NASA Goddard Space Flight Center for analog-to-digital conversion and succeeded in developing software that automatically scales and converts the ionograms to electron density profiles. This rescue effort is still ongoing and has already produced a significant increase in the information available for the topside ionosphere and has resulted in numerous publications. The data have led to improvements of the

  3. Identification of irradiated chicken

    International Nuclear Information System (INIS)

    Spiegelberg, A.; Heide, L.; Boegl, K.W.

    1990-01-01

    Frozen chicken and chicken parts were irradiated at a dose of 5 kGy with Co-60. The irradiated chicken and chicken parts were identified by determination of three radiation-induced hydrocarbons from the lipid fraction. Isolation was carried out by high-vacuum distillation with a cold-finger apparatus. The detection of the hydrocarbons was possible in all irradiated samples by gaschromatography/mass spectrometry. (orig.) [de

  4. Data set on the bioprecipitation of sulfate and trivalent arsenic by acidophilic non-traditional sulfur reducing bacteria.

    Science.gov (United States)

    de Matos, Letícia Paiva; Costa, Patrícia Freitas; Moreira, Mariana; Gomes, Paula Cristine Silva; de Queiroz Silva, Silvana; Gurgel, Leandro Vinícius Alves; Teixeira, Mônica Cristina

    2018-04-01

    Data presented here are related to the original paper "Simultaneous removal of sulfate and arsenic using immobilized non-traditional sulfate reducing bacteria (SRB) mixed culture and alternative low-cost carbon sources" published by same authors (Matos et al., 2018) [1]. The data set here presented aims to facilitate this paper comprehension by giving readers some additional information. Data set includes a brief description of experimental conditions and the results obtained during both batch and semi-continuous reactors experiments. Data confirmed arsenic and sulfate were simultaneously removed under acidic pH by using a biological treatment based on the activity of a non-traditional sulfur reducing bacteria consortium. This microbial consortium was able to utilize glycerol, powdered chicken feathers as carbon donors, and proved to be resistant to arsenite up to 8.0 mg L - 1 . Data related to sulfate and arsenic removal efficiencies, residual arsenite and sulfate contents, pH and Eh measurements obtained under different experimental conditions were depicted in graphical format. Refers to https://doi.org/10.1016/j.cej.2017.11.035.

  5. Data set on the bioprecipitation of sulfate and trivalent arsenic by acidophilic non-traditional sulfur reducing bacteria

    Directory of Open Access Journals (Sweden)

    Letícia Paiva de Matos

    2018-04-01

    Full Text Available Data presented here are related to the original paper “Simultaneous removal of sulfate and arsenic using immobilized non-traditional sulfate reducing bacteria (SRB mixed culture and alternative low-cost carbon sources” published by same authors (Matos et al., 2018 [1]. The data set here presented aims to facilitate this paper comprehension by giving readers some additional information. Data set includes a brief description of experimental conditions and the results obtained during both batch and semi-continuous reactors experiments. Data confirmed arsenic and sulfate were simultaneously removed under acidic pH by using a biological treatment based on the activity of a non-traditional sulfur reducing bacteria consortium. This microbial consortium was able to utilize glycerol, powdered chicken feathers as carbon donors, and proved to be resistant to arsenite up to 8.0 mg L−1. Data related to sulfate and arsenic removal efficiencies, residual arsenite and sulfate contents, pH and Eh measurements obtained under different experimental conditions were depicted in graphical format.Refers to https://doi.org/10.1016/j.cej.2017.11.035 Keywords: Arsenite, Sulfate reduction, Bioremediation, Immobilized cells, Acid pH

  6. Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions

    NARCIS (Netherlands)

    Gordon, L.; Yang, S.; Tran-Gyamfi, M.; Baggott, D.; Christensen, M.; Hamilton, A.; Crooijmans, R.P.M.A.; Groenen, M.A.M.; Lucas, S.; Ovcharenko, I.; Stubbs, L.

    2007-01-01

    The chicken genome draft sequence has provided a valuable resource for studies of an important agricultural and experimental model species and an important data set for comparative analysis. However, some of the most gene-rich segments are missing from chicken genome draft assemblies, limiting the

  7. Global Roads Open Access Data Set, Version 1 (gROADSv1)

    Data.gov (United States)

    National Aeronautics and Space Administration — The Global Roads Open Access Data Set, Version 1 (gROADSv1) was developed under the auspices of the CODATA Global Roads Data Development Task Group. The data set...

  8. Using secondary data sets in health care management : opportunities and challenges

    OpenAIRE

    Buttigieg, Sandra; Annual Meeting of the American Academy of Management

    2013-01-01

    The importance of secondary data sets within the medical services and management sector is discussed. Secondary data sets which are readily available and thus reduce considerable costs, can also provide accurate, valid and reliable evidence.

  9. International Comprehensive Ocean Atmosphere Data Set (ICOADS) And NCEI Global Marine Observations

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — International Comprehensive Ocean Atmosphere Data Set (ICOADS) consists of digital data set DSI-1173, archived at the National Center for Environmental Information...

  10. Assessing the validity of commercial and municipal food environment data sets in Vancouver, Canada.

    Science.gov (United States)

    Daepp, Madeleine Ig; Black, Jennifer

    2017-10-01

    The present study assessed systematic bias and the effects of data set error on the validity of food environment measures in two municipal and two commercial secondary data sets. Sensitivity, positive predictive value (PPV) and concordance were calculated by comparing two municipal and two commercial secondary data sets with ground-truthed data collected within 800 m buffers surrounding twenty-six schools. Logistic regression examined associations of sensitivity and PPV with commercial density and neighbourhood socio-economic deprivation. Kendall's τ estimated correlations between density and proximity of food outlets near schools constructed with secondary data sets v. ground-truthed data. Vancouver, Canada. Food retailers located within 800 m of twenty-six schools RESULTS: All data sets scored relatively poorly across validity measures, although, overall, municipal data sets had higher levels of validity than did commercial data sets. Food outlets were more likely to be missing from municipal health inspections lists and commercial data sets in neighbourhoods with higher commercial density. Still, both proximity and density measures constructed from all secondary data sets were highly correlated (Kendall's τ>0·70) with measures constructed from ground-truthed data. Despite relatively low levels of validity in all secondary data sets examined, food environment measures constructed from secondary data sets remained highly correlated with ground-truthed data. Findings suggest that secondary data sets can be used to measure the food environment, although estimates should be treated with caution in areas with high commercial density.

  11. ccPDB: compilation and creation of data sets from Protein Data Bank.

    Science.gov (United States)

    Singh, Harinder; Chauhan, Jagat Singh; Gromiha, M Michael; Raghava, Gajendra P S

    2012-01-01

    ccPDB (http://crdd.osdd.net/raghava/ccpdb/) is a database of data sets compiled from the literature and Protein Data Bank (PDB). First, we collected and compiled data sets from the literature used for developing bioinformatics methods to annotate the structure and function of proteins. Second, data sets were derived from the latest release of PDB using standard protocols. Third, we developed a powerful module for creating a wide range of customized data sets from the current release of PDB. This is a flexible module that allows users to create data sets using a simple six step procedure. In addition, a number of web services have been integrated in ccPDB, which include submission of jobs on PDB-based servers, annotation of protein structures and generation of patterns. This database maintains >30 types of data sets such as secondary structure, tight-turns, nucleotide interacting residues, metals interacting residues, DNA/RNA binding residues and so on.

  12. A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets

    Science.gov (United States)

    Carrig, Madeline M.; Manrique-Vallier, Daniel; Ranby, Krista W.; Reiter, Jerome P.; Hoyle, Rick H.

    2015-01-01

    Complex research questions often cannot be addressed adequately with a single data set. One sensible alternative to the high cost and effort associated with the creation of large new data sets is to combine existing data sets containing variables related to the constructs of interest. The goal of the present research was to develop a flexible, broadly applicable approach to the integration of disparate data sets that is based on nonparametric multiple imputation and the collection of data from a convenient, de novo calibration sample. We demonstrate proof of concept for the approach by integrating three existing data sets containing items related to the extent of problematic alcohol use and associations with deviant peers. We discuss both necessary conditions for the approach to work well and potential strengths and weaknesses of the method compared to other data set integration approaches. PMID:26257437

  13. The Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator

    Science.gov (United States)

    2017-09-01

    REPORT TYPE Technical Report 3. DATES COVERED (From - To) 4. TITLE AND SUBTITLE The Effect of Training Data Set Composition on the Performance of a...ARL-TR-8124 ● SEP 2017 US Army Research Laboratory The Effect of Training Data Set Composition on the Performance of a Neural...Laboratory The Effect of Training Data Set Composition on the Performance of a Neural Image Caption Generator by Abigail Wilson Montgomery Blair

  14. Applications and Benefits for Big Data Sets Using Tree Distances and The T-SNE Algorithm

    Science.gov (United States)

    2016-03-01

    BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE ALGORITHM by Suyoung Lee March 2016 Thesis Advisor: Samuel E. Buttrey...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE APPLICATIONS AND BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE...public release; distribution is unlimited 12b. DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words ) Modern data sets often consist of unstructured data

  15. Learning Data Set Influence on Identification Accuracy of Gas Turbine Neural Network Model

    Science.gov (United States)

    Kuznetsov, A. V.; Makaryants, G. M.

    2018-01-01

    There are many gas turbine engine identification researches via dynamic neural network models. It should minimize errors between model and real object during identification process. Questions about training data set processing of neural networks are usually missed. This article presents a study about influence of data set type on gas turbine neural network model accuracy. The identification object is thermodynamic model of micro gas turbine engine. The thermodynamic model input signal is the fuel consumption and output signal is the engine rotor rotation frequency. Four types input signals was used for creating training and testing data sets of dynamic neural network models - step, fast, slow and mixed. Four dynamic neural networks were created based on these types of training data sets. Each neural network was tested via four types test data sets. In the result 16 transition processes from four neural networks and four test data sets from analogous solving results of thermodynamic model were compared. The errors comparison was made between all neural network errors in each test data set. In the comparison result it was shown error value ranges of each test data set. It is shown that error values ranges is small therefore the influence of data set types on identification accuracy is low.

  16. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2010

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  17. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2015

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  18. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2014

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  19. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2009

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  20. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2016

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  1. EPA Enforcement and Compliance History Online: Water Discharge Monitoring Report Data Sets for FY2013

    Data.gov (United States)

    U.S. Environmental Protection Agency — Integrated Compliance Information System (ICIS) National Pollutant Discharge Elimination System (NPDES) Discharge Monitoring Report (DMR) data sets for Clean Water...

  2. Benchmarking Data Sets for the Evaluation of Virtual Ligand Screening Methods: Review and Perspectives.

    Science.gov (United States)

    Lagarde, Nathalie; Zagury, Jean-François; Montes, Matthieu

    2015-07-27

    Virtual screening methods are commonly used nowadays in drug discovery processes. However, to ensure their reliability, they have to be carefully evaluated. The evaluation of these methods is often realized in a retrospective way, notably by studying the enrichment of benchmarking data sets. To this purpose, numerous benchmarking data sets were developed over the years, and the resulting improvements led to the availability of high quality benchmarking data sets. However, some points still have to be considered in the selection of the active compounds, decoys, and protein structures to obtain optimal benchmarking data sets.

  3. Marine and land temperature data sets: A comparison and a look at recent trends

    International Nuclear Information System (INIS)

    Jones, P.D.; Wigley, T.M.L.; Farmer, G.

    1990-01-01

    Comparisons are made among the various data sets of marine and land temperatures. Emphasis in the analyses is placed on the first intercomparison of the two marine data sets, the United Kingdom Meteorological Office (UKMO) and the Comprehensive Ocean-Atmosphere Data Set (COADS). The results of the analyses show that the two data sets are not the same, as some authors have assumed. Important differences are noted prior to 1940, with hemispheric averages differing by up to 0.2 C for some decades during the nineteenth century. Patterns of regional temperature change over the two major periods of global warming this century, 1920-39 and 1967-86, are shown

  4. An evaluation of four crop:weed competition models using a common data set

    NARCIS (Netherlands)

    Deen, W.; Cousens, R.; Warringa, J.; Bastiaans, L.; Carberry, P.; Rebel, K.; Riha, S.; Murphy, C.; Benjamin, L.R.; Cloughley, C.; Cussans, J.; Forcella, F.

    2003-01-01

    To date, several crop : weed competition models have been developed. Developers of the various models were invited to compare model performance using a common data set. The data set consisted of wheat and Lolium rigidum grown in monoculture and mixtures under dryland and irrigated conditions.

  5. Assessing fitness for use: the expected value of spatial data sets

    NARCIS (Netherlands)

    Bruin, de S.; Bregt, A.K.; Ven, van de M.

    2001-01-01

    This paper proposes and illustrates a decision analytical approach to compare the value of alternative spatial data sets. In contrast to other work addressing value of information, its focus is on value of control. This is a useful concept when choosing the best data set for decision making under

  6. International spinal cord injury skin and thermoregulation function basic data set.

    Science.gov (United States)

    Karlsson, A K; Krassioukov, A; Alexander, M S; Donovan, W; Biering-Sørensen, F

    2012-07-01

    To create an international spinal cord injury (SCI) skin and thermoregulation basic data set within the framework of the International SCI Data Sets. An international working group. The draft of the data set was developed by a working group comprising members appointed by the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version of the data set was developed after review and comments by members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested international organizations and societies, individual persons with specific interest and the ISCoS Council. To make the data set uniform, each variable and each response category within each variable have been specifically defined to promote the collection and reporting of comparable minimal data. Variables included in the present data set are: date of data collection, thermoregulation history after SCI, including hyperthermia or hypothermia (noninfectious or infectious), as well as the history of hyperhidrosis or hypohidrosis above or below level of lesion. Body temperature and the time of measurement are included. Details regarding the presence of any pressure ulcer and stage, location and size of the ulcer(s), date of appearance of the ulcer(s) and whether surgical treatment has been performed are included. The history of any pressure ulcer during the last 12 months is also noted.

  7. Issues and Considerations regarding Sharable Data Sets for Recommender Systems in Technology Enhanced Learning

    DEFF Research Database (Denmark)

    Drachsler, Hendrik; Bogers, Toine; Vuorikari, Riina

    2010-01-01

    This paper raises the issue of missing standardised data sets for recommender systems in Technology Enhanced Learning (TEL) that can be used as benchmarks to compare different recommendation approaches. It discusses how suitable data sets could be created according to some initial suggestions...

  8. Teaching the Assessment of Normality Using Large Easily-Generated Real Data Sets

    Science.gov (United States)

    Kulp, Christopher W.; Sprechini, Gene D.

    2016-01-01

    A classroom activity is presented, which can be used in teaching students statistics with an easily generated, large, real world data set. The activity consists of analyzing a video recording of an object. The colour data of the recorded object can then be used as a data set to explore variation in the data using graphs including histograms,…

  9. The international spinal cord injury endocrine and metabolic function basic data set

    DEFF Research Database (Denmark)

    Bauman, W A; Biering-Sørensen, Fin; Krassioukov, A

    2011-01-01

    To develop the International Spinal Cord Injury (SCI) Endocrine and Metabolic Function Basic Data Set within the framework of the International SCI Data Sets that would facilitate consistent collection and reporting of basic endocrine and metabolic findings in the SCI population....

  10. Simpson's Paradox: A Data Set and Discrimination Case Study Exercise

    Science.gov (United States)

    Taylor, Stanley A.; Mickel, Amy E.

    2014-01-01

    In this article, we present a data set and case study exercise that can be used by educators to teach a range of statistical concepts including Simpson's paradox. The data set and case study are based on a real-life scenario where there was a claim of discrimination based on ethnicity. The exercise highlights the importance of performing…

  11. Development of land data sets for studies of global climate change

    International Nuclear Information System (INIS)

    Sadowski, F.G.; Watkins, A.H.

    1991-01-01

    The U.S. Geological Survey has begun a major initiative to organize, produce, and distribute land data sets that will support the land data requirements of the global change science community. Satellite image data sets, produced from the National Oceanic and Atmospheric Administration's Advanced Very High Resolution Radiometer sensors, will be developed to provide repetitive, synoptic coverage of regional, continental, and global land areas. These data sets, integrated with related land data and supplemented by coregistered Landsat data sets, will enable scientists to quantify the fundamental land surface attributes that are needed to model land surface processes, to detect and monitor land surface change, and to map land cover. These well-structured, consistent land data sets will form the historical record of land observations prior to the era of the National Aeronautics and Space Administration's Earth Observing System sensors

  12. Challenges in combining different data sets during analysis when using grounded theory.

    Science.gov (United States)

    Rintala, Tuula-Maria; Paavilainen, Eija; Astedt-Kurki, Päivi

    2014-05-01

    To describe the challenges in combining two data sets during grounded theory analysis. The use of grounded theory in nursing research is common. It is a suitable method for studying human action and interaction. It is recommended that many alternative sources of data are collected to create as rich a dataset as possible. Data from interviews with people with diabetes (n=19) and their family members (n=19). Combining two data sets. When using grounded theory, there are numerous challenges in collecting and managing data, especially for the novice researcher. One challenge is to combine different data sets during the analysis. There are many methodological textbooks about grounded theory but there is little written in the literature about combining different data sets. Discussion is needed on the management of data and the challenges of grounded theory. This article provides a means for combining different data sets in the grounded theory analysis process.

  13. The prevalence of terraced treescapes in analyses of phylogenetic data sets.

    Science.gov (United States)

    Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J

    2018-04-04

    The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.

  14. Quality Control and Peer Review of Data Sets: Mapping Data Archiving Processes to Data Publication Requirements

    Science.gov (United States)

    Mayernik, M. S.; Daniels, M.; Eaker, C.; Strand, G.; Williams, S. F.; Worley, S. J.

    2012-12-01

    Data sets exist within scientific research and knowledge networks as both technical and non-technical entities. Establishing the quality of data sets is a multi-faceted task that encompasses many automated and manual processes. Data sets have always been essential for science research, but now need to be more visible as first-class scholarly objects at national, international, and local levels. Many initiatives are establishing procedures to publish and curate data sets, as well as to promote professional rewards for researchers that collect, create, manage, and preserve data sets. Traditionally, research quality has been assessed by peer review of textual publications, e.g. journal articles, conference proceedings, and books. Citation indices then provide standard measures of productivity used to reward individuals for their peer-reviewed work. Whether a similar peer review process is appropriate for assessing and ensuring the quality of data sets remains as an open question. How does the traditional process of peer review apply to data sets? This presentation will describe current work being done at the National Center for Atmospheric Research (NCAR) in the context of the Peer REview for Publication & Accreditation of Research Data in the Earth sciences (PREPARDE) project. PREPARDE is assessing practices and processes for data peer review, with the goal of developing recommendations. NCAR data management teams perform various kinds of quality assessment and review of data sets prior to making them publicly available. The poster will investigate how notions of peer review relate to the types of data review already in place at NCAR. We highlight the data set characteristics and management/archiving processes that challenge the traditional peer review processes by using a number of questions as probes, including: Who is qualified to review data sets? What formal and informal documentation is necessary to allow someone outside of a research team to review a data set

  15. The Chicken Problem.

    Science.gov (United States)

    Reeves, Charles A.

    2000-01-01

    Uses the chicken problem for sixth grade students to scratch the surface of systems of equations using intuitive approaches. Provides students responses to the problem and suggests similar problems for extensions. (ASK)

  16. Eggcited about Chickens

    Science.gov (United States)

    Jones, Carolyn; Brown, Paul

    2012-01-01

    In this article, the authors describe St Peter's Primary School's and Honiton Primary School's experiences of keeping chickens. The authors also describe the benefits they bring and the reactions of the children. (Contains 5 figures.)

  17. A comparison of simulation results from two terrestrial carbon cycle models using three climate data sets

    International Nuclear Information System (INIS)

    Ito, Akihiko; Sasai, Takahiro

    2006-01-01

    This study addressed how different climate data sets influence simulations of the global terrestrial carbon cycle. For the period 1982-2001, we compared the results of simulations based on three climate data sets (NCEP/NCAR, NCEP/DOE AMIP-II and ERA40) employed in meteorological, ecological and biogeochemical studies and two different models (BEAMS and Sim-CYCLE). The models differed in their parameterizations of photosynthetic and phenological processes but used the same surface climate (e.g. shortwave radiation, temperature and precipitation), vegetation, soil and topography data. The three data sets give different climatic conditions, especially for shortwave radiation, in terms of long-term means, linear trends and interannual variability. Consequently, the simulation results for global net primary productivity varied by 16%-43% only from differences in the climate data sets, especially in these regions where the shortwave radiation data differed markedly: differences in the climate data set can strongly influence simulation results. The differences among the climate data set and between the two models resulted in slightly different spatial distribution and interannual variability in the net ecosystem carbon budget. To minimize uncertainty, we should pay attention to the specific climate data used. We recommend developing an accurate standard climate data set for simulation studies

  18. Recommendations for translation and reliability testing of International Spinal Cord Injury Data Sets.

    Science.gov (United States)

    Biering-Sørensen, F; Alexander, M S; Burns, S; Charlifue, S; DeVivo, M; Dietz, V; Krassioukov, A; Marino, R; Noonan, V; Post, M W M; Stripling, T; Vogel, L; Wing, P

    2011-03-01

    To provide recommendations regarding translation and reliability testing of International Spinal Cord Injury (SCI) Data Sets. The Executive Committee for the International SCI Standards and Data Sets. Translations of any specific International SCI Data Set can be accomplished by translation from the English version into the target language, and be followed by a back-translation into English, to confirm that the original meaning has been preserved. Another approach is to have the initial translation performed by translators who have knowledge of SCI, and afterwards controlled by other person(s) with the same kind of knowledge. The translation process includes both language translation and cultural adaptation, and therefore shall not be made word for word, but will strive to include conceptual equivalence. At a minimum, the inter-rater reliability should be tested by no less than two independent observers, and preferably in multiple countries. Translations must include information on the name, role and background of everyone involved in the translation process, and shall be dated and noted with a version number. By following the proposed guidelines, translated data sets should assure comparability of data acquisition across countries and cultures. If the translation process identifies irregularities or misrepresentation in either the original English version or the target language, the working group for the particular International SCI Data Set shall revise the data set accordingly, which may include re-wording of the original English version in order to accomplish a compromise in the content of the data set.

  19. Suitability of public use secondary data sets to study multiple activities.

    Science.gov (United States)

    Putnam, Michelle; Morrow-Howell, Nancy; Inoue, Megumi; Greenfield, Jennifer C; Chen, Huajuan; Lee, YungSoo

    2014-10-01

    The aims of this study were to inventory activity items within and across U.S. public use data sets, to identify gaps in represented activity domains and challenges in interpreting domains, and to assess the potential for studying multiple activity engagement among older adults using existing data. We engaged in content analysis of activity measures of 5U.S. public use data sets with nationally representative samples of older adults. Data sets included the Health & Retirement Survey (HRS), Americans' Changing Lives Survey (ACL), Midlife in the United States Survey (MIDUS), the National Health Interview Survey (NHIS), and the Panel Study of Income Dynamics survey (PSID). Two waves of each data set were analyzed. We identified 13 distinct activity domains across the 5 data sets, with substantial differences in representation of those domains among the data sets, and variance in the number and type of activity measures included in each. Our findings indicate that although it is possible to study multiple activity engagement within existing data sets, fuller sets of activity measures need to be developed in order to evaluate the portfolio of activities older adults engage in and the relationship of these portfolios to health and wellness outcomes. Importantly, clearer conceptual models of activity broadly conceived are required to guide this work. © The Author 2013. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. International lower urinary tract function basic spinal cord injury data set.

    Science.gov (United States)

    Biering-Sørensen, F; Craggs, M; Kennelly, M; Schick, E; Wyndaele, J-J

    2008-05-01

    To create the International Lower Urinary Tract Function Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. International working group. The draft of the Data Set was developed by a working group consisting of the members appointed by the International Continence Society, the European Association of Urology, the American Spinal Injury Association (ASIA), the International Spinal Cord Society (ISCoS) and a representative of the Executive Committee of the International SCI Standards and Data Sets. The final version of the Data Set was developed after review and comments by the members of the Executive Committee of the International SCI Standards and Data Sets, the ISCoS Scientific Committee, ASIA Board, relevant and interested (international) organizations and societies (around 40) and persons, and the ISCoS Council. Endorsement of the Data Set by relevant organizations and societies will be obtained. To make the Data Set uniform, each variable and each response category within each variable have been specifically defined in a way that is designed to promote the collection and reporting of comparable minimal data. Variables included in the International Lower Urinary Tract Function Basic SCI Data Set are as follows: date of data collection, urinary tract impairment unrelated to spinal cord lesion, awareness of the need to empty the bladder, bladder emptying, average number of voluntary bladder emptyings per day during the last week, incontinence within the last 3 months, collecting appliances for urinary incontinence, any drugs for the urinary tract within the last year, surgical procedures on the urinary tract and any change in urinary symptoms within the last year. Complete instruction for data collection, data sheet and training cases available at the website of ISCoS (www.iscos.org.uk) and ASIA (www.asia-spinalinjury.org).

  1. KFUPM-KAUST Red Sea model: Digital viscoelastic depth model and synthetic seismic data set

    KAUST Repository

    Al-Shuhail, Abdullatif A.; Mousa, Wail A.; Alkhalifah, Tariq Ali

    2017-01-01

    The Red Sea is geologically interesting due to its unique structures and abundant mineral and petroleum resources, yet no digital geologic models or synthetic seismic data of the Red Sea are publicly available for testing algorithms to image and analyze the area's interesting features. This study compiles a 2D viscoelastic model of the Red Sea and calculates a corresponding multicomponent synthetic seismic data set. The models and data sets are made publicly available for download. We hope this effort will encourage interested researchers to test their processing algorithms on this data set and model and share their results publicly as well.

  2. Operational Aspects of Dealing with the Large BaBar Data Set

    Energy Technology Data Exchange (ETDEWEB)

    Trunov, Artem G

    2003-06-13

    To date, the BaBar experiment has stored over 0.7PB of data in an Objectivity/DB database. Approximately half this data-set comprises simulated data of which more than 70% has been produced at more than 20 collaborating institutes outside of SLAC. The operational aspects of managing such a large data set and providing access to the physicists in a timely manner is a challenging and complex problem. We describe the operational aspects of managing such a large distributed data-set as well as importing and exporting data from geographically spread BaBar collaborators. We also describe problems common to dealing with such large datasets.

  3. Neutron fluence-to-dose equivalent conversion factors: a comparison of data sets and interpolation methods

    International Nuclear Information System (INIS)

    Sims, C.S.; Killough, G.G.

    1983-01-01

    Various segments of the health physics community advocate the use of different sets of neutron fluence-to-dose equivalent conversion factors as a function of energy and different methods of interpolation between discrete points in those data sets. The major data sets and interpolation methods are used to calculate the spectrum average fluence-to-dose equivalent conversion factors for five spectra associated with the various shielded conditions of the Health Physics Research Reactor. The results obtained by use of the different data sets and interpolation methods are compared and discussed. (author)

  4. KFUPM-KAUST Red Sea model: Digital viscoelastic depth model and synthetic seismic data set

    KAUST Repository

    Al-Shuhail, Abdullatif A.

    2017-06-01

    The Red Sea is geologically interesting due to its unique structures and abundant mineral and petroleum resources, yet no digital geologic models or synthetic seismic data of the Red Sea are publicly available for testing algorithms to image and analyze the area\\'s interesting features. This study compiles a 2D viscoelastic model of the Red Sea and calculates a corresponding multicomponent synthetic seismic data set. The models and data sets are made publicly available for download. We hope this effort will encourage interested researchers to test their processing algorithms on this data set and model and share their results publicly as well.

  5. Pathogenicity of Shigella in chickens.

    Science.gov (United States)

    Shi, Run; Yang, Xia; Chen, Lu; Chang, Hong-tao; Liu, Hong-ying; Zhao, Jun; Wang, Xin-wei; Wang, Chuan-qing

    2014-01-01

    Shigellosis in chickens was first reported in 2004. This study aimed to determine the pathogenicity of Shigella in chickens and the possibility of cross-infection between humans and chickens. The pathogenicity of Shigella in chickens was examined via infection of three-day-old SPF chickens with Shigella strain ZD02 isolated from a human patient. The virulence and invasiveness were examined by infection of the chicken intestines and primary chicken intestinal epithelial cells. The results showed Shigella can cause death via intraperitoneal injection in SPF chickens, but only induce depression via crop injection. Immunohistochemistry and transmission electron microscopy revealed the Shigella can invade the intestinal epithelia. Immunohistochemistry of the primary chicken intestinal epithelial cells infected with Shigella showed the bacteria were internalized into the epithelial cells. Electron microscopy also confirmed that Shigella invaded primary chicken intestinal epithelia and was encapsulated by phagosome-like membranes. Our data demonstrate that Shigella can invade primary chicken intestinal epithelial cells in vitro and chicken intestinal mucosa in vivo, resulting in pathogenicity and even death. The findings suggest Shigella isolated from human or chicken share similar pathogenicity as well as the possibility of human-poultry cross-infection, which is of public health significance.

  6. Treatment Episode Data Set: Discharges (TEDS-D-2006-2011)

    Data.gov (United States)

    U.S. Department of Health & Human Services — The Treatment Episode Data Set -- Discharges (TEDS-D) is a national census data system of annual discharges from substance abuse treatment facilities. TEDS-D...

  7. A Global Data Set of Leaf Photosynthetic Rates, Leaf N and P, and Specific Leaf Area

    Data.gov (United States)

    National Aeronautics and Space Administration — ABSTRACT: This global data set of photosynthetic rates and leaf nutrient traits was compiled from a comprehensive literature review. It includes estimates of Vcmax...

  8. A Global Data Set of Leaf Photosynthetic Rates, Leaf N and P, and Specific Leaf Area

    Data.gov (United States)

    National Aeronautics and Space Administration — This global data set of photosynthetic rates and leaf nutrient traits was compiled from a comprehensive literature review. It includes estimates of Vcmax (maximum...

  9. EPA Enforcement and Compliance History Online: Hazardous Waste Sites Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  10. EPA Enforcement and Compliance History Online: Clean Air Act Data Set (ZIP)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  11. Matched molecular pair-based data sets for computer-aided medicinal chemistry

    Science.gov (United States)

    Bajorath, Jürgen

    2014-01-01

    Matched molecular pairs (MMPs) are widely used in medicinal chemistry to study changes in compound properties including biological activity, which are associated with well-defined structural modifications. Herein we describe up-to-date versions of three MMP-based data sets that have originated from in-house research projects. These data sets include activity cliffs, structure-activity relationship (SAR) transfer series, and second generation MMPs based upon retrosynthetic rules. The data sets have in common that they have been derived from compounds included in the ChEMBL database (release 17) for which high-confidence activity data are available. Thus, the activity data associated with MMP-based activity cliffs, SAR transfer series, and retrosynthetic MMPs cover the entire spectrum of current pharmaceutical targets. Our data sets are made freely available to the scientific community. PMID:24627802

  12. Foreign Language Optical Character Recognition, Phase II: Arabic and Persian Training and Test Data Sets

    National Research Council Canada - National Science Library

    Davidson, Robert

    1997-01-01

    .... Each data set is divided into a training set, which is made available to developers, and a carefully matched equal-sized set of closely analogous samples, which is reserved for testing of the developers' products...

  13. EPA Enforcement and Compliance History Online: Clean Water Act Dischargers Data Set (effluent violations)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  14. Data sets for manuscript titled Unexpected benefits of reducing aerosol cooling effects

    Data.gov (United States)

    U.S. Environmental Protection Agency — These data sets were created using extensive model simulation results from the WRF-CMAQ model, population distributions, and through the use of an health impact...

  15. A Fast Logdet Divergence Based Metric Learning Algorithm for Large Data Sets Classification

    Directory of Open Access Journals (Sweden)

    Jiangyuan Mei

    2014-01-01

    the basis of classifiers, for example, the k-nearest neighbors classifier. Experiments on benchmark data sets demonstrate that the proposed algorithm compares favorably with the state-of-the-art methods.

  16. International Comprehensive Ocean Atmosphere Data Set (ICOADS) in Near-Real Time (NRT)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Near-Real-Time (NRT) product is an extension of the official ICOADS dataset with preliminary...

  17. The GRENE-TEA model intercomparison project (GTMIP) Stage 1 forcing data set

    Science.gov (United States)

    Sueyoshi, T.; Saito, K.; Miyazaki, S.; Mori, J.; Ise, T.; Arakida, H.; Suzuki, R.; Sato, A.; Iijima, Y.; Yabuki, H.; Ikawa, H.; Ohta, T.; Kotani, A.; Hajima, T.; Sato, H.; Yamazaki, T.; Sugimoto, A.

    2016-01-01

    Here, the authors describe the construction of a forcing data set for land surface models (including both physical and biogeochemical models; LSMs) with eight meteorological variables for the 35-year period from 1979 to 2013. The data set is intended for use in a model intercomparison study, called GTMIP, which is a part of the Japanese-funded Arctic Climate Change Research Project. In order to prepare a set of site-fitted forcing data for LSMs with realistic yet continuous entries (i.e. without missing data), four observational sites across the pan-Arctic region (Fairbanks, Tiksi, Yakutsk, and Kevo) were selected to construct a blended data set using both global reanalysis and observational data. Marked improvements were found in the diurnal cycles of surface air temperature and humidity, wind speed, and precipitation. The data sets and participation in GTMIP are open to the scientific community (doi:10.17592/001.2015093001).

  18. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) with Enhanced Trimming, Release 3

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset contains the latest official release of International Comprehensive Ocean-Atmosphere Data Set (ICOADS) with Enhanced Trimming, provided in a common...

  19. Development of the International Spinal Cord Injury Activities and Participation Basic Data Set

    DEFF Research Database (Denmark)

    Post, M W; Charlifue, S; Biering-Sørensen, F

    2016-01-01

    on a three-point scale for each item completes the total of 24 A&P variables. CONCLUSION: Collection of the International SCI A&P Basic Data Set variables in all future research on SCI outcomes is advised to facilitate comparison of results across published studies from around the world. Additional......STUDY DESIGN: Consensus decision-making process. OBJECTIVES: The objective of this study was to develop an International Spinal Cord Injury (SCI) Activities and Participation (A&P) Basic Data Set. SETTING: International working group. METHODS: A committee of experts was established to select...... and define A&P data elements to be included in this data set. A draft data set was developed and posted on the International Spinal Cord Society (ISCoS) and American Spinal Injury Association websites and was also disseminated among appropriate organizations for review. Suggested revisions were considered...

  20. EPA Enforcement and Compliance History Online: Clean Water Act Dischargers Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  1. NASA Shuttle Radar Topography Mission Combined Image Data Set V003

    Data.gov (United States)

    National Aeronautics and Space Administration — The NASA SRTM data sets result from a collaborative effort by the National Aeronautics and Space Administration (NASA) and the National Geospatial-Intelligence...

  2. International Comprehensive Ocean-Atmosphere Data Set (ICOADS) Release 3.0 - Monthly Summary Groups (MSG)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — This dataset, the International Comprehensive Ocean-Atmosphere Data Set (ICOADS), is the most widely-used freely available collection of surface marine observations,...

  3. International lower urinary tract function basic spinal cord injury data set

    DEFF Research Database (Denmark)

    Craggs, M.; Kennelly, M.; Schick, E.

    2008-01-01

    OBJECTIVE: To create the International Lower Urinary Tract Function Basic Spinal Cord Injury (SCI) Data Set within the framework of the International SCI Data Sets. SETTING: International working group. METHODS: The draft of the Data Set was developed by a working group consisting of the members......:Variables included in the International Lower Urinary Tract Function Basic SCI Data Set are as follows: date of data collection, urinary tract impairment unrelated to spinal cord lesion, awareness of the need to empty the bladder, bladder emptying, average number of voluntary bladder emptyings per day during...... the last week, incontinence within the last 3 months, collecting appliances for urinary incontinence, any drugs for the urinary tract within the last year, surgical procedures on the urinary tract and any change in urinary symptoms within the last year. Complete instruction for data collection, data sheet...

  4. EPA Enforcement and Compliance History Online: Legacy System Clean Air Act Data Set (ZIP)

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  5. Stratospheric Water and OzOne Satellite Homogenized (SWOOSH) data set

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — The Stratospheric Water and Ozone Satellite Homogenized (SWOOSH) data set is a merged record of stratospheric ozone and water vapor measurements taken by a number of...

  6. EPA Enforcement and Compliance History Online: EPA Enforcement Action Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — The Enforcement and Compliance History Online (ECHO) data sets have been compiled for access to larger sets of national data to ensure that ECHO meets your data...

  7. Bidirectional Active Learning: A Two-Way Exploration Into Unlabeled and Labeled Data Set.

    Science.gov (United States)

    Zhang, Xiao-Yu; Wang, Shupeng; Yun, Xiaochun

    2015-12-01

    In practical machine learning applications, human instruction is indispensable for model construction. To utilize the precious labeling effort effectively, active learning queries the user with selective sampling in an interactive way. Traditional active learning techniques merely focus on the unlabeled data set under a unidirectional exploration framework and suffer from model deterioration in the presence of noise. To address this problem, this paper proposes a novel bidirectional active learning algorithm that explores into both unlabeled and labeled data sets simultaneously in a two-way process. For the acquisition of new knowledge, forward learning queries the most informative instances from unlabeled data set. For the introspection of learned knowledge, backward learning detects the most suspiciously unreliable instances within the labeled data set. Under the two-way exploration framework, the generalization ability of the learning model can be greatly improved, which is demonstrated by the encouraging experimental results.

  8. Analysis of Water and Energy Budgets and Trends Using the NLDAS Monthly Data Sets

    Science.gov (United States)

    Vollmer, Bruce E.; Rui, Hualan; Mocko, David M.; Teng, William L.; Lei, Guang-Dih

    2012-01-01

    The North American Land Data Assimilation System (NLDAS) is a collaborative project between NASA GSFC, NOAA, Princeton University, and the University of Washington. NLDAS has created surface meteorological forcing data sets using the best-available observations and reanalyses. The forcing data sets are used to drive four separate land-surface models (LSMs), Mosaic, Noah, VIC, and SAC, to produce data sets of soil moisture, snow, runoff, and surface fluxes. NLDAS hourly data, accessible from the NASA GES DISC Hydrology Data Holdings Portal, http://disc.sci.gsfc.nasa.gov/hydrology/data-holdings, are widely used by various user communities in modeling, research, and applications, such as drought and flood monitoring, watershed and water quality management, and case studies of extreme events. More information is available at http://ldas.gsfc.nasa.gov/. To further facilitate analysis of water and energy budgets and trends, NLDAS monthly data sets have been recently released by NASA GES DISC.

  9. Can survival prediction be improved by merging gene expression data sets?

    Directory of Open Access Journals (Sweden)

    Haleh Yasrebi

    Full Text Available BACKGROUND: High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS: Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS: Merging did not deteriorate performance on average despite (a The diversity of microarray platforms used. (b The heterogeneity of patients cohorts. (c The heterogeneity of breast cancer disease. (d Substantial variation of time to death or relapse. (e The reduced number of genes in the merged data

  10. Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis

    KAUST Repository

    Mueller, B.

    2013-10-01

    Land evapotranspiration (ET) estimates are available from several global data sets.Here, Monthly Global Land et Synthesis Products, Merged from These Individual Data Sets over the Time Periods 1989-1995 (7 Yr) and 1989-2005 (17 Yr), Are Presented. the Merged Synthesis Products over the Shorter Period Are Based on A Total of 40 Distinct Data Sets while Those over the Longer Period Are Based on A Total of 14 Data Sets. in the Individual Data Sets, et Is Derived from Satellite And/or in Situ Observations (Diagnostic Data Sets) or Calculated Via Land-surface Models (LSMs) Driven with Observations-based Forcing or Output from Atmospheric Reanalyses. Statistics for Four Merged Synthesis Products Are Provided, One Including All Data Sets and Three Including only Data Sets from One Category Each (Diagnostic, LSMs, and Reanalyses). the Multi-annual Variations of et in the Merged Synthesis Products Display Realistic Responses. They Are Also Consistent with Previous Findings of A Global Increase in et between 1989 and 1997 (0.13 Mm yr-2 in Our Merged Product) Followed by A Significant Decrease in This Trend (-0.18 Mm yr-2), although These Trends Are Relatively Small Compared to the Uncertainty of Absolute et Values. the Global Mean et from the Merged Synthesis Products (Based on All Data Sets) Is 493 Mm yr-1 (1.35 Mm d-1) for Both the 1989-1995 and 1989-2005 Products, Which Is Relatively Low Compared to Previously Published Estimates. We Estimate Global Runoff (Precipitation Minus ET) to 263 Mm yr -1 (34 406 km3 yr-1) for A Total Land Area of 130 922 000 km2. Precipitation, Being An Important Driving Factor and Input to Most Simulated et Data Sets, Presents Uncertainties between Single Data Sets As Large As Those in the et Estimates. in Order to Reduce Uncertainties in Current et Products, Improving the Accuracy of the Input Variables, Especially Precipitation, As Well As the Parameterizations of ET, Are Crucial. 2013 Author(s).

  11. Benchmark products for land evapotranspiration: LandFlux-EVAL multi-data set synthesis

    KAUST Repository

    Mueller, B.; Hirschi, M.; Jimenez, C.; Ciais, P.; Dirmeyer, P.A.; Dolman, A.J.; Fisher, J.B.; Jung, M.; Ludwig, F.; Maignan, F.; Miralles, D.G.; McCabe, Matthew; Reichstein, M.; Sheffield, J.; Wang, K.; Wood, E.F.; Zhang, Y.; Seneviratne, S.I.

    2013-01-01

    Land evapotranspiration (ET) estimates are available from several global data sets.Here, Monthly Global Land et Synthesis Products, Merged from These Individual Data Sets over the Time Periods 1989-1995 (7 Yr) and 1989-2005 (17 Yr), Are Presented. the Merged Synthesis Products over the Shorter Period Are Based on A Total of 40 Distinct Data Sets while Those over the Longer Period Are Based on A Total of 14 Data Sets. in the Individual Data Sets, et Is Derived from Satellite And/or in Situ Observations (Diagnostic Data Sets) or Calculated Via Land-surface Models (LSMs) Driven with Observations-based Forcing or Output from Atmospheric Reanalyses. Statistics for Four Merged Synthesis Products Are Provided, One Including All Data Sets and Three Including only Data Sets from One Category Each (Diagnostic, LSMs, and Reanalyses). the Multi-annual Variations of et in the Merged Synthesis Products Display Realistic Responses. They Are Also Consistent with Previous Findings of A Global Increase in et between 1989 and 1997 (0.13 Mm yr-2 in Our Merged Product) Followed by A Significant Decrease in This Trend (-0.18 Mm yr-2), although These Trends Are Relatively Small Compared to the Uncertainty of Absolute et Values. the Global Mean et from the Merged Synthesis Products (Based on All Data Sets) Is 493 Mm yr-1 (1.35 Mm d-1) for Both the 1989-1995 and 1989-2005 Products, Which Is Relatively Low Compared to Previously Published Estimates. We Estimate Global Runoff (Precipitation Minus ET) to 263 Mm yr -1 (34 406 km3 yr-1) for A Total Land Area of 130 922 000 km2. Precipitation, Being An Important Driving Factor and Input to Most Simulated et Data Sets, Presents Uncertainties between Single Data Sets As Large As Those in the et Estimates. in Order to Reduce Uncertainties in Current et Products, Improving the Accuracy of the Input Variables, Especially Precipitation, As Well As the Parameterizations of ET, Are Crucial. 2013 Author(s).

  12. The 1 km resolution global data set: needs of the International Geosphere Biosphere Programme

    Science.gov (United States)

    Townshend, J.R.G.; Justice, C.O.; Skole, D.; Malingreau, J.-P.; Cihlar, J.; Teillet, P.; Sadowski, F.; Ruttenberg, S.

    1994-01-01

    Examination of the scientific priorities for the International Geosphere Biosphere Programme (IGBP) reveals a requirement for global land data sets in several of its Core Projects. These data sets need to be at several space and time scales. Requirements are demonstrated for the regular acquisition of data at spatial resolutions of 1 km and finer and at high temporal frequencies. Global daily data at a resolution of approximately 1 km are sensed by the Advanced Very High Resolution Radiometer (AVHRR), but they have not been available in a single archive. It is proposed, that a global data set of the land surface is created from remotely sensed data from the AVHRR to support a number of IGBP's projects. This data set should have a spatial resolution of 1 km and should be generated at least once every 10 days for the entire globe. The minimum length of record should be a year, and ideally a system should be put in place which leads to the continuous acquisition of 1 km data to provide a base line data set prior to the Earth Observing System (EOS) towards the end of the decade. Because of the high cloud cover in many parts of the world, it is necessary to plan for the collection of data from every orbit. Substantial effort will be required in the preprocessing of the data set involving radiometric calibration, atmospheric correction, geometric correction and temporal compositing, to make it suitable for the extraction of information.

  13. An historical and geographic data set on the distribution of macroinvertebrates in Italian mountain lakes

    Directory of Open Access Journals (Sweden)

    Angela Boggero

    2017-11-01

    Full Text Available Macroinvertebrates play a key role in freshwater food webs, acting as major links between organic matter resources, primary consumers (such as bacteria, and secondary consumers (e.g.fish, amphibians, birds, and reptiles. In this paper we present a data set encompassing all geographic and historical data available on macroinvertebrates of the Italian mountain lakes from 1902 to 2016. The data set, divided per Italian mountain range (Alps and Apennines and administrative region, covers more than a century of studies of many foreign and Italian scientists. The data set includes 2372 records and shows macroinvertebrate occurrence data in 176 Alpine and in 13 Apennine lakes, of which 178 of natural origin, 5 reservoirs, and 6 artificially extended. The data set lists 605 taxa, updated on the basis of their current taxonomic position. Only 353 taxa are identified at species level, highlighting the still poorly investigated biodiversity of Italian mountain lake macroinvertebrates. Since they function as key elements to characterize lake ecological status, our data set emphasizes the huge taxonomic effort that still has to be undertaken to fully characterize these ecosystems. The data set is available in csv (comma-separated values format.

  14. Demonstration of a software design and statistical analysis methodology with application to patient outcomes data sets.

    Science.gov (United States)

    Mayo, Charles; Conners, Steve; Warren, Christopher; Miller, Robert; Court, Laurence; Popple, Richard

    2013-11-01

    With emergence of clinical outcomes databases as tools utilized routinely within institutions, comes need for software tools to support automated statistical analysis of these large data sets and intrainstitutional exchange from independent federated databases to support data pooling. In this paper, the authors present a design approach and analysis methodology that addresses both issues. A software application was constructed to automate analysis of patient outcomes data using a wide range of statistical metrics, by combining use of C#.Net and R code. The accuracy and speed of the code was evaluated using benchmark data sets. The approach provides data needed to evaluate combinations of statistical measurements for ability to identify patterns of interest in the data. Through application of the tools to a benchmark data set for dose-response threshold and to SBRT lung data sets, an algorithm was developed that uses receiver operator characteristic curves to identify a threshold value and combines use of contingency tables, Fisher exact tests, Welch t-tests, and Kolmogorov-Smirnov tests to filter the large data set to identify values demonstrating dose-response. Kullback-Leibler divergences were used to provide additional confirmation. The work demonstrates the viability of the design approach and the software tool for analysis of large data sets.

  15. Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets

    Science.gov (United States)

    Belter, Christopher W.

    2014-01-01

    Evaluation of scientific research is becoming increasingly reliant on publication-based bibliometric indicators, which may result in the devaluation of other scientific activities - such as data curation – that do not necessarily result in the production of scientific publications. This issue may undermine the movement to openly share and cite data sets in scientific publications because researchers are unlikely to devote the effort necessary to curate their research data if they are unlikely to receive credit for doing so. This analysis attempts to demonstrate the bibliometric impact of properly curated and openly accessible data sets by attempting to generate citation counts for three data sets archived at the National Oceanographic Data Center. My findings suggest that all three data sets are highly cited, with estimated citation counts in most cases higher than 99% of all the journal articles published in Oceanography during the same years. I also find that methods of citing and referring to these data sets in scientific publications are highly inconsistent, despite the fact that a formal citation format is suggested for each data set. These findings have important implications for developing a data citation format, encouraging researchers to properly curate their research data, and evaluating the bibliometric impact of individuals and institutions. PMID:24671177

  16. Regression with Small Data Sets: A Case Study using Code Surrogates in Additive Manufacturing

    Energy Technology Data Exchange (ETDEWEB)

    Kamath, C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Fan, Y. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-04-11

    There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. While it is easy to collect such large data sets in some application domains, there are others where collecting even a single data point can be very expensive, so the resulting data sets have only tens or hundreds of samples. For example, when complex computer simulations are used to understand a scientific phenomenon, we want to run the simulation for many different values of the input parameters and analyze the resulting output. The data set relating the simulation inputs and outputs is typically quite small, especially when each run of the simulation is expensive. However, regression techniques can still be used on such data sets to build an inexpensive \\surrogate" that could provide an approximate output for a given set of inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments. In this paper, we compare different regression techniques to determine how well they predict melt-pool characteristics in the problem domain of additive manufacturing. Our analysis indicates that some of the commonly used regression methods do perform quite well even on small data sets.

  17. Asian-Style Chicken Wraps

    Science.gov (United States)

    ... https://medlineplus.gov/recipe/asianstylechickenwraps.html Asian-Style Chicken Wraps To use the sharing features on this ... Tbsp lime juice (or about 2 limes) For chicken: 1 Tbsp peanut oil or vegetable oil 1 ...

  18. An intercomparison of observational precipitation data sets over Northwest India during winter

    Science.gov (United States)

    Nageswararao, M. M.; Mohanty, U. C.; Ramakrishna, S. S. V. S.; Dimri, A. P.

    2018-04-01

    Winter (DJF) precipitation over Northwest India (NWI) is very important for the cultivation of Rabi crops. Thus, an accurate estimation of high-resolution observations, evaluation of high-resolution numerical models, and understanding the local variability trends are essential. The objective of this study is to verify the quality of a new high spatial resolution (0.25° × 0.25°) gridded daily precipitation data set of India Meteorological Department (IMD1) over NWI during winter. An intercomparison with four existing precipitation data sets at 0.5° × 0.5° of IMD (IMD2), 1° × 1° of IMD (IMD3), 0.25° × 0.25° of APHRODITE (APRD1), and 0.5° × 0.5° of APHRODITE (APRD1) resolution during a common period of 1971-2003 is done. The evaluation of data quality of these five data sets against available 26 station observations is carried out, and the results clearly indicate that all the five data sets reasonably agreed with the station observation. However, the errors are relatively more in all the five data sets over Jammu and Kashmir-related four stations (Srinagar, Drass, Banihal top, and Dawar), while these errors are less in the other stations. It may be due to the lack of station observations over the region. The quality of IMD1 data set over NWI for winter precipitation is reasonably well than the other data sets. The intercomparison analysis suggests that the climatological mean, interannual variability, and coefficient of variation from IMD1 are similar with other data sets. Further, the analysis extended to the India meteorological subdivisions over the region. This analysis indicates overestimation in IMD3 and underestimation in APRD1 and APRD2 over Jammu and Kashmir, Himachal Pradesh, and NWI as a whole, whereas IMD2 is closer to IMD1. Moreover, all the five data sets are highly correlated (>0.5) among them at 99.9% confidence level for all subdivisions. It is remarkably noticed that multicategorical (light precipitation, moderate precipitation, heavy

  19. Concordance and predictive value of two adverse drug event data sets.

    Science.gov (United States)

    Cami, Aurel; Reis, Ben Y

    2014-08-22

    Accurate prediction of adverse drug events (ADEs) is an important means of controlling and reducing drug-related morbidity and mortality. Since no single "gold standard" ADE data set exists, a range of different drug safety data sets are currently used for developing ADE prediction models. There is a critical need to assess the degree of concordance between these various ADE data sets and to validate ADE prediction models against multiple reference standards. We systematically evaluated the concordance of two widely used ADE data sets - Lexi-comp from 2010 and SIDER from 2012. The strength of the association between ADE (drug) counts in Lexi-comp and SIDER was assessed using Spearman rank correlation, while the differences between the two data sets were characterized in terms of drug categories, ADE categories and ADE frequencies. We also performed a comparative validation of the Predictive Pharmacosafety Networks (PPN) model using both ADE data sets. The predictive power of PPN using each of the two validation sets was assessed using the area under Receiver Operating Characteristic curve (AUROC). The correlations between the counts of ADEs and drugs in the two data sets were 0.84 (95% CI: 0.82-0.86) and 0.92 (95% CI: 0.91-0.93), respectively. Relative to an earlier snapshot of Lexi-comp from 2005, Lexi-comp 2010 and SIDER 2012 introduced a mean of 1,973 and 4,810 new drug-ADE associations per year, respectively. The difference between these two data sets was most pronounced for Nervous System and Anti-infective drugs, Gastrointestinal and Nervous System ADEs, and postmarketing ADEs. A minor difference of 1.1% was found in the AUROC of PPN when SIDER 2012 was used for validation instead of Lexi-comp 2010. In conclusion, the ADE and drug counts in Lexi-comp and SIDER data sets were highly correlated and the choice of validation set did not greatly affect the overall prediction performance of PPN. Our results also suggest that it is important to be aware of the

  20. Global temperature response to the major volcanic eruptions in multiple reanalysis data sets

    Directory of Open Access Journals (Sweden)

    M. Fujiwara

    2015-12-01

    Full Text Available The global temperature responses to the eruptions of Mount Agung in 1963, El Chichón in 1982, and Mount Pinatubo in 1991 are investigated using nine currently available reanalysis data sets (JRA-55, MERRA, ERA-Interim, NCEP-CFSR, JRA-25, ERA-40, NCEP-1, NCEP-2, and 20CR. Multiple linear regression is applied to the zonal and monthly mean time series of temperature for two periods, 1979–2009 (for eight reanalysis data sets and 1958–2001 (for four reanalysis data sets, by considering explanatory factors of seasonal harmonics, linear trends, Quasi-Biennial Oscillation, solar cycle, and El Niño Southern Oscillation. The residuals are used to define the volcanic signals for the three eruptions separately, and common and different responses among the older and newer reanalysis data sets are highlighted for each eruption. In response to the Mount Pinatubo eruption, most reanalysis data sets show strong warming signals (up to 2–3 K for 1-year average in the tropical lower stratosphere and weak cooling signals (down to −1 K in the subtropical upper troposphere. For the El Chichón eruption, warming signals in the tropical lower stratosphere are somewhat smaller than those for the Mount Pinatubo eruption. The response to the Mount Agung eruption is asymmetric about the equator with strong warming in the Southern Hemisphere midlatitude upper troposphere to lower stratosphere. Comparison of the results from several different reanalysis data sets confirms the atmospheric temperature response to these major eruptions qualitatively, but also shows quantitative differences even among the most recent reanalysis data sets. The consistencies and differences among different reanalysis data sets provide a measure of the confidence and uncertainty in our current understanding of the volcanic response. The results of this intercomparison study may be useful for validation of climate model responses to volcanic forcing and for assessing proposed

  1. Gamma radiation and chickens

    International Nuclear Information System (INIS)

    Toropilova, D.; Takac, L.; Toropila, M.; Tomko, M. M.

    2014-01-01

    In our work, we focused the effect of low doses of gamma radiation on metabolic parameters in chickens. In the first group of chickens we monitor changes of the concentration in glucose and cholesterol after whole body irradiation dose of chicken (3 Gy). In the second group of chickens we studied the combined effect of radiation and intraperitoneal application solution of zinc chloride to changes of the concentration in glucose and total cholesterol. In the tissues of organisms are found only in a very small amount of microelements however are of particular importance in a number of enzymatic catalytic and regulatory processes. Zinc is found in all cells of the body. However, it is the highest percentage of zinc contained in muscle and bone cells. Resorption takes place in the small intestine, especially in the duodenum. For both groups of chickens, we performed analyzes on the 3 rd , 7 th , 14 th , 21 st and 30 day. Results and an overview of the work can be helpful in the peaceful uses of nuclear energy and in preventing diseases from exposure to radiation, but also in the case of the consequences after nuclear accidents. (authors)

  2. Chicken from Farm to Table

    Science.gov (United States)

    ... on fresh chicken. However, if chicken is processed, additives such as MSG, salt, or sodium erythorbate may be added but must be listed on the label. [ Top of Page ] Foodborne Organisms Associated with Chicken As on any perishable meat, fish, or poultry, bacteria can be found on raw ...

  3. The international spinal cord injury endocrine and metabolic function basic data set.

    Science.gov (United States)

    Bauman, W A; Biering-Sørensen, F; Krassioukov, A

    2011-10-01

    To develop the International Spinal Cord Injury (SCI) Endocrine and Metabolic Function Basic Data Set within the framework of the International SCI Data Sets that would facilitate consistent collection and reporting of basic endocrine and metabolic findings in the SCI population. International. The International SCI Endocrine and Metabolic Function Data Set was developed by a working group. The initial data set document was revised on the basis of suggestions from members of the Executive Committee of the International SCI Standards and Data Sets, the International Spinal Cord Society (ISCoS) Executive and Scientific Committees, American Spinal Injury Association (ASIA) Board, other interested organizations and societies, and individual reviewers. In addition, the data set was posted for 2 months on ISCoS and ASIA websites for comments. The final International SCI Endocrine and Metabolic Function Data Set contains questions on the endocrine and metabolic conditions diagnosed before and after spinal cord lesion. If available, information collected before injury is to be obtained only once, whereas information after injury may be collected at any time. These data include information on diabetes mellitus, lipid disorders, osteoporosis, thyroid disease, adrenal disease, gonadal disease and pituitary disease. The question of gonadal status includes stage of sexual development and that for females also includes menopausal status. Data will be collected for body mass index and for the fasting serum lipid profile. The complete instructions for data collection and the data sheet itself are freely available on the websites of ISCoS (http://www.iscos.org.uk) and ASIA (http://www.asia-spinalinjury.org).

  4. Diverse Data Sets Can Yield Reliable Information through Mechanistic Modeling: Salicylic Acid Clearance.

    Science.gov (United States)

    Raymond, G M; Bassingthwaighte, J B

    This is a practical example of a powerful research strategy: putting together data from studies covering a diversity of conditions can yield a scientifically sound grasp of the phenomenon when the individual observations failed to provide definitive understanding. The rationale is that defining a realistic, quantitative, explanatory hypothesis for the whole set of studies, brings about a "consilience" of the often competing hypotheses considered for individual data sets. An internally consistent conjecture linking multiple data sets simultaneously provides stronger evidence on the characteristics of a system than does analysis of individual data sets limited to narrow ranges of conditions. Our example examines three very different data sets on the clearance of salicylic acid from humans: a high concentration set from aspirin overdoses; a set with medium concentrations from a research study on the influences of the route of administration and of sex on the clearance kinetics, and a set on low dose aspirin for cardiovascular health. Three models were tested: (1) a first order reaction, (2) a Michaelis-Menten (M-M) approach, and (3) an enzyme kinetic model with forward and backward reactions. The reaction rates found from model 1 were distinctly different for the three data sets, having no commonality. The M-M model 2 fitted each of the three data sets but gave a reliable estimates of the Michaelis constant only for the medium level data (K m = 24±5.4 mg/L); analyzing the three data sets together with model 2 gave K m = 18±2.6 mg/L. (Estimating parameters using larger numbers of data points in an optimization increases the degrees of freedom, constraining the range of the estimates). Using the enzyme kinetic model (3) increased the number of free parameters but nevertheless improved the goodness of fit to the combined data sets, giving tighter constraints, and a lower estimated K m = 14.6±2.9 mg/L, demonstrating that fitting diverse data sets with a single model

  5. Visualization of Penile Suspensory Ligamentous System Based on Visible Human Data Sets

    Science.gov (United States)

    Chen, Xianzhuo; Wu, Yi; Tao, Ling; Yan, Yan; Pang, Jun; Zhang, Shaoxiang; Li, Shirong

    2017-01-01

    Background The aim of this study was to use a three-dimensional (3D) visualization technology to illustrate and describe the anatomical features of the penile suspensory ligamentous system based on the Visible Human data sets and to explore the suspensory mechanism of the penis for the further improvement of the penis-lengthening surgery. Material/Methods Cross-sectional images retrieved from the first Chinese Visible Human (CVH-1), third Chinese Visible Human (CVH-3), and Visible Human Male (VHM) data sets were used to segment the suspensory ligamentous system and its adjacent structures. The magnetic resonance imaging (MRI) images of this system were studied and compared with those from the Visible Human data sets. The 3D models reconstructed from the Visible Human data sets were used to provide morphological features of the penile suspensory ligamentous system and its related structures. Results The fundiform ligament was a superficial, loose, fibro-fatty tissue which originated from Scarpa’s fascia superiorly and continued to the scrotal septum inferiorly. The suspensory ligament and arcuate pubic ligament were dense fibrous connective tissues which started from the pubic symphysis and terminated by attaching to the tunica albuginea of the corpora cavernosa. Furthermore, the arcuate pubic ligament attached to the inferior rami of the pubis laterally. Conclusions The 3D model based on Visible Human data sets can be used to clarify the anatomical features of the suspensory ligamentous system, thereby contributing to the improvement of penis-lengthening surgery. PMID:28530218

  6. INTEGRATED SFM TECHNIQUES USING DATA SET FROM GOOGLE EARTH 3D MODEL AND FROM STREET LEVEL

    Directory of Open Access Journals (Sweden)

    L. Inzerillo

    2017-08-01

    Full Text Available Structure from motion (SfM represents a widespread photogrammetric method that uses the photogrammetric rules to carry out a 3D model from a photo data set collection. Some complex ancient buildings, such as Cathedrals, or Theatres, or Castles, etc. need to implement the data set (realized from street level with the UAV one in order to have the 3D roof reconstruction. Nevertheless, the use of UAV is strong limited from the government rules. In these last years, Google Earth (GE has been enriched with the 3D models of the earth sites. For this reason, it seemed convenient to start to test the potentiality offered by GE in order to extract from it a data set that replace the UAV function, to close the aerial building data set, using screen images of high resolution 3D models. Users can take unlimited “aerial photos” of a scene while flying around in GE at any viewing angle and altitude. The challenge is to verify the metric reliability of the SfM model carried out with an integrated data set (the one from street level and the one from GE aimed at replace the UAV use in urban contest. This model is called integrated GE SfM model (i-GESfM. In this paper will be present a case study: the Cathedral of Palermo.

  7. MiniWall Tool for Analyzing CFD and Wind Tunnel Large Data Sets

    Science.gov (United States)

    Schuh, Michael J.; Melton, John E.; Stremel, Paul M.

    2017-01-01

    It is challenging to review and assimilate large data sets created by Computational Fluid Dynamics (CFD) simulations and wind tunnel tests. Over the past 10 years, NASA Ames Research Center has developed and refined a software tool dubbed the MiniWall to increase productivity in reviewing and understanding large CFD-generated data sets. Under the recent NASA ERA project, the application of the tool expanded to enable rapid comparison of experimental and computational data. The MiniWall software is browser based so that it runs on any computer or device that can display a web page. It can also be used remotely and securely by using web server software such as the Apache HTTP server. The MiniWall software has recently been rewritten and enhanced to make it even easier for analysts to review large data sets and extract knowledge and understanding from these data sets. This paper describes the MiniWall software and demonstrates how the different features are used to review and assimilate large data sets.

  8. Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri; Shapiro, Harris; Goltsman, Eugene; McHardy, Alice C.; Rigoutsos, Isidore; Salamov, Asaf; Korzeniewski, Frank; Land, Miriam; Lapidus, Alla; Grigoriev, Igor; Richardson, Paul; Hugenholtz, Philip; Kyrpides, Nikos C.

    2006-12-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and two sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  9. A Comparison of Heuristics with Modularity Maximization Objective using Biological Data Sets

    Directory of Open Access Journals (Sweden)

    Pirim Harun

    2016-01-01

    Full Text Available Finding groups of objects exhibiting similar patterns is an important data analytics task. Many disciplines have their own terminologies such as cluster, group, clique, community etc. defining the similar objects in a set. Adopting the term community, many exact and heuristic algorithms are developed to find the communities of interest in available data sets. Here, three heuristic algorithms to find communities are compared using five gene expression data sets. The heuristics have a common objective function of maximizing the modularity that is a quality measure of a partition and a reflection of objects’ relevance in communities. Partitions generated by the heuristics are compared with the real ones using the adjusted rand index, one of the most commonly used external validation measures. The paper discusses the results of the partitions on the mentioned biological data sets.

  10. Modeling study of solute transport in the unsaturated zone. Information and data sets. Volume 1

    International Nuclear Information System (INIS)

    Polzer, W.L.; Fuentes, H.R.; Springer, E.P.; Nyhan, J.W.

    1986-05-01

    The Environmental Science Group (HSE-12) is conducting a study to compare various approaches of modeling water and solute transport in porous media. Various groups representing different approaches will model a common set of transport data so that the state of the art in modeling and field experimentation can be discussed in a positive framework with an assessment of current capabilities and future needs in this area of research. This paper provides information and sets of data that will be useful to the modelers in meeting the objectives of the modeling study. The information and data sets include: (1) a description of the experimental design and methods used in obtaining solute transport data, (2) supporting data that may be useful in modeling the data set of interest, and (3) the data set to be modeled

  11. Epiphytic bryozoans on Neptune grass - a sample-based data set.

    Science.gov (United States)

    Lepoint, Gilles; Heughebaert, André; Michel, Loïc N

    2016-01-01

    The seagrass Posidonia oceanica L. Delile, commonly known as Neptune grass, is an endemic species of the Mediterranean Sea. It hosts a distinctive and diverse epiphytic community, dominated by various macroalgal and animal organisms. Mediterranean bryozoans have been extensively studied but quantitative data assessing temporal and spatial variability have rarely been documented. In Lepoint et al. (2014a, b) occurrence and abundance data of epiphytic bryozoan communities on leaves of Posidonia oceanica inhabiting Revellata Bay (Corsica, Mediterranean Sea) were reported and trophic ecology of Electra posidoniae Gautier assessed. Here, metadata information is provided on the data set discussed in Lepoint et al. (2014a) and published on the GBIF portal as a sampling-event data set: http://ipt.biodiversity.be/resource?r=ulg_bryozoa&v=1.0). The data set is enriched by data concerning species settled on Posidonia scales (dead petiole of Posidonia leaves, remaining after limb abscission).

  12. Good validity of the international spinal cord injury quality of life basic data set

    DEFF Research Database (Denmark)

    Post, M W M; Adriaansen, J J E; Charlifue, S

    2016-01-01

    STUDY DESIGN: Cross-sectional validation study. OBJECTIVES: To examine the construct and concurrent validity of the International Spinal Cord Injury (SCI) Quality of Life (QoL) Basic Data Set. SETTING: Dutch community. PARTICIPANTS: People 28-65 years of age, who obtained their SCI between 18...... and 35 years of age, were at least 10 years post SCI and were wheelchair users in daily life.Measure(s):The International SCI QoL Basic Data Set consists of three single items on satisfaction with life as a whole, physical health and psychological health (0=complete dissatisfaction; 10=complete...... and psychological health (0.70). CONCLUSIONS: This first validity study of the International SCI QoL Basic Data Set shows that it appears valid for persons with SCI....

  13. Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes.

    Science.gov (United States)

    Garde, Sebastian; Hovenga, Evelyn; Buck, Jasmin; Knaup, Petra

    2006-01-01

    Ubiquitous computing requires ubiquitous access to information and knowledge. With the release of openEHR Version 1.0 there is a common model available to solve some of the problems related to accessing information and knowledge by improving semantic interoperability between clinical systems. Considerable work has been undertaken by various bodies to standardise Clinical Data Sets. Notwithstanding their value, several problems remain unsolved with Clinical Data Sets without the use of a common model underpinning them. This paper outlines these problems like incompatible basic data types and overlapping and incompatible definitions of clinical content. A solution to this based on openEHR archetypes is motivated and an approach to transform existing Clinical Data Sets into archetypes is presented. To avoid significant overlaps and unnecessary effort during archetype development, archetype development needs to be coordinated nationwide and beyond and also across the various health professions in a formalized process.

  14. International spinal cord injury bowel function basic data set (Version 2.0)

    DEFF Research Database (Denmark)

    Krogh, K; Emmanuel, A; Perrouin-Verbe, B

    2017-01-01

    : Working group appointed by the American Spinal injury association (ASIA) and the International Spinal Cord Society (ISCoS). METHODS: The draft prepared by the working group was reviewed by the International SCI Data Set Committee and later by members of the ISCoS Executive and Scientific Committees......STUDY DESIGN: International expert working group. OBJECTIVES: To revise the International Spinal Cord Injury (SCI) Bowel Function Basic Data Set as a standardized format for the collecting and reporting of a minimal amount of information on bowel function in clinical practice and research. SETTING...... and the ASIA board. The revised data set was posted on the ASIA and ISCoS websites for 1 month to allow further comments and suggestions. Changes resulting from a Delphi process among experts in children with SCI were included. Members of ISCoS Executive and Scientific Committees and the ASIA board made...

  15. A Multicriteria Decision Making Approach for Estimating the Number of Clusters in a Data Set

    Science.gov (United States)

    Peng, Yi; Zhang, Yong; Kou, Gang; Shi, Yong

    2012-01-01

    Determining the number of clusters in a data set is an essential yet difficult step in cluster analysis. Since this task involves more than one criterion, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper proposes a multiple criteria decision making (MCDM)-based approach to estimate the number of clusters for a given data set. In this approach, MCDM methods consider different numbers of clusters as alternatives and the outputs of any clustering algorithm on validity measures as criteria. The proposed method is examined by an experimental study using three MCDM methods, the well-known clustering algorithm–k-means, ten relative measures, and fifteen public-domain UCI machine learning data sets. The results show that MCDM methods work fairly well in estimating the number of clusters in the data and outperform the ten relative measures considered in the study. PMID:22870181

  16. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

    Energy Technology Data Exchange (ETDEWEB)

    Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Barry, Kerrie [U.S. Department of Energy, Joint Genome Institute; Shapiro, Harris [U.S. Department of Energy, Joint Genome Institute; Goltsman, Eugene [U.S. Department of Energy, Joint Genome Institute; McHardy, Alice C. [IBM T. J. Watson Research Center; Rigoutsos, Isidore [IBM T. J. Watson Research Center; Salamov, Asaf [U.S. Department of Energy, Joint Genome Institute; Korzeniewski, Frank [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Grigoriev, Igor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

    2007-01-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  17. A proposal to order the neutron data set in neutron spectrometry using the RDANN methodology

    Energy Technology Data Exchange (ETDEWEB)

    Ortiz R, J.M.; Martinez B, M.R.; Vega C, H.R. [UAZ, Av. Ramon Lopez Velarde No. 801, 98000 Zacatecas (Mexico)

    2006-07-01

    A new proposal to order a neutron data set in the design process of artificial neural networks in the neutron spectrometry field is presented for first time. The robust design of artificial neural networks methodology was applied to 187 neutron spectra data set compiled by the International Atomic Energy Agency. Four cases of grouping the neutron spectra were considered and around 1000 different neural networks were designed, trained and tested with different net topologies each one. After carrying out the systematic methodology for all the cases, it was determined that the best neural network topology that produced the best reconstructed neutron spectra was case with 187 neutron spectra data set, determining that the best neural network topology is: 7 entrance neurons, 14 neurons in a hidden layer and 31 neurons in the exit layer, with a value of 0.1 in the learning rate and 0.1 in the moment. (Author)

  18. A proposal to order the neutron data set in neutron spectrometry using the RDANN methodology

    International Nuclear Information System (INIS)

    Ortiz R, J.M.; Martinez B, M.R.; Vega C, H.R.

    2006-01-01

    A new proposal to order a neutron data set in the design process of artificial neural networks in the neutron spectrometry field is presented for first time. The robust design of artificial neural networks methodology was applied to 187 neutron spectra data set compiled by the International Atomic Energy Agency. Four cases of grouping the neutron spectra were considered and around 1000 different neural networks were designed, trained and tested with different net topologies each one. After carrying out the systematic methodology for all the cases, it was determined that the best neural network topology that produced the best reconstructed neutron spectra was case with 187 neutron spectra data set, determining that the best neural network topology is: 7 entrance neurons, 14 neurons in a hidden layer and 31 neurons in the exit layer, with a value of 0.1 in the learning rate and 0.1 in the moment. (Author)

  19. Strategy for Developing Local Chicken

    Directory of Open Access Journals (Sweden)

    Sofjan Iskandar

    2006-12-01

    Full Text Available Chicken industry in Indonesia offer jobs for people in the village areas . The balance in development industry of selected and local chicken has to be anticipated as there has been threat of reducing importation of grand parent stock of selected chicken due to global avian influenza . In the mean time, high appreciation to the local chicken has been shown by the existence of local chicken farms in the size of business scale . For local chicken business, the government has been built programs, projects, and infrastructures, although the programs and projects were dropped scattered in to several institutions, which were end up with less significant impact to the people. Therefore, it is the time that the government should put more efforts to integrate various sources . focusing in enhancing local chicken industry .

  20. mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets.

    Science.gov (United States)

    Dalke, Andrew; Hert, Jérôme; Kramer, Christian

    2018-05-29

    Matched molecular pair analysis (MMPA) enables the automated and systematic compilation of medicinal chemistry rules from compound/property data sets. Here we present mmpdb, an open-source matched molecular pair (MMP) platform to create, compile, store, retrieve, and use MMP rules. mmpdb is suitable for the large data sets typically found in pharmaceutical and agrochemical companies and provides new algorithms for fragment canonicalization and stereochemistry handling. The platform is written in Python and based on the RDKit toolkit. It is freely available from https://github.com/rdkit/mmpdb .

  1. A Data Set of Human Body Movements for Physical Rehabilitation Exercises.

    Science.gov (United States)

    Vakanski, Aleksandar; Jun, Hyung-Pil; Paul, David; Baker, Russell

    2018-03-01

    The article presents University of Idaho - Physical Rehabilitation Movement Data (UI-PRMD) - a publically available data set of movements related to common exercises performed by patients in physical rehabilitation programs. For the data collection, 10 healthy subjects performed 10 repetitions of different physical therapy movements, with a Vicon optical tracker and a Microsoft Kinect sensor used for the motion capturing. The data are in a format that includes positions and angles of full-body joints. The objective of the data set is to provide a basis for mathematical modeling of therapy movements, as well as for establishing performance metrics for evaluation of patient consistency in executing the prescribed rehabilitation exercises.

  2. Brief communication: Getting Greenland's glaciers right - a new data set of all official Greenlandic glacier names

    Science.gov (United States)

    Bjørk, A. A.; Kruse, L. M.; Michaelsen, P. B.

    2015-12-01

    Place names in Greenland can be difficult to get right, as they are a mix of Greenlandic, Danish, and other foreign languages. In addition, orthographies have changed over time. With this new data set, we give the researcher working with Greenlandic glaciers the proper tool to find the correct name for glaciers and ice caps in Greenland and to locate glaciers described in the historic literature with the old Greenlandic orthography. The data set contains information on the names of 733 glaciers, 285 originating from the Greenland Ice Sheet (GrIS) and 448 from local glaciers and ice caps (LGICs).

  3. Chicken Astrovirus Infection

    African Journals Online (AJOL)

    Dr Olaleye

    35 nm in diameter with a ... named chicken astrovirus (CAstV) isolated from broiler chicks (Baxendale and Mebatsion, 2004). CAstV has .... successfully used the RT-PCR method to detect CAstV in field samples from across the USA while Day et ...

  4. Analysis and classification of data sets for calibration and validation of agro-ecosystem models

    DEFF Research Database (Denmark)

    Kersebaum, K C; Boote, K J; Jorgenson, J S

    2015-01-01

    Experimental field data are used at different levels of complexity to calibrate, validate and improve agro-ecosystem models to enhance their reliability for regional impact assessment. A methodological framework and software are presented to evaluate and classify data sets into four classes regar...

  5. 3D visualization of a resistivity data set - an example from a sludge disposal site

    International Nuclear Information System (INIS)

    Bernstone, C.; Dahlin, T.; Jonsson, P.

    1997-01-01

    A relatively large 2D inverted CVES resistivity data set from a waste pond area in southern Sweden was visualized as an animated 3D model using state-of-the-art techniques and tools. The presentation includes a description of the hardware and software used, outline of the case study and examples of scenes from the animation

  6. Data fusion analysis of a surface direct-current resistivity and well pick data set

    International Nuclear Information System (INIS)

    Clayton, E.A.; Lewis, R.E.

    1995-09-01

    Pacific Northwest Laboratory (PNL) has been tasked with testing, debugging, and refining the Hanford Site data fusion workstation (DFW), with the assistance of Coleman Research Corporation (CRC), before delivering the DFW to the environmental restoration client at the Hanford Site. Data fusion is the mathematical combination (or fusion) of disparate data sets into a single interpretation. The data fusion software used in this study was developed by CRC. This report discusses the results of evaluating a surface direct-current (dc) resistivity and well-pick data set using two methods: data fusion technology and commercially available software (i.e., RESIX Plus from Interpex Ltd., Golden, Colorado), the conventional method of analysis. The report compares the two technologies; describes the survey, procedures, and results; and includes conclusions and recommendations. The surface dc resistivity and well-pick data set had been acquired by PNL from a study performed in May 1993 at Eielson Air Force Base near Fairbanks, Alaska. The resistivity survey data were acquired to map the top of permafrost in support of a hydrogeologic study. This data set provided an excellent opportunity to test and refine the dc resistivity capabilities of the DFW; previously, the data fusion software was untested on dc resistivity data. The DFW was used to evaluate the dc resistivity survey data and to produce a 3-dimensional earth model of the study area

  7. Mixture modeling of multi-component data sets with application to ion-probe zircon ages

    Science.gov (United States)

    Sambridge, M. S.; Compston, W.

    1994-12-01

    A method is presented for detecting multiple components in a population of analytical observations for zircon and other ages. The procedure uses an approach known as mixture modeling, in order to estimate the most likely ages, proportions and number of distinct components in a given data set. Particular attention is paid to estimating errors in the estimated ages and proportions. At each stage of the procedure several alternative numerical approaches are suggested, each having their own advantages in terms of efficency and accuracy. The methodology is tested on synthetic data sets simulating two or more mixed populations of zircon ages. In this case true ages and proportions of each population are known and compare well with the results of the new procedure. Two examples are presented of its use with sets of SHRIMP U-238 - Pb-206 zircon ages from Palaeozoic rocks. A published data set for altered zircons from bentonite at Meishucun, South China, previously treated as a single-component population after screening for gross alteration effects, can be resolved into two components by the new procedure and their ages, proportions and standard errors estimated. The older component, at 530 +/- 5 Ma (2 sigma), is our best current estimate for the age of the bentonite. Mixture modeling of a data set for unaltered zircons from a tonalite elsewhere defines the magmatic U-238 - Pb-206 age at high precision (2 sigma +/- 1.5 Ma), but one-quarter of the 41 analyses detect hidden and significantly older cores.

  8. A Decomposition Model for HPLC-DAD Data Set and Its Solution by Particle Swarm Optimization

    Directory of Open Access Journals (Sweden)

    Lizhi Cui

    2014-01-01

    Full Text Available This paper proposes a separation method, based on the model of Generalized Reference Curve Measurement and the algorithm of Particle Swarm Optimization (GRCM-PSO, for the High Performance Liquid Chromatography with Diode Array Detection (HPLC-DAD data set. Firstly, initial parameters are generated to construct reference curves for the chromatogram peaks of the compounds based on its physical principle. Then, a General Reference Curve Measurement (GRCM model is designed to transform these parameters to scalar values, which indicate the fitness for all parameters. Thirdly, rough solutions are found by searching individual target for every parameter, and reinitialization only around these rough solutions is executed. Then, the Particle Swarm Optimization (PSO algorithm is adopted to obtain the optimal parameters by minimizing the fitness of these new parameters given by the GRCM model. Finally, spectra for the compounds are estimated based on the optimal parameters and the HPLC-DAD data set. Through simulations and experiments, following conclusions are drawn: (1 the GRCM-PSO method can separate the chromatogram peaks and spectra from the HPLC-DAD data set without knowing the number of the compounds in advance even when severe overlap and white noise exist; (2 the GRCM-PSO method is able to handle the real HPLC-DAD data set.

  9. Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

    Science.gov (United States)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

    2016-12-01

    The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  10. Good validity of the international spinal cord injury quality of life basic data set

    NARCIS (Netherlands)

    Post, M. W. M.; Adriaansen, J. J. E.; Charlifue, S.; Biering-Sorensen, F.; van Asbeck, F. W. A.

    Study design: Cross-sectional validation study. Objectives: To examine the construct and concurrent validity of the International Spinal Cord Injury (SCI) Quality of Life (QoL) Basic Data Set. Setting: Dutch community. Participants: People 28-65 years of age, who obtained their SCI between 18 and 35

  11. Large data sets in finance and marketing: introduction by the special issue editor

    NARCIS (Netherlands)

    Ph.H.B.F. Franses (Philip Hans)

    1998-01-01

    textabstractOn December 18 and 19 of 1997, a small conference on the "Statistical Analysis of Large Data Sets in Business Economics" was organized by the Rotterdam Institute for Business Economic Studies. Eleven presentations were delivered in plenary sessions, which were attended by about 90

  12. Early Predictors of ASD in Young Children Using a Nationally Representative Data Set

    Science.gov (United States)

    Jeans, Laurie M.; Santos, Rosa Milagros; Laxman, Daniel J.; McBride, Brent A.; Dyer, W. Justin

    2013-01-01

    Current clinical diagnosis of Autism Spectrum Disorders (ASD) occurs between 3 and 4 years of age, but increasing evidence indicates that intervention begun earlier may improve outcomes. Using secondary analysis of the Early Childhood Longitudinal Study-Birth Cohort data set, the current study identifies early predictors prior to the diagnosis of…

  13. Estimating Pay Gaps for Workers with Disabilities: Implications from Broadening Definitions and Data Sets

    Science.gov (United States)

    Hallock, Kevin F.; Jin, Xin; Barrington, Linda

    2014-01-01

    Purpose: To compare pay gap estimates across 3 different national survey data sets for people with disabilities relative to those without disabilities when pay is measured as wage and salary alone versus a (total compensation) definition that includes an estimate of the value of benefits. Method: Estimates of the cost to the employers of employee…

  14. The Evolution of School Nursing Data Indicators in Massachusetts: Recommendations for a National Data Set

    Science.gov (United States)

    Gapinski, Mary Ann; Sheetz, Anne H.

    2014-01-01

    The National Association of School Nurses' research priorities include the recommendation that data reliability, quality, and availability be addressed to advance research in child and school health. However, identifying a national school nursing data set has remained a challenge for school nurses, school nursing leaders, school nurse professional…

  15. International Spinal Cord Injury Core Data Set (version 2.0)-including standardization of reporting

    NARCIS (Netherlands)

    Biering-Sorensen, F.; DeVivo, M. J.; Charlifue, S.; Chen, Y.; New, P. W.; Noonan, V.; Post, M. W. M.; Vogel, L.

    Study design: The study design includes expert opinion, feedback, revisions and final consensus. Objectives: The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe the

  16. The Minimum Data Set Depression Quality Indicator: Does It Reflect Differences in Care Processes?

    Science.gov (United States)

    Simmons, S.F.; Cadogan, M.P.; Cabrera, G.R.; Al-Samarrai, N.R.; Jorge, J.S.; Levy-Storms, L.; Osterweil, D.; Schnelle, J.F.

    2004-01-01

    Purpose. The objective of this work was to determine if nursing homes that score differently on prevalence of depression, according to the Minimum Data Set (MDS) quality indicator, also provide different processes of care related to depression. Design and Methods. A cross-sectional study with 396 long-term residents in 14 skilled nursing…

  17. Goodness of Fit of Skills Assessment Approaches: Insights from Patterns of Real vs. Synthetic Data Sets

    Science.gov (United States)

    Beheshti, Behzad; Desmarais, Michel C.

    2015-01-01

    This study investigates the issue of the goodness of fit of different skills assessment models using both synthetic and real data. Synthetic data is generated from the different skills assessment models. The results show wide differences of performances between the skills assessment models over synthetic data sets. The set of relative performances…

  18. EarthVision 2000: Examining Students' Representations of Complex Data Sets.

    Science.gov (United States)

    Vellom, R. Paul; Pape, Stephen J.

    2000-01-01

    Examines pencil-and-paper graphs produced by students at the beginning of a 1-week summer teacher/student institute as well as computer-based graphs produced by those same students at the end of the institute. Initial problems with managing data sets and producing meaningful graphs disappeared quickly as students used the process of "building…

  19. Reliability of the International Spinal Cord Injury Musculoskeletal Basic Data Set

    DEFF Research Database (Denmark)

    Baunsgaard, C B; Chhabra, H S; Harvey, L A

    2016-01-01

    STUDY DESIGN: Psychometric study. OBJECTIVES: To determine the intra- and inter-rater reliability and content validity of the International Spinal Cord Injury (SCI) Musculoskeletal Basic Data Set (ISCIMSBDS). SETTING: Four centers with one in each of the countries in Australia, England, India and...

  20. Characteristics Associated with Genital Herpes Testing among Young Adults: Assessing Factors from Two National Data Sets

    Science.gov (United States)

    Gilbert, Lisa K.; Levandowski, Brooke A.; Roberts, Craig M.

    2010-01-01

    Objectives and Participants: In the United States, genital herpes (GH) prevalence is 10.6% among 20- to 29-year-olds and about 90% of seropositive persons do not know their status. This study investigated individual characteristics associated with GH screening and diagnosis in sexually active young adults aged 18 to 24. Methods: Two data sets were…

  1. International Spinal Cord Injury Core Data Set (version 2.0)-including standardization of reporting

    NARCIS (Netherlands)

    Biering-Sørensen, F; DeVivo, M J; Charlifue, Susan; Chen, Y; New, P.W.; Noonan, V.; Post, M W M; Vogel, L.

    STUDY DESIGN: The study design includes expert opinion, feedback, revisions and final consensus. OBJECTIVES: The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe the

  2. Mass Spectrometry Data Set for Renal Cell Carcinoma and Polycystic Kidney Disease Cell Models

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, B. J. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-01-05

    This data set will be evaluated by collaborators at UC Davis for possible inclusion in a research paper for publication in a scientific journal and to assist in the design of additional experiments. Researchers from UC Davis and LLNL will contribute to the manuscript.

  3. Childhood intelligence and adult mortality in the Brabant data set: First report

    NARCIS (Netherlands)

    Cramer, J.S.

    2011-01-01

    The Brabant Data Set, now freely accessible, contains information on a sample cohort of 3 000 individuals born around 1940 from surveys in 1952, 1983 and 1993, as well as on deaths between 1994 and 2009. In line with numerous epidemiological studies we find that among the early variables recorded at

  4. Relevance of the international spinal cord injury basic data sets to youth

    DEFF Research Database (Denmark)

    Carroll, A; Vogel, L C; Zebracki, K

    2017-01-01

    STUDY DESIGN: Mixed methods, using the Modified Delphi Technique and Expert Panel Review. OBJECTIVE: To evaluate the utility and relevance of the International Spinal Cord Injury (SCI) Core and Basic Data Sets for children and youth with SCI. SETTING: International. METHODS: Via 20 electronic...

  5. Management of a Large Qualitative Data Set: Establishing Trustworthiness of the Data

    Directory of Open Access Journals (Sweden)

    Debbie Elizabeth White RN, PhD

    2012-07-01

    Full Text Available Health services research is multifaceted and impacted by the multiple contexts and stakeholders involved. Hence, large data sets are necessary to fully understand the complex phenomena (e.g., scope of nursing practice being studied. The management of these large data sets can lead to numerous challenges in establishing trustworthiness of the study. This article reports on strategies utilized in data collection and analysis of a large qualitative study to establish trustworthiness. Specific strategies undertaken by the research team included training of interviewers and coders, variation in participant recruitment, consistency in data collection, completion of data cleaning, development of a conceptual framework for analysis, consistency in coding through regular communication and meetings between coders and key research team members, use of N6™ software to organize data, and creation of a comprehensive audit trail with internal and external audits. Finally, we make eight recommendations that will help ensure rigour for studies with large qualitative data sets: organization of the study by a single person; thorough documentation of the data collection and analysis process; attention to timelines; the use of an iterative process for data collection and analysis; internal and external audits; regular communication among the research team; adequate resources for timely completion; and time for reflection and diversion. Following these steps will enable researchers to complete a rigorous, qualitative research study when faced with large data sets to answer complex health services research questions.

  6. Advantages of Synthetic Noise and Machine Learning for Analyzing Radioecological Data Sets.

    Directory of Open Access Journals (Sweden)

    Igor Shuryak

    Full Text Available The ecological effects of accidental or malicious radioactive contamination are insufficiently understood because of the hazards and difficulties associated with conducting studies in radioactively-polluted areas. Data sets from severely contaminated locations can therefore be small. Moreover, many potentially important factors, such as soil concentrations of toxic chemicals, pH, and temperature, can be correlated with radiation levels and with each other. In such situations, commonly-used statistical techniques like generalized linear models (GLMs may not be able to provide useful information about how radiation and/or these other variables affect the outcome (e.g. abundance of the studied organisms. Ensemble machine learning methods such as random forests offer powerful alternatives. We propose that analysis of small radioecological data sets by GLMs and/or machine learning can be made more informative by using the following techniques: (1 adding synthetic noise variables to provide benchmarks for distinguishing the performances of valuable predictors from irrelevant ones; (2 adding noise directly to the predictors and/or to the outcome to test the robustness of analysis results against random data fluctuations; (3 adding artificial effects to selected predictors to test the sensitivity of the analysis methods in detecting predictor effects; (4 running a selected machine learning method multiple times (with different random-number seeds to test the robustness of the detected "signal"; (5 using several machine learning methods to test the "signal's" sensitivity to differences in analysis techniques. Here, we applied these approaches to simulated data, and to two published examples of small radioecological data sets: (I counts of fungal taxa in samples of soil contaminated by the Chernobyl nuclear power plan accident (Ukraine, and (II bacterial abundance in soil samples under a ruptured nuclear waste storage tank (USA. We show that the proposed

  7. Developing Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards

    Science.gov (United States)

    Duerr, R. E.; Yang, M.; Gooyabadi, M.; Lee, C.

    2008-12-01

    The key to interoperability between systems is often metadata, yet metadata standards in the digital library and data center communities have evolved separately. In the data center world NASA's Directory Interchange Format (DIF), the Content Standard for Digital Geospatial Metadata (CSDGM), and most recently the international Geographic Information: Metadata (ISO 19115:2003) are used for descriptive metadata at the data set level to allow catalog interoperability; but use of anything other than repository- based metadata standards for the individual files that comprise a data set is rare, making true interoperability, at the data rather than data set level, across archives difficult. While the Open Archival Information Systems (OAIS) Reference Model with its call for creating Archive Information Packages (AIP) containing not just descriptive metadata but also preservation metadata is slowly being adopted in the community, the PREservation Metadata Implementation Strategies (PREMIS) standard, the only extant OAIS- compliant preservation metadata standard, has scarcely even been recognized as being applicable to the community. The digital library community in the meantime has converged upon the Metadata Encoding and Transmission Standard (METS) for interoperability between systems as evidenced by support for the standard by digital library systems such as Fedora and Greenstone. METS is designed to allow inclusion of other XML-based standards as descriptive and administrative metadata components. A recent Stanford study suggests that a combination of METS with included FGDC and PREMIS metadata could work well for individual granules of a data set. However, some of the lessons learned by the data center community over the last 30+ years of dealing with digital data are 1) that data sets as a whole need to be preserved and described and 2) that discovery and access mechanisms need to be hierarchical. Only once a user has reviewed a data set description and determined

  8. A consistent data set of Antarctic ice sheet topography, cavity geometry, and global bathymetry

    Directory of Open Access Journals (Sweden)

    R. Timmermann

    2010-12-01

    Full Text Available Sub-ice shelf circulation and freezing/melting rates in ocean general circulation models depend critically on an accurate and consistent representation of cavity geometry. Existing global or pan-Antarctic topography data sets have turned out to contain various inconsistencies and inaccuracies. The goal of this work is to compile independent regional surveys and maps into a global data set. We use the S-2004 global 1-min bathymetry as the backbone and add an improved version of the BEDMAP topography (ALBMAP bedrock topography for an area that roughly coincides with the Antarctic continental shelf. The position of the merging line is individually chosen in different sectors in order to capture the best of both data sets. High-resolution gridded data for ice shelf topography and cavity geometry of the Amery, Fimbul, Filchner-Ronne, Larsen C and George VI Ice Shelves, and for Pine Island Glacier are carefully merged into the ambient ice and ocean topographies. Multibeam survey data for bathymetry in the former Larsen B cavity and the southeastern Bellingshausen Sea have been obtained from the data centers of Alfred Wegener Institute (AWI, British Antarctic Survey (BAS and Lamont-Doherty Earth Observatory (LDEO, gridded, and blended into the existing bathymetry map. The resulting global 1-min Refined Topography data set (RTopo-1 contains self-consistent maps for upper and lower ice surface heights, bedrock topography, and surface type (open ocean, grounded ice, floating ice, bare land surface. The data set is available in NetCDF format from the PANGAEA database at doi:10.1594/pangaea.741917.

  9. International Spinal Cord Injury Data Sets for non-traumatic spinal cord injury.

    Science.gov (United States)

    New, P W; Marshall, R

    2014-02-01

    Multifaceted: extensive discussions at workshop and conference presentations, survey of experts and feedback. Present the background, purpose and development of the International Spinal Cord Injury (SCI) Data Sets for Non-Traumatic SCI (NTSCI), including a hierarchical classification of aetiology. International. Consultation via e-mail, presentations and discussions at ISCoS conferences (2006-2009), and workshop (1 September 2008). The consultation processes aimed to: (1) clarify aspects of the classification structure, (2) determine placement of certain aetiologies and identify important missing causes of NTSCI and (3) resolve coding issues and refine definitions. Every effort was made to consider feedback and suggestions from participants. The International Data Sets for NTSCI includes basic and an extended versions. The extended data set includes a two-axis classification system for the causes of NTSCI. Axis 1 consists of a five-level, two-tier (congenital-genetic and acquired) hierarchy that allows for increasing detail to specify the aetiology. Axis 2 uses the International Statistical Classification of Diseases (ICD) and Related Health Problems for coding the initiating diseases(s) that may have triggered the events that resulted in the axis 1 diagnosis, where appropriate. Additional items cover the timeframe of onset of NTSCI symptoms and presence of iatrogenicity. Complete instructions for data collection, data sheet and training cases are available at the websites of ISCoS (http://www.iscos.org.uk) and ASIA (http://www.asia-spinalinjury.org). The data sets should facilitate comparative research involving NTSCI participants, especially epidemiological studies and prevention projects. Further work is anticipated to refine the data sets, particularly regarding iatrogenicity.

  10. Multisource data set integration and characterization of uranium mineralization for the Montrose Quadrangle, Colorado

    International Nuclear Information System (INIS)

    Bolivar, S.L.; Balog, S.H.; Campbell, K.; Fugelso, L.E.; Weaver, T.A.; Wecksung, G.W.

    1981-04-01

    Several data-classification schemes were developed by the Los Alamos National Laboratory to detect potential uranium mineralization in the Montrose 1 0 x 2 0 quadrangle, Colorado. A first step was to develop and refine the techniques necessary to digitize, integrate, and register various large geological, geochemical, and geophysical data sets, including Landsat 2 imagery, for the Montrose quadrangle, Colorado, using a grid resolution of 1 km. All data sets for the Montrose quadrangle were registered to the Universal Transverse Mercator projection. The data sets include hydrogeochemical and stream sediment analyses for 23 elements, uranium-to-thorium ratios, airborne geophysical survey data, the locations of 90 uranium occurrences, a geologic map and Landsat 2 (bands 4 through 7) imagery. Geochemical samples were collected from 3965 locations in the 19 200 km 2 quadrangle; aerial data were collected on flight lines flown with 3 to 5 km spacings. These data sets were smoothed by universal kriging and interpolated to a 179 x 119 rectangular grid. A mylar transparency of the geologic map was prepared and digitized. Locations for the known uranium occurrences were also digitized. The Landsat 2 imagery was digitally manipulated and rubber-sheet transformed to quadrangle boundaries and bands 4 through 7 were resampled to both a 1-km and 100-m resolution. All possible combinations of three, for all data sets, were examined for general geologic correlations by utilizing a color microfilm output. Subsets of data were further examined for selected test areas. Two classification schemes for uranium mineralization, based on selected test areas in both the Cochetopa and Marshall Pass uranium districts, are presented. Areas favorable for uranium mineralization, based on these schemes, were identified and are discussed

  11. Generalized index for spatial data sets as a measure of complete spatial randomness

    Science.gov (United States)

    Hackett-Jones, Emily J.; Davies, Kale J.; Binder, Benjamin J.; Landman, Kerry A.

    2012-06-01

    Spatial data sets, generated from a wide range of physical systems can be analyzed by counting the number of objects in a set of bins. Previous work has been limited to equal-sized bins, which are inappropriate for some domains (e.g., circular). We consider a nonequal size bin configuration whereby overlapping or nonoverlapping bins cover the domain. A generalized index, defined in terms of a variance between bin counts, is developed to indicate whether or not a spatial data set, generated from exclusion or nonexclusion processes, is at the complete spatial randomness (CSR) state. Limiting values of the index are determined. Using examples, we investigate trends in the generalized index as a function of density and compare the results with those using equal size bins. The smallest bin size must be much larger than the mean size of the objects. We can determine whether a spatial data set is at the CSR state or not by comparing the values of a generalized index for different bin configurations—the values will be approximately the same if the data is at the CSR state, while the values will differ if the data set is not at the CSR state. In general, the generalized index is lower than the limiting value of the index, since objects do not have access to the entire region due to blocking by other objects. These methods are applied to two applications: (i) spatial data sets generated from a cellular automata model of cell aggregation in the enteric nervous system and (ii) a known plant data distribution.

  12. Preliminary Survey of Ectoparasites Infesting Chickens (Gallus ...

    African Journals Online (AJOL)

    ectoparasites of chickens in four areas of Sokoto metropolis, Nigeria, on 160 chickens raised under free-range ... 90% mortality of local free range chickens. Arthropod ... some cases premature death. ... from the birds by displaying the feathers.

  13. Secondary data analysis of large data sets in urology: successes and errors to avoid.

    Science.gov (United States)

    Schlomer, Bruce J; Copp, Hillary L

    2014-03-01

    Secondary data analysis is the use of data collected for research by someone other than the investigator. In the last several years there has been a dramatic increase in the number of these studies being published in urological journals and presented at urological meetings, especially involving secondary data analysis of large administrative data sets. Along with this expansion, skepticism for secondary data analysis studies has increased for many urologists. In this narrative review we discuss the types of large data sets that are commonly used for secondary data analysis in urology, and discuss the advantages and disadvantages of secondary data analysis. A literature search was performed to identify urological secondary data analysis studies published since 2008 using commonly used large data sets, and examples of high quality studies published in high impact journals are given. We outline an approach for performing a successful hypothesis or goal driven secondary data analysis study and highlight common errors to avoid. More than 350 secondary data analysis studies using large data sets have been published on urological topics since 2008 with likely many more studies presented at meetings but never published. Nonhypothesis or goal driven studies have likely constituted some of these studies and have probably contributed to the increased skepticism of this type of research. However, many high quality, hypothesis driven studies addressing research questions that would have been difficult to conduct with other methods have been performed in the last few years. Secondary data analysis is a powerful tool that can address questions which could not be adequately studied by another method. Knowledge of the limitations of secondary data analysis and of the data sets used is critical for a successful study. There are also important errors to avoid when planning and performing a secondary data analysis study. Investigators and the urological community need to strive to use

  14. Application of 3D X-ray CT data sets to finite element analysis

    International Nuclear Information System (INIS)

    Bossart, P.L.; Martz, H.E.; Brand, H.R.; Hollerbach, K.

    1995-01-01

    Finite Element Modeling (FEM) is becoming more important as industry drives toward concurrent engineering. A fundamental hindrance to fully exploiting the power of FEM is the human effort required to acquire complex part geometry, particularly as-built geometry, as a FEM mesh. Many Quantitative Non Destructive Evaluation (QNDE) techniques that produce three-dimensional (3D) data sets provide a substantial reduction in the effort required to apply FEM to as-built parts. This paper describes progress at LLNL on the application of 3D X-ray computed tomography (CT) data sets to more rapidly produce high-quality FEM meshes of complex, as-built geometries. Issues related to the volume segmentation of the 3D CT data as well as the use of this segmented data to tailor generic hexahedral FEM meshes to part specific geometries are discussed. The application of these techniques to FEM analysis in the medical field is reported here

  15. Representation and display of vector field topology in fluid flow data sets

    Science.gov (United States)

    Helman, James; Hesselink, Lambertus

    1989-01-01

    The visualization of physical processes in general and of vector fields in particular is discussed. An approach to visualizing flow topology that is based on the physics and mathematics underlying the physical phenomenon is presented. It involves determining critical points in the flow where the velocity vector vanishes. The critical points, connected by principal lines or planes, determine the topology of the flow. The complexity of the data is reduced without sacrificing the quantitative nature of the data set. By reducing the original vector field to a set of critical points and their connections, a representation of the topology of a two-dimensional vector field that is much smaller than the original data set but retains with full precision the information pertinent to the flow topology is obtained. This representation can be displayed as a set of points and tangent curves or as a graph. Analysis (including algorithms), display, interaction, and implementation aspects are discussed.

  16. Estimating clinical chemistry reference values based on an existing data set of unselected animals.

    Science.gov (United States)

    Dimauro, Corrado; Bonelli, Piero; Nicolussi, Paola; Rassu, Salvatore P G; Cappio-Borlino, Aldo; Pulina, Giuseppe

    2008-11-01

    In an attempt to standardise the determination of biological reference values, the International Federation of Clinical Chemistry (IFCC) has published a series of recommendations on developing reference intervals. The IFCC recommends the use of an a priori sampling of at least 120 healthy individuals. However, such a high number of samples and laboratory analysis is expensive, time-consuming and not always feasible, especially in veterinary medicine. In this paper, an alternative (a posteriori) method is described and is used to determine reference intervals for biochemical parameters of farm animals using an existing laboratory data set. The method used was based on the detection and removal of outliers to obtain a large sample of animals likely to be healthy from the existing data set. This allowed the estimation of reliable reference intervals for biochemical parameters in Sarda dairy sheep. This method may also be useful for the determination of reference intervals for different species, ages and gender.

  17. A Data Set of Human Body Movements for Physical Rehabilitation Exercises

    Directory of Open Access Journals (Sweden)

    Aleksandar Vakanski

    2018-01-01

    Full Text Available The article presents University of Idaho-Physical Rehabilitation Movement Data (UI-PRMD, a publically available data set of movements related to common exercises performed by patients in physical rehabilitation programs. For the data collection, 10 healthy subjects performed 10 repetitions of different physical therapy movements with a Vicon optical tracker and a Microsoft Kinect sensor used for the motion capturing. The data are in a format that includes positions and angles of full-body joints. The objective of the data set is to provide a basis for mathematical modeling of therapy movements, as well as for establishing performance metrics for evaluation of patient consistency in executing the prescribed rehabilitation exercises.

  18. New and Improved GLDAS Data Sets and Data Services at NASA GES DISC

    Science.gov (United States)

    Rui, Hualan; Beaudoing, Hiroko; Teng, William; Vollmer, Bruce; Rodell, Matthew; Lei, Guang-Dih

    2012-01-01

    The goal of a Land Data Assimilation System (LDAS) is to ingest satellite- and ground-based observational data products, using advanced land surface modeling and data assimilation techniques, in order to generate optimal fields of land surface states and fluxes data and, thereby, facilitate hydrology and climate modeling, research, and forecast. With the motivation of creating more climatologically consistent data sets, NASA GSFC's Hydrological Sciences Laboratory has generated more than 60 years (Jan. 1948-- Dec. 2008) of Global LDAS Version 2 (GLDAS-2) data, by using the Princeton Forcing Data Set and upgraded versions of Land Surface Models (LSMs). GLDAS data and data services are provided at NASA GES DISC Hydrology Data and Information Services Center (HDISC), in collaboration with HSL and LDAS.

  19. A posteriori noise estimation in variable data sets. With applications to spectra and light curves

    Science.gov (United States)

    Czesla, S.; Molle, T.; Schmitt, J. H. M. M.

    2018-01-01

    Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.

  20. Development of a daily gridded precipitation data set for the Middle East

    Directory of Open Access Journals (Sweden)

    A. Yatagai

    2008-03-01

    Full Text Available We show an algorithm to construct a rain-gauge-based analysis of daily precipitation for the Middle East. One of the key points of our algorithm is to construct an accurate distribution of climatology. One possible advantage of this product is to validate high-resolution climate models and/or to diagnose the impact of climate changes on local hydrological resources. Many users are familiar with a monthly precipitation dataset (New et al., 1999 and a satellite-based daily precipitation dataset (Huffman et al., 2001, yet our data set, unlike theirs, clearly shows the effect of orography on daily precipitation and other extreme events, especially over the Fertile Crescent region. Currently the Middle-East precipitation analysis product is consisting of a 25-year data set for 1979–2003 based on more than 1300 stations.

  1. The Construction of 3-d Neutral Density for Arbitrary Data Sets

    Science.gov (United States)

    Riha, S.; McDougall, T. J.; Barker, P. M.

    2014-12-01

    The Neutral Density variable allows inference of water pathways from thermodynamic properties in the global ocean, and is therefore an essential component of global ocean circulation analysis. The widely used algorithm for the computation of Neutral Density yields accurate results for data sets which are close to the observed climatological ocean. Long-term numerical climate simulations, however, often generate a significant drift from present-day climate, which renders the existing algorithm inaccurate. To remedy this problem, new algorithms which operate on arbitrary data have been developed, which may potentially be used to compute Neutral Density during runtime of a numerical model.We review existing approaches for the construction of Neutral Density in arbitrary data sets, detail their algorithmic structure, and present an analysis of the computational cost for implementations on a single-CPU computer. We discuss possible strategies for the implementation in state-of-the-art numerical models, with a focus on distributed computing environments.

  2. The International Spinal Cord Injury Pain Basic Data Set (version 2.0)

    DEFF Research Database (Denmark)

    Widerström-Noga, E; Biering-Sørensen, F; Bryce, T N

    2014-01-01

    OBJECTIVES: To revise the International Spinal Cord Injury Pain Basic Data Set (ISCIPBDS) based on new developments in the field and on suggestions from the spinal cord injury (SCI) and pain clinical and research community. SETTING: International. METHODS: The ISCIPBDS working group evaluated...... suggestions regarding the utility of the ISCIPBDS and made modifications in response to these and to significant developments in the field. The revised ISCIPBDS (version 2.0) was reviewed by members of the Executive Committee of the International SCI Standards and Data Sets, the International Spinal Cord...... Society (ISCoS) Executive and Scientific Committees, the American Spinal Injury Association and American Pain Society Boards and the Neuropathic Pain Special Interest Group of the International Association for the Study of Pain, individual reviewers and societies and the ISCoS Council. RESULTS...

  3. A learning algorithm for adaptive canonical correlation analysis of several data sets.

    Science.gov (United States)

    Vía, Javier; Santamaría, Ignacio; Pérez, Jesús

    2007-01-01

    Canonical correlation analysis (CCA) is a classical tool in statistical analysis to find the projections that maximize the correlation between two data sets. In this work we propose a generalization of CCA to several data sets, which is shown to be equivalent to the classical maximum variance (MAXVAR) generalization proposed by Kettenring. The reformulation of this generalization as a set of coupled least squares regression problems is exploited to develop a neural structure for CCA. In particular, the proposed CCA model is a two layer feedforward neural network with lateral connections in the output layer to achieve the simultaneous extraction of all the CCA eigenvectors through deflation. The CCA neural model is trained using a recursive least squares (RLS) algorithm. Finally, the convergence of the proposed learning rule is proved by means of stochastic approximation techniques and their performance is analyzed through simulations.

  4. The CAIN computer code for the generation of MABEL input data sets: a user's manual

    International Nuclear Information System (INIS)

    Tilley, D.R.

    1983-03-01

    CAIN is an interactive FORTRAN computer code designed to overcome the substantial effort involved in manually creating the thermal-hydraulics input data required by MABEL-2. CAIN achieves this by processing output from either of the whole-core codes, RELAP or TRAC, interpolating where necessary, and by scanning RELAP/TRAC output in order to generate additional information. This user's manual describes the actions required in order to create RELAP/TRAC data sets from magnetic tape, to create the other input data sets required by CAIN, and to operate the interactive command procedure for the execution of CAIN. In addition, the CAIN code is described in detail. This programme of work is part of the Nuclear Installations Inspectorate (NII)'s contribution to the United Kingdom Atomic Energy Authority's independent safety assessment of pressurized water reactors. (author)

  5. Sensitive study of the climatological SST by using ATSR global SST data sets

    Science.gov (United States)

    Xue, Yong; Lawrence, Sean P.; Llewellyn-Jones, David T.

    1995-12-01

    Climatological sea surface temperature (SST) is an initial step for global climate processing monitoring. A comparison has been made by using Oberhuber's SST data set and two years monthly averaged SST from ATSR thermal band data to force the OGCM. In the eastern Pacific Ocean, these make only a small difference to model SST. In the western Pacific Ocean, the use of Oberhuber's data set gives higher climatological SST than that using ATSR data. The SSTs were also simulated for 1992 using climatological SSTs from two years monthly averaged ATSR data and Oberhuber data. The forcing with SST from ATSR data was found to give better SST simulation than that from Oberhuber's data. Our study has confirmed that ATSR can provide accurate monthly averaged global SST for global climate processing monitoring.

  6. Wind resource and plant output data sets for wind integration studies

    Energy Technology Data Exchange (ETDEWEB)

    Frank, Jaclyn D.; Manobianco, John; Alonge, Charles J.; Brower, Michael C. [AWS Truepower, Albany, NY (United States)

    2010-07-01

    One of the first step towards understanding the impact of increasing penetrations of wind is developing data sets of wind power output over large regions. To facilitate the development of these data sets, AWS Truepower (AWST) generated wind speeds over multiple years (2-3) using the Mesoscale Atmospheric Simulation System (MASS). These simulations were performed with high spatial resolution (1-2 km) to capture the wind flows over each area of interest. Output was saved in 10-minute interval to capture variations in wind speed so that plant output could be analyzed against utility load and system operations. This paper will describe the methodology of mesoscale modeling, site selection, conversion to power, and downscaling to high frequency output. Additionally, the generation of synthetic forecasts will be discussed. The validation results from recent studies in the eastern United States and Hawaii will be highlighted. (orig.)

  7. Integrating the nursing management minimum data set into the logical observation identifier names and codes system.

    Science.gov (United States)

    Subramanian, Amarnath; Westra, Bonnie; Matney, Susan; Wilson, Patricia S; Delaney, Connie W; Huff, Stan; Huff, Stanley M; Huber, Diane

    2008-11-06

    This poster describes the process used to integrate the Nursing Management Minimum Data Set (NMMDS), an instrument to measure the nursing context of care, into the Logical Observation Identifier Names and Codes (LOINC) system to facilitate contextualization of quality measures. Integration of the first three of 18 elements resulted in 48 new codes including five panels. The LOINC Clinical Committee has approved the presented mapping for their next release.

  8. A new CM SAF Solar Surface Radiation Climate Data Set derived from Meteosat Satellite Observations

    Science.gov (United States)

    Trentmann, J.; Mueller, R. W.; Pfeifroth, U.; Träger-Chatterjee, C.; Cremer, R.

    2014-12-01

    The incoming surface solar radiation has been defined as an essential climate variable by GCOS. It is mandatory to monitor this part of the earth's energy balance, and thus gain insights on the state and variability of the climate system. In addition, data sets of the surface solar radiation have received increased attention over the recent years as an important source of information for the planning of solar energy applications. The EUMETSAT Satellite Application Facility on Climate Monitoring (CM SAF) is deriving surface solar radiation from geostationary and polar-orbiting satellite instruments. While CM SAF is focusing on the generation of high-quality long-term climate data records, also operationally data is provided in short time latency within 8 weeks. Here we present SARAH (Solar Surface Radiation Dataset - Heliosat), i.e. the new CM SAF Solar Surface Radiation data set based on Meteosat satellite observations. SARAH provides instantaneous, daily- and monthly-averaged data of the effective cloud albedo (CAL), the direct normalized solar radiation (DNI) and the solar irradiance (SIS) from 1983 to 2013 for the full view of the Meteosat satellite (i.e, Europe, Africa, parts of South America, and the Atlantic ocean). The data sets are generated with a high spatial resolution of 0.05 deg allowing for detailed regional studies, and are available in netcdf-format at no cost without restrictions at www.cmsaf.eu. We provide an overview of the data sets, including a validation against reference measurements from the BSRN and GEBA surface station networks.

  9. Atlantic frugivory: a plant-frugivore interaction data set for the Atlantic Forest.

    Science.gov (United States)

    Bello, Carolina; Galetti, Mauro; Montan, Denise; Pizo, Marco A; Mariguela, Tatiane C; Culot, Laurence; Bufalo, Felipe; Labecca, Fabio; Pedrosa, Felipe; Constantini, Rafaela; Emer, Carine; Silva, Wesley R; da Silva, Fernanda R; Ovaskainen, Otso; Jordano, Pedro

    2017-06-01

    The data set provided here includes 8,320 frugivory interactions (records of pairwise interactions between plant and frugivore species) reported for the Atlantic Forest. The data set includes interactions between 331 vertebrate species (232 birds, 90 mammals, 5 fishes, 1 amphibian, and 3 reptiles) and 788 plant species. We also present information on traits directly related to the frugivory process (endozoochory), such as the size of fruits and seeds and the body mass and gape size of frugivores. Data were extracted from 166 published and unpublished sources spanning from 1961 to 2016. While this is probably the most comprehensive data set available for a tropical ecosystem, it is arguably taxonomically and geographically biased. The plant families better represented are Melastomataceae, Myrtaceae, Moraceae, Urticaceae, and Solanaceae. Myrsine coriacea, Alchornea glandulosa, Cecropia pachystachya, and Trema micrantha are the plant species with the most animal dispersers (83, 76, 76, and 74 species, respectively). Among the animal taxa, the highest number of interactions is reported for birds (3,883) followed by mammals (1,315). The woolly spider monkey or muriqui, Brachyteles arachnoides, and Rufous-bellied Thrush, Turdus rufiventris, are the frugivores with the most diverse fruit diets (137 and 121 plants species, respectively). The most important general patterns that we note are that larger seeded plant species (>12 mm) are mainly eaten by terrestrial mammals (rodents, ungulates, primates, and carnivores) and that birds are the main consumers of fruits with a high concentration of lipids. Our data set is geographically biased, with most interactions recorded for the southeast Atlantic Forest. © 2017 by the Ecological Society of America.

  10. Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets*

    Science.gov (United States)

    Yu, Xueping; Ivanic, Joseph; Memišević, Vesna; Wallqvist, Anders; Reifman, Jaques

    2011-01-01

    We characterized and evaluated the functional attributes of three yeast high-confidence protein-protein interaction data sets derived from affinity purification/mass spectrometry, protein-fragment complementation assay, and yeast two-hybrid experiments. The interacting proteins retrieved from these data sets formed distinct, partially overlapping sets with different protein-protein interaction characteristics. These differences were primarily a function of the deployed experimental technologies used to recover these interactions. This affected the total coverage of interactions and was especially evident in the recovery of interactions among different functional classes of proteins. We found that the interaction data obtained by the yeast two-hybrid method was the least biased toward any particular functional characterization. In contrast, interacting proteins in the affinity purification/mass spectrometry and protein-fragment complementation assay data sets were over- and under-represented among distinct and different functional categories. We delineated how these differences affected protein complex organization in the network of interactions, in particular for strongly interacting complexes (e.g. RNA and protein synthesis) versus weak and transient interacting complexes (e.g. protein transport). We quantified methodological differences in detecting protein interactions from larger protein complexes, in the correlation of protein abundance among interacting proteins, and in their connectivity of essential proteins. In the latter case, we showed that minimizing inherent methodology biases removed many of the ambiguous conclusions about protein essentiality and protein connectivity. We used these findings to rationalize how biological insights obtained by analyzing data sets originating from different sources sometimes do not agree or may even contradict each other. An important corollary of this work was that discrepancies in biological insights did not

  11. Acoustic Metadata Management and Transparent Access to Networked Oceanographic Data Sets

    Science.gov (United States)

    2015-09-30

    Transparent Access to Networked Oceanographic Data Sets Marie A. Roch Dept. of Computer Science San Diego State University 5500 Campanile Drive San...specific technologies for processing Excel spreadsheets and Access databases. The architecture (Figure 4) is based on a client-server model...Keesey, M. S., Lieske, J. H., Ostro, S. J., Standish, E. M., and Wimberly, R. N. (1996). "JPL’s On-Line Solar System Data Service," B. Am. Astron

  12. Analyzing large data sets from XGC1 magnetic fusion simulations using apache spark

    Energy Technology Data Exchange (ETDEWEB)

    Churchill, R. Michael [Princeton Plasma Physics Lab. (PPPL), Princeton, NJ (United States)

    2016-11-21

    Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGCI. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. Here, an unsupervised machine learning algorithm, k-means clustering, is applied to XGCI particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ring-like structures in velocity space.

  13. Two Types of Social Grooming discovered in Primitive and Modern Communication Data-Sets

    OpenAIRE

    Takano, Masanori

    2017-01-01

    Social networking sites (SNS) provide innovative social bonding methods known as social grooming. These have drastically decreased time and distance constraints of social grooming. Here we show two type social grooming (elaborate social grooming and lightweight social grooming) discovered in a model constructed by thirty communication data-sets including face to face, SNS, mobile phones, and Chacma baboons. This demarcation is caused by a trade-off between the number and strength of social re...

  14. Exploring the SDSS Data Set with Linked Scatter Plots. I. EMP, CEMP, and CV Stars

    Energy Technology Data Exchange (ETDEWEB)

    Carbon, Duane F.; Henze, Christopher; Nelson, Bron C., E-mail: Duane.F.Carbon@nasa.gov [NASA Ames Research Center, NASA Advanced Supercomputing Facility, Moffett Field, CA, 94035-1000 (United States)

    2017-02-01

    We present the results of a search for extremely metal-poor (EMP), carbon-enhanced metal-poor (CEMP), and cataclysmic variable (CV) stars using a new exploration tool based on linked scatter plots (LSPs). Our approach is especially designed to work with very large spectrum data sets such as the SDSS, LAMOST, RAVE, and Gaia data sets, and it can be applied to stellar, galaxy, and quasar spectra. As a demonstration, we conduct our search using the SDSS DR10 data set. We first created a 3326-dimensional phase space containing nearly 2 billion measures of the strengths of over 1600 spectral features in 569,738 SDSS stars. These measures capture essentially all the stellar atomic and molecular species visible at the resolution of SDSS spectra. We show how LSPs can be used to quickly isolate and examine interesting portions of this phase space. To illustrate, we use LSPs coupled with cuts in selected portions of phase space to extract EMP stars, CEMP stars, and CV stars. We present identifications for 59 previously unrecognized candidate EMP stars and 11 previously unrecognized candidate CEMP stars. We also call attention to 2 candidate He ii emission CV stars found by the LSP approach that have not yet been discussed in the literature.

  15. Data Sets Replicas Placements Strategy from Cost-Effective View in the Cloud

    Directory of Open Access Journals (Sweden)

    Xiuguo Wu

    2016-01-01

    Full Text Available Replication technology is commonly used to improve data availability and reduce data access latency in the cloud storage system by providing users with different replicas of the same service. Most current approaches largely focus on system performance improvement, neglecting management cost in deciding replicas number and their store places, which cause great financial burden for cloud users because the cost for replicas storage and consistency maintenance may lead to high overhead with the number of new replicas increased in a pay-as-you-go paradigm. In this paper, towards achieving the approximate minimum data sets management cost benchmark in a practical manner, we propose a replicas placements strategy from cost-effective view with the premise that system performance meets requirements. Firstly, we design data sets management cost models, including storage cost and transfer cost. Secondly, we use the access frequency and the average response time to decide which data set should be replicated. Then, the method of calculating replicas’ number and their store places with minimum management cost is proposed based on location problem graph. Both the theoretical analysis and simulations have shown that the proposed strategy offers the benefits of lower management cost with fewer replicas.

  16. Identification of a robust gene signature that predicts breast cancer outcome in independent data sets

    International Nuclear Information System (INIS)

    Korkola, James E; Waldman, Frederic M; Blaveri, Ekaterina; DeVries, Sandy; Moore, Dan H II; Hwang, E Shelley; Chen, Yunn-Yi; Estep, Anne LH; Chew, Karen L; Jensen, Ronald H

    2007-01-01

    Breast cancer is a heterogeneous disease, presenting with a wide range of histologic, clinical, and genetic features. Microarray technology has shown promise in predicting outcome in these patients. We profiled 162 breast tumors using expression microarrays to stratify tumors based on gene expression. A subset of 55 tumors with extensive follow-up was used to identify gene sets that predicted outcome. The predictive gene set was further tested in previously published data sets. We used different statistical methods to identify three gene sets associated with disease free survival. A fourth gene set, consisting of 21 genes in common to all three sets, also had the ability to predict patient outcome. To validate the predictive utility of this derived gene set, it was tested in two published data sets from other groups. This gene set resulted in significant separation of patients on the basis of survival in these data sets, correctly predicting outcome in 62–65% of patients. By comparing outcome prediction within subgroups based on ER status, grade, and nodal status, we found that our gene set was most effective in predicting outcome in ER positive and node negative tumors. This robust gene selection with extensive validation has identified a predictive gene set that may have clinical utility for outcome prediction in breast cancer patients

  17. A full scale approximation of covariance functions for large spatial data sets

    KAUST Repository

    Sang, Huiyan

    2011-10-10

    Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such models typically require O(n 3) operations for a data set of size n. Various approximations of the covariance functions have been introduced to reduce the computational cost. However, most existing approximations cannot simultaneously capture both the large- and the small-scale spatial dependence. A new approximation scheme is developed to provide a high quality approximation to the covariance function at both the large and the small spatial scales. The new approximation is the summation of two parts: a reduced rank covariance and a compactly supported covariance obtained by tapering the covariance of the residual of the reduced rank approximation. Whereas the former part mainly captures the large-scale spatial variation, the latter part captures the small-scale, local variation that is unexplained by the former part. By combining the reduced rank representation and sparse matrix techniques, our approach allows for efficient computation for maximum likelihood estimation, spatial prediction and Bayesian inference. We illustrate the new approach with simulated and real data sets. © 2011 Royal Statistical Society.

  18. Global long-term ozone trends derived from different observed and modelled data sets

    Science.gov (United States)

    Coldewey-Egbers, M.; Loyola, D.; Zimmer, W.; van Roozendael, M.; Lerot, C.; Dameris, M.; Garny, H.; Braesicke, P.; Koukouli, M.; Balis, D.

    2012-04-01

    The long-term behaviour of stratospheric ozone amounts during the past three decades is investigated on a global scale using different observed and modelled data sets. Three European satellite sensors GOME/ERS-2, SCIAMACHY/ENVISAT, and GOME-2/METOP are combined and a merged global monthly mean total ozone product has been prepared using an inter-satellite calibration approach. The data set covers the 16-years period from June 1995 to June 2011 and it exhibits an excellent long-term stability, which is required for such trend studies. A multiple linear least-squares regression algorithm using different explanatory variables is applied to the time series and statistically significant positive trends are detected in the northern mid latitudes and subtropics. Global trends are also estimated using a second satellite-based Merged Ozone Data set (MOD) provided by NASA. For few selected geographical regions ozone trends are additionally calculated using well-maintained measurements of individual Dobson/Brewer ground-based instruments. A reasonable agreement in the spatial patterns of the trends is found amongst the European satellite, the NASA satellite, and the ground-based observations. Furthermore, two long-term simulations obtained with the Chemistry-Climate Models E39C-A provided by German Aerospace Center and UMUKCA-UCAM provided by University of Cambridge are analysed.

  19. Dark Energy Survey Year 1 Results: The Photometric Data Set for Cosmology

    Science.gov (United States)

    Drlica-Wagner, A.; Sevilla-Noarbe, I.; Rykoff, E. S.; Gruendl, R. A.; Yanny, B.; Tucker, D. L.; Hoyle, B.; Carnero Rosell, A.; Bernstein, G. M.; Bechtol, K.; Becker, M. R.; Benoit-Lévy, A.; Bertin, E.; Carrasco Kind, M.; Davis, C.; de Vicente, J.; Diehl, H. T.; Gruen, D.; Hartley, W. G.; Leistedt, B.; Li, T. S.; Marshall, J. L.; Neilsen, E.; Rau, M. M.; Sheldon, E.; Smith, J.; Troxel, M. A.; Wyatt, S.; Zhang, Y.; Abbott, T. M. C.; Abdalla, F. B.; Allam, S.; Banerji, M.; Brooks, D.; Buckley-Geer, E.; Burke, D. L.; Capozzi, D.; Carretero, J.; Cunha, C. E.; D’Andrea, C. B.; da Costa, L. N.; DePoy, D. L.; Desai, S.; Dietrich, J. P.; Doel, P.; Evrard, A. E.; Fausti Neto, A.; Flaugher, B.; Fosalba, P.; Frieman, J.; García-Bellido, J.; Gerdes, D. W.; Giannantonio, T.; Gschwend, J.; Gutierrez, G.; Honscheid, K.; James, D. J.; Jeltema, T.; Kuehn, K.; Kuhlmann, S.; Kuropatkin, N.; Lahav, O.; Lima, M.; Lin, H.; Maia, M. A. G.; Martini, P.; McMahon, R. G.; Melchior, P.; Menanteau, F.; Miquel, R.; Nichol, R. C.; Ogando, R. L. C.; Plazas, A. A.; Romer, A. K.; Roodman, A.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Smith, M.; Smith, R. C.; Soares-Santos, M.; Sobreira, F.; Suchyta, E.; Tarle, G.; Vikram, V.; Walker, A. R.; Wechsler, R. H.; Zuntz, J.; DES Collaboration

    2018-04-01

    We describe the creation, content, and validation of the Dark Energy Survey (DES) internal year-one cosmology data set, Y1A1 GOLD, in support of upcoming cosmological analyses. The Y1A1 GOLD data set is assembled from multiple epochs of DES imaging and consists of calibrated photometric zero-points, object catalogs, and ancillary data products—e.g., maps of survey depth and observing conditions, star–galaxy classification, and photometric redshift estimates—that are necessary for accurate cosmological analyses. The Y1A1 GOLD wide-area object catalog consists of ∼ 137 million objects detected in co-added images covering ∼ 1800 {\\deg }2 in the DES grizY filters. The 10σ limiting magnitude for galaxies is g=23.4, r=23.2, i=22.5, z=21.8, and Y=20.1. Photometric calibration of Y1A1 GOLD was performed by combining nightly zero-point solutions with stellar locus regression, and the absolute calibration accuracy is better than 2% over the survey area. DES Y1A1 GOLD is the largest photometric data set at the achieved depth to date, enabling precise measurements of cosmic acceleration at z ≲ 1.

  20. A Comprehensive Training Data Set for the Development of Satellite-Based Volcanic Ash Detection Algorithms

    Science.gov (United States)

    Schmidl, Marius

    2017-04-01

    We present a comprehensive training data set covering a large range of atmospheric conditions, including disperse volcanic ash and desert dust layers. These data sets contain all information required for the development of volcanic ash detection algorithms based on artificial neural networks, urgently needed since volcanic ash in the airspace is a major concern of aviation safety authorities. Selected parts of the data are used to train the volcanic ash detection algorithm VADUGS. They contain atmospheric and surface-related quantities as well as the corresponding simulated satellite data for the channels in the infrared spectral range of the SEVIRI instrument on board MSG-2. To get realistic results, ECMWF, IASI-based, and GEOS-Chem data are used to calculate all parameters describing the environment, whereas the software package libRadtran is used to perform radiative transfer simulations returning the brightness temperatures for each atmospheric state. As optical properties are a prerequisite for radiative simulations accounting for aerosol layers, the development also included the computation of optical properties for a set of different aerosol types from different sources. A description of the developed software and the used methods is given, besides an overview of the resulting data sets.

  1. The integrated water balance and soil data set of the Rollesbroich hydrological observatory

    Science.gov (United States)

    Qu, Wei; Bogena, Heye R.; Huisman, Johan A.; Schmidt, Marius; Kunkel, Ralf; Weuthen, Ansgar; Schiedung, Henning; Schilling, Bernd; Sorg, Jürgen; Vereecken, Harry

    2016-10-01

    The Rollesbroich headwater catchment located in western Germany is a densely instrumented hydrological observatory and part of the TERENO (Terrestrial Environmental Observatories) initiative. The measurements acquired in this observatory present a comprehensive data set that contains key hydrological fluxes in addition to important hydrological states and properties. Meteorological data (i.e., precipitation, air temperature, air humidity, radiation components, and wind speed) are continuously recorded and actual evapotranspiration is measured using the eddy covariance technique. Runoff is measured at the catchment outlet with a gauging station. In addition, spatiotemporal variations in soil water content and temperature are measured at high resolution with a wireless sensor network (SoilNet). Soil physical properties were determined using standard laboratory procedures from samples taken at a large number of locations in the catchment. This comprehensive data set can be used to validate remote sensing retrievals and hydrological models, to improve the understanding of spatial temporal dynamics of soil water content, to optimize data assimilation and inverse techniques for hydrological models, and to develop upscaling and downscaling procedures of soil water content information. The complete data set is freely available online (http://www.tereno.net, doi:10.5880/TERENO.2016.001, doi:10.5880/TERENO.2016.004, doi:10.5880/TERENO.2016.003) and additionally referenced by three persistent identifiers securing the long-term data and metadata availability.

  2. International Spinal Cord Injury Core Data Set (version 2.0)

    DEFF Research Database (Denmark)

    Biering-Sørensen, F; DeVivo, M J; Charlifue, S

    2017-01-01

    STUDY DESIGN: The study design includes expert opinion, feedback, revisions and final consensus. OBJECTIVES: The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe the adjust......STUDY DESIGN: The study design includes expert opinion, feedback, revisions and final consensus. OBJECTIVES: The objective of the study was to present the new knowledge obtained since the International Spinal Cord Injury (SCI) Core Data Set (Version 1.0) published in 2006, and describe...... the adjustments made in Version 2.0, including standardization of data reporting. SETTING: International. METHODS: Comments received from the SCI community were discussed in a working group (WG); suggestions from the WG were reviewed and revisions were made. All suggested revisions were considered, and a final...... version was circulated for final approval. RESULTS: The International SCI Core Data Set (Version 2.0) consists of 25 variables. Changes made to this version include the deletion of one variable 'Total Days Hospitalized' and addition of two variables 'Date of Rehabilitation Admission' and 'Date of Death...

  3. Visualization of big data security: a case study on the KDD99 cup data set

    Directory of Open Access Journals (Sweden)

    Zichan Ruan

    2017-11-01

    Full Text Available Cyber security has been thrust into the limelight in the modern technological era because of an array of attacks often bypassing untrained intrusion detection systems (IDSs. Therefore, greater attention has been directed on being able deciphering better methods for identifying attack types to train IDSs more effectively. Keycyber-attack insights exist in big data; however, an efficient approach is required to determine strong attack types to train IDSs to become more effective in key areas. Despite the rising growth in IDS research, there is a lack of studies involving big data visualization, which is key. The KDD99 data set has served as a strong benchmark since 1999; therefore, we utilized this data set in our experiment. In this study, we utilized hash algorithm, a weight table, and sampling method to deal with the inherent problems caused by analyzing big data; volume, variety, and velocity. By utilizing a visualization algorithm, we were able to gain insights into the KDD99 data set with a clear identification of “normal” clusters and described distinct clusters of effective attacks.

  4. Taming the data wilderness with the VHO: Integrating heliospheric data sets

    Science.gov (United States)

    Schroeder, P.; Szabo, A.; Narock, T.

    Currently space physicists are faced with a bewildering array of heliospheric missions experiments and data sets available at archives distributed around the world Daunting even for those most familiar with the field physicists in other concentrations solar physics magnetospheric physics etc find locating the heliospheric data that they need extremely challenging if not impossible The Virtual Heliospheric Observatory VHO will help to solve this problem by creating an Application Programming Interface API and web portal that integrates these data sets to find the highest quality data for a given task The VHO will locate the best available data often found only at PI institutions rather than at national archives like the NSSDC The VHO will therefore facilitate a dynamic data environment where improved data products are made available immediately In order to accomplish this the VHO will enforce a metadata standard on participating data providers with sufficient depth to allow for meaningful scientific evaluation of similar data products The VHO will provide an automated way for secondary sites to keep mirrors of data archives up to date and encouraging the generation of secondary or added-value data products The VHO will interact seamlessly with the Virtual Solar Observatory VSO and other Virtual Observatories VxO s to allow for inter-disciplinary data searching Software tools for these data sets will also be available through the VHO Finally the VHO will provide linkages to the modeling community and will develop metadata standards for the

  5. A full scale approximation of covariance functions for large spatial data sets

    KAUST Repository

    Sang, Huiyan; Huang, Jianhua Z.

    2011-01-01

    Gaussian process models have been widely used in spatial statistics but face tremendous computational challenges for very large data sets. The model fitting and spatial prediction of such models typically require O(n 3) operations for a data set of size n. Various approximations of the covariance functions have been introduced to reduce the computational cost. However, most existing approximations cannot simultaneously capture both the large- and the small-scale spatial dependence. A new approximation scheme is developed to provide a high quality approximation to the covariance function at both the large and the small spatial scales. The new approximation is the summation of two parts: a reduced rank covariance and a compactly supported covariance obtained by tapering the covariance of the residual of the reduced rank approximation. Whereas the former part mainly captures the large-scale spatial variation, the latter part captures the small-scale, local variation that is unexplained by the former part. By combining the reduced rank representation and sparse matrix techniques, our approach allows for efficient computation for maximum likelihood estimation, spatial prediction and Bayesian inference. We illustrate the new approach with simulated and real data sets. © 2011 Royal Statistical Society.

  6. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    Science.gov (United States)

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy

  7. [Impact of universal vaccination against chicken pox in Navarre, 2006-2010].

    Science.gov (United States)

    Cenoz, M García; Catalán, J Castilla; Zamarbide, F Irisarri; Berastegui, M Arriazu; Gurrea, A Barricarte

    2011-01-01

    In 2007 universal vaccination against chicken pox was introduced in the vaccine calendar of Navarre. The aim of this study is to evaluate the impact of this measure on the incidence of chicken pox in both the vaccinated cohorts (direct effect) and in the unvaccinated cohorts (indirect effect). Chicken pox is a disease of individualized compulsory notification. We analyzed the annual incidence by age groups between 2006 and 2010. Hospital admittances with chicken pox or complicated chicken pox as the principal diagnosis were taken from the minimum basic data set on hospital discharges for the years 2006 to 2009. The incidence of chicken pox has fallen by 93.0%, from 8.04 cases per 1,000 inhabitants in 2006 to 0.56 per 1,000 inhabitants in 2010 (ppox has fallen by 96.3%. In the cohorts vaccinated at 10 and 14 years, a fall of 93.6% can also be observed in children from 10 to 14 years, and of 85.0% in those of 15 to 19 years. In the unvaccinated age groups we can observe falls of 88.2% in children under one year, of 73.3% in those of 7 to 9 years, and of 84.6% in people over 20 years. In 2006 there were 25 hospital admissions due to chicken pox in Navarre and in 2009 this figure fell to 7. The rate of admissions fell by 71%. The introduction of universal chicken pox vaccination in Navarre has resulted in a rapid and very steep reduction of the incidence of chicken pox in both vaccinated and unvaccinated people.

  8. Market trials of irradiated chicken

    International Nuclear Information System (INIS)

    Fox, John A.; Olson, Dennis G.

    1998-01-01

    The potential market for irradiated chicken breasts was investigated using a mail survey and a retail trial. Results from the mail survey suggested a significantly higher level of acceptability of irradiated chicken than did the retail trial. A subsequent market experiment involving actual purchases showed levels of acceptability similar to that of the mail survey when similar information about food irradiation was provided

  9. Detecting representative data and generating synthetic samples to improve learning accuracy with imbalanced data sets.

    Directory of Open Access Journals (Sweden)

    Der-Chiang Li

    Full Text Available It is difficult for learning models to achieve high classification performances with imbalanced data sets, because with imbalanced data sets, when one of the classes is much larger than the others, most machine learning and data mining classifiers are overly influenced by the larger classes and ignore the smaller ones. As a result, the classification algorithms often have poor learning performances due to slow convergence in the smaller classes. To balance such data sets, this paper presents a strategy that involves reducing the sizes of the majority data and generating synthetic samples for the minority data. In the reducing operation, we use the box-and-whisker plot approach to exclude outliers and the Mega-Trend-Diffusion method to find representative data from the majority data. To generate the synthetic samples, we propose a counterintuitive hypothesis to find the distributed shape of the minority data, and then produce samples according to this distribution. Four real datasets were used to examine the performance of the proposed approach. We used paired t-tests to compare the Accuracy, G-mean, and F-measure scores of the proposed data pre-processing (PPDP method merging in the D3C method (PPDP+D3C with those of the one-sided selection (OSS, the well-known SMOTEBoost (SB study, and the normal distribution-based oversampling (NDO approach, and the proposed data pre-processing (PPDP method. The results indicate that the classification performance of the proposed approach is better than that of above-mentioned methods.

  10. Data linkage of inpatient hospitalization and workers' claims data sets to characterize occupational falls.

    Science.gov (United States)

    Bunn, Terry L; Slavova, Svetla; Bathke, Arne

    2007-07-01

    The identification of industry, occupation, and associated injury costs for worker falls in Kentucky have not been fully examined. The purpose of this study was to determine the associations between industry and occupation and 1) hospitalization length of stay; 2) hospitalization charges; and 3) workers' claims costs in workers suffering falls, using linked inpatient hospitalization discharge and workers' claims data sets. Hospitalization cases were selected with ICD-9-CM external cause of injury codes for falls and payer code of workers' claims for years 2000-2004. Selection criteria for workers'claims cases were International Association of Industrial Accident Boards and Commissions Electronic Data Interchange Nature (IAIABCEDIN) injuries coded as falls and/or slips. Common data variables between the two data sets such as date of birth, gender, date of injury, and hospital admission date were used to perform probabilistic data linkage using LinkSolv software. Statistical analysis was performed with non-parametric tests. Construction falls were the most prevalent for male workers and incurred the highest hospitalization and workers' compensation costs, whereas most female worker falls occurred in the services industry. The largest percentage of male worker falls was from one level to another, while the largest percentage of females experienced a fall, slip, or trip (not otherwise classified). When male construction worker falls were further analyzed, laborers and helpers had longer hospital stays as well as higher total charges when the worker fell from one level to another. Data linkage of hospitalization and workers' claims falls data provides additional information on industry, occupation, and costs that are not available when examining either data set alone.

  11. An evolutionary algorithm for tomographic reconstructions in limited data sets problems

    International Nuclear Information System (INIS)

    Turcanu, Catrinel; Craciunescu, Teddy

    2000-01-01

    The paper proposes a new method for tomographic reconstructions. Unlike nuclear medicine applications, in physical science problems we are often confronted with limited data sets: constraints in the number of projections or limited angle views. The problem of image reconstruction from projections may be considered as a problem of finding an image (solution) having projections that match the experimental ones. In our approach, we choose a statistical correlation coefficient to evaluate the fitness of any potential solution. The optimization process is carried out by an evolutionary algorithm. Our algorithm has some problem-oriented characteristics. One of them is that a chromosome, representing a potential solution, is not linear but coded as a matrix of pixels corresponding to a two-dimensional image. This kind of internal representation reflects the genuine manifestation and slight differences between two points situated in the original problem space give rise to similar differences once they become coded. Another particular feature is a newly built crossover operator: the grid-based crossover, suitable for high dimension two-dimensional chromosomes. Except for the population size and the dimension of the cutting grid for the grid-based crossover, all the other parameters of the algorithm are independent of the geometry of the tomographic reconstruction. The performances of the method are evaluated in comparison with a traditional tomographic method, based on the maximization of the entropy of the image, that proved to work well with limited data sets. The test phantom is typical for an application with limited data sets: the determination of the neutron energy spectra with time resolution in case of short-pulsed neutron emission. The qualitative judgement and also the quantitative one, based on some figures of merit, point out that the proposed method ensures an improved reconstruction of shapes, sizes and resolution in the image, even in the presence of noise

  12. Effects of errors and gaps in spatial data sets on assessment of conservation progress.

    Science.gov (United States)

    Visconti, P; Di Marco, M; Álvarez-Romero, J G; Januchowski-Hartley, S R; Pressey, R L; Weeks, R; Rondinini, C

    2013-10-01

    Data on the location and extent of protected areas, ecosystems, and species' distributions are essential for determining gaps in biodiversity protection and identifying future conservation priorities. However, these data sets always come with errors in the maps and associated metadata. Errors are often overlooked in conservation studies, despite their potential negative effects on the reported extent of protection of species and ecosystems. We used 3 case studies to illustrate the implications of 3 sources of errors in reporting progress toward conservation objectives: protected areas with unknown boundaries that are replaced by buffered centroids, propagation of multiple errors in spatial data, and incomplete protected-area data sets. As of 2010, the frequency of protected areas with unknown boundaries in the World Database on Protected Areas (WDPA) caused the estimated extent of protection of 37.1% of the terrestrial Neotropical mammals to be overestimated by an average 402.8% and of 62.6% of species to be underestimated by an average 10.9%. Estimated level of protection of the world's coral reefs was 25% higher when using recent finer-resolution data on coral reefs as opposed to globally available coarse-resolution data. Accounting for additional data sets not yet incorporated into WDPA contributed up to 6.7% of additional protection to marine ecosystems in the Philippines. We suggest ways for data providers to reduce the errors in spatial and ancillary data and ways for data users to mitigate the effects of these errors on biodiversity assessments. © 2013 Society for Conservation Biology.

  13. ATLANTIC BATS: a data set of bat communities from the Atlantic Forests of South America.

    Science.gov (United States)

    Muylaert, Renata D L; Stevens, Richard D; Esbérard, Carlos E L; Mello, Marco A R; Garbino, Guilherme S T; Varzinczak, Luiz H; Faria, Deborah; Weber, Marcelo D M; Kerches Rogeri, Patricia; Regolin, André L; Oliveira, Hernani F M D; Costa, Luciana D M; Barros, Marília A S; Sabino-Santos, Gilberto; Crepaldi de Morais, Mara Ariane; Kavagutti, Vinicius S; Passos, Fernando C; Marjakangas, Emma-Liina; Maia, Felipe G M; Ribeiro, Milton C; Galetti, Mauro

    2017-12-01

    Bats are the second most diverse mammal order and they provide vital ecosystem functions (e.g., pollination, seed dispersal, and nutrient flux in caves) and services (e.g., crop pest suppression). Bats are also important vectors of infectious diseases, harboring more than 100 different virus types. In the present study, we compiled information on bat communities from the Atlantic Forests of South America, a species-rich biome that is highly threatened by habitat loss and fragmentation. The ATLANTIC BATS data set comprises 135 quantitative studies carried out in 205 sites, which cover most vegetation types of the tropical and subtropical Atlantic Forest: dense ombrophilous forest, mixed ombrophilous forest, semideciduous forest, deciduous forest, savanna, steppe, and open ombrophilous forest. The data set includes information on more than 90,000 captures of 98 bat species of eight families. Species richness averaged 12.1 per site, with a median value of 10 species (ranging from 1 to 53 species). Six species occurred in more than 50% of the communities: Artibeus lituratus, Carollia perspicillata, Sturnira lilium, Artibeus fimbriatus, Glossophaga soricina, and Platyrrhinus lineatus. The number of captures divided by sampling effort, a proxy for abundance, varied from 0.000001 to 0.77 individuals·h -1 ·m -2 (0.04 ± 0.007 individuals·h -1 ·m -2 ). Our data set reveals a hyper-dominance of eight species that together that comprise 80% of all captures: Platyrrhinus lineatus (2.3%), Molossus molossus (2.8%), Artibeus obscurus (3.4%), Artibeus planirostris (5.2%), Artibeus fimbriatus (7%), Sturnira lilium (14.5%), Carollia perspicillata (15.6%), and Artibeus lituratus (29.2%). © 2017 by the Ecological Society of America.

  14. A reconstructed melanoma data set for evaluating differential treatment benefit according to biomarker subgroups

    Directory of Open Access Journals (Sweden)

    Jaya M. Satagopan

    2017-06-01

    Full Text Available The data presented in this article are related to the research article entitled “Measuring differential treatment benefit across marker specific subgroups: the choice of outcome scale” (Satagopan and Iasonos, 2015 [1]. These data were digitally reconstructed from figures published in Larkin et al. (2015 [2]. This article describes the steps to digitally reconstruct patient-level data on time-to-event outcome and treatment and biomarker groups using published Kaplan-Meier survival curves. The reconstructed data set and the corresponding computer programs are made publicly available to enable further statistical methodology research.

  15. Chemical Topic Modeling: Exploring Molecular Data Sets Using a Common Text-Mining Approach.

    Science.gov (United States)

    Schneider, Nadine; Fechner, Nikolas; Landrum, Gregory A; Stiefl, Nikolaus

    2017-08-28

    Big data is one of the key transformative factors which increasingly influences all aspects of modern life. Although this transformation brings vast opportunities it also generates novel challenges, not the least of which is organizing and searching this data deluge. The field of medicinal chemistry is not different: more and more data are being generated, for instance, by technologies such as DNA encoded libraries, peptide libraries, text mining of large literature corpora, and new in silico enumeration methods. Handling those huge sets of molecules effectively is quite challenging and requires compromises that often come at the expense of the interpretability of the results. In order to find an intuitive and meaningful approach to organizing large molecular data sets, we adopted a probabilistic framework called "topic modeling" from the text-mining field. Here we present the first chemistry-related implementation of this method, which allows large molecule sets to be assigned to "chemical topics" and investigating the relationships between those. In this first study, we thoroughly evaluate this novel method in different experiments and discuss both its disadvantages and advantages. We show very promising results in reproducing human-assigned concepts using the approach to identify and retrieve chemical series from sets of molecules. We have also created an intuitive visualization of the chemical topics output by the algorithm. This is a huge benefit compared to other unsupervised machine-learning methods, like clustering, which are commonly used to group sets of molecules. Finally, we applied the new method to the 1.6 million molecules of the ChEMBL22 data set to test its robustness and efficiency. In about 1 h we built a 100-topic model of this large data set in which we could identify interesting topics like "proteins", "DNA", or "steroids". Along with this publication we provide our data sets and an open-source implementation of the new method (CheTo) which

  16. A global, high-resolution data set of ice sheet topography, cavity geometry, and ocean bathymetry

    Science.gov (United States)

    Schaffer, Janin; Timmermann, Ralph; Arndt, Jan Erik; Savstrup Kristensen, Steen; Mayer, Christoph; Morlighem, Mathieu; Steinhage, Daniel

    2016-10-01

    The ocean plays an important role in modulating the mass balance of the polar ice sheets by interacting with the ice shelves in Antarctica and with the marine-terminating outlet glaciers in Greenland. Given that the flux of warm water onto the continental shelf and into the sub-ice cavities is steered by complex bathymetry, a detailed topography data set is an essential ingredient for models that address ice-ocean interaction. We followed the spirit of the global RTopo-1 data set and compiled consistent maps of global ocean bathymetry, upper and lower ice surface topographies, and global surface height on a spherical grid with now 30 arcsec grid spacing. For this new data set, called RTopo-2, we used the General Bathymetric Chart of the Oceans (GEBCO_2014) as the backbone and added the International Bathymetric Chart of the Arctic Ocean version 3 (IBCAOv3) and the International Bathymetric Chart of the Southern Ocean (IBCSO) version 1. While RTopo-1 primarily aimed at a good and consistent representation of the Antarctic ice sheet, ice shelves, and sub-ice cavities, RTopo-2 now also contains ice topographies of the Greenland ice sheet and outlet glaciers. In particular, we aimed at a good representation of the fjord and shelf bathymetry surrounding the Greenland continent. We modified data from earlier gridded products in the areas of Petermann Glacier, Hagen Bræ, and Sermilik Fjord, assuming that sub-ice and fjord bathymetries roughly follow plausible Last Glacial Maximum ice flow patterns. For the continental shelf off Northeast Greenland and the floating ice tongue of Nioghalvfjerdsfjorden Glacier at about 79° N, we incorporated a high-resolution digital bathymetry model considering original multibeam survey data for the region. Radar data for surface topographies of the floating ice tongues of Nioghalvfjerdsfjorden Glacier and Zachariæ Isstrøm have been obtained from the data centres of Technical University of Denmark (DTU), Operation Icebridge (NASA

  17. The NAFE'05/CoSMOS Data Set: Toward SMOS Soil Moisture Retrieval, Downscaling, and Assimilation

    DEFF Research Database (Denmark)

    Panciera, Rocco; Walker, Jeffrey P.; Kalma, Jetse D.

    2008-01-01

    The National Airborne Field Experiment 2005 (NAFE'05) and the Campaign for validating the Operation of Soil Moisture and Ocean Salinity (COSMOS) were undertaken in November 2005 in the Goulburn River catchment, which is located in southeastern Australia. The objective of the joint campaign......-resolution data from SMOS; and 3) testing its assimilation into land surface models for root zone soil moisture retrieval. This paper describes the NAFE'05 and COSMOS airborne data sets together with the ground data collected in support of both aircraft campaigns. The airborne L-band acquisitions included 40 km x...

  18. EASE-Grid 2.0: Incremental but Significant Improvements for Earth-Gridded Data Sets

    OpenAIRE

    Brodzik, Mary J.; Billingsley, Brendan; Haran, Terry; Raup, Bruce; Savoie, Matthew H.

    2012-01-01

    Defined in the early 1990s for use with gridded satellite passive microwave data, the Equal-Area Scalable Earth Grid (EASE-Grid) was quickly adopted and used for distribution of a variety of satellite and in situ data sets. Conceptually easy to understand, EASE-Grid suffers from limitations that make it impossible to format in the widely popular GeoTIFF convention without reprojection. Importing EASE-Grid data into standard mapping software packages is nontrivial and error-prone. This article...

  19. Iterative algorithm of discrete Fourier transform for processing randomly sampled NMR data sets

    International Nuclear Information System (INIS)

    Stanek, Jan; Kozminski, Wiktor

    2010-01-01

    Spectra obtained by application of multidimensional Fourier Transformation (MFT) to sparsely sampled nD NMR signals are usually corrupted due to missing data. In the present paper this phenomenon is investigated on simulations and experiments. An effective iterative algorithm for artifact suppression for sparse on-grid NMR data sets is discussed in detail. It includes automated peak recognition based on statistical methods. The results enable one to study NMR spectra of high dynamic range of peak intensities preserving benefits of random sampling, namely the superior resolution in indirectly measured dimensions. Experimental examples include 3D 15 N- and 13 C-edited NOESY-HSQC spectra of human ubiquitin.

  20. An evaluation of sampling and full enumeration strategies for Fisher Jenks classification in big data settings

    Science.gov (United States)

    Rey, Sergio J.; Stephens, Philip A.; Laura, Jason R.

    2017-01-01

    Large data contexts present a number of challenges to optimal choropleth map classifiers. Application of optimal classifiers to a sample of the attribute space is one proposed solution. The properties of alternative sampling-based classification methods are examined through a series of Monte Carlo simulations. The impacts of spatial autocorrelation, number of desired classes, and form of sampling are shown to have significant impacts on the accuracy of map classifications. Tradeoffs between improved speed of the sampling approaches and loss of accuracy are also considered. The results suggest the possibility of guiding the choice of classification scheme as a function of the properties of large data sets.

  1. Data sets for hydrogen reflection and their use in neutral transport calculations

    International Nuclear Information System (INIS)

    Eckstein, W.; Heifetz, D.B.

    1986-08-01

    A realistic characterization of the interaction of ions and neutral particles with device walls is important for any edge plasma calculation. Present reflection models vary in detail and computational efficiency. This paper presents a data set for the distribution of the reflection coefficient, R N , over reflected energy, polar, and azimuthal angles, as functions of incident polar angle and energy. These results have been computed using a vectorized version of the TRIM Monte Carlo code. The data are stored using an algorithm for reducing the data into three one-dimensional distributions, resulting in a realistic reflection model which can be used very efficiently in plasma edge calculations. (orig.)

  2. The Role in the Virtual Astronomical Observatory in the Era of Massive Data Sets

    Science.gov (United States)

    Berriman, G. Bruce; Hanisch, Robert J.; Lazio, T. Joseph W.

    2012-01-01

    The Virtual Observatory (VO) is realizing global electronic integration of astronomy data. One of the long-term goals of the U.S. VO project, the Virtual Astronomical Observatory (VAO), is development of services and protocols that respond to the growing size and complexity of astronomy data sets. This paper describes how VAO staff are active in such development efforts, especially in innovative strategies and techniques that recognize the limited operating budgets likely available to astronomers even as demand increases. The project has a program of professional outreach whereby new services and protocols are evaluated.

  3. A search for refraction in the Kepler gas giant data set

    Science.gov (United States)

    Sheets, Holly A.; Jacob, Laurent; Cowan, Nicolas; Deming, Drake

    2018-06-01

    I present the results of our systematic search for refraction in the atmospheres of giant planets in the Kepler data set. We chose our candidates using the approximations of Sidis and Sari (ApJ, 2010, 720, 904S), selecting those that had an expected signal greater than 10 parts per million. We model the refraction shoulders as simple exponentials outside of transit and fit a transit+shoulder model to individual candidates. We find that the effect is not present to the extent predicted from the approximations.

  4. The WACMOS-ET project – Part 2: Evaluation of global terrestrial evaporation data sets

    KAUST Repository

    Miralles, D. G.; Jimé nez, C.; Jung, M.; Michel, D.; Ershadi, Ali; McCabe, Matthew; Hirschi, M.; Martens, B.; Dolman, A. J.; Fisher, J. B.; Mu, Q.; Seneviratne, S. I.; Wood, E. F.; Fernaì ndez-Prieto, D.

    2015-01-01

    The WAter Cycle Multi-mission Observation Strategy – EvapoTranspiration (WACMOS-ET) project aims to advance the development of land evaporation estimates on global and regional scales. Its main objective is the derivation, validation, and intercomparison of a group of existing evaporation retrieval algorithms driven by a common forcing data set. Three commonly used process-based evaporation methodologies are evaluated: the Penman–Monteith algorithm behind the official Moderate Resolution Imaging Spectroradiometer (MODIS) evaporation product (PM-MOD), the Global Land Evaporation Amsterdam Model (GLEAM), and the Priestley–Taylor Jet Propulsion Laboratory model (PT-JPL). The resulting global spatiotemporal variability of evaporation, the closure of regional water budgets, and the discrete estimation of land evaporation components or sources (i.e. transpiration, interception loss, and direct soil evaporation) are investigated using river discharge data, independent global evaporation data sets and results from previous studies. In a companion article (Part 1), Michel et al. (2016) inspect the performance of these three models at local scales using measurements from eddy-covariance towers and include in the assessment the Surface Energy Balance System (SEBS) model. In agreement with Part 1, our results indicate that the Priestley and Taylor products (PT-JPL and GLEAM) perform best overall for most ecosystems and climate regimes. While all three evaporation products adequately represent the expected average geographical patterns and seasonality, there is a tendency in PM-MOD to underestimate the flux in the tropics and subtropics. Overall, results from GLEAM and PT-JPL appear more realistic when compared to surface water balances from 837 globally distributed catchments and to separate evaporation estimates from ERAInterim and the model tree ensemble (MTE). Nonetheless, all products show large dissimilarities during conditions of water stress and drought and

  5. A Data Set of Portuguese Traditional Recipes Based on Published Cookery Books

    Directory of Open Access Journals (Sweden)

    Alexandra Soveral Dias

    2018-03-01

    Full Text Available This paper presents a data set resulting from the abstraction of books of traditional recipes for Portuguese cuisine. Only starters, main courses, side dishes, and soups were considered. Desserts, cakes, sweets, puddings, and pastries were not included. Recipes were characterized by the province and ingredients regardless of quantities or preparation. An exploratory characterization of recipes and ingredients is presented. Results show that Portuguese traditional recipes organize differently among the eleven provinces considered, setting up the basis for more detailed analyses of the 1382 recipes and 421 ingredients inventoried.

  6. MXA: a customizable HDF5-based data format for multi-dimensional data sets

    International Nuclear Information System (INIS)

    Jackson, M; Simmons, J P; De Graef, M

    2010-01-01

    A new digital file format is proposed for the long-term archival storage of experimental data sets generated by serial sectioning instruments. The format is known as the multi-dimensional eXtensible Archive (MXA) format and is based on the public domain Hierarchical Data Format (HDF5). The MXA data model, its description by means of an eXtensible Markup Language (XML) file with associated Document Type Definition (DTD) are described in detail. The public domain MXA package is available through a dedicated web site (mxa.web.cmu.edu), along with implementation details and example data files

  7. SAR matrices: automated extraction of information-rich SAR tables from large compound data sets.

    Science.gov (United States)

    Wassermann, Anne Mai; Haebel, Peter; Weskamp, Nils; Bajorath, Jürgen

    2012-07-23

    We introduce the SAR matrix data structure that is designed to elucidate SAR patterns produced by groups of structurally related active compounds, which are extracted from large data sets. SAR matrices are systematically generated and sorted on the basis of SAR information content. Matrix generation is computationally efficient and enables processing of large compound sets. The matrix format is reminiscent of SAR tables, and SAR patterns revealed by different categories of matrices are easily interpretable. The structural organization underlying matrix formation is more flexible than standard R-group decomposition schemes. Hence, the resulting matrices capture SAR information in a comprehensive manner.

  8. Envision: An interactive system for the management and visualization of large geophysical data sets

    Science.gov (United States)

    Searight, K. R.; Wojtowicz, D. P.; Walsh, J. E.; Pathi, S.; Bowman, K. P.; Wilhelmson, R. B.

    1995-01-01

    Envision is a software project at the University of Illinois and Texas A&M, funded by NASA's Applied Information Systems Research Project. It provides researchers in the geophysical sciences convenient ways to manage, browse, and visualize large observed or model data sets. Envision integrates data management, analysis, and visualization of geophysical data in an interactive environment. It employs commonly used standards in data formats, operating systems, networking, and graphics. It also attempts, wherever possible, to integrate with existing scientific visualization and analysis software. Envision has an easy-to-use graphical interface, distributed process components, and an extensible design. It is a public domain package, freely available to the scientific community.

  9. Analysis of Availability of Data Sets Necessary for Decision Making in Air Quality Assessment

    Science.gov (United States)

    Sanzhapov, B. Kh; Rashevskiy, N. M.; Sinitsyn, A. A.

    2017-11-01

    The article analyzes the problems in obtaining the environmental information data sets necessary for decision making in the field of air quality assessment. The authors of this article describe their experience in search for data required to run the WRF and Calpuff systems to estimate the air quality in the city of Volgograd, the Russian Federation. The problems the authors encountered and other researchers may encounter are indicated. A conceptual scheme of the resource that will help to increase the speed of searching for necessary information and also inform users about the rules for use of a particular resource is suggested.

  10. Hydro-Climatic Data Network (HCDN) Streamflow Data Set, 1874-1988

    Science.gov (United States)

    Slack, James Richard; Lumb, Alan M.; Landwehr, Jurate Maciunas

    1993-01-01

    The potential consequences of climate change to continental water resources are of great concern in the management of those resources. Critically important to society is what effect fluctuations in the prevailing climate may have on hydrologic conditions, such as the occurrence and magnitude of floods or droughts and the seasonal distribution of water supplies within a region. Records of streamflow that are unaffected by artificial diversions, storage, or other works of man in or on the natural stream channels or in the watershed can provide an account of hydrologic responses to fluctuations in climate. By examining such records given known past meteorologic conditions, we can better understand hydrologic responses to those conditions and anticipate the effects of postulated changes in current climate regimes. Furthermore, patterns in streamflow records can indicate when a change in the prevailing climate regime may have occurred in the past, even in the absence of concurrent meteorologic records. A streamflow data set, which is specifically suitable for the study of surface-water conditions throughout the United States under fluctuations in the prevailing climatic conditions, has been developed. This data set, called the Hydro-Climatic Data Network, or HCDN, consists of streamflow records for 1,659 sites throughout United States and its Territories. Records cumulatively span the period 1874 through 1988, inclusive, and represent a total of 73,231 water years of information. Development of the HCDN Data Set: Records for the HCDN were obtained through a comprehensive search of the extensive surface- water data holdings of the U.S. Geological Survey (USGS), which are contained in the USGS National Water Storage and Retrieval System (WATSTORE). All streamflow discharge records in WATSTORE through September 30, 1988, were examined for inclusion in the HCDN in accordance with strictly defined criteria of measurement accuracy and natural conditions. No reconstructed

  11. Evaluating the relationship between evolutionary divergence and phylogenetic accuracy in AFLP data sets.

    Science.gov (United States)

    García-Pereira, María Jesús; Caballero, Armando; Quesada, Humberto

    2010-05-01

    Using in silico amplified fragment length polymorphism (AFLP) fingerprints, we explore the relationship between sequence similarity and phylogeny accuracy to test when, in terms of genetic divergence, the quality of AFLP data becomes too low to be informative for a reliable phylogenetic reconstruction. We generated DNA sequences with known phylogenies using balanced and unbalanced trees with recent, uniform and ancient radiations, and average branch lengths (from the most internal node to the tip) ranging from 0.02 to 0.4 substitutions per site. The resulting sequences were used to emulate the AFLP procedure. Trees were estimated by maximum parsimony (MP), neighbor-joining (NJ), and minimum evolution (ME) methods from both DNA sequences and virtual AFLP fingerprints. The estimated trees were compared with the reference trees using a score that measures overall differences in both topology and relative branch length. As expected, the accuracy of AFLP-based phylogenies decreased dramatically in the more divergent data sets. Above a divergence of approximately 0.05, AFLP-based phylogenies were largely inaccurate irrespective of the distinct topology, radiation model, or phylogenetic method used. This value represents an upper bound of expected tree accuracy for data sets with a simple divergence history; AFLP data sets with a similar divergence but with unbalanced topologies and short ancestral branches produced much less accurate trees. The lack of homology of AFLP bands quickly increases with divergence and reaches its maximum value (100%) at a divergence of only 0.4. Low guanine-cytosine (GC) contents increase the number of nonhomologous bands in AFLP data sets and lead to less reliable trees. However, the effect of the lack of band homology on tree accuracy is surprisingly small relative to the negative impact due to the low information content of AFLP characters. Tree-building methods based on genetic distance displayed similar trends and outperformed parsimony

  12. Security Optimization for Distributed Applications Oriented on Very Large Data Sets

    Directory of Open Access Journals (Sweden)

    Mihai DOINEA

    2010-01-01

    Full Text Available The paper presents the main characteristics of applications which are working with very large data sets and the issues related to security. First section addresses the optimization process and how it is approached when dealing with security. The second section describes the concept of very large datasets management while in the third section the risks related are identified and classified. Finally, a security optimization schema is presented with a cost-efficiency analysis upon its feasibility. Conclusions are drawn and future approaches are identified.

  13. The WACMOS-ET project – Part 2: Evaluation of global terrestrial evaporation data sets

    KAUST Repository

    Miralles, D. G.

    2015-10-19

    The WAter Cycle Multi-mission Observation Strategy – EvapoTranspiration (WACMOS-ET) project aims to advance the development of land evaporation estimates on global and regional scales. Its main objective is the derivation, validation, and intercomparison of a group of existing evaporation retrieval algorithms driven by a common forcing data set. Three commonly used process-based evaporation methodologies are evaluated: the Penman–Monteith algorithm behind the official Moderate Resolution Imaging Spectroradiometer (MODIS) evaporation product (PM-MOD), the Global Land Evaporation Amsterdam Model (GLEAM), and the Priestley–Taylor Jet Propulsion Laboratory model (PT-JPL). The resulting global spatiotemporal variability of evaporation, the closure of regional water budgets, and the discrete estimation of land evaporation components or sources (i.e. transpiration, interception loss, and direct soil evaporation) are investigated using river discharge data, independent global evaporation data sets and results from previous studies. In a companion article (Part 1), Michel et al. (2016) inspect the performance of these three models at local scales using measurements from eddy-covariance towers and include in the assessment the Surface Energy Balance System (SEBS) model. In agreement with Part 1, our results indicate that the Priestley and Taylor products (PT-JPL and GLEAM) perform best overall for most ecosystems and climate regimes. While all three evaporation products adequately represent the expected average geographical patterns and seasonality, there is a tendency in PM-MOD to underestimate the flux in the tropics and subtropics. Overall, results from GLEAM and PT-JPL appear more realistic when compared to surface water balances from 837 globally distributed catchments and to separate evaporation estimates from ERAInterim and the model tree ensemble (MTE). Nonetheless, all products show large dissimilarities during conditions of water stress and drought and

  14. DNA microarray global gene expression analysis of influenza virus-infected chicken and duck cells

    Directory of Open Access Journals (Sweden)

    Suresh V. Kuchipudi

    2015-06-01

    Full Text Available The data described in this article pertain to the article by Kuchipudi et al. (2014 titled “Highly Pathogenic Avian Influenza Virus Infection in Chickens But Not Ducks Is Associated with Elevated Host Immune and Pro-inflammatory Responses” [1]. While infection of chickens with highly pathogenic avian influenza (HPAI H5N1 virus subtypes often leads to 100% mortality within 1 to 2 days, infection of ducks in contrast causes mild or no clinical signs. The rapid onset of fatal disease in chickens, but with no evidence of severe clinical symptoms in ducks, suggests underlying differences in their innate immune mechanisms. We used Chicken Genechip microarrays (Affymetrix to analyse the gene expression profiles of primary chicken and duck lung cells infected with a low pathogenic avian influenza (LPAI H2N3 virus and two HPAI H5N1 virus subtypes to understand the molecular basis of host susceptibility and resistance in chickens and ducks. Here, we described the experimental design, quality control and analysis that were performed on the data set. The data are publicly available through the Gene Expression Omnibus (GEOdatabase with accession number GSE33389, and the analysis and interpretation of these data are included in Kuchipudi et al. (2014 [1].

  15. 7 CFR 65.160 - Ground chicken.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Ground chicken. 65.160 Section 65.160 Agriculture... OF BEEF, PORK, LAMB, CHICKEN, GOAT MEAT, PERISHABLE AGRICULTURAL COMMODITIES, MACADAMIA NUTS, PECANS, PEANUTS, AND GINSENG General Provisions Definitions § 65.160 Ground chicken. Ground chicken means...

  16. 7 CFR 65.120 - Chicken.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 3 2010-01-01 2010-01-01 false Chicken. 65.120 Section 65.120 Agriculture Regulations..., PORK, LAMB, CHICKEN, GOAT MEAT, PERISHABLE AGRICULTURAL COMMODITIES, MACADAMIA NUTS, PECANS, PEANUTS, AND GINSENG General Provisions Definitions § 65.120 Chicken. Chicken has the meaning given the term in...

  17. Lipoxygenase in chicken muscle

    International Nuclear Information System (INIS)

    Grossman, S.; Bergman, M.; Sklan, D.

    1988-01-01

    The presence of lipoxygenase-type enzymes was demonstrated in chick muscles. Examination of the oxidation products of [ 14 C]arachidonic acid revealed the presence of 15-lipoxygenase. The enzyme was partially purified by affinity chromatography on linoleoyl-aminoethyl-Sepharose. The enzyme was stable on frozen storage, and activity was almost completely preserved after 12-month storage at -20 degree C. During this period the content of cis,cis-1,4-pentadiene fatty acids decreased slightly. It is suggested that lipoxygenase may be responsible for some of the oxidative changes occurring in fatty acids on frozen storage of chicken meat

  18. The 1 km AVHRR global land data set: first stages in implementation

    Science.gov (United States)

    Eidenshink, J.C.; Faundeen, J.L.

    1994-01-01

    The global land 1 km data set project represents an international effort to acquire, archive, process, and distribute 1 km AVHRR data of the entire global land surface in order to meet the needs of the international science community. A network of 26 high resolution picture transmission (HRPT) stations, along with data recorded by the National Oceanic and Atmospheric Administration (NOAA), has been acquiring daily global land coverage since 1 April 1992. A data set of over 30000 AVHRR images has been archived and made available for distribution by the United States Geological Survey, EROS Data Center and the European Space Agency. Under the guidance of the International Geosphere Biosphere programme, processing standards for the AVHRR data have been developed for calibration, atmospheric correction, geometric registration, and the production of global 10-day maximum normalized difference vegetation index (NDVI) composites. The major uses of the composites are related to the study of surface vegetation cover. A prototype 10-day composite was produced for the period of 21–30 June 1992. Production of an 18-month time series of 10-day composites is underway.

  19. A proposed minimum data set for international primary care optometry: a modified Delphi study.

    Science.gov (United States)

    Davey, Christopher J; Slade, Sarah V; Shickle, Darren

    2017-07-01

    To identify a minimum list of metrics of international relevance to public health, research and service development which can be extracted from practice management systems and electronic patient records in primary optometric practice. A two stage modified Delphi technique was used. Stage 1 categorised metrics that may be recorded as being part of a primary eye examination by their importance to research using the results from a previous survey of 40 vision science and public health academics. Delphi stage 2 then gauged the opinion of a panel of seven vision science academics and achieved consensus on contentious metrics and methods of grading/classification. A consensus regarding inclusion and response categories was achieved for nearly all metrics. A recommendation was made of 53 metrics which would be appropriate in a minimum data set. This minimum data set should be easily integrated into clinical practice yet allow vital data to be collected internationally from primary care optometry. It should not be mistaken for a clinical guideline and should not add workload to the optometrist. A pilot study incorporating an additional Delphi stage prior to implementation is advisable to refine some response categories. © 2017 The Authors. Ophthalmic and Physiological Optics published by John Wiley & Sons Ltd on behalf of College of Optometrists.

  20. Global change data sets: Excerpts from the Master Directory, version 2.0

    International Nuclear Information System (INIS)

    Beier, J.

    1992-02-01

    The recent awakening to the reality of human-induced changes to the environment has resulted in an organized effort to promote global change research. The goal of this research as outlined by NASA's Earth System Science Committee (Earth System Science: A closer View, 1988) is to understand the entire Earth system on a global scale by describing how its component parts and their interactions have evolved, how they function, and how they may be expected to evolve on all timescales. The practical result is the capacity to predict that evolution over the next decade to century. Key variables important for the study of global change include external forcing factors (solar radiance, UV flux), radiatively and chemically important trace species (CO2, CH4, N2O, etc.), atmospheric response variables (temperature, pressure, winds), landsurface properties (river run-off, snow cover, albedo, soil moisture, vegetation cover), and oceanic variables (sea surface temperature, sea ice extent, sea level ocean wind stress, currents, chlorophyll, biogeochemical fluxes). The purpose of this document is to identify existing data sets available (both remotely sensed and in situ data) covering some of these variables. This is not intended to be a complete list of global change data, but merely a highlight of what is available. The information was extracted from the Master Directory (MD), an on-line scientific data information service which may be used by any researcher. This report contains the coverage dates for the data sets, sources (satellites, instruments) of the data and where they are archived

  1. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    Science.gov (United States)

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  2. NASA's Planetary Data System: Support for the Delivery of Derived Data Sets at the Atmospheres Node

    Science.gov (United States)

    Chanover, Nancy J.; Beebe, Reta; Neakrase, Lynn; Huber, Lyle; Rees, Shannon; Hornung, Danae

    2015-11-01

    NASA’s Planetary Data System is charged with archiving electronic data products from NASA planetary missions that are sponsored by NASA’s Science Mission Directorate. This archive, currently organized by science disciplines, uses standards for describing and storing data that are designed to enable future scientists who are unfamiliar with the original experiments to analyze the data, and to do this using a variety of computer platforms, with no additional support. These standards address the data structure, description contents, and media design. The new requirement in the NASA ROSES-2015 Research Announcement to include a Data Management Plan will result in an increase in the number of derived data sets that are being delivered to the PDS. These data sets may come from the Planetary Data Archiving, Restoration and Tools (PDART) program, other Data Analysis Programs (DAPs) or be volunteered by individuals who are publishing the results of their analysis. In response to this increase, the PDS Atmospheres Node is developing a set of guidelines and user tools to make the process of archiving these derived data products more efficient. Here we provide a description of Atmospheres Node resources, including a letter of support for the proposal stage, a communication schedule for the planned archive effort, product label samples and templates in extensible markup language (XML), documentation templates, and validation tools necessary for producing a PDS4-compliant derived data bundle(s) efficiently and accurately.

  3. Bioalerts: a python library for the derivation of structural alerts from bioactivity and toxicity data sets.

    Science.gov (United States)

    Cortes-Ciriano, Isidro

    2016-01-01

    Assessing compound toxicity at early stages of the drug discovery process is a crucial task to dismiss drug candidates likely to fail in clinical trials. Screening drug candidates against structural alerts, i.e. chemical fragments associated to a toxicological response prior or after being metabolized (bioactivation), has proved a valuable approach for this task. During the last decades, diverse algorithms have been proposed for the automatic derivation of structural alerts from categorical toxicity data sets. Here, the python library bioalerts is presented, which comprises functionalities for the automatic derivation of structural alerts from categorical (dichotomous), e.g. toxic/non-toxic, and continuous bioactivity data sets, e.g. [Formula: see text] or [Formula: see text] values. The library bioalerts relies on the RDKit implementation of the circular Morgan fingerprint algorithm to compute chemical substructures, which are derived by considering radial atom neighbourhoods of increasing bond radius. In addition to the derivation of structural alerts, bioalerts provides functionalities for the calculation of unhashed (keyed) Morgan fingerprints, which can be used in predictive bioactivity modelling with the advantage of allowing for a chemically meaningful deconvolution of the chemical space. Finally, bioalerts provides functionalities for the easy visualization of the derived structural alerts.

  4. Solar radiation for buildings application: comparing reduced data set for mediterranean sites

    International Nuclear Information System (INIS)

    La Gennusa, M.; Rizzo, G.; Scaccianoce, G.; Sorrentino, G.

    2006-01-01

    A growing diffusion of computer programs for the thermal simulation of buildings is occurring in the last year; they allow the description of the thermal behaviour of buildings during the year, in order to verify their energy efficiencies and to suggest eventual improvements, even at the design stage. These thermal simulation programs generally need a complete input data set and among these, information particularly referring to the climatic conditions of the site where the building will be built-up. As it is well known, a climatic issue, particularly important for the thermal energy balance, is the solar radiation. In this work we have updated a short reference year of the solar radiation, more precisely the Monthly Average Day (MAD) for a town of the Southern Italy (Palermo), which shows climatic features similar to other places in the Mediterranean basin. In addition, we have compared the climatic data of the MAD (for the global and diffuse solar radiations) obtained from hourly measures for seven years, with those obtained both from geo-astronomical parameters and from the monthly average of the daily global solar radiation, which is commonly adopted on purposes. The comparison does suggest a particular caution in the choice of the method used for generating reduced data sets of the solar radiation for these mediterranean.(Author)

  5. Parallel analysis tools and new visualization techniques for ultra-large climate data set

    Energy Technology Data Exchange (ETDEWEB)

    Middleton, Don [National Center for Atmospheric Research, Boulder, CO (United States); Haley, Mary [National Center for Atmospheric Research, Boulder, CO (United States)

    2014-12-10

    ParVis was a project funded under LAB 10-05: “Earth System Modeling: Advanced Scientific Visualization of Ultra-Large Climate Data Sets”. Argonne was the lead lab with partners at PNNL, SNL, NCAR and UC-Davis. This report covers progress from January 1st, 2013 through Dec 1st, 2014. Two previous reports covered the period from Summer, 2010, through September 2011 and October 2011 through December 2012, respectively. While the project was originally planned to end on April 30, 2013, personnel and priority changes allowed many of the institutions to continue work through FY14 using existing funds. A primary focus of ParVis was introducing parallelism to climate model analysis to greatly reduce the time-to-visualization for ultra-large climate data sets. Work in the first two years was conducted on two tracks with different time horizons: one track to provide immediate help to climate scientists already struggling to apply their analysis to existing large data sets and another focused on building a new data-parallel library and tool for climate analysis and visualization that will give the field a platform for performing analysis and visualization on ultra-large datasets for the foreseeable future. In the final 2 years of the project, we focused mostly on the new data-parallel library and associated tools for climate analysis and visualization.

  6. The All-wavelength Extended Groth Strip International Survey (AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.; Coil, A.L.; Conselice, C.; Cooper, M.; Croton,; Eisenhardt, P.; Ellis, R.; Faber, S.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke, B.; Goss, W.M.; /UC, Berkeley, Astron. Dept. /Lick Observ. /LBL, Berkeley

    2006-07-21

    In this the first of a series of ''Letters'', we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z {approx} 1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS{sup 30} X-ray (0.5-10 keV), GALEX{sup 31} ultraviolet (1200-2500 A), CFHT/MegaCam Legacy Survey{sup 32} optical (3600-9000 {angstrom}), CFHT/CFH12K optical (4500-9000 {angstrom}), Hubble Space Telescope/ACS{sup 33} optical (4400-8500 {angstrom}), Palomar/WIRC{sup 34} near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC{sup 35} mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA{sup 36} radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope{sup 37}. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  7. The All-Wavelength Extended Groth Strip International Survey(AEGIS) Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Davis, M.; Guhathakurta, P.; Konidaris, N.P.; Newman, J.A.; Ashby, M.L.N.; Biggs, A.D.; Barmby, P.; Bundy, K.; Chapman, S.C.; Coil,A.L.; Conselice, C.J.; Cooper, M.C.; Croton, D.J.; Eisenhardt, P.R.M.; Ellis, R.S.; Faber, S.M.; Fang, T.; Fazio, G.G.; Georgakakis, A.; Gerke,B.F.; Goss, W.M.; Gwyn, S.; Harker, J.; Hopkins, A.M.; Huang, J.-S.; Ivison, R.J.; Kassin, S.A.; Kirby, E.N.; Koekemoer, A.M.; Koo, D.C.; Laird, E.S.; Le Floc' h, E.; Lin, L.; Lotz, J.M.; Marshall, P.J.; Martin,D.C.; Metevier, A.J.; Moustakas, L.A.; Nandra, K.; Noeske, K.G.; Papovich, C.; Phillips, A.C.; Rich,R. M.; Rieke, G.H.; Rigopoulou, D.; Salim, S.; Schiminovich, D.; Simard, L.; Smail, I.; Small,T.A.; Weiner,B.J.; Willmer, C.N.A.; Willner, S.P.; Wilson, G.; Wright, E.L.; Yan, R.

    2006-10-13

    In this the first of a series of Letters, we present a description of the panchromatic data sets that have been acquired in the Extended Groth Strip region of the sky. Our survey, the All-wavelength Extended Groth Strip International Survey (AEGIS), is intended to study the physical properties and evolutionary processes of galaxies at z{approx}1. It includes the following deep, wide-field imaging data sets: Chandra/ACIS X-ray (0.5-10 keV), GALEX ultraviolet (1200-2500 Angstroms), CFHT/MegaCam Legacy Survey optical (3600-9000 Angstroms), CFHT/CFH12K optical (4500-9000 Angstroms), Hubble Space Telescope/ACS optical (4400-8500 Angstroms), Palomar/WIRC near-infrared (1.2-2.2 {micro}m), Spitzer/IRAC mid-infrared (3.6-8.0 {micro}m), Spitzer/MIPS far-infrared (24-70 {micro}m), and VLA radio continuum (6-20 cm). In addition, this region of the sky has been targeted for extensive spectroscopy using the DEIMOS spectrograph on the Keck II 10 m telescope. Our survey is compared to other large multiwavelength surveys in terms of depth and sky coverage.

  8. Comparative Modeling and Benchmarking Data Sets for Human Histone Deacetylases and Sirtuin Families

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Histone Deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective Histone Deacetylases Inhibitors (HDACIs). To facilitate the process, we constructed the Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs covers all 4 Classes including Class III (Sirtuins family) and 14 HDACs isoforms, composed of 631 inhibitors and 24,609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of “artificial enrichment” and “analogue bias”. We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets, and demonstrate that our MUBD-HDACs is unique in that it can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the “2D bias” and “LBVS favorable” effect within the benchmarking sets. In summary, MUBD-HDACs is the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that is available so far. MUBD-HDACs is freely available at http://www.xswlab.org/. PMID:25633490

  9. Global change data sets: Excerpts from the Master Directory, version 2.0

    Science.gov (United States)

    Beier, Joy

    1992-01-01

    The recent awakening to the reality of human-induced changes to the environment has resulted in an organized effort to promote global change research. The goal of this research as outlined by NASA's Earth System Science Committee (Earth System Science: A closer View, 1988) is to understand the entire Earth system on a global scale by describing how its component parts and their interactions have evolved, how they function, and how they may be expected to evolve on all timescales. The practical result is the capacity to predict that evolution over the next decade to century. Key variables important for the study of global change include external forcing factors (solar radiance, UV flux), radiatively and chemically important trace species (CO2, CH4, N2O, etc.), atmospheric response variables (temperature, pressure, winds), landsurface properties (river run-off, snow cover, albedo, soil moisture, vegetation cover), and oceanic variables (sea surface temperature, sea ice extent, sea level ocean wind stress, currents, chlorophyll, biogeochemical fluxes). The purpose of this document is to identify existing data sets available (both remotely sensed and in situ data) covering some of these variables. This is not intended to be a complete list of global change data, but merely a highlight of what is available. The information was extracted from the Master Directory (MD), an on-line scientific data information service which may be used by any researcher. This report contains the coverage dates for the data sets, sources (satellites, instruments) of the data and where they are archived.

  10. Fast Computation of Categorical Richness on Raster Data Sets and Related Problems

    DEFF Research Database (Denmark)

    de Berg, Mark; Tsirogiannis, Constantinos; Wilkinson, Bryan

    2015-01-01

    that runs in O(n) time and one for circular windows that runs in O((1+K/r)n) time, where K is the number of different categories appearing in G. The algorithms are not only very efficient in theory, but also in practice: our experiments show that our algorithms can handle raster data sets of hundreds...... of millions of cells. The categorical richness problem is related to colored range counting, where the goal is to preprocess a colored point set such that we can efficiently count the number of colors appearing inside a query range. We present a data structure for colored range counting in R^2 for the case......In many scientific fields, it is common to encounter raster data sets consisting of categorical data, such as soil type or land usage of a terrain. A problem that arises in the presence of such data is the following: given a raster G of n cells storing categorical data, compute for every cell c...

  11. Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2014-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs. PMID:25481478

  12. A variant reference data set for the Africanized honeybee, Apis mellifera.

    Science.gov (United States)

    Kadri, Samir M; Harpur, Brock A; Orsi, Ricardo O; Zayed, Amro

    2016-11-08

    The Africanized honeybee (AHB) is a population of Apis mellifera found in the Americas. AHBs originated in 1956 in Rio Clara, Brazil where imported African A. m. scutellata escaped and hybridized with local populations of European A. mellifera. Africanized populations can now be found from Northern Argentina to the Southern United States. AHBs-often referred to as 'Killer Bees'- are a major concern to the beekeeping industry as well as a model for the evolutionary genetics of colony defence. We performed high coverage pooled-resequencing of 360 diploid workers from 30 Brazilian AHB colonies using Illumina Hi-Seq (150 bp PE). This yielded a high density SNP data set with an average read depth at each site of 20.25 reads. With 3,606,720 SNPs and 155,336 SNPs within 11,365 genes, this data set is the largest genomic resource available for AHBs and will enable high-resolution studies of the population dynamics, evolution, and genetics of this successful biological invader, in addition to facilitating the development of SNP-based tools for identifying AHBs.

  13. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

    Science.gov (United States)

    Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-02-23

    Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .

  15. ESA's Planetary Science Archive: Preserve and present reliable scientific data sets

    Science.gov (United States)

    Besse, S.; Vallat, C.; Barthelemy, M.; Coia, D.; Costa, M.; De Marchi, G.; Fraga, D.; Grotheer, E.; Heather, D.; Lim, T.; Martinez, S.; Arviset, C.; Barbarisi, I.; Docasal, R.; Macfarlane, A.; Rios, C.; Saiz, J.; Vallejo, F.

    2018-01-01

    The European Space Agency (ESA) Planetary Science Archive (PSA) is undergoing a significant refactoring of all its components to improve the services provided to the scientific community and the public. The PSA supports ESA's missions exploring the Solar System by archiving scientific peer-reviewed observations as well as engineering data sets. This includes the Giotto, SMART-1, Huygens, Venus Express, Mars Express, Rosetta, Exomars 2016, Exomars RSP, BepiColombo, and JUICE missions. The PSA is offering a newly designed graphical user interface which is simultaneously meant to maximize the interaction with scientific observations and also minimise the efforts needed to download these scientific observations. The PSA still offers the same services as before (i.e., FTP, documentation, helpdesk, etc.). In addition, it will support the two formats of the Planetary Data System (i.e., PDS3 and PDS4), as well as providing new ways for searching the data products with specific metadata and geometrical parameters. As well as enhanced services, the PSA will also provide new services to improve the visualisation of data products and scientific content (e.g., spectra, etc.). Together with improved access to the spacecraft engineering data sets, the PSA will provide easier access to scientific data products that will help to maximize the science return of ESA's space missions.

  16. Relevance of the international spinal cord injury basic data sets to youth: an Inter-Professional review with recommendations.

    Science.gov (United States)

    Carroll, A; Vogel, L C; Zebracki, K; Noonan, V K; Biering-Sørensen, F; Mulcahey, M J

    2017-09-01

    Mixed methods, using the Modified Delphi Technique and Expert Panel Review. To evaluate the utility and relevance of the International Spinal Cord Injury (SCI) Core and Basic Data Sets for children and youth with SCI. International. Via 20 electronic surveys, an interprofessional sample of healthcare professionals with pediatric SCI experience participated in an iterative critical review of the International SCI Data Sets, and submitted suggestions for modifications for use with four pediatric age groups. A panel of 5 experts scrutinized the utility of all data sets, correlated any modifications with the developing National Institute of Neurological Disorders and Stroke (NINDS) pediatric SCI Common Data Elements (CDE) and distributed final recommendations for modifications required to the adult data sets to the International SCI Data Set Committee and the associated Working Groups. Two International SCI Data Sets were considered relevant and appropriate for use with children without any changes. Three were considered not appropriate or applicable for use with children, regardless of age. Recommendations were made for five data sets to enhance their relevance and applicability to children across the age groups, and recommendations for seven data sets were specific to infants and younger children. The results of this critical review are significant in that substantive recommendations to align the International SCI Core and Basic Data Sets to pediatric practice were made. This project was funded by the Rick Hansen Institute Grant# 2015-27.

  17. Water mass census in the Nordic seas using climatological and observational data sets

    International Nuclear Information System (INIS)

    Piacsek, S.; Allard, R.; McClean, J.

    2008-01-01

    We have compared and evaluated the water mass census in the Greenland-Iceland-Norwegian (Gin) Sea area from climatologies, observational data sets and model output. The four climatologies evaluated were: the 1998 and 2001 versions of the World Ocean Atlas (WOA98, WOA01), and the United States Navy's GDEM90 (Generalized Digital Environmental Model) and MODAS01 (Modular Ocean Data Assimilation System) climatologies. Three observational data sets were examined: the multidecadal (1965-1995) set contained on the National Oceano- graphic Data Centre's (NODC) WOD98 (World Ocean Data) Cd-Rom, and two seasonal data sets extracted from observations taken on six cruises by the SACLANT Research Center (SACLANTCEN) of NATO/Italy between 1986-1989. The model data is extracted from a global model run at 1/3 degree resolution for the years 1983-1997, using the Pop (Parallel Ocean Program) model of the Los Alamos National Laboratory. The census computations focused on the Norwegian Sea, in the southern part of the Gin Sea, between 10 0 W-10 0 E and 60 0 N-70 0 N, especially for comparisons with the hydro casts and the model. Cases of such evaluation computations included: (a) short term comparisons with quasi-synoptic CTD surveys carried out over a 4-year period in the southeastern Gin Sea; (b) climatological comparisons utilizing all available casts from the WOD98 Cd-Rom, with four climatologies; and (c) a comparison between the WOA01 climatology and the Pop model output ending in 1997. In this region in the spring, the fraction of ocean water that has salinity above 34.85 is ∼94%, and that has temperatures above 0 0 C is ∼33%. Three principal water masses dominated the census: the Atlantic water A W, the deep water D W and an intermediate water mass defined as Lower Arctic Intermediate Water (LAIW). Besides these classes, both the climatologies and the observations exhibited the significant presence of deep water masses with T-S characteristics that do not fall into the named

  18. 3D printing of normal and pathologic tricuspid valves from transthoracic 3D echocardiography data sets.

    Science.gov (United States)

    Muraru, Denisa; Veronesi, Federico; Maddalozzo, Anna; Dequal, Daniele; Frajhof, Leonardo; Rabischoffsky, Arnaldo; Iliceto, Sabino; Badano, Luigi P

    2017-07-01

    To explore the feasibility of using transthoracic 3D echocardiography (3DTTE) data to generate 3D patient-specific models of tricuspid valve (TV). Multi-beat 3D data sets of the TV (32 vol/s) were acquired in five subjects with various TV morphologies from the apical approach and analysed offline with custom-made software. Coordinates representing the annulus and the leaflets were imported into MeshLab (Visual Computing Lab ISTICNR) to develop solid models to be converted to stereolithographic file format and 3D print. Measurements of the TV annulus antero-posterior (AP) and medio-lateral (ML) diameters, perimeter (P), and TV tenting height (H) and volume (V) obtained from the 3D echo data set were compared with those performed on the 3D models using a caliper, a syringe and a millimeter tape. Antero-posterior (4.2 ± 0.2 cm vs. 4.2 ± 0 cm), ML (3.7 ± 0.2 cm vs. 3.6 ± 0.1 cm), P (12.6 ± 0.2 cm vs. 12.7 ± 0.1 cm), H (11.2 ± 2.1 mm vs. 10.8 ± 2.1 mm) and V (3.0 ± 0.6 ml vs. 2.8 ± 1.4 ml) were similar (P = NS for all) when measured on the 3D data set and the printed model. The two sets of measurements were highly correlated (r = 0.991). The mean absolute error (2D - 3D) for AP, ML, P and tenting H was 0.7 ± 0.3 mm, indicating accuracy of the 3D model of printing of the TV from 3DTTE data is feasible with highly conserved fidelity. This technique has the potential for rapid integration into clinical practice to assist with decision-making, surgical planning, and teaching. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2016. For permissions, please email: journals.permissions@oup.com.

  19. Breeding and Genetics Symposium: really big data: processing and analysis of very large data sets.

    Science.gov (United States)

    Cole, J B; Newman, S; Foertter, F; Aguilar, I; Coffey, M

    2012-03-01

    Modern animal breeding data sets are large and getting larger, due in part to recent availability of high-density SNP arrays and cheap sequencing technology. High-performance computing methods for efficient data warehousing and analysis are under development. Financial and security considerations are important when using shared clusters. Sound software engineering practices are needed, and it is better to use existing solutions when possible. Storage requirements for genotypes are modest, although full-sequence data will require greater storage capacity. Storage requirements for intermediate and results files for genetic evaluations are much greater, particularly when multiple runs must be stored for research and validation studies. The greatest gains in accuracy from genomic selection have been realized for traits of low heritability, and there is increasing interest in new health and management traits. The collection of sufficient phenotypes to produce accurate evaluations may take many years, and high-reliability proofs for older bulls are needed to estimate marker effects. Data mining algorithms applied to large data sets may help identify unexpected relationships in the data, and improved visualization tools will provide insights. Genomic selection using large data requires a lot of computing power, particularly when large fractions of the population are genotyped. Theoretical improvements have made possible the inversion of large numerator relationship matrices, permitted the solving of large systems of equations, and produced fast algorithms for variance component estimation. Recent work shows that single-step approaches combining BLUP with a genomic relationship (G) matrix have similar computational requirements to traditional BLUP, and the limiting factor is the construction and inversion of G for many genotypes. A naïve algorithm for creating G for 14,000 individuals required almost 24 h to run, but custom libraries and parallel computing reduced that to

  20. Validating hierarchical verbal autopsy expert algorithms in a large data set with known causes of death.

    Science.gov (United States)

    Kalter, Henry D; Perin, Jamie; Black, Robert E

    2016-06-01

    Physician assessment historically has been the most common method of analyzing verbal autopsy (VA) data. Recently, the World Health Organization endorsed two automated methods, Tariff 2.0 and InterVA-4, which promise greater objectivity and lower cost. A disadvantage of the Tariff method is that it requires a training data set from a prior validation study, while InterVA relies on clinically specified conditional probabilities. We undertook to validate the hierarchical expert algorithm analysis of VA data, an automated, intuitive, deterministic method that does not require a training data set. Using Population Health Metrics Research Consortium study hospital source data, we compared the primary causes of 1629 neonatal and 1456 1-59 month-old child deaths from VA expert algorithms arranged in a hierarchy to their reference standard causes. The expert algorithms were held constant, while five prior and one new "compromise" neonatal hierarchy, and three former child hierarchies were tested. For each comparison, the reference standard data were resampled 1000 times within the range of cause-specific mortality fractions (CSMF) for one of three approximated community scenarios in the 2013 WHO global causes of death, plus one random mortality cause proportions scenario. We utilized CSMF accuracy to assess overall population-level validity, and the absolute difference between VA and reference standard CSMFs to examine particular causes. Chance-corrected concordance (CCC) and Cohen's kappa were used to evaluate individual-level cause assignment. Overall CSMF accuracy for the best-performing expert algorithm hierarchy was 0.80 (range 0.57-0.96) for neonatal deaths and 0.76 (0.50-0.97) for child deaths. Performance for particular causes of death varied, with fairly flat estimated CSMF over a range of reference values for several causes. Performance at the individual diagnosis level was also less favorable than that for overall CSMF (neonatal: best CCC = 0.23, range 0

  1. Olkiluoto hydrogeochemistry. A 3-D modelling approach for sparce data set

    International Nuclear Information System (INIS)

    Luukkonen, A.; Partamies, S.; Pitkaenen, P.

    2003-07-01

    Olkiluoto at Eurajoki has been selected as a candidate site for final disposal repository for the used nuclear waste produced in Finland. In the long term safety assessment, one of the principal evaluation tools of safe disposal is hydrogeochemistry. For assessment purposes Posiva Oy excavates in the Olkiluoto bedrock an underground research laboratory (ONKALO). The complexity of the groundwater chemistry is characteristic to the Olkiluoto site and causes a demand to examine and visualise these hydrogeochemical features in 3-D together with the structural model. The need to study the hydrogeochemical features is not inevitable only in the stable undisturbed (pre-excavational) conditions but also in the disturbed system caused by the construction activities and open-tunnel conditions of the ONKALO. The present 3-D approach is based on integrating the independently and separately developed structural model and the results from the geochemical mixing calculations of the groundwater samples. For spatial geochemical regression purposes the study area is divided into four primary sectors on the basis of the occurrence of the samples. The geochemical information within the four primary sector are summed up in the four sector centroids that sum-up the depth distributions of the different water types within each primary sector area. The geographic locations of the centroids are used for secondary division of the study area into secondary sectors. With the aid of secondary sectors spatial regressions between the centroids can be calculated and interpolation of water type fractions within the centroid volume becomes possible. Similarly, extrapolations outside the centroid volume are possible as well. The mixing proportions of the five detected water types in an arbitrary point in the modelling volume can be estimated by applying the four centroids and by using lateral linear regression. This study utilises two separate data sets: the older data set and the newer data set. The

  2. Regression Analysis of Long-Term Profile Ozone Data Set from BUV Instruments

    Science.gov (United States)

    Stolarski, Richard S.

    2005-01-01

    We have produced a profile merged ozone data set (MOD) based on the SBUV/SBUV2 series of nadir-viewing satellite backscatter instruments, covering the period from November 1978 - December 2003. In 2004, data from the Nimbus 7 SBUV and NOAA 9, ll, and 16 SBUV/2 instruments were reprocessed using the Version 8 (V8) algorithm and most recent calibrations. More recently, data from the Nimbus 4 BUT instrument, which was operational from 1970 - 1977, were also reprocessed using the V8 algorithm. As part of the V8 profile calibration, the Nimbus 7 and NOAA 9 (1993-1997 only) instrument calibrations have been adjusted to match the NOAA 11 calibration, which was established based on comparisons with SSBUV shuttle flight data. Differences between NOAA 11, Nimbus 7 and NOAA 9 profile zonal means are within plus or minus 5% at all levels when averaged over the respective periods of data overlap. NOAA 16 SBUV/2 data have insufficient overlap with NOAA 11, so its calibration is based on pre-flight information. Mean differences over 4 months of overlap are within plus or minus 7%. Given the level of agreement between the data sets, we simply average the ozone values during periods of instrument overlap to produce the MOD profile data set. Initial comparisons of coincident matches of N4 BUV and Arosa Umkehr data show mean differences of 0.5 (0.5)% at 30km; 7.5 (0.5)% at 35 km; and 11 (0.7)% at 40 km, where the number in parentheses is the standard error of the mean. In this study, we use the MOD profile data set (1978-2003) to estimate the change in profile ozone due to changing stratospheric chlorine levels. We use a standard linear regression model with proxies for the seasonal cycle, solar cycle, QBO, and ozone trend. To account for the non-linearity of stratospheric chlorine levels since the late 1990s, we use a time series of Effective Chlorine, defined as the global average of Chlorine + 50 * Bromine at 1 hPa, as the trend proxy. The Effective Chlorine data are taken from

  3. Chemometrical exploration of an isotopic ratio data set of acetylsalicylic acid

    International Nuclear Information System (INIS)

    Stanimirova, I.; Daszykowski, M.; Van Gyseghem, E.; Bensaid, F.F.; Lees, M.; Smeyers-Verbeke, J.; Massart, D.L.; Vander Heyden, Y.

    2005-01-01

    A data set consisting of fourteen isotopic ratios or quantities derived from such ratios for samples of acetylsalicylic acid (aspirin), commercialized by various pharmaceutical companies from different countries, was analyzed. The goal of the data analysis was to explore whether results can be linked to geographical origin or other features such as different manufacturing processes, of the samples. The methods of data analysis used were principal component analysis (PCA), robust principal component analysis (RPCA), projection pursuit (PP) and multiple factor analysis (MFA). The results do not seem to depend on geographic origin, except for some samples from India. They do depend on the pharmaceutical companies. Moreover, it seems that the samples from certain pharmaceutical companies form clusters of similar samples, suggesting that there is some common feature between those pharmaceutical companies. Variable selection performed by means of MFA showed that the number of variables can be reduced to five without loss of information

  4. Extraction of 3D velocity and porosity fields from GeoPET data sets

    Energy Technology Data Exchange (ETDEWEB)

    Lippmann-Pipke, Johanna; Kulenkampff, Johannes [Helmholtz-Zentrum Dresden-Rossendorf e.V., Dresden (Germany). Reactive Transport; Eichelbaum, S. [Nemtics Visualization, Leipzig (Germany)

    2017-06-01

    Geoscientific process monitoring with positron emission tomography (GeoPET) is proven to be applicable for quantitative tomographic transport process monitoring in natural geological materials. We benchmarked GeoPET by inversely fitting a numerical finite element model to a diffusive transport experiment in Opalinus clay. The obtained effective diffusion coefficients, D{sub e}, parallel and D{sub e}, perpendicular to, are well in line with data from literature. But more complex, heterogeneous migration, and flow patterns cannot be similarly evaluated by inverse fitting using optimization tools. Alternatively, we started developing an algorithm that allows the quantitative extraction of velocity and porosity fields, v{sub i=x,y,z} (x,y,z) and n(x,y,z) from GeoPET time series, c{sub PET}(x,y,z,t). They may serve as constituent data sets for reactive transport modelling.

  5. EASE-Grid 2.0: Incremental but Significant Improvements for Earth-Gridded Data Sets

    Directory of Open Access Journals (Sweden)

    Matthew H. Savoie

    2012-03-01

    Full Text Available Defined in the early 1990s for use with gridded satellite passive microwave data, the Equal-Area Scalable Earth Grid (EASE-Grid was quickly adopted and used for distribution of a variety of satellite and in situ data sets. Conceptually easy to understand, EASE-Grid suffers from limitations that make it impossible to format in the widely popular GeoTIFF convention without reprojection. Importing EASE-Grid data into standard mapping software packages is nontrivial and error-prone. This article defines a standard for an improved EASE-Grid 2.0 definition, addressing how the changes rectify issues with the original grid definition. Data distributed using the EASE-Grid 2.0 standard will be easier for users to import into standard software packages and will minimize common reprojection errors that users had encountered with the original EASE-Grid definition.

  6. Clustering procedures for the optimal selection of data sets from multiple crystals in macromolecular crystallography

    International Nuclear Information System (INIS)

    Foadi, James; Aller, Pierre; Alguel, Yilmaz; Cameron, Alex; Axford, Danny; Owen, Robin L.; Armour, Wes; Waterman, David G.; Iwata, So; Evans, Gwyndaf

    2013-01-01

    A systematic approach to the scaling and merging of data from multiple crystals in macromolecular crystallography is introduced and explained. The availability of intense microbeam macromolecular crystallography beamlines at third-generation synchrotron sources has enabled data collection and structure solution from microcrystals of <10 µm in size. The increased likelihood of severe radiation damage where microcrystals or particularly sensitive crystals are used forces crystallographers to acquire large numbers of data sets from many crystals of the same protein structure. The associated analysis and merging of multi-crystal data is currently a manual and time-consuming step. Here, a computer program, BLEND, that has been written to assist with and automate many of the steps in this process is described. It is demonstrated how BLEND has successfully been used in the solution of a novel membrane protein

  7. Arctic sea ice albedo - A comparison of two satellite-derived data sets

    Science.gov (United States)

    Schweiger, Axel J.; Serreze, Mark C.; Key, Jeffrey R.

    1993-01-01

    Spatial patterns of mean monthly surface albedo for May, June, and July, derived from DMSP Operational Line Scan (OLS) satellite imagery are compared with surface albedos derived from the International Satellite Cloud Climatology Program (ISCCP) monthly data set. Spatial patterns obtained by the two techniques are in general agreement, especially for June and July. Nevertheless, systematic differences in albedo of 0.05 - 0.10 are noted which are most likely related to uncertainties in the simple parameterizations used in the DMSP analyses, problems in the ISCCP cloud-clearing algorithm and other modeling simplifications. However, with respect to the eventual goal of developing a reliable automated retrieval algorithm for compiling a long-term albedo data base, these initial comparisons are very encouraging.

  8. Seismic inference of 57 stars using full-length Kepler data sets

    Directory of Open Access Journals (Sweden)

    Creevey Orlagh

    2017-01-01

    Full Text Available We present stellar properties of 57 stars from a seismic inference using full-length data sets from Kepler (mass, age, radius, distances. These stars comprise active stars, planet-hosts, solar-analogs, and binary systems. We validate the distances derived from the astrometric Gaia-Tycho solution. Ensemble analysis of the stellar properties reveals a trend of mixing-length parameter with the surface gravity and effective temperature. We derive a linear relationship with the seismic quantity ‹r02› to estimate the stellar age. Finally, we define the stellar regimes where the Kjeldsen et al (2008 empirical surface correction for 1D model frequencies is valid.

  9. Search for outlying data points in multivariate solar activity data sets

    International Nuclear Information System (INIS)

    Bartkowiak, A.; Jakimiec, M.

    1989-01-01

    The aim of this paper is the investigation of outlying data points in the solar activity data sets. Two statistical methods for identifying of multivariate outliers are presented: the chi2-plot method based on the analysis of Mahalanobis distances and the method based on principal component analysis, i.e. on scatterdiagrams constructed from the first two or last two eigenvectors. We demonstrate the usefullness of these methods applying them to same data of solar activity. The methods allow to reveal quite precisely the data vectors containing some errors and also some untypical vectors, i.e. vectors with unusually large values or with values revealing untypical relations as compared with the common relations between the appropriate variables. 12 refs., 7 figs., 8 tabs. (author)

  10. Chemometrical exploration of an isotopic ratio data set of acetylsalicylic acid

    Energy Technology Data Exchange (ETDEWEB)

    Stanimirova, I. [ChemoAC, FABI, Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels (Belgium); Daszykowski, M. [ChemoAC, FABI, Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels (Belgium); Van Gyseghem, E. [Eurofins Scientific Analytics, Rue Pierre Adolphe Bobierre, 44323 Nantes Cedex 3 (France); Bensaid, F.F. [Eurofins Scientific Analytics, Rue Pierre Adolphe Bobierre, 44323 Nantes Cedex 3 (France); Lees, M. [Eurofins Scientific Analytics, Rue Pierre Adolphe Bobierre, 44323 Nantes Cedex 3 (France); Smeyers-Verbeke, J. [ChemoAC, FABI, Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels (Belgium); Massart, D.L. [ChemoAC, FABI, Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels (Belgium); Vander Heyden, Y. [ChemoAC, FABI, Analytical Chemistry and Pharmaceutical Technology, Vrije Universiteit Brussel-VUB, Laarbeeklaan 103, B-1090 Brussels (Belgium)]. E-mail: yvanvdh@vub.ac.be

    2005-11-03

    A data set consisting of fourteen isotopic ratios or quantities derived from such ratios for samples of acetylsalicylic acid (aspirin), commercialized by various pharmaceutical companies from different countries, was analyzed. The goal of the data analysis was to explore whether results can be linked to geographical origin or other features such as different manufacturing processes, of the samples. The methods of data analysis used were principal component analysis (PCA), robust principal component analysis (RPCA), projection pursuit (PP) and multiple factor analysis (MFA). The results do not seem to depend on geographic origin, except for some samples from India. They do depend on the pharmaceutical companies. Moreover, it seems that the samples from certain pharmaceutical companies form clusters of similar samples, suggesting that there is some common feature between those pharmaceutical companies. Variable selection performed by means of MFA showed that the number of variables can be reduced to five without loss of information.

  11. Inference on white dwarf binary systems using the first round Mock LISA Data Challenges data sets

    International Nuclear Information System (INIS)

    Stroeer, Alexander; Veitch, John; Roever, Christian; Bloomer, Ed; Clark, James; Christensen, Nelson; Hendry, Martin; Messenger, Chris; Meyer, Renate; Pitkin, Matthew; Toher, Jennifer; Umstaetter, Richard; Vecchio, Alberto; Woan, Graham

    2007-01-01

    We report on the analysis of selected single source data sets from the first round of the mock LISA data challenges (MLDC) for white dwarf binaries. We implemented an end-to-end pipeline consisting of a grid-based coherent pre-processing unit for signal detection and an automatic Markov Chain Monte Carlo (MCMC) post-processing unit for signal evaluation. We demonstrate that signal detection with our coherent approach is secure and accurate, and is increased in accuracy and supplemented with additional information on the signal parameters by our Markov Chain Monte Carlo approach. We also demonstrate that the Markov Chain Monte Carlo routine is additionally able to determine accurately the noise level in the frequency window of interest

  12. A global high-resolution data set of ice sheet topography, cavity geometry and ocean bathymetry

    DEFF Research Database (Denmark)

    Schaffer, Janin; Timmermann, Ralph; Arndt, Jan Erik

    2016-01-01

    of the Southern Ocean (IBCSO) version 1. While RTopo-1 primarily aimed at a good and consistent representation of the Antarctic ice sheet, ice shelves, and sub-ice cavities, RTopo-2now also contains ice topographies of the Greenland ice sheet and outlet glaciers. In particular, we aimed at agood representation....... For the continental shelf off Northeast Greenland and the floating ice tongue of Nioghalvfjerdsfjorden Glacier at about79 N, we incorporated a high-resolution digital bathymetry model considering original multibeam survey datafor the region. Radar data for surface topographies of the floating ice tongues...... for the geometry of Getz, Abbot, andFimbul ice shelf cavities. The data set is available in full and in regional subsets in NetCDF format from thePANGAEA database at doi:10.1594/PANGAEA.856844....

  13. New method of three-dimensional reconstruction from two-dimensional MR data sets

    International Nuclear Information System (INIS)

    Wrazidlo, W.; Schneider, S.; Brambs, H.J.; Richter, G.M.; Kauffmann, G.W.; Geiger, B.; Fischer, C.

    1989-01-01

    In medical diagnosis and therapy, cross-sectional images are obtained by means of US, CT, or MR imaging. The authors propose a new solution to the problem of constructing a shape over a set of cross-sectional contours from two-dimensional (2D) MR data sets. The authors' method reduces the problem of constructing a shape over the cross sections to one of constructing a sequence of partial shapes, each of them connecting two cross sections lying on adjacent planes. The solution makes use of the Delaunay triangulation, which is isomorphic in that specific situation. The authors compute this Delaunay triangulation. Shape reconstruction is then achieved section by pruning Delaunay triangulations

  14. Worldwide data sets constrain the water vapor uptake coefficient in cloud formation.

    Science.gov (United States)

    Raatikainen, Tomi; Nenes, Athanasios; Seinfeld, John H; Morales, Ricardo; Moore, Richard H; Lathem, Terry L; Lance, Sara; Padró, Luz T; Lin, Jack J; Cerully, Kate M; Bougiatioti, Aikaterini; Cozic, Julie; Ruehl, Christopher R; Chuang, Patrick Y; Anderson, Bruce E; Flagan, Richard C; Jonsson, Haflidi; Mihalopoulos, Nikos; Smith, James N

    2013-03-05

    Cloud droplet formation depends on the condensation of water vapor on ambient aerosols, the rate of which is strongly affected by the kinetics of water uptake as expressed by the condensation (or mass accommodation) coefficient, αc. Estimates of αc for droplet growth from activation of ambient particles vary considerably and represent a critical source of uncertainty in estimates of global cloud droplet distributions and the aerosol indirect forcing of climate. We present an analysis of 10 globally relevant data sets of cloud condensation nuclei to constrain the value of αc for ambient aerosol. We find that rapid activation kinetics (αc > 0.1) is uniformly prevalent. This finding resolves a long-standing issue in cloud physics, as the uncertainty in water vapor accommodation on droplets is considerably less than previously thought.

  15. A Distributed Architecture for Sharing Ecological Data Sets with Access and Usage Control Guarantees

    DEFF Research Database (Denmark)

    Bonnet, Philippe; Gonzalez, Javier; Granados, Joel Andres

    2014-01-01

    new insights, there are signicant barriers to the realization of this vision. One of the key challenge is to allow scientists to share their data widely while retaining some form of control over who accesses this data (access control) and more importantly how it is used (usage control). Access...... and usage control is necessary to enforce existing open data policies. We have proposed the vision of trusted cells: A decentralized infrastructure, based on secure hardware running on devices equipped with trusted execution environments at the edges of the Internet. We originally described the utilization...... data sets with access and usage control guarantees. We rely on examples from terrestrial research and monitoring in the arctic in the context of the INTERACT project....

  16. Clustering procedures for the optimal selection of data sets from multiple crystals in macromolecular crystallography

    Energy Technology Data Exchange (ETDEWEB)

    Foadi, James [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Imperial College, London SW7 2AZ (United Kingdom); Aller, Pierre [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Alguel, Yilmaz; Cameron, Alex [Imperial College, London SW7 2AZ (United Kingdom); Axford, Danny; Owen, Robin L. [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Armour, Wes [Oxford e-Research Centre (OeRC), Keble Road, Oxford OX1 3QG (United Kingdom); Waterman, David G. [Research Complex at Harwell (RCaH), Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0FA (United Kingdom); Iwata, So [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom); Imperial College, London SW7 2AZ (United Kingdom); Evans, Gwyndaf, E-mail: gwyndaf.evans@diamond.ac.uk [Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE (United Kingdom)

    2013-08-01

    A systematic approach to the scaling and merging of data from multiple crystals in macromolecular crystallography is introduced and explained. The availability of intense microbeam macromolecular crystallography beamlines at third-generation synchrotron sources has enabled data collection and structure solution from microcrystals of <10 µm in size. The increased likelihood of severe radiation damage where microcrystals or particularly sensitive crystals are used forces crystallographers to acquire large numbers of data sets from many crystals of the same protein structure. The associated analysis and merging of multi-crystal data is currently a manual and time-consuming step. Here, a computer program, BLEND, that has been written to assist with and automate many of the steps in this process is described. It is demonstrated how BLEND has successfully been used in the solution of a novel membrane protein.

  17. ANALYSIS DATA SETS USING HYBRID TECHNIQUES APPLIED ARTIFICIAL INTELLIGENCE BASED PRODUCTION SYSTEMS INTEGRATED DESIGN

    Directory of Open Access Journals (Sweden)

    Daniel-Petru GHENCEA

    2017-06-01

    Full Text Available The paper proposes a prediction model of behavior spindle from the point of view of the thermal deformations and the level of the vibrations by highlighting and processing the characteristic equations. This is a model analysis for the shaft with similar electro-mechanical characteristics can be achieved using a hybrid analysis based on artificial intelligence (genetic algorithms - artificial neural networks - fuzzy logic. The paper presents a prediction mode obtaining valid range of values for spindles with similar characteristics based on measured data sets from a few spindles test without additional measures being required. Extracting polynomial functions of graphs resulting from simultaneous measurements and predict the dynamics of the two features with multi-objective criterion is the main advantage of this method.

  18. Correction of Magnetic Optics and Beam Trajectory Using LOCO Based Algorithm with Expanded Experimental Data Sets

    Energy Technology Data Exchange (ETDEWEB)

    Romanov, A.; Edstrom, D.; Emanov, F. A.; Koop, I. A.; Perevedentsev, E. A.; Rogovsky, Yu. A.; Shwartz, D. B.; Valishev, A.

    2017-03-28

    Precise beam based measurement and correction of magnetic optics is essential for the successful operation of accelerators. The LOCO algorithm is a proven and reliable tool, which in some situations can be improved by using a broader class of experimental data. The standard data sets for LOCO include the closed orbit responses to dipole corrector variation, dispersion, and betatron tunes. This paper discusses the benefits from augmenting the data with four additional classes of experimental data: the beam shape measured with beam profile monitors; responses of closed orbit bumps to focusing field variations; betatron tune responses to focusing field variations; BPM-to-BPM betatron phase advances and beta functions in BPMs from turn-by-turn coordinates of kicked beam. All of the described features were implemented in the Sixdsimulation software that was used to correct the optics of the VEPP-2000 collider, the VEPP-5 injector booster ring, and the FAST linac.

  19. Biogas Production from Chicken Manure

    Directory of Open Access Journals (Sweden)

    Kenan Dalkılıç

    2013-11-01

    Full Text Available Traditionally, animal manures are burned for heating in Turkey. It is also used as soil conditioner which has adverse environmental effects. Although, the use of renewable energy sources in Turkey is very limited, the application studies on biogas production from animal manure are increasing. 25-30% of total animal manures produced in Turkey are composed of chicken manure. The works on biogas production from chicken manure are very limited in Turkey. In this paper, biogas production studies from chicken manure in Turkey and in the World are reviewed.

  20. Characteristics of monsoon inversions over the Arabian Sea observed by satellite sounder and reanalysis data sets

    Directory of Open Access Journals (Sweden)

    S. Dwivedi

    2016-04-01

    Full Text Available Monsoon inversion (MI over the Arabian Sea (AS is one of the important characteristics associated with the monsoon activity over Indian region during summer monsoon season. In the present study, we have used 5 years (2009–2013 of temperature and water vapour measurement data obtained from satellite sounder instrument, an Infrared Atmospheric Sounding Interferometer (IASI onboard MetOp satellite, in addition to ERA-Interim data, to study their characteristics. The lower atmospheric data over the AS have been examined first to identify the areas where MIs are predominant and occur with higher strength. Based on this information, a detailed study has been made to investigate their characteristics separately in the eastern AS (EAS and western AS (WAS to examine their contrasting features. The initiation and dissipation times of MIs, their percentage occurrence, strength, etc., has been examined using the huge database. The relation with monsoon activity (rainfall over Indian region during normal and poor monsoon years is also studied. WAS ΔT values are  ∼  2 K less than those over the EAS, ΔT being the temperature difference between 950 and 850 hPa. A much larger contrast between the WAS and EAS in ΔT is noticed in ERA-Interim data set vis-à-vis those observed by satellites. The possibility of detecting MI from another parameter, refractivity N, obtained directly from another satellite constellation of GPS Radio Occultation (RO (COSMIC, has also been examined. MI detected from IASI and Atmospheric Infrared Sounder (AIRS onboard the NOAA satellite have been compared to see how far the two data sets can be combined to study the MI characteristics. We suggest MI could also be included as one of the semipermanent features of southwest monsoon along with the presently accepted six parameters.

  1. Developing a Minimum Data Set for an Information Management System to Study Traffic Accidents in Iran.

    Science.gov (United States)

    Mohammadi, Ali; Ahmadi, Maryam; Gharagozlu, Alireza

    2016-03-01

    Each year, around 1.2 million people die in the road traffic incidents. Reducing traffic accidents requires an exact understanding of the risk factors associated with traffic patterns and behaviors. Properly analyzing these factors calls for a comprehensive system for collecting and processing accident data. The aim of this study was to develop a minimum data set (MDS) for an information management system to study traffic accidents in Iran. This descriptive, cross-sectional study was performed in 2014. Data were collected from the traffic police, trauma centers, medical emergency centers, and via the internet. The investigated resources for this study were forms, databases, and documents retrieved from the internet. Forms and databases were identical, and one sample of each was evaluated. The related internet-sourced data were evaluated in their entirety. Data were collected using three checklists. In order to arrive at a consensus about the data elements, the decision Delphi technique was applied using questionnaires. The content validity and reliability of the questionnaires were assessed by experts' opinions and the test-retest method, respectively. An (MDS) of a traffic accident information management system was assigned to three sections: a minimum data set for traffic police with six classes, including 118 data elements; a trauma center with five data classes, including 57 data elements; and a medical emergency center, with 11 classes, including 64 data elements. Planning for the prevention of traffic accidents requires standardized data. As the foundation for crash prevention efforts, existing standard data infrastructures present policymakers and government officials with a great opportunity to strengthen and integrate existing accident information systems to better track road traffic injuries and fatalities.

  2. Derivation of inner magnetospheric electric field (UNH-IMEF model using Cluster data set

    Directory of Open Access Journals (Sweden)

    H. Matsui

    2008-09-01

    Full Text Available We derive an inner magnetospheric electric field (UNH-IMEF model at L=2–10 using primarily Cluster electric field data for more than 5 years between February 2001 and October 2006. This electric field data set is divided into several ranges of the interplanetary electric field (IEF values measured by ACE. As ring current simulations which require electric field as an input parameter are often performed at L=2–6.6, we have included statistical results from ground radars and low altitude satellites inside the perigee of Cluster in our data set (L~4. Electric potential patterns are derived from the average electric fields by solving an inverse problem. The electric potential pattern for small IEF values is probably affected by the ionospheric dynamo. The magnitudes of the electric field increase around the evening local time as IEF increases, presumably due to the sub-auroral polarization stream (SAPS. Another region with enhanced electric fields during large IEF periods is located around 9 MLT at L>8, which is possibly related to solar wind-magnetosphere coupling. Our potential patterns are consistent with those derived from self-consistent simulations. As the potential patterns can be interpolated/extrapolated to any discrete IEF value within measured ranges, we thus derive an empirical electric potential model. The performance of the model is evaluated by comparing the electric field derived from the model with original one measured by Cluster and mapped to the equator. The model is open to the public through our website.

  3. Derivation of inner magnetospheric electric field (UNH-IMEF model using Cluster data set

    Directory of Open Access Journals (Sweden)

    H. Matsui

    2008-09-01

    Full Text Available We derive an inner magnetospheric electric field (UNH-IMEF model at L=2–10 using primarily Cluster electric field data for more than 5 years between February 2001 and October 2006. This electric field data set is divided into several ranges of the interplanetary electric field (IEF values measured by ACE. As ring current simulations which require electric field as an input parameter are often performed at L=2–6.6, we have included statistical results from ground radars and low altitude satellites inside the perigee of Cluster in our data set (L~4. Electric potential patterns are derived from the average electric fields by solving an inverse problem. The electric potential pattern for small IEF values is probably affected by the ionospheric dynamo. The magnitudes of the electric field increase around the evening local time as IEF increases, presumably due to the sub-auroral polarization stream (SAPS. Another region with enhanced electric fields during large IEF periods is located around 9 MLT at L>8, which is possibly related to solar wind-magnetosphere coupling. Our potential patterns are consistent with those derived from self-consistent simulations. As the potential patterns can be interpolated/extrapolated to any discrete IEF value within measured ranges, we thus derive an empirical electric potential model. The performance of the model is evaluated by comparing the electric field derived from the model with original one measured by Cluster and mapped to the equator. The model is open to the public through our website.

  4. Jumps in GNSS coordinates time series, a simple and fast methodology to clean the data sets

    Science.gov (United States)

    Bruni, Sara; Zerbini, Susanna; Raicich, Fabio; Errico, Maddalena; Santi, Efisio

    2014-05-01

    GNSS coordinate time series often suffer from the presence of undesired offsets of different nature which may impair the reliable estimation of the long-period trend and that should be corrected in the original data sets. Examples of such discontinuities are those originated by earthquakes, monumentation problems, replacement/maintenance of the station equipment, change of the reference system and by a number of unforeseen events. We have developed an automated and fast data inspection procedure for estimating the time of occurrence and the magnitude of the jumps and for correcting the time series accordingly. These processing characteristics are important because many time series are now spanning almost two decades, and dense GNSS networks are becoming a reality. The procedure has been developed and tailored to GNSS data sets starting from the Sequential T-test Analysis of Regime Shifts (STARS) originally conceived by Rodionov (Geophys. Res. Lett., 31, L09204, 2004) in the context of climatic studies. This technique does not make any a priori assumption on the time of occurrence and on the magnitude of the discontinuities. A jump is detected and its magnitude estimated when, over two consecutive time windows of the same length, the mean value exhibits a statistically significant change. Three user-defined parameters are required: the cut-off length, L, representing the minimum time interval between two consecutive discontinuities, the significance level, p, of the exploited two-tailed Student t-test, and the Huber parameter, H, used to compute a weighted mean over the L-day intervals. The method has been tested on GPS coordinates time series of stations located in the southeastern Po Plain, in Italy. The series span more than 15 years and are affected by offsets of different nature. The methodology has proven to be effective, as confirmed by the comparison between the corrected GPS time series and those obtained by other co-located observation techniques such as

  5. An Uncertainty Data Set for Passive Microwave Satellite Observations of Warm Cloud Liquid Water Path

    Science.gov (United States)

    Greenwald, Thomas J.; Bennartz, Ralf; Lebsock, Matthew; Teixeira, João.

    2018-04-01

    The first extended comprehensive data set of the retrieval uncertainties in passive microwave observations of cloud liquid water path (CLWP) for warm oceanic clouds has been created for practical use in climate applications. Four major sources of systematic errors were considered over the 9-year record of the Advanced Microwave Scanning Radiometer-EOS (AMSR-E): clear-sky bias, cloud-rain partition (CRP) bias, cloud-fraction-dependent bias, and cloud temperature bias. Errors were estimated using a unique merged AMSR-E/Moderate resolution Imaging Spectroradiometer Level 2 data set as well as observations from the Cloud-Aerosol Lidar with Orthogonal Polarization and the CloudSat Cloud Profiling Radar. To quantify the CRP bias more accurately, a new parameterization was developed to improve the inference of CLWP in warm rain. The cloud-fraction-dependent bias was found to be a combination of the CRP bias, an in-cloud bias, and an adjacent precipitation bias. Globally, the mean net bias was 0.012 kg/m2, dominated by the CRP and in-cloud biases, but with considerable regional and seasonal variation. Good qualitative agreement between a bias-corrected AMSR-E CLWP climatology and ship observations in the Northeast Pacific suggests that the bias estimates are reasonable. However, a possible underestimation of the net bias in certain conditions may be due in part to the crude method used in classifying precipitation, underscoring the need for an independent method of detecting rain in warm clouds. This study demonstrates the importance of combining visible-infrared imager data and passive microwave CLWP observations for estimating uncertainties and improving the accuracy of these observations.

  6. Evaluating Fast Maximum Likelihood-Based Phylogenetic Programs Using Empirical Phylogenomic Data Sets

    Science.gov (United States)

    Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd

    2018-01-01

    Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474

  7. Geologic mapping of the Hekla volcano (Iceland) using integrated data sets from optic and radar sensors

    Science.gov (United States)

    Wever, Tobias; Loercher, Gerhard

    1994-12-01

    During the MAC-Europe campaign in June/July 1991 different airborne data sets (AIRSAR, TMS and AVIRIS) were collected over Iceland. One test site is situated around the Hekla-volcano in South Iceland. This area is characterised by a sequence of lava flows of different ages together with tuffs and ashes. This case study shall contribute to demonstrate the potential of MAC-Europe data for geological mapping. The optical- and the SAR data was analysed separately to elaborate the preferences of the different sensors. An approach was carried out to process an image representing the advantages of the respective sensors in only one presentation. The synergetic approach improves the separation of geological units clearly by combination of two completely different data sets due to the utilisation of spectral bands in the visible and infrared region on one side and on the other side in the microwave region. Beside the petrographical information extracted from optical data using spectral signatures the combination includes physical information like roughness and dielectricity of a target. The geologic setting of the test area is characterised by a very uniform petrography hence the spectral signatures are showing only little variations. Due to this fact, the differentiation of geological units using optical data is limited. The additional use of SAR data establishes the new dimension of the surface roughness which improves the discrimination clearly. This additional parameter presents a new information tool about the state of weathering, age and sequence of the different lava flows. The NASA/JPL AIRSAR system is very suitable for this kind of investigation due to its multifrequency and polarimetric capabilities. The three SAR frequencies (C-, L- and P-Band) enable the detection of a broad range of roughness differences. These results can be enhanced by comprising the full scattering matrix of the polarimetric AIRSAR data.

  8. Day 1 for the Integrated Multi-Satellite Retrievals for GPM (IMERG) Data Sets

    Science.gov (United States)

    Huffman, G. J.; Bolvin, D. T.; Braithwaite, D.; Hsu, K. L.; Joyce, R.; Kidd, C.; Sorooshian, S.; Xie, P.

    2014-12-01

    The Integrated Multi-satellitE Retrievals for GPM (IMERG) is designed to compute the best time series of (nearly) global precipitation from "all" precipitation-relevant satellites and global surface precipitation gauge analyses. IMERG was developed to use GPM Core Observatory data as a reference for the international constellation of satellites of opportunity that constitute the GPM virtual constellation. Computationally, IMERG is a unified U.S. algorithm drawing on strengths in the three contributing groups, whose previous work includes: 1) the TRMM Multi-satellite Precipitation Analysis (TMPA); 2) the CPC Morphing algorithm with Kalman Filtering (K-CMORPH); and 3) the Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks using a Cloud Classification System (PERSIANN-CCS). We review the IMERG design, development, testing, and current status. IMERG provides 0.1°x0.1° half-hourly data, and will be run at multiple times, providing successively more accurate estimates: 4 hours, 8 hours, and 2 months after observation time. In Day 1 the spatial extent is 60°N-S, for the period March 2014 to the present. In subsequent reprocessing the data will extend to fully global, covering the period 1998 to the present. Both the set of input data set retrievals and the IMERG system are substantially different than those used in previous U.S. products. The input passive microwave data are all being produced with GPROF2014, which is substantially upgraded compared to previous versions. For the first time, this includes microwave sounders. Accordingly, there is a strong need to carefully check the initial test data sets for performance. IMERG output will be illustrated using pre-operational test data, including the variety of supporting fields, such as the merged-microwave and infrared estimates, and the precipitation type. Finally, we will summarize the expected release of various output products, and the subsequent reprocessing sequence.

  9. Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme.

    Directory of Open Access Journals (Sweden)

    Hashem Koohy

    Full Text Available DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data.We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing.DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.

  10. Development of a Minimum Data Set (MDS) for C-Section Anesthesia Information Management System (AIMS).

    Science.gov (United States)

    Sheykhotayefeh, Mostafa; Safdari, Reza; Ghazisaeedi, Marjan; Khademi, Seyed Hossein; Seyed Farajolah, Seyedeh Sedigheh; Maserat, Elham; Jebraeily, Mohamad; Torabi, Vahid

    2017-04-01

    Caesarean section, also known as C-section, is a very common procedure in the world. Minimum data set (MDS) is defined as a set of data elements holding information regarding a series of target entities to provide a basis for planning, management, and performance evaluation. MDS has found a great use in health care information systems. Also, it can be considered as a basis for medical information management and has shown a great potential for contributing to the provision of high quality care and disease control measures. The principal aim of this research was to determine MDS and required capabilities for Anesthesia information management system (AIMS) in C-section in Iran. Data items collected from several selected AIMS were studied to establish an initial set of data. The population of this study composed of 115 anesthesiologists was asked to review the proposed data elements and score them in order of importance by using a five-point Likert scale. The items scored as important or highly important by at least 75% of the experts were included in the final list of minimum data set. Overall 8 classes of data (consisted of 81 key data elements) were determined as final set. Also, the most important required capabilities were related to airway management and hypertension and hypotension management. In the development of information system (IS) based on MDS and identification, because of the broad involvement of users, IS capabilities must focus on the users' needs to form a successful system. Therefore, it is essential to assess MDS watchfully by considering the planned uses of data. Also, IS should have essential capabilities to meet the needs of its users.

  11. A method for statistical comparison of data sets and its uses in analysis of nuclear physics data

    International Nuclear Information System (INIS)

    Bityukov, S.I.; Smirnova, V.V.; Krasnikov, N.V.; Maksimushkina, A.V.; Nikitenko, A.N.

    2014-01-01

    Authors propose a method for statistical comparison of two data sets. The method is based on the method of statistical comparison of histograms. As an estimator of quality of the decision made, it is proposed to use the value which it is possible to call the probability that the decision (data sets are various) is correct [ru

  12. The Impacts of Different Meteorology Data Sets on Nitrogen Fate and Transport in the SWAT Watershed Model

    Science.gov (United States)

    In this study, we investigated how different meteorology data sets impacts nitrogen fate and transport responses in the Soil and Water Assessment Tool (SWAT) model. We used two meteorology data sets: National Climatic Data Center (observed) and Mesoscale Model 5/Weather Research ...

  13. Molecular characterization of chicken syndecan-2 proteoglycan

    DEFF Research Database (Denmark)

    Chen, Ligong; Couchman, John R; Smith, Jacqueline

    2002-01-01

    A partial syndecan-2 sequence (147 bp) was obtained from chicken embryonic fibroblast poly(A)+ RNA by reverse transcription-PCR. This partial sequence was used to produce a 5'-end-labelled probe. A chicken liver cDNA library was screened with this probe, and overlapping clones were obtained......Da. Western blotting of chicken embryonic fibroblast cell lysates with species-specific monoclonal antibody mAb 8.1 showed that chicken syndecan-2 is substituted with heparan sulphate, and that the major form of chicken syndecan-2 isolated from chicken fibroblasts is consistent with the formation of SDS......-resistant dimers, which is common for syndecans. A 5'-end-labelled probe hybridized to two mRNA species in chicken embryonic fibroblasts, while Northern analysis with poly(A)+ RNAs from different tissues of chicken embryos showed wide and distinct distributions of chicken syndecan-2 during embryonic development...

  14. Error Characterisation and Merging of Active and Passive Microwave Soil Moisture Data Sets

    Science.gov (United States)

    Wagner, Wolfgang; Gruber, Alexander; de Jeu, Richard; Parinussa, Robert; Chung, Daniel; Dorigo, Wouter; Reimer, Christoph; Kidd, Richard

    2015-04-01

    As part of the Climate Change Initiative (CCI) programme of the European Space Agency (ESA) a data fusion system has been developed which is capable of ingesting surface soil moisture data derived from active and passive microwave sensors (ASCAT, AMSR-E, etc.) flown on different satellite platforms and merging them to create long and consistent time series of soil moisture suitable for use in climate change studies. The so-created soil moisture data records (latest version: ESA CCI SM v02.1 released on 5/12/2014) are freely available and can be obtained from http://www.esa-soilmoisture-cci.org/. As described by Wagner et al. (2012) the principle steps of the data fusion process are: 1) error characterisation, 2) matching to account for data set specific biases, and 3) merging. In this presentation we present the current data fusion process and discuss how new error characterisation methods, such as the increasingly popular triple collocation method as discussed for example by Zwieback et al. (2012) may be used to improve it. The main benefit of an improved error characterisation would be a more reliable identification of the best performing microwave soil moisture retrieval(s) for each grid point and each point in time. In case that two or more satellite data sets provides useful information, the estimated errors can be used to define the weights with which each satellite data set are merged, i.e. the lower its error the higher its weight. This is expected to bring a significant improvement over the current data fusion scheme which is not yet based on quantitative estimates of the retrieval errors but on a proxy measure, namely the vegetation optical depth (Dorigo et al., 2015): over areas with low vegetation passive soil moisture retrievals are used, while over areas with moderate vegetation density active retrievals are used. In transition areas, where both products correlate well, both products are being used in a synergistic way: on time steps where only one of

  15. EUDAT and EPOS moving towards the efficient management of scientific data sets

    Science.gov (United States)

    Fiameni, Giuseppe; Bailo, Daniele; Cacciari, Claudio

    2016-04-01

    This abstract presents the collaboration between the European Collaborative Data Infrastructure (EUDAT) and the pan-European infrastructure for solid Earth science (EPOS) which draws on the management of scientific data sets through a reciprocal support agreement. EUDAT is a Consortium of European Data Centers and Scientific Communities whose focus is the development and realisation of the Collaborative Data Infrastructure (CDI), a common model for managing data spanning all European research data centres and data repositories and providing an interoperable layer of common data services. The EUDAT Service Suite is a set of a) implementations of the CDI model and b) standards, developed and offered by members of the EUDAT Consortium. These EUDAT Services include a baseline of CDI-compliant interface and API services - a "CDI Gateway" - plus a number of web-based GUIs and command-line client tools. On the other hand,the EPOS initiative aims at creating a pan-European infrastructure for the solid Earth science to support a safe and sustainable society. In accordance with this scientific vision, the mission of EPOS is to integrate the diverse and advanced European Research Infrastructures for solid Earth Science relying on new e-science opportunities to monitor and unravel the dynamic and complex Earth System. EPOS will enable innovative multidisciplinary research for a better understanding of the Earth's physical and chemical processes that control earthquakes, volcanic eruptions, ground instability and tsunami as well as the processes driving tectonics and Earth's surface dynamics. Through the integration of data, models and facilities EPOS will allow the Earth Science community to make a step change in developing new concepts and tools for key answers to scientific and socio-economic questions concerning geo-hazards and geo-resources as well as Earth sciences applications to the environment and to human welfare. To achieve this integration challenge and the

  16. BIPOLAR MAGNETIC REGIONS ON THE SUN: GLOBAL ANALYSIS OF THE SOHO/MDI DATA SET

    International Nuclear Information System (INIS)

    Stenflo, J. O.; Kosovichev, A. G.

    2012-01-01

    The magnetic flux that is generated by dynamo processes inside the Sun emerges in the form of bipolar magnetic regions. The properties of these directly observable signatures of the dynamo can be extracted from full-disk solar magnetograms. The most homogeneous, high-quality synoptic data set of solar magnetograms has been obtained with the Michelson Doppler Imager (MDI) instrument on the Solar and Heliospheric Observatory spacecraft during 1995-2011. We have developed an IDL program that has, when applied to the 73,838 magnetograms of the MDI data set, automatically identified 160,079 bipolar magnetic regions that span a range of scale sizes across nearly four orders of magnitude. The properties of each region have been extracted and statistically analyzed, in particular with respect to the polarity orientations of the bipolar regions, including their tilt-angle distributions and their violations of Hale's polarity law. The latitude variation of the average tilt angles (with respect to the E-W direction), which is known as Joy's law, is found to closely follow the relation 32. 0 1 × sin (latitude). There is no indication of a dependence on region size that one may expect if the tilts were produced by the Coriolis force during the buoyant rise of flux loops from the tachocline region. A few percent of all regions have orientations that violate Hale's polarity law. We show explicit examples, from different phases of the solar cycle, where well-defined medium-size bipolar regions with opposite polarity orientations occur side by side in the same latitude zone in the same magnetogram. Such oppositely oriented large bipolar regions cannot be part of the same toroidal flux system, but different flux systems must coexist at any given time in the same latitude zones. These examples are incompatible with the paradigm of coherent, subsurface toroidal flux ropes as the source of sunspots, and instead show that fluctuations must play a major role at all scales for the

  17. Snapshot science: new research possibilities facilitated by spatially dense data sets in limnology

    Science.gov (United States)

    Stanley, E. H.; Loken, L. C.; Crawford, J.; Butitta, V.; Schramm, P.

    2017-12-01

    The recent increase in availability of high frequency sensors is transforming the study of inland aquatic ecosystems, allowing the detection of rare or difficult-to-capture events, revealing previously unappreciated temporal dynamics, and providing rich data sets that can be used to calibrate or inform process-based models in ways that have not previously been possible. Yet sensor deployment is typically a 1-D practice, so insights are tempered by device placement. Limnologists have long known that there can be substantial spatial variability in physical, chemical, and biological features within water bodies, but in most cases, logistical difficulties limit our ability to quantify this heterogeneity. Recent improvements in remote sensing are helping to overcome this deficit for a subset of variables. Alternatively, devices such as the Fast Limnology Automated Measurement platform that deploy sensors on watercraft can be used to quickly generate spatially-rich data sets. This expanded capacity leads to new questions about what can be seen and learned about underlying processes. Surveys of multiple Wisconsin lakes reveal both homogeneity and heterogeneity among sites and variables, indicating that the limnological tradition of sampling at a single fixed point is unlikely to represent the entire lake area. Initial inferences drawn from surface water maps include identification of biogeochemical hotspots or areas of elevated loading. At a more sophisticated level, evaluation of changes in spatial structure among sites or dates is commonly used to infer process by landscape ecologists, and these same practices can now be applied to lakes and rivers. For example, a recent study documented significant changes in spatial variance and the magnitude of spatial autocorrelation of phycocyanin prior to the onset of a cyanobacterial bloom. This may provide information on population growth dynamics of cyanobacteria, and be used as early warnings of impending algal blooms. As the

  18. Monitoring and Quantifying Subsurface Ice and Water Content in Permafrost Regions Based on Geophysical Data Sets

    Science.gov (United States)

    Hauck, C.; Bach, M.; Hilbich, C.

    2007-12-01

    Based on recent observational evidence of climate change in permafrost regions, it is now recognised that a detailed knowledge of the material composition of the subsurface in permafrost regions is required for modelling of the future evolution of the ground thermal regime and an assessment of the hazard potential due to degrading permafrost. However, due to the remote location of permafrost areas and the corresponding difficulties in obtaining high-quality data sets of the subsurface, knowledge about the material composition in permafrost areas is scarce. In frozen ground subsurface material may consist of four different phases: rock/soil matrix, unfrozen pore water, ice and air-filled pore space. Applications of geophysical techniques for determining the subsurface composition are comparatively cheap and logistically feasible alternatives to the single point information from boreholes. Due to the complexity of the subsurface a combination of complementary geophysical methods (e.g. electrical resistivity tomography (ERT) and refraction seismic tomography) is often favoured to avoid ambiguities in the interpretation of the results. The indirect nature of geophysical soundings requires a relation between the measured variable (electrical resistivity, seismic velocity) and the rock-, water-, ice- and air content. In this contribution we will present a model which determines the volumetric fractions of these four phases from tomographic electrical and seismic data sets. The so-called 4-phase model is based on two well-known geophysical mixing rules using observed resistivity and velocity data as input data on a 2-dimensional grid. Material properties such as resistivity and P- wave velocity of the host rock material and the pore water have to be known beforehand. The remaining free model parameters can be determined by a Monte-Carlo approach, the results of which are used additionally as indicator for the reliability of the model results. First results confirm the

  19. Assessing the optimality of ASHRAE climate zones using high resolution meteorological data sets

    Science.gov (United States)

    Fils, P. D.; Kumar, J.; Collier, N.; Hoffman, F. M.; Xu, M.; Forbes, W.

    2017-12-01

    Energy consumed by built infrastructure constitutes a significant fraction of the nation's energy budget. According to 2015 US Energy Information Agency report, 41% of the energy used in the US was going to residential and commercial buildings. Additional research has shown that 32% of commercial building energy goes into heating and cooling the building. The American National Standards Institute and the American Society of Heating Refrigerating and Air-Conditioning Engineers Standard 90.1 provides climate zones for current state-of-practice since heating and cooling demands are strongly influenced by spatio-temporal weather variations. For this reason, we have been assessing the optimality of the climate zones using high resolution daily climate data from NASA's DAYMET database. We analyzed time series of meteorological data sets for all ASHRAE climate zones between 1980-2016 inclusively. We computed the mean, standard deviation, and other statistics for a set of meteorological variables (solar radiation, maximum and minimum temperature)within each zone. By plotting all the zonal statistics, we analyzed patterns and trends in those data over the past 36 years. We compared the means of each zone to its standard deviation to determine the range of spatial variability that exist within each zone. If the band around the mean is too large, it indicates that regions in the zone experience a wide range of weather conditions and perhaps a common set of building design guidelines would lead to a non-optimal energy consumption scenario. In this study we have observed a strong variation in the different climate zones. Some have shown consistent patterns in the past 36 years, indicating that the zone was well constructed, while others have greatly deviated from their mean indicating that the zone needs to be reconstructed. We also looked at redesigning the climate zones based on high resolution climate data. We are using building simulations models like EnergyPlus to develop

  20. Building Bridges Between Geoscience and Data Science through Benchmark Data Sets

    Science.gov (United States)

    Thompson, D. R.; Ebert-Uphoff, I.; Demir, I.; Gel, Y.; Hill, M. C.; Karpatne, A.; Güereque, M.; Kumar, V.; Cabral, E.; Smyth, P.

    2017-12-01

    The changing nature of observational field data demands richer and more meaningful collaboration between data scientists and geoscientists. Thus, among other efforts, the Working Group on Case Studies of the NSF-funded RCN on Intelligent Systems Research To Support Geosciences (IS-GEO) is developing a framework to strengthen such collaborations through the creation of benchmark datasets. Benchmark datasets provide an interface between disciplines without requiring extensive background knowledge. The goals are to create (1) a means for two-way communication between geoscience and data science researchers; (2) new collaborations, which may lead to new approaches for data analysis in the geosciences; and (3) a public, permanent repository of complex data sets, representative of geoscience problems, useful to coordinate efforts in research and education. The group identified 10 key elements and characteristics for ideal benchmarks. High impact: A problem with high potential impact. Active research area: A group of geoscientists should be eager to continue working on the topic. Challenge: The problem should be challenging for data scientists. Data science generality and versatility: It should stimulate development of new general and versatile data science methods. Rich information content: Ideally the data set provides stimulus for analysis at many different levels. Hierarchical problem statement: A hierarchy of suggested analysis tasks, from relatively straightforward to open-ended tasks. Means for evaluating success: Data scientists and geoscientists need means to evaluate whether the algorithms are successful and achieve intended purpose. Quick start guide: Introduction for data scientists on how to easily read the data to enable rapid initial data exploration. Geoscience context: Summary for data scientists of the specific data collection process, instruments used, any pre-processing and the science questions to be answered. Citability: A suitable identifier to

  1. Interactive Visualization and Analysis of Geospatial Data Sets - TrikeND-iGlobe

    Science.gov (United States)

    Rosebrock, Uwe; Hogan, Patrick; Chandola, Varun

    2013-04-01

    The visualization of scientific datasets is becoming an ever-increasing challenge as advances in computing technologies have enabled scientists to build high resolution climate models that have produced petabytes of climate data. To interrogate and analyze these large datasets in real-time is a task that pushes the boundaries of computing hardware and software. But integration of climate datasets with geospatial data requires considerable amount of effort and close familiarity of various data formats and projection systems, which has prevented widespread utilization outside of climate community. TrikeND-iGlobe is a sophisticated software tool that bridges this gap, allows easy integration of climate datasets with geospatial datasets and provides sophisticated visualization and analysis capabilities. The objective for TrikeND-iGlobe is the continued building of an open source 4D virtual globe application using NASA World Wind technology that integrates analysis of climate model outputs with remote sensing observations as well as demographic and environmental data sets. This will facilitate a better understanding of global and regional phenomenon, and the impact analysis of climate extreme events. The critical aim is real-time interactive interrogation. At the data centric level the primary aim is to enable the user to interact with the data in real-time for the purpose of analysis - locally or remotely. TrikeND-iGlobe provides the basis for the incorporation of modular tools that provide extended interactions with the data, including sub-setting, aggregation, re-shaping, time series analysis methods and animation to produce publication-quality imagery. TrikeND-iGlobe may be run locally or can be accessed via a web interface supported by high-performance visualization compute nodes placed close to the data. It supports visualizing heterogeneous data formats: traditional geospatial datasets along with scientific data sets with geographic coordinates (NetCDF, HDF, etc

  2. Galaxy Evolution Insights from Spectral Modeling of Large Data Sets from the Sloan Digital Sky Survey

    Energy Technology Data Exchange (ETDEWEB)

    Hoversten, Erik A. [Johns Hopkins Univ., Baltimore, MD (United States)

    2007-10-01

    This thesis centers on the use of spectral modeling techniques on data from the Sloan Digital Sky Survey (SDSS) to gain new insights into current questions in galaxy evolution. The SDSS provides a large, uniform, high quality data set which can be exploited in a number of ways. One avenue pursued here is to use the large sample size to measure precisely the mean properties of galaxies of increasingly narrow parameter ranges. The other route taken is to look for rare objects which open up for exploration new areas in galaxy parameter space. The crux of this thesis is revisiting the classical Kennicutt method for inferring the stellar initial mass function (IMF) from the integrated light properties of galaxies. A large data set (~ 105 galaxies) from the SDSS DR4 is combined with more in-depth modeling and quantitative statistical analysis to search for systematic IMF variations as a function of galaxy luminosity. Galaxy Hα equivalent widths are compared to a broadband color index to constrain the IMF. It is found that for the sample as a whole the best fitting IMF power law slope above 0.5 M is Γ = 1.5 ± 0.1 with the error dominated by systematics. Galaxies brighter than around Mr,0.1 = -20 (including galaxies like the Milky Way which has Mr,0.1 ~ -21) are well fit by a universal Γ ~ 1.4 IMF, similar to the classical Salpeter slope, and smooth, exponential star formation histories (SFH). Fainter galaxies prefer steeper IMFs and the quality of the fits reveal that for these galaxies a universal IMF with smooth SFHs is actually a poor assumption. Related projects are also pursued. A targeted photometric search is conducted for strongly lensed Lyman break galaxies (LBG) similar to MS1512-cB58. The evolution of the photometric selection technique is described as are the results of spectroscopic follow-up of the best targets. The serendipitous discovery of two interesting blue compact dwarf galaxies is reported. These

  3. Caught you: threats to confidentiality due to the public release of large-scale genetic data sets.

    Science.gov (United States)

    Wjst, Matthias

    2010-12-29

    Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal identifiers. The assumption of anonymity of genetic data sets, however, is tenuous because genetic data are intrinsically self-identifying. Two types of re-identification are possible: the "Netflix" type and the "profiling" type. The "Netflix" type needs another small genetic data set, usually with less than 100 SNPs but including a personal identifier. This second data set might originate from another clinical examination, a study of leftover samples or forensic testing. When merged to the primary, unidentified set it will re-identify all samples of that individual. Even with no second data set at hand, a "profiling" strategy can be developed to extract as much information as possible from a sample collection. Starting with the identification of ethnic subgroups along with predictions of body characteristics and diseases, the asthma kids case as a real-life example is used to illustrate that approach. Depending on the degree of supplemental information, there is a good chance that at least a few individuals can be identified from an anonymized data set. Any re-identification, however, may potentially harm study participants because it will release individual genetic disease risks to the public.

  4. Caught you: threats to confidentiality due to the public release of large-scale genetic data sets

    Directory of Open Access Journals (Sweden)

    Wjst Matthias

    2010-12-01

    Full Text Available Abstract Background Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal identifiers. Discussion The assumption of anonymity of genetic data sets, however, is tenuous because genetic data are intrinsically self-identifying. Two types of re-identification are possible: the "Netflix" type and the "profiling" type. The "Netflix" type needs another small genetic data set, usually with less than 100 SNPs but including a personal identifier. This second data set might originate from another clinical examination, a study of leftover samples or forensic testing. When merged to the primary, unidentified set it will re-identify all samples of that individual. Even with no second data set at hand, a "profiling" strategy can be developed to extract as much information as possible from a sample collection. Starting with the identification of ethnic subgroups along with predictions of body characteristics and diseases, the asthma kids case as a real-life example is used to illustrate that approach. Summary Depending on the degree of supplemental information, there is a good chance that at least a few individuals can be identified from an anonymized data set. Any re-identification, however, may potentially harm study participants because it will release individual genetic disease risks to the public.

  5. Measures of aging with disability in U.S. secondary data sets: Results of a scoping review.

    Science.gov (United States)

    Putnam, Michelle; Molton, Ivan R; Truitt, Anjali R; Smith, Amanda E; Jensen, Mark P

    2016-01-01

    There remain significant knowledge gaps in our understanding of aging with long-term disability. It is possible that important advances in knowledge could be gained using existing secondary data sets. However, little is known regarding which of the data sets available to researchers contain the age-related measures needed for this purpose, specifically age of onset and/or duration of disability measures. To better understand the capacity to investigate aging with long-term disability (e.g. mobility limitation) and aging with long-term chronic conditions (e.g. spinal cord injury, multiple sclerosis) using extant data. Public use national and regional data sets were identified through existing reports, web-based searches, and expert nomination. The age- and disability-related variables, including age of onset and duration of disability, were tabulated for data sets meeting inclusion criteria. Analysis was descriptive. A total of N = 44 data sets were reviewed. Of these, 22 contained both age and disability variables. Within these 22 data sets, 9 contained an age of onset or duration of disability variable. Six of the nine data sets contained age of diagnosis for a single or set of health conditions. Onset of functional limitation is in two, and onset of self-reported and/or employment disability is in four, of the nine data sets respectively. There is some, but limited opportunity to investigate aging with long-term disability in extant U.S. public use secondary data sets. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. SU-F-T-78: Minimum Data Set of Measurements for TG 71 Based Electron Monitor-Unit Calculations

    International Nuclear Information System (INIS)

    Xu, H; Guerrero, M; Prado, K; Yi, B

    2016-01-01

    Purpose: Building up a TG-71 based electron monitor-unit (MU) calculation protocol usually involves massive measurements. This work investigates a minimum data set of measurements and its calculation accuracy and measurement time. Methods: For 6, 9, 12, 16, and 20 MeV of our Varian Clinac-Series linear accelerators, the complete measurements were performed at different depth using 5 square applicators (6, 10, 15, 20 and 25 cm) with different cutouts (2, 3, 4, 6, 10, 15 and 20 cm up to applicator size) for 5 different SSD’s. For each energy, there were 8 PDD scans and 150 point measurements for applicator factors, cutout factors and effective SSDs that were then converted to air-gap factors for SSD 99–110cm. The dependence of each dosimetric quantity on field size and SSD was examined to determine the minimum data set of measurements as a subset of the complete measurements. The “missing” data excluded in the minimum data set were approximated by linear or polynomial fitting functions based on the included data. The total measurement time and the calculated electron MU using the minimum and the complete data sets were compared. Results: The minimum data set includes 4 or 5 PDD’s and 51 to 66 point measurements for each electron energy, and more PDD’s and fewer point measurements are generally needed as energy increases. Using only <50% of complete measurement time, the minimum data set generates acceptable MU calculation results compared to those with the complete data set. The PDD difference is within 1 mm and the calculated MU difference is less than 1.5%. Conclusion: Data set measurement for TG-71 electron MU calculations can be minimized based on the knowledge of how each dosimetric quantity depends on various setup parameters. The suggested minimum data set allows acceptable MU calculation accuracy and shortens measurement time by a few hours.

  7. SU-F-T-78: Minimum Data Set of Measurements for TG 71 Based Electron Monitor-Unit Calculations

    Energy Technology Data Exchange (ETDEWEB)

    Xu, H; Guerrero, M; Prado, K; Yi, B [University of Maryland School of Medicine, Baltimore, MD (United States)

    2016-06-15

    Purpose: Building up a TG-71 based electron monitor-unit (MU) calculation protocol usually involves massive measurements. This work investigates a minimum data set of measurements and its calculation accuracy and measurement time. Methods: For 6, 9, 12, 16, and 20 MeV of our Varian Clinac-Series linear accelerators, the complete measurements were performed at different depth using 5 square applicators (6, 10, 15, 20 and 25 cm) with different cutouts (2, 3, 4, 6, 10, 15 and 20 cm up to applicator size) for 5 different SSD’s. For each energy, there were 8 PDD scans and 150 point measurements for applicator factors, cutout factors and effective SSDs that were then converted to air-gap factors for SSD 99–110cm. The dependence of each dosimetric quantity on field size and SSD was examined to determine the minimum data set of measurements as a subset of the complete measurements. The “missing” data excluded in the minimum data set were approximated by linear or polynomial fitting functions based on the included data. The total measurement time and the calculated electron MU using the minimum and the complete data sets were compared. Results: The minimum data set includes 4 or 5 PDD’s and 51 to 66 point measurements for each electron energy, and more PDD’s and fewer point measurements are generally needed as energy increases. Using only <50% of complete measurement time, the minimum data set generates acceptable MU calculation results compared to those with the complete data set. The PDD difference is within 1 mm and the calculated MU difference is less than 1.5%. Conclusion: Data set measurement for TG-71 electron MU calculations can be minimized based on the knowledge of how each dosimetric quantity depends on various setup parameters. The suggested minimum data set allows acceptable MU calculation accuracy and shortens measurement time by a few hours.

  8. Whose data set is it anyway? Sharing raw data from randomized trials

    Directory of Open Access Journals (Sweden)

    Vickers Andrew J

    2006-05-01

    Full Text Available Abstract Background Sharing of raw research data is common in many areas of medical research, genomics being perhaps the most well-known example. In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason. Discussion Data sharing benefits numerous research-related activities: reproducing analyses; testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aiding design of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting. Clinical trialists, however, sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away. Conclusion Technological developments, particularly the Internet, have made data sharing generally a trivial logistical problem. Data sharing should come to be seen as an inherent part of conducting a randomized trial, similar to the way in which we consider ethical review and publication of study results. Journals and funding bodies should insist that trialists make raw data available, for example, by publishing data on the Web. If the clinical trial community continues to fail with respect to data sharing, we will only strengthen the public perception that we do clinical trials to benefit ourselves, not our patients.

  9. Whose data set is it anyway? Sharing raw data from randomized trials.

    Science.gov (United States)

    Vickers, Andrew J

    2006-05-16

    Sharing of raw research data is common in many areas of medical research, genomics being perhaps the most well-known example. In the clinical trial community investigators routinely refuse to share raw data from a randomized trial without giving a reason. Data sharing benefits numerous research-related activities: reproducing analyses; testing secondary hypotheses; developing and evaluating novel statistical methods; teaching; aiding design of future trials; meta-analysis; and, possibly, preventing error, fraud and selective reporting. Clinical trialists, however, sometimes appear overly concerned with being scooped and with misrepresentation of their work. Both possibilities can be avoided with simple measures such as inclusion of the original trialists as co-authors on any publication resulting from data sharing. Moreover, if we treat any data set as belonging to the patients who comprise it, rather than the investigators, such concerns fall away. Technological developments, particularly the Internet, have made data sharing generally a trivial logistical problem. Data sharing should come to be seen as an inherent part of conducting a randomized trial, similar to the way in which we consider ethical review and publication of study results. Journals and funding bodies should insist that trialists make raw data available, for example, by publishing data on the Web. If the clinical trial community continues to fail with respect to data sharing, we will only strengthen the public perception that we do clinical trials to benefit ourselves, not our patients.

  10. Adaptive Fault Detection for Complex Dynamic Processes Based on JIT Updated Data Set

    Directory of Open Access Journals (Sweden)

    Jinna Li

    2012-01-01

    Full Text Available A novel fault detection technique is proposed to explicitly account for the nonlinear, dynamic, and multimodal problems existed in the practical and complex dynamic processes. Just-in-time (JIT detection method and k-nearest neighbor (KNN rule-based statistical process control (SPC approach are integrated to construct a flexible and adaptive detection scheme for the control process with nonlinear, dynamic, and multimodal cases. Mahalanobis distance, representing the correlation among samples, is used to simplify and update the raw data set, which is the first merit in this paper. Based on it, the control limit is computed in terms of both KNN rule and SPC method, such that we can identify whether the current data is normal or not by online approach. Noted that the control limit obtained changes with updating database such that an adaptive fault detection technique that can effectively eliminate the impact of data drift and shift on the performance of detection process is obtained, which is the second merit in this paper. The efficiency of the developed method is demonstrated by the numerical examples and an industrial case.

  11. IO strategies and data services for petascale data sets from a global cloud resolving model

    International Nuclear Information System (INIS)

    Schuchardt, K L; Palmer, B J; Daily, J A; Elsethagen, T O; Koontz, A S

    2007-01-01

    Global cloud resolving models at resolutions of 4km or less create significant challenges for simulation output, data storage, data management, and post-simulation analysis and visualization. To support efficient model output as well as data analysis, new methods for IO and data organization must be evaluated. The model we are supporting, the Global Cloud Resolving Model being developed at Colorado State University, uses a geodesic grid. The non-monotonic nature of the grid's coordinate variables requires enhancements to existing data processing tools and community standards for describing and manipulating grids. The resolution, size and extent of the data suggest the need for parallel analysis tools and allow for the possibility of new techniques in data mining, filtering and comparison to observations. We describe the challenges posed by various aspects of data generation, management, and analysis, our work exploring IO strategies for the model, and a preliminary architecture, web portal, and tool enhancements which, when complete, will enable broad community access to the data sets in familiar ways to the community

  12. Lesion removal and lesion addition algorithms in lung volumetric data sets for perception studies

    Science.gov (United States)

    Madsen, Mark T.; Berbaum, Kevin S.; Ellingson, Andrew; Thompson, Brad H.; Mullan, Brian F.

    2006-03-01

    Image perception studies of medical images provide important information about how radiologists interpret images and insights for reducing reading errors. In the past, perception studies have been difficult to perform using clinical imaging studies because of the problems associated with obtaining images demonstrating proven abnormalities and appropriate normal control images. We developed and evaluated interactive software that allows the seamless removal of abnormal areas from CT lung image sets. We have also developed interactive software for capturing lung lesions in a database where they can be added to lung CT studies. The efficacy of the software to remove abnormal areas of lung CT studies was evaluated psychophysically by having radiologists select the one altered image from a display of four. The software for adding lesions was evaluated by having radiologists classify displayed CT slices with lesions as real or artificial scaled to 3 levels of confidence. The results of these experiments demonstrated that the radiologist had difficulty in distinguishing the raw clinical images from those that had been altered. We conclude that this software can be used to create experimental normal control and "proven" lesion data sets for volumetric CT of the lung fields. We also note that this software can be easily adapted to work with other tissue besides lung and that it can be adapted to other digital imaging modalities.

  13. ObspyDMT: a Python toolbox for retrieving and processing large seismological data sets

    Directory of Open Access Journals (Sweden)

    K. Hosseini

    2017-10-01

    Full Text Available We present obspyDMT, a free, open-source software toolbox for the query, retrieval, processing and management of seismological data sets, including very large, heterogeneous and/or dynamically growing ones. ObspyDMT simplifies and speeds up user interaction with data centers, in more versatile ways than existing tools. The user is shielded from the complexities of interacting with different data centers and data exchange protocols and is provided with powerful diagnostic and plotting tools to check the retrieved data and metadata. While primarily a productivity tool for research seismologists and observatories, easy-to-use syntax and plotting functionality also make obspyDMT an effective teaching aid. Written in the Python programming language, it can be used as a stand-alone command-line tool (requiring no knowledge of Python or can be integrated as a module with other Python codes. It facilitates data archiving, preprocessing, instrument correction and quality control – routine but nontrivial tasks that can consume much user time. We describe obspyDMT's functionality, design and technical implementation, accompanied by an overview of its use cases. As an example of a typical problem encountered in seismogram preprocessing, we show how to check for inconsistencies in response files of two example stations. We also demonstrate the fully automated request, remote computation and retrieval of synthetic seismograms from the Synthetics Engine (Syngine web service of the Data Management Center (DMC at the Incorporated Research Institutions for Seismology (IRIS.

  14. A 60-year ocean colour data set from the continuous plankton recorder

    KAUST Repository

    Raitsos, Dionysios E.

    2012-11-20

    The phytoplankton colour index (PCI) of the Continuous Plankton Recorder (CPR) survey is an in situ measure of ocean colour, which is considered a proxy of the phytoplankton biomass. PCI has been extensively used to describe the major spatiotemporal patterns of phytoplankton in the North Atlantic Ocean and North Sea since 1931. Regardless of its wide application, the lack of an adequate evaluation to test the PCI\\'s quantitative nature is an important limitation. To address this concern, a field trial over the main production season has been undertaken to assess the numerical values assigned by previous investigations for each category of the greenness of the PCI. CPRs were towed across the English Channel from Roscoff to Plymouth consecutively for each of 8 months producing 76 standard CPR samples, each representing 10 nautical miles of tow. The results of this experiment test and update the PCI methodology, and confirm the validity of this long-term in situ ocean colour data set. In addition, using a 60-year time series of the PCI of the western English Channel, a comparison is made between the previous and the current revised experimental calculations of PCI. © 2012 The Author 2012. Published by Oxford University Press. All rights reserved.

  15. The incidence of health financing in South Africa: findings from a recent data set.

    Science.gov (United States)

    Ataguba, John E; McIntyre, Di

    2018-01-01

    There is an international call for countries to ensure universal health coverage. This call has been embraced in South Africa (SA) in the form of a National Health Insurance (NHI). This is expected to be financed through general tax revenue with the possibility of additional earmarked taxes including a surcharge on personal income and/or a payroll tax for employers. Currently, health services are financed in SA through allocations from general tax revenue, direct out-of-pocket payments, and contributions to medical scheme. This paper uses the most recent data set to assess the progressivity of each health financing mechanism and overall financing system in SA. Applying standard and innovative methodologies for assessing progressivity, the study finds that general taxes and medical scheme contributions remain progressive, and direct out-of-pocket payments and indirect taxes are regressive. However, private health insurance contributions, across only the insured, are regressive. The policy implications of these findings are discussed in the context of the NHI.

  16. Comprehensive Analysis of MILE Gene Expression Data Set Advances Discovery of Leukaemia Type and Subtype Biomarkers.

    Science.gov (United States)

    Labaj, Wojciech; Papiez, Anna; Polanski, Andrzej; Polanska, Joanna

    2017-03-01

    Large collections of data in studies on cancer such as leukaemia provoke the necessity of applying tailored analysis algorithms to ensure supreme information extraction. In this work, a custom-fit pipeline is demonstrated for thorough investigation of the voluminous MILE gene expression data set. Three analyses are accomplished, each for gaining a deeper understanding of the processes underlying leukaemia types and subtypes. First, the main disease groups are tested for differential expression against the healthy control as in a standard case-control study. Here, the basic knowledge on molecular mechanisms is confirmed quantitatively and by literature references. Second, pairwise comparison testing is performed for juxtaposing the main leukaemia types among each other. In this case by means of the Dice coefficient similarity measure the general relations are pointed out. Moreover, lists of candidate main leukaemia group biomarkers are proposed. Finally, with this approach being successful, the third analysis provides insight into all of the studied subtypes, followed by the emergence of four leukaemia subtype biomarkers. In addition, the class enhanced DEG signature obtained on the basis of novel pipeline processing leads to significantly better classification power of multi-class data classifiers. The developed methodology consisting of batch effect adjustment, adaptive noise and feature filtration coupled with adequate statistical testing and biomarker definition proves to be an effective approach towards knowledge discovery in high-throughput molecular biology experiments.

  17. Reconstruction of the primordial power spectrum of curvature perturbations using multiple data sets

    DEFF Research Database (Denmark)

    Hunt, Paul; Sarkar, Subir

    2014-01-01

    Detailed knowledge of the primordial power spectrum of curvature perturbations is essential both in order to elucidate the physical mechanism (`inflation') which generated it, and for estimating the cosmological parameters from observations of the cosmic microwave background and large-scale struc......Detailed knowledge of the primordial power spectrum of curvature perturbations is essential both in order to elucidate the physical mechanism (`inflation') which generated it, and for estimating the cosmological parameters from observations of the cosmic microwave background and large...... content of the universe. Moreover the deconvolution problem is ill-conditioned so a regularisation scheme must be employed to control error propagation. We demonstrate that `Tikhonov regularisation' can robustly reconstruct the primordial spectrum from multiple cosmological data sets, a significant...... advantage being that both its uncertainty and resolution are then quantified. Using Monte Carlo simulations we investigate several regularisation parameter selection methods and find that generalised cross-validation and Mallow's Cp method give optimal results. We apply our inversion procedure to data from...

  18. Geostatistical analysis of groundwater chemistry in Japan. Evaluation of the base case groundwater data set

    Energy Technology Data Exchange (ETDEWEB)

    Salter, P.F.; Apted, M.J. [Monitor Scientific LLC, Denver, CO (United States); Sasamoto, Hiroshi; Yui, Mikazu

    1999-05-01

    The groundwater chemistry is one of important geological environment for performance assessment of high level radioactive disposal system. This report describes the results of geostatistical analysis of groundwater chemistry in Japan. Over 15,000 separate groundwater analyses have been collected of deep Japanese groundwaters for the purpose of evaluating the range of geochemical conditions for geological radioactive waste repositories in Japan. The significance to issues such as radioelement solubility limits, sorption, corrosion of overpack, behavior of compacted clay buffers, and many other factors involved in safety assessment. It is important therefore, that a small, but representative set of groundwater types be identified so that defensible models and data for generic repository performance assessment can be established. Principal component analysis (PCA) is used to categorize representative deep groundwater types from this extensive data set. PCA is a multi-variate statistical analysis technique, similar to factor analysis or eigenvector analysis, designed to provide the best possible resolution of the variability within multi-variate data sets. PCA allows the graphical inspection of the most important similarities (clustering) and differences among samples, based on simultaneous consideration of all variables in the dataset, in a low dimensionality plot. It also allows the analyst to determine the reasons behind any pattern that is observed. In this study, PCA has been aided by hierarchical cluster analysis (HCA), in which statistical indices of similarity among multiple samples are used to distinguish distinct clusters of samples. HCA allows the natural, a priori, grouping of data into clusters showing similar attributes and is graphically represented in a dendrogram Pirouette is the multivariate statistical software package used to conduct the PCA and HCA for the Japanese groundwater dataset. An audit of the initial 15,000 sample dataset on the basis of

  19. Measurement of the bottom-strange meson mixing phase in the full CDF data set.

    Science.gov (United States)

    Aaltonen, T; Álvarez González, B; Amerio, S; Amidei, D; Anastassov, A; Annovi, A; Antos, J; Apollinari, G; Appel, J A; Arisawa, T; Artikov, A; Asaadi, J; Ashmanskas, W; Auerbach, B; Aurisano, A; Azfar, F; Badgett, W; Bae, T; Barbaro-Galtieri, A; Barnes, V E; Barnett, B A; Barria, P; Bartos, P; Bauce, M; Bedeschi, F; Behari, S; Bellettini, G; Bellinger, J; Benjamin, D; Beretvas, A; Bhatti, A; Bisello, D; Bizjak, I; Bland, K R; Blumenfeld, B; Bocci, A; Bodek, A; Bortoletto, D; Boudreau, J; Boveia, A; Brigliadori, L; Bromberg, C; Brucken, E; Budagov, J; Budd, H S; Burkett, K; Busetto, G; Bussey, P; Buzatu, A; Calamba, A; Calancha, C; Camarda, S; Campanelli, M; Campbell, M; Canelli, F; Carls, B; Carlsmith, D; Carosi, R; Carrillo, S; Carron, S; Casal, B; Casarsa, M; Castro, A; Catastini, P; Cauz, D; Cavaliere, V; Cavalli-Sforza, M; Cerri, A; Cerrito, L; Chen, Y C; Chertok, M; Chiarelli, G; Chlachidze, G; Chlebana, F; Cho, K; Chokheli, D; Chung, W H; Chung, Y S; Ciocci, M A; Clark, A; Clarke, C; Compostella, G; Convery, M E; Conway, J; Corbo, M; Cordelli, M; Cox, C A; Cox, D J; Crescioli, F; Cuevas, J; Culbertson, R; Dagenhart, D; d'Ascenzo, N; Datta, M; de Barbaro, P; Dell'Orso, M; Demortier, L; Deninno, M; Devoto, F; d'Errico, M; Di Canto, A; Di Ruzza, B; Dittmann, J R; D'Onofrio, M; Donati, S; Dong, P; Dorigo, M; Dorigo, T; Ebina, K; Elagin, A; Eppig, A; Erbacher, R; Errede, S; Ershaidat, N; Eusebi, R; Farrington, S; Feindt, M; Fernandez, J P; Field, R; Flanagan, G; Forrest, R; Frank, M J; Franklin, M; Freeman, J C; Funakoshi, Y; Furic, I; Gallinaro, M; Garcia, J E; Garfinkel, A F; Garosi, P; Gerberich, H; Gerchtein, E; Giagu, S; Giakoumopoulou, V; Giannetti, P; Gibson, K; Ginsburg, C M; Giokaris, N; Giromini, P; Giurgiu, G; Glagolev, V; Glenzinski, D; Gold, M; Goldin, D; Goldschmidt, N; Golossanov, A; Gomez, G; Gomez-Ceballos, G; Goncharov, M; González, O; Gorelov, I; Goshaw, A T; Goulianos, K; Grillo, L; Grinstein, S; Grosso-Pilcher, C; Group, R C; Guimaraes da Costa, J; Hahn, S R; Halkiadakis, E; Hamaguchi, A; Han, J Y; Happacher, F; Hara, K; Hare, D; Hare, M; Harr, R F; Hatakeyama, K; Hays, C; Heck, M; Heinrich, J; Herndon, M; Hewamanage, S; Hocker, A; Hopkins, W; Horn, D; Hou, S; Hughes, R E; Hurwitz, M; Husemann, U; Hussain, N; Hussein, M; Huston, J; Introzzi, G; Iori, M; Ivanov, A; James, E; Jang, D; Jayatilaka, B; Jeon, E J; Jindariani, S; Jones, M; Joo, K K; Jun, S Y; Junk, T R; Kamon, T; Karchin, P E; Kasmi, A; Kato, Y; Ketchum, W; Keung, J; Khotilovich, V; Kilminster, B; Kim, D H; Kim, H S; Kim, J E; Kim, M J; Kim, S B; Kim, S H; Kim, Y K; Kim, Y J; Kimura, N; Kirby, M; Klimenko, S; Knoepfel, K; Kondo, K; Kong, D J; Konigsberg, J; Kotwal, A V; Kreps, M; Kroll, J; Krop, D; Kruse, M; Krutelyov, V; Kuhr, T; Kurata, M; Kwang, S; Laasanen, A T; Lami, S; Lammel, S; Lancaster, M; Lander, R L; Lannon, K; Lath, A; Latino, G; LeCompte, T; Lee, E; Lee, H S; Lee, J S; Lee, S W; Leo, S; Leone, S; Lewis, J D; Limosani, A; Lin, C-J; Lindgren, M; Lipeles, E; Lister, A; Litvintsev, D O; Liu, C; Liu, H; Liu, Q; Liu, T; Lockwitz, S; Loginov, A; Lucchesi, D; Lueck, J; Lujan, P; Lukens, P; Lungu, G; Lys, J; Lysak, R; Madrak, R; Maeshima, K; Maestro, P; Malik, S; Manca, G; Manousakis-Katsikakis, A; Margaroli, F; Marino, C; Martínez, M; Mastrandrea, P; Matera, K; Mattson, M E; Mazzacane, A; Mazzanti, P; McFarland, K S; McIntyre, P; McNulty, R; Mehta, A; Mehtala, P; Mesropian, C; Miao, T; Mietlicki, D; Mitra, A; Miyake, H; Moed, S; Moggi, N; Mondragon, M N; Moon, C S; Moore, R; Morello, M J; Morlock, J; Movilla Fernandez, P; Mukherjee, A; Muller, Th; Murat, P; Mussini, M; Nachtman, J; Nagai, Y; Naganoma, J; Nakano, I; Napier, A; Nett, J; Neu, C; Neubauer, M S; Nielsen, J; Nodulman, L; Noh, S Y; Norniella, O; Oakes, L; Oh, S H; Oh, Y D; Oksuzian, I; Okusawa, T; Orava, R; Ortolan, L; Pagan Griso, S; Pagliarone, C; Palencia, E; Papadimitriou, V; Paramonov, A A; Patrick, J; Pauletta, G; Paulini, M; Paus, C; Pellett, D E; Penzo, A; Phillips, T J; Piacentino, G; Pianori, E; Pilot, J; Pitts, K; Plager, C; Pondrom, L; Poprocki, S; Potamianos, K; Prokoshin, F; Pranko, A; Ptohos, F; Punzi, G; Rahaman, A; Ramakrishnan, V; Ranjan, N; Redondo, I; Renton, P; Rescigno, M; Riddick, T; Rimondi, F; Ristori, L; Robson, A; Rodrigo, T; Rodriguez, T; Rogers, E; Rolli, S; Roser, R; Ruffini, F; Ruiz, A; Russ, J; Rusu, V; Safonov, A; Sakumoto, W K; Sakurai, Y; Santi, L; Sato, K; Saveliev, V; Savoy-Navarro, A; Schlabach, P; Schmidt, A; Schmidt, E E; Schwarz, T; Scodellaro, L; Scribano, A; Scuri, F; Seidel, S; Seiya, Y; Semenov, A; Sforza, F; Shalhout, S Z; Shears, T; Shepard, P F; Shimojima, M; Shochet, M; Shreyber-Tecker, I; Simonenko, A; Sinervo, P; Sliwa, K; Smith, J R; Snider, F D; Soha, A; Sorin, V; Song, H; Squillacioti, P; Stancari, M; St Denis, R; Stelzer, B; Stelzer-Chilton, O; Stentz, D; Strologas, J; Strycker, G L; Sudo, Y; Sukhanov, A; Suslov, I; Takemasa, K; Takeuchi, Y; Tang, J; Tecchio, M; Teng, P K; Thom, J; Thome, J; Thompson, G A; Thomson, E; Toback, D; Tokar, S; Tollefson, K; Tomura, T; Tonelli, D; Torre, S; Torretta, D; Totaro, P; Trovato, M; Ukegawa, F; Uozumi, S; Varganov, A; Vázquez, F; Velev, G; Vellidis, C; Vidal, M; Vila, I; Vilar, R; Vizán, J; Vogel, M; Volpi, G; Wagner, P; Wagner, R L; Wakisaka, T; Wallny, R; Wang, S M; Warburton, A; Waters, D; Wester, W C; Whiteson, D; Wicklund, A B; Wicklund, E; Wilbur, S; Wick, F; Williams, H H; Wilson, J S; Wilson, P; Winer, B L; Wittich, P; Wolbers, S; Wolfe, H; Wright, T; Wu, X; Wu, Z; Yamamoto, K; Yamato, D; Yang, T; Yang, U K; Yang, Y C; Yao, W-M; Yeh, G P; Yi, K; Yoh, J; Yorita, K; Yoshida, T; Yu, G B; Yu, I; Yu, S S; Yun, J C; Zanetti, A; Zeng, Y; Zhou, C; Zucchelli, S

    2012-10-26

    We report a measurement of the bottom-strange meson mixing phase β(s) using the time evolution of B(s)(0)→J/ψ(→μ(+)μ(-))φ(→K(+)K(-)) decays in which the quark-flavor content of the bottom-strange meson is identified at production. This measurement uses the full data set of proton-antiproton collisions at √s=1.96 TeV collected by the Collider Detector experiment at the Fermilab Tevatron, corresponding to 9.6 fb(-1) of integrated luminosity. We report confidence regions in the two-dimensional space of β(s) and the B(s)(0) decay-width difference ΔΓ(s) and measure β(s)∈[-π/2,-1.51]∪[-0.06,0.30]∪[1.26,π/2] at the 68% confidence level, in agreement with the standard model expectation. Assuming the standard model value of β(s), we also determine ΔΓ(s)=0.068±0.026(stat)±0.009(syst) ps(-1) and the mean B(s)(0) lifetime τ(s)=1.528±0.019(stat)±0.009(syst) ps, which are consistent and competitive with determinations by other experiments.

  20. An updated global grid point surface air temperature anomaly data set: 1851--1990

    Energy Technology Data Exchange (ETDEWEB)

    Sepanski, R.J.; Boden, T.A.; Daniels, R.C.

    1991-10-01

    This document presents land-based monthly surface air temperature anomalies (departures from a 1951--1970 reference period mean) on a 5{degree} latitude by 10{degree} longitude global grid. Monthly surface air temperature anomalies (departures from a 1957--1975 reference period mean) for the Antarctic (grid points from 65{degree}S to 85{degree}S) are presented in a similar way as a separate data set. The data were derived primarily from the World Weather Records and the archives of the United Kingdom Meteorological Office. This long-term record of temperature anomalies may be used in studies addressing possible greenhouse-gas-induced climate changes. To date, the data have been employed in generating regional, hemispheric, and global time series for determining whether recent (i.e., post-1900) warming trends have taken place. This document also presents the monthly mean temperature records for the individual stations that were used to generate the set of gridded anomalies. The periods of record vary by station. Northern Hemisphere station data have been corrected for inhomogeneities, while Southern Hemisphere data are presented in uncorrected form. 14 refs., 11 figs., 10 tabs.

  1. Covariance approximation for large multivariate spatial data sets with an application to multiple climate model errors

    KAUST Repository

    Sang, Huiyan

    2011-12-01

    This paper investigates the cross-correlations across multiple climate model errors. We build a Bayesian hierarchical model that accounts for the spatial dependence of individual models as well as cross-covariances across different climate models. Our method allows for a nonseparable and nonstationary cross-covariance structure. We also present a covariance approximation approach to facilitate the computation in the modeling and analysis of very large multivariate spatial data sets. The covariance approximation consists of two parts: a reduced-rank part to capture the large-scale spatial dependence, and a sparse covariance matrix to correct the small-scale dependence error induced by the reduced rank approximation. We pay special attention to the case that the second part of the approximation has a block-diagonal structure. Simulation results of model fitting and prediction show substantial improvement of the proposed approximation over the predictive process approximation and the independent blocks analysis. We then apply our computational approach to the joint statistical modeling of multiple climate model errors. © 2012 Institute of Mathematical Statistics.

  2. Measurement of the TeV atmospheric muon charge ratio with the complete OPERA data set

    Science.gov (United States)

    Agafonova, N.; Aleksandrov, A.; Anokhina, A.; Aoki, S.; Ariga, A.; Ariga, T.; Bender, D.; Bertolin, A.; Bozza, C.; Brugnera, R.; Buonaura, A.; Buontempo, S.; Büttner, B.; Chernyavsky, M.; Chukanov, A.; Consiglio, L.; D'Ambrosio, N.; De Lellis, G.; De Serio, M.; Del Amo Sanchez, P.; Di Crescenzo, A.; Di Ferdinando, D.; Di Marco, N.; Dmitrievski, S.; Dracos, M.; Duchesneau, D.; Dusini, S.; Dzhatdoev, T.; Ebert, J.; Ereditato, A.; Fini, R. A.; Fukuda, T.; Galati, G.; Garfagnini, A.; Giacomelli, G.; Göllnitz, C.; Goldberg, J.; Gornushkin, Y.; Grella, G.; Guler, M.; Gustavino, C.; Hagner, C.; Hara, T.; Hollnagel, A.; Hosseini, B.; Ishida, H.; Ishiguro, K.; Jakovcic, K.; Jollet, C.; Kamiscioglu, C.; Kamiscioglu, M.; Kawada, J.; Kim, J. H.; Kim, S. H.; Kitagawa, N.; Klicek, B.; Kodama, K.; Komatsu, M.; Kose, U.; Kreslo, I.; Lauria, A.; Lenkeit, J.; Ljubicic, A.; Longhin, A.; Loverre, P.; Malgin, A.; Malenica, M.; Mandrioli, G.; Matsuo, T.; Matveev, V.; Mauri, N.; Medinaceli, E.; Meregaglia, A.; Mikado, S.; Monacelli, P.; Montesi, M. C.; Morishima, K.; Muciaccia, M. T.; Naganawa, N.; Naka, T.; Nakamura, M.; Nakano, T.; Nakatsuka, Y.; Niwa, K.; Ogawa, S.; Okateva, N.; Olshevsky, A.; Omura, T.; Ozaki, K.; Paoloni, A.; Park, B. D.; Park, I. G.; Pasqualini, L.; Pastore, A.; Patrizii, L.; Pessard, H.; Pistillo, C.; Podgrudkov, D.; Polukhina, N.; Pozzato, M.; Pupilli, F.; Roda, M.; Rokujo, H.; Roganova, T.; Rosa, G.; Ryazhskaya, O.; Sato, O.; Schembri, A.; Shakiryanova, I.; Shchedrina, T.; Sheshukov, A.; Shibuya, H.; Shiraishi, T.; Shoziyoev, G.; Simone, S.; Sioli, M.; Sirignano, C.; Sirri, G.; Spinetti, M.; Stanco, L.; Starkov, N.; Stellacci, S. M.; Stipcevic, M.; Strolin, P.; Takahashi, S.; Tenti, M.; Terranova, F.; Tioukov, V.; Tufanli, S.; Vilain, P.; Vladimirov, M.; Votano, L.; Vuilleumier, J. L.; Wilquet, G.; Wonsak, B.; Yoon, C. S.; Zemskova, S.; Zghiche, A.

    2014-07-01

    The OPERA detector, designed to search for oscillations in the CNGS beam, is located in the underground Gran Sasso laboratory, a privileged location to study TeV-scale cosmic rays. For the analysis here presented, the detector was used to measure the atmospheric muon charge ratio in the TeV region. OPERA collected charge-separated cosmic ray data between 2008 and 2012. More than 3 million atmospheric muon events were detected and reconstructed, among which about 110000 multiple muon bundles. The charge ratio was measured separately for single and for multiple muon events. The analysis exploited the inversion of the magnet polarity which was performed on purpose during the 2012 Run. The combination of the two data sets with opposite magnet polarities allowed minimizing systematic uncertainties and reaching an accurate determination of the muon charge ratio. Data were fitted to obtain relevant parameters on the composition of primary cosmic rays and the associated kaon production in the forward fragmentation region. In the surface energy range 1-20 TeV investigated by OPERA, is well described by a parametric model including only pion and kaon contributions to the muon flux, showing no significant contribution of the prompt component. The energy independence supports the validity of Feynman scaling in the fragmentation region up to TeV/nucleon primary energy.

  3. A 60-year ocean colour data set from the continuous plankton recorder

    KAUST Repository

    Raitsos, Dionysios E.; Walne, Anthony W.; Lavender, Sam; Licandro, Priscilla; Reid, Philip Chris; Edwards, Martin

    2012-01-01

    The phytoplankton colour index (PCI) of the Continuous Plankton Recorder (CPR) survey is an in situ measure of ocean colour, which is considered a proxy of the phytoplankton biomass. PCI has been extensively used to describe the major spatiotemporal patterns of phytoplankton in the North Atlantic Ocean and North Sea since 1931. Regardless of its wide application, the lack of an adequate evaluation to test the PCI's quantitative nature is an important limitation. To address this concern, a field trial over the main production season has been undertaken to assess the numerical values assigned by previous investigations for each category of the greenness of the PCI. CPRs were towed across the English Channel from Roscoff to Plymouth consecutively for each of 8 months producing 76 standard CPR samples, each representing 10 nautical miles of tow. The results of this experiment test and update the PCI methodology, and confirm the validity of this long-term in situ ocean colour data set. In addition, using a 60-year time series of the PCI of the western English Channel, a comparison is made between the previous and the current revised experimental calculations of PCI. © 2012 The Author 2012. Published by Oxford University Press. All rights reserved.

  4. Does ecosystem variability explain phytoplankton diversity? Solving an ecological puzzle with long-term data sets

    Science.gov (United States)

    Sarker, Subrata; Lemke, Peter; Wiltshire, Karen H.

    2018-05-01

    Explaining species diversity as a function of ecosystem variability is a long-term discussion in community-ecology research. Here, we aimed to establish a causal relationship between ecosystem variability and phytoplankton diversity in a shallow-sea ecosystem. We used long-term data on biotic and abiotic factors from Helgoland Roads, along with climate data to assess the effect of ecosystem variability on phytoplankton diversity. A point cumulative semi-variogram method was used to estimate the long-term ecosystem variability. A Markov chain model was used to estimate dynamical processes of species i.e. occurrence, absence and outcompete probability. We identified that the 1980s was a period of high ecosystem variability while the last two decades were comparatively less variable. Ecosystem variability was found as an important predictor of phytoplankton diversity at Helgoland Roads. High diversity was related to low ecosystem variability due to non-significant relationship between probability of a species occurrence and absence, significant negative relationship between probability of a species occurrence and probability of a species to be outcompeted by others, and high species occurrence at low ecosystem variability. Using an exceptional marine long-term data set, this study established a causal relationship between ecosystem variability and phytoplankton diversity.

  5. Unbiased analysis of geomagnetic data sets and comparison of historical data with paleomagnetic and archeomagnetic records

    Science.gov (United States)

    Arneitz, Patrick; Egli, Ramon; Leonhardt, Roman

    2017-03-01

    Reconstructions of the past geomagnetic field provide fundamental constraints for understanding the dynamics of the Earth's interior, as well as serving as basis for magnetostratigraphic and archeomagnetic dating tools. Such reconstructions, when extending over epochs that precede the advent of instrumental measurements, rely exclusively on magnetic records from archeological artifacts, and, further in the past, from rocks and sediments. The most critical component of such indirect records is field intensity because of possible biases introduced by material properties and by laboratory protocols, which do not reproduce exactly the original field recording conditions. Large biases are usually avoided by the use of appropriate checking procedures; however, smaller ones can remain undetected in individual studies and might significantly affect field reconstructions. We introduce a new general approach for analyzing geomagnetic databases in order to investigate the reliability of indirect records. This approach is based on the comparison of historical records with archeomagnetic and volcanic data, considering temporal and spatial mismatches with adequate weighting functions and error estimation. A good overall agreement is found between indirect records and historical measurements, while for several subsets systematic bias is detected (e.g., inclination shallowing of lava records). We also demonstrate that simple approaches to analyzing highly inhomogeneous and internally correlated paleomagnetic data sets can lead to incorrect conclusions about the efficiency of quality checks and corrections. Consistent criteria for selecting and weighting data are presented in this review and can be used to improve current geomagnetic field modeling techniques.

  6. A multivariate fall risk assessment model for VHA nursing homes using the minimum data set.

    Science.gov (United States)

    French, Dustin D; Werner, Dennis C; Campbell, Robert R; Powell-Cope, Gail M; Nelson, Audrey L; Rubenstein, Laurence Z; Bulat, Tatjana; Spehar, Andrea M

    2007-02-01

    The purpose of this study was to develop a multivariate fall risk assessment model beyond the current fall Resident Assessment Protocol (RAP) triggers for nursing home residents using the Minimum Data Set (MDS). Retrospective, clustered secondary data analysis. National Veterans Health Administration (VHA) long-term care nursing homes (N = 136). The study population consisted of 6577 national VHA nursing home residents who had an annual assessment during FY 2005, identified from the MDS, as well as an earlier annual or admission assessment within a 1-year look-back period. A dichotomous multivariate model of nursing home residents coded with a fall on selected fall risk characteristics from the MDS, estimated with general estimation equations (GEE). There were 17 170 assessments corresponding to 6577 long-term care nursing home residents. The increased odds ratio (OR) of being classified as a faller relative to the omitted "dependent" category of activities of daily living (ADL) ranged from OR = 1.35 for "limited" ADL category up to OR = 1.57 for "extensive-2" ADL (P canes, walkers, or crutches, or the use of wheelchairs increases the odds of being a faller (OR = 1.17, P falls in long-term care settings. The model incorporated an ADL index and adjusted for case mix by including only long-term care nursing home residents. The study offers clinicians practical estimates by combining multiple univariate MDS elements in an empirically based, multivariate fall risk assessment model.

  7. Automated Classification and Analysis of Non-metallic Inclusion Data Sets

    Science.gov (United States)

    Abdulsalam, Mohammad; Zhang, Tongsheng; Tan, Jia; Webler, Bryan A.

    2018-05-01

    The aim of this study is to utilize principal component analysis (PCA), clustering methods, and correlation analysis to condense and examine large, multivariate data sets produced from automated analysis of non-metallic inclusions. Non-metallic inclusions play a major role in defining the properties of steel and their examination has been greatly aided by automated analysis in scanning electron microscopes equipped with energy dispersive X-ray spectroscopy. The methods were applied to analyze inclusions on two sets of samples: two laboratory-scale samples and four industrial samples from a near-finished 4140 alloy steel components with varying machinability. The laboratory samples had well-defined inclusions chemistries, composed of MgO-Al2O3-CaO, spinel (MgO-Al2O3), and calcium aluminate inclusions. The industrial samples contained MnS inclusions as well as (Ca,Mn)S + calcium aluminate oxide inclusions. PCA could be used to reduce inclusion chemistry variables to a 2D plot, which revealed inclusion chemistry groupings in the samples. Clustering methods were used to automatically classify inclusion chemistry measurements into groups, i.e., no user-defined rules were required.

  8. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species

    Directory of Open Access Journals (Sweden)

    Kristopher J. L. Irizarry

    2016-01-01

    Full Text Available Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  9. Los Alamos geostationary orbit synoptic data set: a compilation of energetic particle data

    International Nuclear Information System (INIS)

    Baker, D.N.; Higbie, P.R.; Belian, R.D.; Aiello, W.P.; Hones, E.W. Jr.; Tech, E.R.; Halbig, M.F.; Payne, J.B.; Robinson, R.; Kedge, S.

    1981-08-01

    Energetic electron (30 to 2000 keV) and proton (145 keV to 150 MeV) measurements made by Los Alamos National Laboratory sensors at geostationary orbit 6.6 R/sub E/ are summarized. The data are plotted in terms of daily average spectra, 3-h local time averages, and in a variety of statistical formats. The data summarize conditions from mid-1976 through 1978 (S/C 1976-059) and from early 1977 through 1978 (S/C 1977-007). The compilations correspond to measurements at 35 0 W, 70 0 W, and 135 0 W geographic longitude and, thus, are indicative of conditions at 9 0 , 11 0 , and 4.8 0 geomagnetic latitude, respectively. Most of this report is comprised of data plots that are organized according to Carrington solar rotations so that the data can be easily compared to solar rotation-dependent interplanetary data. As shown in prior studies, variations in solar wind conditions modulate particle intensity within the terrestrial magnetosphere. The effects of these variations are demonstrated and discussed. Potential uses of the Synoptic Data Set by the scientific and applications-oriented communities are also discussed

  10. Comparison of Co-Temporal Modeling Algorithms on Sparse Experimental Time Series Data Sets.

    Science.gov (United States)

    Allen, Edward E; Norris, James L; John, David J; Thomas, Stan J; Turkett, William H; Fetrow, Jacquelyn S

    2010-01-01

    Multiple approaches for reverse-engineering biological networks from time-series data have been proposed in the computational biology literature. These approaches can be classified by their underlying mathematical algorithms, such as Bayesian or algebraic techniques, as well as by their time paradigm, which includes next-state and co-temporal modeling. The types of biological relationships, such as parent-child or siblings, discovered by these algorithms are quite varied. It is important to understand the strengths and weaknesses of the various algorithms and time paradigms on actual experimental data. We assess how well the co-temporal implementations of three algorithms, continuous Bayesian, discrete Bayesian, and computational algebraic, can 1) identify two types of entity relationships, parent and sibling, between biological entities, 2) deal with experimental sparse time course data, and 3) handle experimental noise seen in replicate data sets. These algorithms are evaluated, using the shuffle index metric, for how well the resulting models match literature models in terms of siblings and parent relationships. Results indicate that all three co-temporal algorithms perform well, at a statistically significant level, at finding sibling relationships, but perform relatively poorly in finding parent relationships.

  11. Looking for exceptions on knowledge rules induced from HIV cleavage data set

    Directory of Open Access Journals (Sweden)

    Ronaldo Cristiano Prati

    2004-01-01

    Full Text Available The aim of data mining is to find useful knowledge inout of databases. In order to extract such knowledge, several methods can be used, among them machine learning (ML algorithms. In this work we focus on ML algorithms that express the extracted knowledge in a symbolic form, such as rules. This representation may allow us to ''explain'' the data. Rule learning algorithms are mainly designed to induce classification rules that can predict new cases with high accuracy. However, these sorts of rules generally express common sense knowledge, resulting in many interesting and useful rules not being discovered. Furthermore, the domain independent biases, especially those related to the language used to express the induced knowledge, could induce rules that are difficult to understand. Exceptions might be used in order to overcome these drawbacks. Exceptions are defined as rules that contradict common believebeliefs. This kind of rules can play an important role in the process of understanding the underlying data as well as in making critical decisions. By contradicting the user's common beliefves, exceptions are bound to be interesting. This work proposes a method to find exceptions. In order to illustrate the potential of our approach, we apply the method in a real world data set to discover rules and exceptions in the HIV virus protein cleavage process. A good understanding of the process that generates this data plays an important role oin the research of cleavage inhibitors. We consider believe that the proposed approach may help the domain expert to further understand this process.

  12. Development and validation of factor analysis for dynamic in-vivo imaging data sets

    Science.gov (United States)

    Goldschmied, Lukas; Knoll, Peter; Mirzaei, Siroos; Kalchenko, Vyacheslav

    2018-02-01

    In-vivo optical imaging method provides information about the anatomical structures and function of tissues ranging from single cell to entire organisms. Dynamic Fluorescent Imaging (DFI) is used to examine dynamic events related to normal physiology or disease progression in real time. In this work we improve this method by using factor analysis (FA) to automatically separate overlying structures.The proposed method is based on a previously introduced Transcranial Optical Vascular Imaging (TOVI), which employs natural and sufficient transparency through the intact cranial bones of a mouse. Fluorescent image acquisition is performed after intravenous fluorescent tracer administration. Afterwards FA is used to extract structures with different temporal characteristics from dynamic contrast enhanced studies without making any a priori assumptions about physiology. The method was validated by a dynamic light phantom based on the Arduino hardware platform and dynamic fluorescent cerebral hemodynamics data sets. Using the phantom data FA can separate various light channels without user intervention. FA applied on an image sequence obtained after fluorescent tracer administration is allowing extracting valuable information about cerebral blood vessels anatomy and functionality without a-priory assumptions of their anatomy or physiology while keeping the mouse cranium intact. Unsupervised color-coding based on FA enhances visibility and distinguishing of blood vessels belonging to different compartments. DFI based on FA especially in case of transcranial imaging can be used to separate dynamic structures.

  13. Tattoos and body piercings in the United States: a national data set.

    Science.gov (United States)

    Laumann, Anne E; Derick, Amy J

    2006-09-01

    Little is known about the prevalence and consequences of body art application. Our aim was to provide US tattooing and body piercing prevalence, societal distribution, and medical and social consequence data. Random digit dialing technology was used to obtain a national probability sample of 253 women and 247 men who were 18 to 50 years of age. Of our respondents, 24% had tattoos and 14% had body piercings. Tattooing was equally common in both sexes, but body piercing was more common among women. Other associations were a lack of religious affiliation, extended jail time, previous drinking, and recreational drug use. Local medical complications, including broken teeth, were present in one third of those with body piercings. The prevalence of jewelry allergy increased with the number of piercings. Of those with tattoos, 17% were considering removal but none had had a tattoo removed. This was a self-reported data set with a 33% response rate. Tattooing and body piercing are associated with risk-taking activities. Body piercing has a high incidence of medical complications.

  14. REDEN: Named Entity Linking in Digital Literary Editions Using Linked Data Sets

    Directory of Open Access Journals (Sweden)

    Carmen Brando

    2016-07-01

    Full Text Available This paper proposes a graph-based Named Entity Linking (NEL algorithm named REDEN for the disambiguation of authors’ names in French literary criticism texts and scientific essays from the 19th and early 20th centuries. The algorithm is described and evaluated according to the two phases of NEL as reported in current state of the art, namely, candidate retrieval and candidate selection. REDEN leverages knowledge from different Linked Data sources in order to select candidates for each author mention, subsequently crawls data from other Linked Data sets using equivalence links (e.g., owl:sameAs, and, finally, fuses graphs of homologous individuals into a non-redundant graph well-suited for graph centrality calculation; the resulting graph is used for choosing the best referent. The REDEN algorithm is distributed in open-source and follows current standards in digital editions (TEI and semantic Web (RDF. Its integration into an editorial workflow of digital editions in Digital humanities and cultural heritage projects is entirely plausible. Experiments are conducted along with the corresponding error analysis in order to test our approach and to help us to study the weaknesses and strengths of our algorithm, thereby to further improvements of REDEN.

  15. Construction of a century solar chromosphere data set for solar activity related research

    Science.gov (United States)

    Lin, Ganghua; Wang, Xiao Fan; Yang, Xiao; Liu, Suo; Zhang, Mei; Wang, Haimin; Liu, Chang; Xu, Yan; Tlatov, Andrey; Demidov, Mihail; Borovik, Aleksandr; Golovko, Aleksey

    2017-06-01

    This article introduces our ongoing project "Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research". Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a time span of more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant signs of progress are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

  16. A digital phantom of the axilla based on the Visible Human Project data set

    Science.gov (United States)

    McCallum, S. J.; Welch, A. E.; Baker, L.

    2001-08-01

    In this paper, we describe the development of a new digital phantom designed for Monte Carlo simulations of breast cancer and particularly positron emission tomography (PET) imaging of the axillary lymph nodes. The phantom was based on data from the Visible Human Project female data set. The phantom covers the head-to-diaphragm regions; 17 major tissue types were segmented and 66 individual lymph nodes were identified. The authors have used the phantom in Monte Carlo simulations to model a reduced field-of-view PET imager based on two flat plate arrays placed on either side of the shoulder. Images used a simple single angle set of projections. The authors have conducted two preliminary studies: one modeling a single-frame PET acquisition 60 min after FDG injection and the other modeling a dynamic PET acquisition simulating four time frames after FDG injection. The dynamic results were processed into parametric images using the Patlak method and show the advantage to be gained by including the temporal information for legion detection. The authors' preliminary results indicate that the performance of such an imager forming projection images is not sufficient for axillary node PET imaging.

  17. Construction of a century solar chromosphere data set for solar activity related research

    Directory of Open Access Journals (Sweden)

    Ganghua Lin

    2017-06-01

    Full Text Available This article introduces our ongoing project “Construction of a Century Solar Chromosphere Data Set for Solar Activity Related Research”. Solar activities are the major sources of space weather that affects human lives. Some of the serious space weather consequences, for instance, include interruption of space communication and navigation, compromising the safety of astronauts and satellites, and damaging power grids. Therefore, the solar activity research has both scientific and social impacts. The major database is built up from digitized and standardized film data obtained by several observatories around the world and covers a timespan more than 100 years. After careful calibration, we will develop feature extraction and data mining tools and provide them together with the comprehensive database for the astronomical community. Our final goal is to address several physical issues: filament behavior in solar cycles, abnormal behavior of solar cycle 24, large-scale solar eruptions, and sympathetic remote brightenings. Significant progresses are expected in data mining algorithms and software development, which will benefit the scientific analysis and eventually advance our understanding of solar cycles.

  18. Three-dimensional seed reconstruction from an incomplete data set for prostate brachytherapy

    International Nuclear Information System (INIS)

    Narayanan, Sreeram; Cho, Paul S; MarksII, Robert J

    2004-01-01

    Intra-operative dosimetry in prostate brachytherapy requires 3D coordinates of the implanted, radioactive seeds. Since CT is not readily available during the implant operation, projection x-rays are commonly used for intra-operative seed localization. Three x-ray projections are usually used. The requirement of the current seed reconstruction algorithms is that the seeds must be identified on all three projections. However, in practice this is often difficult to accomplish due to the problem of heavily clustered and overlapping seeds. We have developed an algorithm that permits seed reconstruction from an incomplete data set. Instead of all three projections, the new algorithm requires only one of the three projections to be complete. Furthermore, even if all three projections are incomplete, it can reconstruct 100% of the implanted seeds depending on how the undetected seeds are distributed among the projections. The method utilizes the principles of epipolar imaging geometry and pseudo-matching of the undetected seeds. The algorithm was successfully applied to a large number of clinical cases where seeds imperceptibly overlap in some projections

  19. Extraction of tacit knowledge from large ADME data sets via pairwise analysis.

    Science.gov (United States)

    Keefer, Christopher E; Chang, George; Kauffman, Gregory W

    2011-06-15

    Pharmaceutical companies routinely collect data across multiple projects for common ADME endpoints. Although at the time of collection the data is intended for use in decision making within a specific project, knowledge can be gained by data mining the entire cross-project data set for patterns of structure-activity relationships (SAR) that may be applied to any project. One such data mining method is pairwise analysis. This method has the advantage of being able to identify small structural changes that lead to significant changes in activity. In this paper, we describe the process for full pairwise analysis of our high-throughput ADME assays routinely used for compound discovery efforts at Pfizer (microsomal clearance, passive membrane permeability, P-gp efflux, and lipophilicity). We also describe multiple strategies for the application of these transforms in a prospective manner during compound design. Finally, a detailed analysis of the activity patterns in pairs of compounds that share the same molecular transformation reveals multiple types of transforms from an SAR perspective. These include bioisosteres, additives, multiplicatives, and a type we call switches as they act to either turn on or turn off an activity. Copyright © 2011 Elsevier Ltd. All rights reserved.

  20. Integrating Genomic Data Sets for Knowledge Discovery: An Informed Approach to Management of Captive Endangered Species.

    Science.gov (United States)

    Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C

    2016-01-01

    Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.

  1. ACE-FTS version 3.0 data set: validation and data processing update

    Directory of Open Access Journals (Sweden)

    Claire Waymark

    2014-01-01

    Full Text Available On 12 August 2003, the Canadian-led Atmospheric Chemistry Experiment (ACE was launched into a 74° inclination orbit at 650 km with the mission objective to measure atmospheric composition using infrared and UV-visible spectroscopy (Bernath et al. 2005. The ACE mission consists of two main instruments, ACE-FTS and MAESTRO (McElroy et al. 2007, which are being used to investigate the chemistry and dynamics of the Earth’s atmosphere.  Here, we focus on the high resolution (0.02 cm-1 infrared Fourier Transform Spectrometer, ACE-FTS, that measures in the 750-4400 cm-1 (2.2 to 13.3 µm spectral region.  This instrument has been making regular solar occultation observations for more than nine years.  The current ACE-FTS data version (version 3.0 provides profiles of temperature and volume mixing ratios (VMRs of more than 30 atmospheric trace gas species, as well as 20 subsidiary isotopologues of the most abundant trace atmospheric constituents over a latitude range of ~85°N to ~85°S.  This letter describes the current data version and recent validation comparisons and provides a description of our planned updates for the ACE-FTS data set. [...

  2. Sound speed in the Mediterranean Sea: an analysis from a climatological data set

    Directory of Open Access Journals (Sweden)

    S. Salon

    2003-03-01

    Full Text Available This paper presents an analysis of sound speed distribution in the Mediterranean Sea based on climatological temperature and salinity data. In the upper layers, propagation is characterised by upward refraction in winter and an acoustic channel in summer. The seasonal cycle of the Mediterranean and the presence of gyres and fronts create a wide range of spatial and temporal variabilities, with relevant differences between the western and eastern basins. It is shown that the analysis of a climatological data set can help in defining regions suitable for successful monitoring by means of acoustic tomography. Empirical Orthogonal Functions (EOF decomposition on the profiles, performed on the seasonal cycle for some selected areas, demonstrates that two modes account for more than 98% of the variability of the climatological distribution. Reduced order EOF analysis is able to correctly represent sound speed profiles within each zone, thus providing the a priori knowledge for Matched Field Tomography. It is also demonstrated that salinity can affect the tomographic inversion, creating a higher degree of complexity than in the open oceans.Key words. Oceanography: general (marginal and semi-enclosed seas; ocean acoustics

  3. Issues and Solutions for Bringing Heterogeneous Water Cycle Data Sets Together

    Science.gov (United States)

    Acker, James; Kempler, Steven; Teng, William; Belvedere, Deborah; Liu, Zhong; Leptoukh, Gregory

    2010-01-01

    The water cycle research community has generated many regional to global scale products using data from individual NASA missions or sensors (e.g., TRMM, AMSR-E); multiple ground- and space-based data sources (e.g., Global Precipitation Climatology Project [GPCP] products); and sophisticated data assimilation systems (e.g., Land Data Assimilation Systems [LDAS]). However, it is often difficult to access, explore, merge, analyze, and inter-compare these data in a coherent manner due to issues of data resolution, format, and structure. These difficulties were substantiated at the recent Collaborative Energy and Water Cycle Information Services (CEWIS) Workshop, where members of the NASA Energy and Water cycle Study (NEWS) community gave presentations, provided feedback, and developed scenarios which illustrated the difficulties and techniques for bringing together heterogeneous datasets. This presentation reports on the findings of the workshop, thus defining the problems and challenges of multi-dataset research. In addition, the CEWIS prototype shown at the workshop will be presented to illustrate new technologies that can mitigate data access roadblocks encountered in multi-dataset research, including: (1) Quick and easy search and access of selected NEWS data sets. (2) Multi-parameter data subsetting, manipulation, analysis, and display tools. (3) Access to input and derived water cycle data (data lineage). It is hoped that this presentation will encourage community discussion and feedback on heterogeneous data analysis scenarios, issues, and remedies.

  4. Multiple sclerosis rehabilitation outcomes: analysis of a national casemix data set from Australia.

    Science.gov (United States)

    Khan, F; Turner-Stokes, L; Stevermuer, T; Simmonds, F

    2009-07-01

    To examine the outcomes of inpatient rehabilitation for persons with multiple sclerosis (pwMS), using the Australian Rehabilitation Outcomes Centre (AROC) database. Deidentified data from the AROC database were analyzed for all rehabilitation admissions during 2003-2007, using four classes for functional level. The outcomes included Functional Independence Measure (FIM) scores and efficiency, hospital length of stay (LOS), and discharge destination. Of 1010 case episodes, 70% were women, admitted from home (n = 851) and discharged into the community (n = 890), and 97% (n = 986) were in the higher three classes for functional level (classes 216, 217, and 218). Majority of the more disabled pwMS were treated in the public hospital system, with a longer LOS compared with private facilities (P < 0.001). The FIM for classes 216-218 showed significant functional improvement during the admission (P < 0.001), and those in higher classes showed less change (likely due to higher FIM admission scores). FIM efficiency was significantly higher in class 217 than other classes (P < 0.001). The year-on-year trend was toward reducing hospital LOS and FIM efficiency, but these did not reach significance (P = 0.107, P = 0.634). The AROC data set is useful for describing rehabilitation outcomes for pwMS. However, additional information needs to be collected to evaluate nature of services provided and service implications.

  5. Expediting Combinatorial Data Set Analysis by Combining Human and Algorithmic Analysis.

    Science.gov (United States)

    Stein, Helge Sören; Jiao, Sally; Ludwig, Alfred

    2017-01-09

    A challenge in combinatorial materials science remains the efficient analysis of X-ray diffraction (XRD) data and its correlation to functional properties. Rapid identification of phase-regions and proper assignment of corresponding crystal structures is necessary to keep pace with the improved methods for synthesizing and characterizing materials libraries. Therefore, a new modular software called htAx (high-throughput analysis of X-ray and functional properties data) is presented that couples human intelligence tasks used for "ground-truth" phase-region identification with subsequent unbiased verification by an algorithm to efficiently analyze which phases are present in a materials library. Identified phases and phase-regions may then be correlated to functional properties in an expedited manner. For the functionality of htAx to be proven, two previously published XRD benchmark data sets of the materials systems Al-Cr-Fe-O and Ni-Ti-Cu are analyzed by htAx. The analysis of ∼1000 XRD patterns takes less than 1 day with htAx. The proposed method reliably identifies phase-region boundaries and robustly identifies multiphase structures. The method also addresses the problem of identifying regions with previously unpublished crystal structures using a special daisy ternary plot.

  6. Musculoskeletal Simulation Model Generation from MRI Data Sets and Motion Capture Data

    Science.gov (United States)

    Schmid, Jérôme; Sandholm, Anders; Chung, François; Thalmann, Daniel; Delingette, Hervé; Magnenat-Thalmann, Nadia

    Today computer models and computer simulations of the musculoskeletal system are widely used to study the mechanisms behind human gait and its disorders. The common way of creating musculoskeletal models is to use a generic musculoskeletal model based on data derived from anatomical and biomechanical studies of cadaverous specimens. To adapt this generic model to a specific subject, the usual approach is to scale it. This scaling has been reported to introduce several errors because it does not always account for subject-specific anatomical differences. As a result, a novel semi-automatic workflow is proposed that creates subject-specific musculoskeletal models from magnetic resonance imaging (MRI) data sets and motion capture data. Based on subject-specific medical data and a model-based automatic segmentation approach, an accurate modeling of the anatomy can be produced while avoiding the scaling operation. This anatomical model coupled with motion capture data, joint kinematics information, and muscle-tendon actuators is finally used to create a subject-specific musculoskeletal model.

  7. Hierarchical ordering with partial pairwise hierarchical relationships on the macaque brain data sets.

    Directory of Open Access Journals (Sweden)

    Woosang Lim

    Full Text Available Hierarchical organizations of information processing in the brain networks have been known to exist and widely studied. To find proper hierarchical structures in the macaque brain, the traditional methods need the entire pairwise hierarchical relationships between cortical areas. In this paper, we present a new method that discovers hierarchical structures of macaque brain networks by using partial information of pairwise hierarchical relationships. Our method uses a graph-based manifold learning to exploit inherent relationship, and computes pseudo distances of hierarchical levels for every pair of cortical areas. Then, we compute hierarchy levels of all cortical areas by minimizing the sum of squared hierarchical distance errors with the hierarchical information of few cortical areas. We evaluate our method on the macaque brain data sets whose true hierarchical levels are known as the FV91 model. The experimental results show that hierarchy levels computed by our method are similar to the FV91 model, and its errors are much smaller than the errors of hierarchical clustering approaches.

  8. ObspyDMT: a Python toolbox for retrieving and processing large seismological data sets

    Science.gov (United States)

    Hosseini, Kasra; Sigloch, Karin

    2017-10-01

    We present obspyDMT, a free, open-source software toolbox for the query, retrieval, processing and management of seismological data sets, including very large, heterogeneous and/or dynamically growing ones. ObspyDMT simplifies and speeds up user interaction with data centers, in more versatile ways than existing tools. The user is shielded from the complexities of interacting with different data centers and data exchange protocols and is provided with powerful diagnostic and plotting tools to check the retrieved data and metadata. While primarily a productivity tool for research seismologists and observatories, easy-to-use syntax and plotting functionality also make obspyDMT an effective teaching aid. Written in the Python programming language, it can be used as a stand-alone command-line tool (requiring no knowledge of Python) or can be integrated as a module with other Python codes. It facilitates data archiving, preprocessing, instrument correction and quality control - routine but nontrivial tasks that can consume much user time. We describe obspyDMT's functionality, design and technical implementation, accompanied by an overview of its use cases. As an example of a typical problem encountered in seismogram preprocessing, we show how to check for inconsistencies in response files of two example stations. We also demonstrate the fully automated request, remote computation and retrieval of synthetic seismograms from the Synthetics Engine (Syngine) web service of the Data Management Center (DMC) at the Incorporated Research Institutions for Seismology (IRIS).

  9. Multivariate modeling of complications with data driven variable selection: Guarding against overfitting and effects of data set size

    International Nuclear Information System (INIS)

    Schaaf, Arjen van der; Xu Chengjian; Luijk, Peter van; Veld, Aart A. van’t; Langendijk, Johannes A.; Schilstra, Cornelis

    2012-01-01

    Purpose: Multivariate modeling of complications after radiotherapy is frequently used in conjunction with data driven variable selection. This study quantifies the risk of overfitting in a data driven modeling method using bootstrapping for data with typical clinical characteristics, and estimates the minimum amount of data needed to obtain models with relatively high predictive power. Materials and methods: To facilitate repeated modeling and cross-validation with independent datasets for the assessment of true predictive power, a method was developed to generate simulated data with statistical properties similar to real clinical data sets. Characteristics of three clinical data sets from radiotherapy treatment of head and neck cancer patients were used to simulate data with set sizes between 50 and 1000 patients. A logistic regression method using bootstrapping and forward variable selection was used for complication modeling, resulting for each simulated data set in a selected number of variables and an estimated predictive power. The true optimal number of variables and true predictive power were calculated using cross-validation with very large independent data sets. Results: For all simulated data set sizes the number of variables selected by the bootstrapping method was on average close to the true optimal number of variables, but showed considerable spread. Bootstrapping is more accurate in selecting the optimal number of variables than the AIC and BIC alternatives, but this did not translate into a significant difference of the true predictive power. The true predictive power asymptotically converged toward a maximum predictive power for large data sets, and the estimated predictive power converged toward the true predictive power. More than half of the potential predictive power is gained after approximately 200 samples. Our simulations demonstrated severe overfitting (a predicative power lower than that of predicting 50% probability) in a number of small

  10. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates

    DEFF Research Database (Denmark)

    Schwämmle, Veit; León, Ileana R.; Jensen, Ole Nørregaard

    2013-01-01

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical...... replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant...... as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved...

  11. Reliability and validity of the International Spinal Cord Injury Basic Pain Data Set items as self-report measures

    DEFF Research Database (Denmark)

    Jensen, M P; Widerström-Noga, E; Richards, J S

    2010-01-01

    To evaluate the psychometric properties of a subset of International Spinal Cord Injury Basic Pain Data Set (ISCIBPDS) items that could be used as self-report measures in surveys, longitudinal studies and clinical trials....

  12. Impact of sample size on principal component analysis ordination of an environmental data set: effects on eigenstructure

    Directory of Open Access Journals (Sweden)

    Shaukat S. Shahid

    2016-06-01

    Full Text Available In this study, we used bootstrap simulation of a real data set to investigate the impact of sample size (N = 20, 30, 40 and 50 on the eigenvalues and eigenvectors resulting from principal component analysis (PCA. For each sample size, 100 bootstrap samples were drawn from environmental data matrix pertaining to water quality variables (p = 22 of a small data set comprising of 55 samples (stations from where water samples were collected. Because in ecology and environmental sciences the data sets are invariably small owing to high cost of collection and analysis of samples, we restricted our study to relatively small sample sizes. We focused attention on comparison of first 6 eigenvectors and first 10 eigenvalues. Data sets were compared using agglomerative cluster analysis using Ward’s method that does not require any stringent distributional assumptions.

  13. Performance Evaluation and Community Application of Low-Cost Sensors for Ozone and Nitrogen Dioxide Data Set

    Data.gov (United States)

    U.S. Environmental Protection Agency — Data set contains data collected during the DISCOVER-AQ Mission that support the journal article results. This dataset is associated with the following publication:...

  14. A spatially comprehensive, meteorological data set for Mexico, the U.S., and southern Canada (NCEI Accession 0129374)

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — A data set of observed daily and monthly averaged precipitation, maximum and minimum temperature, gridded to a 1/16° (~6km) resolution that spans the entire country...

  15. Anthropogenic Sulfur Dioxide Emissions, 1850-2005: National and Regional Data Set by Source Category, Version 2.86

    Data.gov (United States)

    National Aeronautics and Space Administration — The Anthropogenic Sulfur Dioxide Emissions, 1850-2005: National and Regional Data Set by Source Category, Version 2.86 provides annual estimates of anthropogenic...

  16. Tropospheric Aerosol Radiative Forcing Observational eXperiment - University of Washington instrumented C-131A aircraft Data Set

    Data.gov (United States)

    National Aeronautics and Space Administration — TARFOX_UWC131A is the Tropospheric Aerosol Radiative Forcing Observational eXperiment (TARFOX) - University of Washington instrumented C-131A aircraft data set. The...

  17. Analyzing Planck and low redshift data sets with advanced statistical methods

    Science.gov (United States)

    Eifler, Tim

    The recent ESA/NASA Planck mission has provided a key data set to constrain cosmology that is most sensitive to physics of the early Universe, such as inflation and primordial NonGaussianity (Planck 2015 results XIII). In combination with cosmological probes of the LargeScale Structure (LSS), the Planck data set is a powerful source of information to investigate late time phenomena (Planck 2015 results XIV), e.g. the accelerated expansion of the Universe, the impact of baryonic physics on the growth of structure, and the alignment of galaxies in their dark matter halos. It is the main objective of this proposal to re-analyze the archival Planck data, 1) with different, more recently developed statistical methods for cosmological parameter inference, and 2) to combine Planck and ground-based observations in an innovative way. We will make the corresponding analysis framework publicly available and believe that it will set a new standard for future CMB-LSS analyses. Advanced statistical methods, such as the Gibbs sampler (Jewell et al 2004, Wandelt et al 2004) have been critical in the analysis of Planck data. More recently, Approximate Bayesian Computation (ABC, see Weyant et al 2012, Akeret et al 2015, Ishida et al 2015, for cosmological applications) has matured to an interesting tool in cosmological likelihood analyses. It circumvents several assumptions that enter the standard Planck (and most LSS) likelihood analyses, most importantly, the assumption that the functional form of the likelihood of the CMB observables is a multivariate Gaussian. Beyond applying new statistical methods to Planck data in order to cross-check and validate existing constraints, we plan to combine Planck and DES data in a new and innovative way and run multi-probe likelihood analyses of CMB and LSS observables. The complexity of multiprobe likelihood analyses scale (non-linearly) with the level of correlations amongst the individual probes that are included. For the multi

  18. HOMPRA Europe - A gridded precipitation data set from European homogenized time series

    Science.gov (United States)

    Rustemeier, Elke; Kapala, Alice; Meyer-Christoffer, Anja; Finger, Peter; Schneider, Udo; Venema, Victor; Ziese, Markus; Simmer, Clemens; Becker, Andreas

    2017-04-01

    Reliable monitoring data are essential for robust analyses of climate variability and, in particular, long-term trends. In this regard, a gridded, homogenized data set of monthly precipitation totals - HOMPRA Europe (HOMogenized PRecipitation Analysis of European in-situ data)- is presented. The data base consists of 5373 homogenized monthly time series, a carefully selected subset held by the Global Precipitation Climatology Centre (GPCC). The chosen series cover the period 1951-2005 and contain less than 10% missing values. Due to the large number of data, an automatic algorithm had to be developed for the homogenization of these precipitation series. In principal, the algorithm is based on three steps: * Selection of overlapping station networks in the same precipitation regime, based on rank correlation and Ward's method of minimal variance. Since the underlying time series should be as homogeneous as possible, the station selection is carried out by deterministic first derivation in order to reduce artificial influences. * The natural variability and trends were temporally removed by means of highly correlated neighboring time series to detect artificial break-points in the annual totals. This ensures that only artificial changes can be detected. The method is based on the algorithm of Caussinus and Mestre (2004). * In the last step, the detected breaks are corrected monthly by means of a multiple linear regression (Mestre, 2003). Due to the automation of the homogenization, the validation of the algorithm is essential. Therefore, the method was tested on artificial data sets. Additionally the sensitivity of the method was tested by varying the neighborhood series. If available in digitized form, the station history was also used to search for systematic errors in the jump detection. Finally, the actual HOMPRA Europe product is produced by interpolation of the homogenized series onto a 1° grid using one of the interpolation schems operationally at GPCC

  19. Atmospheric correction at AERONET locations: A new science and validation data set

    Science.gov (United States)

    Wang, Y.; Lyapustin, A.I.; Privette, J.L.; Morisette, J.T.; Holben, B.

    2009-01-01

    This paper describes an Aerosol Robotic Network (AERONET)-based Surface Reflectance Validation Network (ASRVN) and its data set of spectral surface bidirectional reflectance and albedo based on Moderate Resolution Imaging Spectroradiometer (MODIS) TERRA and AQUA data. The ASRVN is an operational data collection and processing system. It receives 50 ?? 50 km2; subsets of MODIS level 1B (L1B) data from MODIS adaptive processing system and AERONET aerosol and water-vapor information. Then, it performs an atmospheric correction (AC) for about 100 AERONET sites based on accurate radiative-transfer theory with complex quality control of the input data. The ASRVN processing software consists of an L1B data gridding algorithm, a new cloud-mask (CM) algorithm based on a time-series analysis, and an AC algorithm using ancillary AERONET aerosol and water-vapor data. The AC is achieved by fitting the MODIS top-of-atmosphere measurements, accumulated for a 16-day interval, with theoretical reflectance parameterized in terms of the coefficients of the Li SparseRoss Thick (LSRT) model of the bidirectional reflectance factor (BRF). The ASRVN takes several steps to ensure high quality of results: 1) the filtering of opaque clouds by a CM algorithm; 2) the development of an aerosol filter to filter residual semitransparent and subpixel clouds, as well as cases with high inhomogeneity of aerosols in the processing area; 3) imposing the requirement of the consistency of the new solution with previously retrieved BRF and albedo; 4) rapid adjustment of the 16-day retrieval to the surface changes using the last day of measurements; and 5) development of a seasonal backup spectral BRF database to increase data coverage. The ASRVN provides a gapless or near-gapless coverage for the processing area. The gaps, caused by clouds, are filled most naturally with the latest solution for a given pixel. The ASRVN products include three parameters of the LSRT model (kL, kG, and kV), surface albedo

  20. Nursing Minimum Data Sets for documenting nutritional care for adults in primary healthcare: a scoping review.

    Science.gov (United States)

    Håkonsen, Sasja Jul; Pedersen, Preben Ulrich; Bjerrum, Merete; Bygholm, Ann; Peters, Micah D J

    2018-01-01

    To identify all published nutritional screening instruments that have been validated in the adult population in primary healthcare settings and to report on their psychometric validity. Within health care, there is an urgent need for the systematic collection of nursing care data in order to make visible what nurses do and to facilitate comparison, quality assurance, management, research and funding of nursing care. To be effective, nursing records should accurately and comprehensively document all required information to support safe and high quality care of patients. However, this process of documentation has been criticized from many perspectives as being highly inadequate. A Nursing Minimum Data Set within the nutritional area in primary health care could therefore be beneficial in order to support nurses in their daily documentation and observation of patients. The review considered studies that included adults aged over 18 years of any gender, culture, diagnosis and ethnicity, as well as nutritional experts, patients and their relatives. The concepts of interest were: the nature and content of any nutritional screening tools validated (regardless of the type of validation) in the adult population in primary healthcare; and the views and opinions of eligible participants regarding the appropriateness of nutritional assessment were the concept of interest. Studies included must have been conducted in primary healthcare settings, both within home care and nursing home facilities. This scoping review used a two-step approach as a preliminary step to the subsequent development of a Nursing Minimum Data Set within the nutritional area in primary healthcare: i) a systematic literature search of existing nutritional screening tools validated in primary health care; and ii) a systematic literature search on nutritional experts opinions on the assessment of nutritional nursing care of adults in primary healthcare as well as the views of patients and their relatives